Title: XStream: a SignalOriented Data Stream Management System
1XStream a Signal-Oriented Data Stream
Management System
- Lewis Girod, Yuan Mei, Ryan Newton, Stanislav
Rost, Arvind Thiagarajan,Hari Balakrishnan,
Samuel MaddenCSAIL, MIT
2XStream and Its Advantages
- A stream processing engine (SPE) for high- data
rate signal processing applications - The SigSeg abstractionEfficient window-based
signal processing - The Sync operatorOptimizes joining of streams
in the time domain - The Depth-First schedulerMinimizes scheduling
and data passing overhead
3Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
4Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
5Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
6Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
DATA ABSTRACTION
7Most Sensor Data is Isochronous
- Isochronous occurring at equal intervals in time
8Most Sensor Data is Isochronous
- A stream of lttimestamp, samplegt tuples is
wasteful - Inflates cache footprint
- Magnifies memory management costs
- Convert between windows and samples strip out,
add timestamps - XStream adds an isochronous signal abstraction,
SigSeg - index sampling_rate phase timestamp
- SigSeg window of signal data
9Signal Processing Applications Operate on Data
Windows
- Windows used by query may not align, may overlap
- Naïve implementations introduce copying,
duplicate data - XStreams implementationSigSeg reference to a
time range of a signal - Unreferenced portions of signal data are
garbage-collected
10SigSegs
- SigSeg API methods
- Signal(Timebase tb)
- SigSeg Signal.grow(T buf, size_t size)
- SigSeg ss.append(SigSeg otherSS)
- SigSeg ss.subseg(int start, int end)
- size_t ss.materialize(T buf)
- Data model
- Stream tuples may contain SigSegs(example
ltboolean isWhiteNoise, SigSeg soundgt)
11Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
TIME-BASEDJOINS
12The Time Join Operator
- Time join semantics
- Processes 2 streams at a time
- Aligns the streams by timestamps
- Outputs joined input tuples whose timestamps
matched - Possible implementation
- Hash join on timestamps, consume matching tuples
from the other stream
13The Sync Operator
- Sync semantics time join select
- Processes N streams at a time
- Explicit control stream input
- ltaccept, start_time, end_timegt
- ltdiscard, start_time, end_timegt
- Sync implementation
- N accumulator SigSegs, one control queue
- Matching based on windows and ranges
14Sync is Better Than Time Join
- Window-based vs tuple-at-a-time
- Direct offset into input streams to find the time
ranges sought by control input - No overhead of indices
- Buffering signal data in SigSegs is lightweight
- Versatile can filter based on control stream
- Time join would have to pass the whole stream of
join results to a filter operator
15Example Application Acoustic Localization
VoiceDetector
Silence Filter
BeamForming
SCHEDULING
16Scheduling FIFO
17Scheduling FIFO
18Scheduling FIFO
19Scheduling FIFO
20Scheduling FIFO-TS
- Like FIFO, but operators yield after X
secondsof processing on a given input queue - X is the timeslice
- Similar to the Train Scheduler
21Scheduling RTC
- Run-to-completion drain the input queues
22Scheduling RTC
- Run-to-completion drain the input queues
23Scheduling RTC
- Run-to-completion drain the input queues
24Scheduling RTC
- Run-to-completion drain the input queues
25Scheduling RTC
- Run-to-completion drain the input queues
26Scheduling RTC
- Run-to-completion drain the input queues
27Scheduling DF
- Depth-first call along the graph, one tuple at a
time
28Scheduling DF
- Depth-first call along the graph, one tuple at
a time
29Scheduling DF
- Depth-first call along the graph, one tuple at
a time
30Scheduling DF
- Depth-first call along the graph, one tuple at
a time
31Scheduling DF
- Depth-first call along the graph, one tuple at
a time
32Scheduling DF
- Depth-first call along the graph, one tuple at
a time
33Depth-First is Efficient
- Output tuples are allocated on stack
- We also avoid overhead of using a container to
accumulate and pass around batches - No need to queue, copy tuples
- Passing a tuple is a direct function call
- Minimizes the data cache footprint
- Minimal state along the edges of the query graph
34XStream Evaluation
VoiceDetector
Silence Filter
35Summary of Evaluation Results
- Representation of signal data
- Samples are tuples ? windows of samples 2x
- Windows of samples ? SigSegs another 2x
- SyncSigSegs vs TimeJoinSelectWindows
- About 10x speedup
- DepthFirst scheduler 1.2x-1.5x speedup over
other schedulers
36Conclusions and Current Work
- Contributions
- SigSegs efficient abstraction for isochronous
streaming data - Sync window-based time join select
- Depth-First a simple, low-overhead scheduler
with good cache locality and inexpensive data
management - XStream severely outperforms conventional SPEs in
high-rate applications working on isochronous
data - Ongoing work for XStream
- Query language and compiler optimizations
- Sensor network component
- Multicore scheduling