Title: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries
1State-Slice New Paradigm of Multi-query
Optimization ofWindow-based Stream Queries
- Song Wang
- Elke Rundensteiner
- Database Systems Research Group
- Worcester Polytechnic Institute
- Worcester, MA, USA.
Samrat Ganguly Sudeept Bhatnagar NEC
Laboratories America Inc. Princeton, NJ, USA.
2Computation Sharing for Stream Processing
Register Continuous Queries
Streaming Data
s
Streaming Result
w1
?
Agg
w2
s
Agg
s
w3
SPJA Query Network
- New Challenges
- In-memory processing of stateful operators
- Stateful operators with various window
constraints
3Window Constraints for Stateful Operators
- Time-based sliding window constraints
- Each tuple has a timestamp
- Only tuples within W timeframe can form an output
- Observations
- States in the operator dominate memory usage
- State size is proportional to the input rate and
window length - Join CPU cost is proportional to the state size
4A Motivation Example
Q1 SELECT A. FROM Temperature A, Humidity
B WHERE A.LocationId B.LocationId WINDOW w1 min
Q2 SELECT A. FROM Temperature A, Humidity
B WHERE A.LocationId B.LocationId AND
A.ValuegtThreshold WINDOW w2 min
Let w1ltw2
- Observations
- State AW1 overlaps with state AW2
- State BW1 overlaps with state BW2
- Joined results of Q1 and Q2 overlap
5Sharing with Selection Pull-up CDF02, HFA03
Q2
Q1
Router
sA
Ta-Tb
ltW1
all
R
Aw2
Bw2
B
A
- Selection pull up
- Using larger window (w2)
- CDF02 J. Chen, D. J. DeWitt, and J. F.
Naughton. Design and evaluation of alternative
selection placement strategies in optimizing
continuous queries. In ICDE02. - HFA03 M. A. Hammad, M. J. Franklin, W. G.
Aref, and A. K. Elmagarmid. Scheduling for shared
window joins over data streams. In VLDB03.
6Sharing with Selection Pull-up CDF02, HFA03
- Pros
- Single Join Operator
- Cons
- Wasted Computation without Early Filtering
- Wasted State Memory without Early Filtering
- Per Output-Tuple Routing Cost
7Stream Partition with Selection Pushdown KFH04
Q2
Q1
Router
all
ltW1
Ta-Tb
Union
U
R
A1
B1
A2
B2
Aw1
Bw1
Aw2
Bw2
1
2
lt
Split
S
gt
Threshold
B
A
- Split stream A by A.Value
- Route shared join results
- KFH04 S. Krishnamurthy, M. J. Franklin, J. M.
Hellerstein, and G. Jacobson. The case for
precision sharing. In VLDB04.
8Stream Partition with Selection Pushdown KFH04
- Pros
- Selection pushdown no wasted Join Computation
- Cons
- Multiple Join Operators
- Duplicated State Memory in Multiple Join
Operators - Per Output-Tuple Routing Cost
9State-Slice New Sharing Paradigm
- Key Ideas
- State-Slice Concept for Sliding Window Join
- Pipelined Chain of Join Slices
- Prospective Benefit
- Fine-grained Selection Push-down
- Pipelined Join Operators
- Avoiding Per-tuple Routing Cost
10One-way State Sliced Window Join
- Iower bound of sliding window w1,w2
- B tuple only probes A tuples that are older at
least W1, but at most W2, than itself
11The Chain of One-way State-Sliced Joins
Joined-Result
- Split state memory into chain of joins
- No overlap of state memory in chain of joins
12From One-way to Two-way Binary Join
Joined-Result
U
Union
female
A Tuple
State of Stream A 0, w1
State of Stream A w1, w2
male
Queue(s)
male
B Tuple
State of Stream B 0, w1
State of Stream B w1, w2
female
J2
J1
- Intuitively a combination of two one-way join
- Two references for each A or B tuples
- Male tuples are used to probe states
- Female tuples are inserted and cross-purged to
respective states
13State-Sliced Join Chain The Example
A1
- States of sliced joins in a chain are disjoint
with each other - ? Minimize State Memory Usage
- Selection can be pushed down into middle of join
chain - ? Avoid Unnecessary Resource Waste
- No routing step is needed
- ? Avoid Per Output-Tuple Routing Cost Completely
14Summary State-Sliced Join Chain
- Pros
- Minimized Memory Usage
- Reduced Routing Cost
- No Need of Operator Synchronization in the Chain
- Cons
- Stream traffic between pipelined joins
- Purge cost
15Sharing via Chains Memory-Optimal Chain
16Mem-Optimal Chain? CPU-Optimal Chain?
- Overheads
- Too many operators may increase system context
switch cost - Too many sliced states increase purging cost
17Merging Sliced Joins
- Tradeoff
- Gain from Merging
- Reduce number of Join operators
- Reduce extra purging cost
- Loss from Merging
- Introduce routing cost
- Increase memory usage due to selection pullup
- Cost Model for CPU Usage
18CPU-Opt. Chain Search Space Solution
Legend Vi window start/end time Vi toVj one
slice window
v0
v1
v2
v5
v3
v4
Shortest path problem
19Summary Mem-Opt. vs. CPU-Opt. Join Chain
- Mem-Optimal
- Minimized Memory Usage
- Higher System Overhead
- Higher Purging Cost
- CPU-Optimal
- Minimized CPU Usage
- More Memory Usage if Selection is Pulled Up to
Merge Slices.
20Experimental WPI Stream Engine CAPE
Software Demonstration VLDB04
21Experiment Study 1 Memory Consumption
22Experiment Study 2 Total Service Rate
23Experiment Study 3 Mem-Opt. vs. CPU-Opt.
Window Distributions Used for 12 Queries.
Small-Large 12 Queries
Small-Large 24 Queries
24Conclusion
- Pipelined state sliced join chain
- Mem-Optimal chain construction
- CPU-Optimal chain construction
- Implemented in CAPE
- Performance evaluation
25Thank You!
Visit CAPE Homepagehttp//davis.wpi.edu/dsrg/CAP
E/index.html
Supported by
CRI grant CNS 05-51584