Title: Continuous Stream Monitoring Technology
1Continuous Stream Monitoring Technology
- Elke A. Rundensteiner
- Database Systems Research Laboratory
- Department of Computer Science
- Worcester Polytechnic Institute, USA
- rundenst _at_ cs.wpi.edu
-
- November 2006
2A Database . . .
- Vast amount of electronic information in
organisations, companies, scientific institutes
that needs to be organized, stored securily, and
accessed efficiently and easily. - Three common steps
- Make schema design
- Load database
- Query static database
Select name from employee
DBMS
Stored Database
3Select name from employee
DBMS
Stored Database
4A Look at Modern Data Streams !
- Digital radio telescopes
- Network traffic flow
- Stock tickers/feeds
- Sensor networks
- Web usage transactions
- Outpatient care
- Environmental instruments
DSMS Filter Transform
select fft(s) from radiosignal s where
source(s) Antenna1
5 Databases Everything is Upside Down !
static data
data
Query
one-time queries
6Continuous Queries on Data Streams
Online Stream Monitoring
7Motivating Applications Everywhere
- Traffic Management Streams of Cars and Mobile
Requests - Market Analysis Streams of Stock Exchange
Data - Critical Care Streams of Vital Sign
Measurements - Physical Plant Monitoring Streams of
RFID/Environmental Readings - Emergency Response Streams of Sensors and
People tracking
8Mobile Traffic-Related Streams
- moving objects
- dynamic range query
- dynamic kNN query
9Spatio-Temporal Continuous Tracking
Monitor the traffic in the red areas
Continuously return the area covered by the herd
during the migration
10FireEngine Project Sensors in Rooms
11Fire Monitoring Queries
- Track smoke and heat clouds (moving clusters) in
terms of their sizes and speeds? - Is there an outlier (prank), or an actual fire ?
- Match sensors readings of fire with a fire stream
simulation to determine similarity ? - Any sensors faulty, and thus should be ignored?
12Dynamicity in Stream Query Processing
Register Continuous Queries
High workload of queries
Real-time and accurate responses required
Scalable Stream Query Engine
Streaming Data (push-based paradigm)
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations (continuous
evaluation)
New query processing technology required.
13Execution of Queries
Slide
s
s
. . .
. . .
s
s
m
. . .
. . .
. . .
È
m
Tumble
s
m
- Queries
- Graph Query Plan
- Boxes Query Operators such as Filter or Join
- Arcs Streams with time-stamped tuples
14Execution of Queries
Slide
s
s
s
s
s
s
. . .
. . .
s
s
s
s
s
s
s
App
s
m
s
s
s
m
m
s
. . .
. . .
. . .
È
È
È
È
È
È
È
m
m
m
App
Tumble
Tumble
Tumble
s
m
s
s
m
s
m
s
Execution via Operator Scheduling
15Adaptation Techniques in CAPE
- On-Line Query Plan Reshaping
- (with Yali Zhu and G. Heineman )
Published in ACM SIGMOD 2004, and in Submission
to TODS journal 2006
16Query Optimization
BC
AB
AB
BC
A
A
B
B
C
C
How optimize if query is continuously running?
17Run-time Plan Re-Optimization
- Step1 - Decide when to optimize
- Statistics monitoring
- Step2 Generate new query plan
- Query optimization
- Step3 Replace current plan by new plan
- Plan Migration
18Naïve Plan Migration Strategy
BC
AB
AB
BC
A
A
B
B
C
C
- Migration Steps
- Pause execution of old plan
- Drain out all tuples inside old plan
- Replace old plan by new plan
- Resume execution of new plan
Problem Works for stateless operators only
19Stateful Operator in Streaming
- Why stateful
- Need non-blocking operators
- Operator needs to output partial results
Symmetric hash join For each new tuple A purge
state B, join state B, insert to state A
State A
State B
AB
A
B
Key Observation The purge of tuples in states
relies on processing of new tuples.
20Naïve Migration Strategy Revisited
BC
AB
Deadlock Waiting Problem
A
B
C
(2) All tuples drained
- Steps
- (1) Pause execution of old plan
- (2) Drain out all tuples inside old plan
- (3) Replace old plan by new plan
- (4) Resume execution of new plan
(3) Old Replaced By new
(4) Processing Resumed
21Proposed Dynamic Migration Strategies
- Moving State Strategy
- Parallel Track Strategy
22Moving State Strategy
- Basic idea
- Share common states between two boxes
- Key Steps
- Identify common states
- State matching
- Share common states
- State moving
- Recompute unmatched states
- State recomputing
23Moving State Strategy
- State Matching
- State in old box has unique ID
- During rewriting, new ID given to new state in
new box - When rewriting done, match states based on IDs.
- State Moving
- Between matched states
- On same machine, creates new pointers for matched
states in new box - Whats left?
- Unmatched states in new box
QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
24Unmatched States
- State Recomputing
- Recursively recompute unmatched SBC and SBCD by
joining matched states
QABCD
AB
SA
SBCD
CD
SBC
SD
BC
SB
SC
QA
QB
QC
QD
25MS Migration Pros and Cons
- Pros
- Fast when of tuples in states is small
- Low input rates or small window size
- Cons
- Output silence during entire migration stage
- Can we output results even during migration?
- Motivation for Parallel Track Strategy
26Parallel Track Strategy
- Basic idea
- Execute both old and new plans in parallel
- Gradually push old tuples out of old box by
purging - Key Steps
- Connect new box
- Execute both boxes in parallel
- Remove old box once expired
- Contains only new tuples
- No old tuples or sub-tuples
27Parallel Track Strategy
A Tuple ABC in SABC
A
B
C
- Connect boxes
- Execute in parallel
- Until all old tuples purged
- Disconnect old box
QABCD
QABCD
SABC
SD
SBCD
SA
CD
AB
SBC
SAB
SD
SC
BC
CD
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QD
QA
QB
QC
28PT Migrations Pros and Cons
- Pros
- Keep on producing results even during migration
- No results during MS migration
- Cons
- Migration duration is at least 2W
- MS may be faster depends on of tuples in states
29Summary Stream Plan Migration
- Our central theme Optimization via Adaptation
- First run-time solution for stateful operators
- Two migration methods
- Moving State Strategy
- Parallel Track Strategy
- Cost Models for Comparative Analysis
- System Implementation in CAPE
- Experimental Evaluations
30Overall Summary So Much Left to Do !
- Large variety of challenging stream applications
- Generic core technology for stream processing
engines - Startup starting to pop up StreamBase for
Stockmarket - Major DBMS players like IBM, Oracle, etc.
joining in - Cool open research, great potential for real
impact !
31The End
http//davis.wpi.edu.edu/dsrg
32Subset of CAPE Publications
- RDZ04 E. A. Rundensteiner, L. Ding, Y. Zhu, T.
Sutherland and B. Pielech, CAPE A
Constraint-Aware Adaptive Stream Processing
Engine. Invited Book Chapter. http//www.cs.uno.e
du/nauman/streamBook/. July 2004 - ZRH04 Y. Zhu, E. A. Rundensteiner and G. T.
Heineman, "Dynamic Plan Migration for Continuous
Queries Over Data Streams. SIGMOD 2004, pages
431-442. - DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
and G. T. Heineman, "Joining Punctuated Streams.
EDBT 2004, pages 587-604. - DR04 L. Ding and E. A. Rundensteiner,
"Evaluating Window Joins over Punctuated
Streams. CIKM 2004, to appear. - DRH03 L. Ding, E. A. Rundensteiner and G. T.
Heineman, MJoin A Metadata-Aware Stream Join
Operator. DEBS 2003. - RDSZBM04 E A. Rundensteiner, L Ding, T
Sutherland, Y Zhu, B Pielech And N Mehta. CAPE
Continuous Query Engine with Heterogeneous-Grained
Adaptivity. Demonstration Paper. VLDB 2004 - SR04 T. Sutherland and E. A. Rundensteiner,
"D-CAPE A Self-Tuning Continuous Query Plan
Distribution Architecture. Tech Report,
WPI-CS-TR-04-18, 2004. - SPR04 T. Sutherland, B. Pielech, Yali Zhu,
Luping Ding, and E. A. Rundensteiner, "Adaptive
Multi-Objective Scheduling Selection Framework
for Continuous Query Processing . IDEAS 2005. - SLJR05 T Sutherland, B Liu, M Jbantova, and E
A. Rundensteiner, D-CAPE Distributed and
Self-Tuned Continuous Query Processing, CIKM,
Bremen, Germany, Nov. 2005. - LR05 Bin Liu and E.A. Rundensteiner,
Revisiting Pipelined Parallelism in Multi-Join
Query Processing, VLDB 2005. - B05 Bin Liu , Yali Zhu and E.A.
Rundensteiner, Spill Policies for Long-Running
Queries, ACM SIGMOD 2006, to appear. - CAPE Project http//davis.wpi.edu/dsrg/CAPE/index
.html