Title: The Raindrop Engine: Continuous Query Processing
1The Raindrop EngineContinuous Query Processing
- Elke A. Rundensteiner
- Database Systems Research Lab, WPI
- 2003
2Monitoring Applications
- Monitor troop movements during combat and warn
when soldiers veer off course - Send alert when patients vital signs begin to
deteriorate - Monitor incoming news feeds to see stories on
Iraq - Scour network traffic logs looking for intruders
3Properties of Monitoring Applications
- Queries and monitors run continuously, possibly
unending - Applications have varying service preferences
- Patient monitoring only want freshest data
- Remote sensors have limited memory
- News service wishes maximal throughput
- Taking 60 seconds to process vital signs and
sound an alert may be too long
4Properties of Streaming Data
- Possibly never ending stream of data
- Unpredictable arrival patterns
- Network congestion
- Weather (for external sensors)
- Sensor moves out of range
5DBMS Approach to Continuous Queries
- Insert each new tuple into the database, encode
queries as triggers MWA03 - Problems
- High overhead with inserts CCC02
- Triggers do not scale well CCC02
- Uses static optimization and execution strategies
that cannot adapt to unpredictable streams - System is less utilized if data streams arrive
slowly - No means to input application service
requirements
6New Class of Query Systems
- CQ Systems emerged recently (Aurora, Stream,
NiagaraCQ, Telegraph, et al.) - Generally work as follows
- System subscribes to some streams
- End users issued continuous queries against
streams - System returns the results to the user as a
stream - All CQ systems use some adaptive techniques to
cope with unpredictable streams
7Overview of Adaptive Techniques in CQ Systems
Research Work Technique(s) Goal
Aurora CCC02 Load shedding, batch tuple processing to reduce context switching Maintain high quality of service
STREAM MWA03 Adaptive scheduling algorithm (Chain) BBM03, Load shedding Minimize memory requirements during periods of bursty arrival
NiagaraCQ CDT00 Generate near-optimal query plans for multiple queries Efficiently share computation between multiple queries, highly scalable system, maximize output rate
Eddies AH00 (Telegraph) Dynamically route tuples among Joins Keep system constantly busy, improve throughput
XJoin UF00 Break Join into 3 stages and make use of memory and disk storage Keep Join and system running at full capacity at all times
UF01 Schedule streams with the highest rate Maximize throughput to clients
Tukwila IFF99 UF98 Reorganize query plans on the fly by using using synchronization packets to tell operators to finish up their current work. Improve ill-performing query plans
8The WPI Stream Project Raindrop
Runtime Engine
CAPE Runtime Engine
QoS Inspector
Operator Configurator
Operator Scheduler
Plan Migrator
Distribution Manager
Query Plan Generator
Execution Engine
Storage Manager
Stream Receiver
Stream / Query Registration GUI
Queries
Stream Provider
Results
9Topics Studied in Raindrop Project
- Bring XML into Stream Engine
- Scalable Query Operators (Punctuations)
- Cooperative Plan Optimization
- Adaptive Operator Scheduling
- On-line Query Plan Migration
- Distributed Plan Execution
10PART I XQueries on XML Streams
(Automaton Meets Algebra)
- Based on CIKM03
- Joint work with Hong Su and Jinhui Jian
11Whats Special for XML Stream Processing?
ltBiditemsgt ltbook year2001"gt
lttitlegtDream Catcherlt/titlegt
ltauthorgtltlastgtKinglt/lastgtltfirstgtS.lt/firstgtlt/author
gt ltpublishergtBt Bound lt/publishergt
ltpricegt 30 lt/initialgt lt/bookgt
Pattern Retrieval on Token Streams
12Two Computation Paradigms
- Automata-based yfilter02, xscan01, xsm02, xsq03,
xpush03 - Algebraic niagara00,
- This Raindrop framework intends to integrate
both paradigms into one
13Automata-Based Paradigm
- Auxiliary structures for
- Buffering data
- Filtering
- Restructuring
-
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
//book/title
4
title
book
1
2
price
//book
//book/price
3
14Algebraic Computation
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
book
book
book
title
author
publisher
price
Text
Text
Text
last
first
Text
Text
b
t
lttitlegt lt/titlegt
ltbookgt lt/bookgt
Navigate b, /title -gt t
b
ltbookgt lt/bookgt
15Observations
Automata Paradigm Algebra Paradigm
Good for pattern retrieval on tokens Does not support token inputs
Need patches for filtering and restructuring Good for filtering and restructuring
Present all details on same low level Support multiple descriptive levels (declarative-gtprocedural)
Little studied as query processing paradigm Well studied as query process paradigm
Either paradigm has deficiencies Both paradigms
complement each other
16How to Integrate Two Paradigms
17How to Integrate Two Models?
- Design choices
- Extend algebraic paradigm to support automata?
- Extend automata paradigm to support algebra?
- Come up with completely new paradigm?
- Extend algebraic paradigm to support automata
- Practical
- Reuse extend existing algebraic query
processing engines - Natural
- Present details of automata computation at low
level - Present semantics of automata computation (target
patterns) at high level
18Raindrop Four-Level Framework
High (Declarative)
Stream Logic Plan
Low (Procedural)
Abstraction Level
19Level I Semantics-focused Plan Rainbow-ZPR02
- Express query semantics regardless of stored or
stream input sources - Reuse existing techniques for stored XML
processing - Query parser
- Initial plan constructor
- Rewriting optimization
- Decorrelation
- Selection push down
20Example Semantics-focused Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
ltBiditemsgt ltbook year2001"gt
lttitlegtDream Catcherlt/titlegt
ltauthorgtltlastgtKinglt/lastgtltfirstgtS.lt/firstgtlt/author
gt ltpublishergtBt Bound lt/publishergt
ltpricegt 30 lt/initialgt lt/bookgt
21Level II Stream Logical Plan
- Extend semantics-focused plan to accommodate
tokenized stream inputs - New input data format
- contextualized tokens
- New operators
- StreamSource, Nav, ExtractUnnest, ExtractNest,
StructuralJoin - New rewrite rules
- Push-into-Automata
22One Uniform Algebraic View
Algebraic Stream Logical Plan
Tuple-based plan
Query answer
Tuple stream
Token-based plan (automata plan)
XML data stream
23Modeling the Automata in Algebraic PlanBlack
BoxXScan01 vs. White Box
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
b //book p b/price t b/title
XScan
Black Box
24Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tuple-based plan
Token-based plan (automata plan)
25Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tuple-based plan
StructuralJoin b
ExtractNest b, p
ExtractNest b, t
Navigate b, /title-gtt
Navigate b, /price-gtp
Navigate S1, //book -gtb
26Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tagger Inexpensive, t-gtr
Select plt30
StructuralJoin b
ExtractNest b, p
ExtractNest b, t
Navigate b, /title-gtt
Navigate b, /price-gtp
Navigate S1, //book -gtb
27From Semantics-focused Plan to Stream Logical Plan
28Level III Stream Physical Plan
- For each stream logical operator, define how to
generate outputs when given some inputs - Multiple physical implementations may be provided
for a single logical operator - Automata details of some physical implementation
are exposed at this level - Nav, ExtractNest, ExtractUnnest, Structural Join
29One Implementation of Extract/Structural Join
lttitlegtlt/titlegt ltpricegtlt/pricegt
SJoin //book
ltpricegtlt/pricegt
lttitlegtlt/titlegt
ExtractNest b, t
ExtractNest /b, p
Nav b, /title-gtt
Nav b, /price-gtp
title
book
1
Nav ., //book -gtb
price
4
ltbiditemsgt ltbookgt lttitlegt Dream Catcher lt/titlegt
lt/bookgt
30Level IV Stream Execution Plan
- Describe coordination between operators regarding
when to fetch the inputs - When input operator generates one output tuple
- When input operator generates a batch
- When a time period has elapsed
-
- Potentially unstable data arrival rate in stream
makes fixed scheduling strategy unsuitable - Delayed data under scheduling may stall engine
- Bursty data not under scheduling may cause
overflow
31Raindrop Four-Level Framework (Recap)
Express the semantics of query regardless of
input sources
Accommodate tokenized input streams
Stream Logic Plan
Describe how operators manipulate given data
Decides the Coordination among operators
32Optimization Opportunities
33Optimization Opportunities
General rewriting (e.g., selection push down)
Break-linear-navigation rewriting
Stream Logic Plan
Physical implementations choosing
Execution strategy choosing
34From Semantics-focused to Stream Logical Plan In
or Out?
Tuple-based Plan
Query answer
Pattern retrieval in Semantics-focused plan
Tuple stream
Token-based plan (automata plan)
Apply push into automata
XML data stream
35Plan Alternatives
36Experimentation Results
37Contributions Thus Far
- Combined automata and algebra based paradigms
into one uniform algebraic paradigm - Provided four layers in algebraic paradigm
- Query semantics expressed at high layer
- Automata computation on streams hidden at low
layer - Supported optimization at an iterative manner
(from high abstraction level to low abstraction
level) - Illustrated enriched optimization opportunities
by experiments
38On-Going Issues To be Tackled
- Exploit XML schema constraints for query
optimization - Costing/query optimization of plans
- On-the-fly migration into/out of automaton
- Physical implementation strategies of operators
- Load-shedding from an automaton
39PART II On-line Query Plan Migration
40Motivation for Migration
- An Initial Good Query Plan may
- become less effective over time
- Changes in stream data distributions
(selectivity) - Changes in data arrival rates (operator overload)
- Addition of new queries/de-registering of
existing queries - Availability of resources allocated to query
evaluation - Changes in quality of service requirements
41A Simple Motivating Example
42On-line Plan Optimization
- Detection of Suboptimality in Plan
- Query Optimization via Plan Rewriting
- On-line Migration of Subplan
43Related Work
- Efficient mid-query re-optimization of
sub-optimal query execution plans,
KabraDeWitt,SIGMOD98. (Only re-optimize part of
query not started executing yet) - On reconfiguring query execution plans in
distributed object-relational DBMS, K. Ng, 1998
ICPADS. (plan cloning) - Continuously Adaptive Continuous Queries over
Streams, S. Madden, J. Hellerstein, UC Berkeley.
ACM SIGMOD 2002.
44Focus on Dynamic Migration
- Given a better plan or sub-plan
- Dynamically migrate running plan to given plan.
- Guarantee correctness of results
- No missing
- No duplicate
- No incorrect
45Join Algorithm Stateful Operator
Output Queue AB
- Symmetric NLJ
- For each new A tuple
- Purge State B using time-based window constraints
W. - Join with tuples in State B
- Output result to output queue
- Put into State A
a1b1
a1b2
a2b1
a2b2
a3b2
Node AB
State A
State B
a1
b1
a2
b2
a3
b1
b2
a1
a2
a3
Input Queue A
Input Queue B
46So whats the problem of migration?
- Old states in old plan still need to join with
future incoming tuples, cannot be discarded. - New tuples arrive randomly and continuously in
streaming system.
Old Query Plan
State C
State AB
a1b1
c1
ABC
a2b1
c2
c3
a2b1
a2b2
State A
State B
a1
b1
a2
b2
AB
47Box Concept
- Migration Unit Box
- Old box contains an old plan or sub-plan
- New box contains a new plan or sub-plan
- Two equivalent boxes
- Have the same input queues
- Have the same output queues
- Contain semantically equivalent sub-plans
- Can be migrated from one to another
48But How?
- Proposal of two Migration Strategies
- Moving state strategy
- Parallel track strategy
- Comparison via Cost models
- Experimental Evaluation
49Parallel Track Strategy
- New plan and old plan co-exist during migration.
- Run in parallel
- Share input queues and output queues
- Window constraints are used to eventually time
out the old states - This is when migration stage is over
- Discard old plan and run only new plan
50A Running Example
a1b1c1
a3b2c2
Output Queue ABC
a2b1c1
a3b1c1
a2b2c2
State BC
State C
State A
State AB
a1b1
c1
a3
b2c2
ABC
ABC
a2b1
c2
a3b1
a2b2
a3b2
W3
State A
State B
State B
State C
a3b1
a1
b1
a1
a2b2
b2c2
a2
c2
b2
b2
a3b2
a3
b1
AB
BC
c1
t
a2
a3
c2
a3
c2
b2
b2
A
B
C
Input Queues
51Pros and Cons
- We dont need to halt the system in order to do
migration - Low delay on generating results
- Overhead
- In old plan part, all-new tuple pair is discarded
only at the last node.
52Moving State Strategy
- First, freeze inputs and drain out the old
plan - Then, establish and connect the new plan
- And, move over all old states to the new states
- Lastly, let all new input data go to new plan
only - Resume processing
53Moving State Strategy
- State Matching Compare states of old and new
plans - State Moving If two states match, move them.
- State Re-computation If no match, recompute
state.
54Abstract Description
State BC
State C
State A
State AB
a1b1
c1
ABC
ABC
a2b1
State A
State B
State B
State C
a1
b1
a2
AB
BC
55Intermediate States Sharing
- We can share intermediate state BC if
- Inputs for both plans are exactly the same.
- Tuples arrived at the same state have passed
exactly the same predicates. - Above must hold for any sharing to be possible.
56Moving State Strategy
a3b1c1
Output Queue ABC
a2b2c2
a3b2c2
X
State BC
State C
State A
State AB
a1b1
c1
a1
b1c1
ABC
ABC
a2b1
a2
b2c2
a3
W3
State A
State B
State B
State C
a1
b1
a1
b2c2
a2
b1
c1
b1
c2
b2
AB
BC
c1
t
a2
a3
X
X
c2
X
a3
c2
b2
b2
A
B
C
Input Queues
57Why Need Two Pointers
- Two nodes share the same state may have different
contents - Each node has two pointers for each associated
state. - First points to the first tuple in the state
- Last points to the last tuple in the state
ABC
ABC
AB
BC
Shared State B
b1
b2
b3
b4
58Compare the Two Migration Algorithms
- Different distribution of tasks between old and
new plans
A B C Query plan used in 1 Query plan used in 2
O O O Old Old
O O N Old New
O N O Old New
N O O Old New
N N O Old New
N O N Old New
O N N Old New
N N N New New
N new tuple arrives after migration start
time. O old tuple arrives before migration start
time. New new query plan is used to compute the
result. Old old query plan is used to compute
the result.
59Which algorithm performs better?
- Compare the performances
- Which one is faster? Cost-saving?
- New query plan should outperform old query plan
- In algorithm 1 old plan part deals with 7 out of
8 cases - In algorithm 2 new plan part deals with 7 out of
8 cases - Seems winning
- However, 2 needs extra cost to re-compute
intermediate states. - So cost models are needed!
60Cost Model Assumptions
- All binary NL joins
- Assume that we already know the statistics of
each node in query plan - Input arriving rate
- Join node selectivity
- We compute processing power needed in a period
of time during which migration happens. - Not computing the real power that the system
used. - Those two are different because of resource
limitation in a real system.
61Cost Model Assumptions (cont.)
- Assume tuple processing time is the same for
tuples of different sizes. - Assume when migration starts, the old query plan
has passed its start-up stage and fully running - States are at their max size, controlled by
window constraints. - Window constraints
- Time-based.
- Same over all streams in a join.
62Running Example Revisit
- Old Query Plan New Query Plan
63Symbol Definition
- ?a, ?b, ?c Tuples/time unit, the average arrival
rate on stream A, B and C. - ?ab, ?abc_old The selectivity of node AB and ABC
in old query plan. - ?bc, ?abc_new The selectivity of node BC and ABC
in new query plan. - W Window constraints over all joins, time-based,
for example 5 time units. - t Any t time units after migration has started.
- Cj Cost of join for each pair of tuples,
including the cost of accessing the 2 tuples,
comparing their values, and so on. - Cs Cost of inserting/deleting a tuple to/from
states. - Ct Cost of access a tuple, for example, check a
tuples timestamp. - input_name Size of the state for one input of
a node. For example, A represents the size of
state A in node AB, and A ?aW
64Cost Model for Algorithm I for old plan part
State C
State AB
- Cost for node AB in old plan
- State starts full
- CAB cost of purge cost of insert cost of
join - Cs(?a/?b) ?b Cs(?b/?a) ?a Cs ?b Cs
?aCj(?aB ?bA) t - 2Cj ?a ?bW 2Cs(?a ?b) t ---
formula (1) - ?AB (?aB ?bA) ?ab 2 ?a?bW ?ab ---
formula (2) - NAB 2 ?a?bW ?abt
- Apply the same formula to each node in old query.
- Put the cost of each node together would be the
total cost.
a1b1
c1
ABC
c2
a2b1
c3
a2b2
State A
State B
a1
b1
a2
AB
b2
65Cost Model for Algorithm I for new plan part
- Cost for node BC in new plan
- State start empty, no purge needed if tltW.
- CBC join cost insert cost
- t2 ?b ?cCj (?bt ?ct)Cs , Where t lt W
--- formula (3) - In ith time unit after migration start time, and
output rate is - ?i (2i 1) ?b ?c ?bc
- The total number generated in anytime t (t ltW)
is - NBC ?I1, t ?i t2 ?b ?c ?bc
--- formula (4) - Also apply the same formula to each node in the
new plan.
State BC
State A
ABC
State B
State C
BC
66Cost Model for Algorithm II
- Extra cost needed for computing new states.
- For our running example, we need to compute state
BC. The extra cost would be - CstateBC Cj BC Cj ?bW?cW W2?b?cCj
- We can apply formula (1) and (2) for computing
cost of each node in old plan - Because all states are full new states
re-computing - Add above two together would be the total cost of
algorithm II. -
67Analysis on Cost Models
- Several parameters control the performance of the
two migration algorithms - Arriving rate
- Join node selectivity
- Window size
- Time
- Costs may not be linear by time
- As for algorithm I, new plan part
- Total migration time largely depends on window
size - Design experiments by varying those parameters.
68(No Transcript)
69(No Transcript)
70Some remaining challenges
- Alternate Migration Strategies
- Selection of Box Sizes
- Dynamic Optimization and Migration
- Comparison Study to Eddies
71PART III Adaptive Scheduler Selection
Framework
- Joint work with Brad Pielech and Tim Sutherland
72Idea
- Propose a novel adaptive selection of scheduling
strategies - Observations
- The scheduling algorithm has large impact on
behavior of system - Utilizing a single scheduling algorithm to
execute a continuous query is not sufficient
because all scheduling algorithms have inherent
flaws or tradeoffs - Hypothesis
- Adaptively choose next scheduling strategy to
leverage strengths and weaknesses of each and
outperform a single strategy
73Continuous Query Issues
- The arrival rate of data is unpredictable.
- The volume of data may be extremely high.
- Certain domains may have different service
requirements. - A scheduling strategy such as Round Robin, FIFO,
etc. designed to resolve a particular scheduling
problem (Minimize memory, Maximize throughput,
etc). - What happens if we have multiple problems to
solve?
74Scheduling Example
s tuples outputted tuples inputted
C Operator processing cost in time unit
- Operator 2 is the quickest and most selective
- Operator 1 is the slowest and least selective
- Every 1 time unit, a new tuple arrives in Q1
starting at t0 - When told to run, an operator will process at
most 1 tuple, any extra is left in its queue for
a later run. - An operator takes C time units to process its
input, regardless of the size of the input - An operators output size s X of input
tuples if an operator inputs 1 tuple and s
0.9, it will output 0.9 tuples. - Assume zero time for context switches
- Assume all tuples are the same size
75Scheduling Example II
- Two Scheduling Strategies
- FIFO starts at leaf and processes the newest
tuple until completion1,2,3,1,2,3, etc. - Greedy schedule the operator with the most
tuples in its input buffer - 1,1,1,2,1,1,1,2,3
We will compare throughput and total queue sizes
for both algorithms for the first few time units
76Scheduling Example FIFO
FIFO Start at leaf and process the newest tuple
until completion.
End User
s 1 C 0.75
3
Time
s .1 C 0.25
2
- FIFOs queue size grows very quickly.
- It spends 1 time unit processing Operator 1,
then 1 time unit processing 2 and 3. - During these 2 time units, 2 tuples arrive in
1s queue.
FIFO outputs 0.09 tuples to the end user every 2
time units.
s 0.9 C 1
1
Stream
77Scheduling Example Greedy
- Greedy
- Schedule the operator with the largest input
queue.
End User
s 1 C 0.75
3
Time
s .1 C 0.25
2
- Greedys queue size grows at a slower rate
because Operator 1 is run more often - But tuples remain queued for long periods of
time in 2 and 3 until their queue sizes become
larger than 1s
- Greedy will finally output a tuple at about t
16 - At t 16, Greedy will output 1 tuple, by then,
FIFO has outputted .72 (.09 x 6) tuples
s 0.9 C 1
1
Stream
78Scheduling Example Wrap-up
- FIFO
- Output tuples at regular intervals
- - Q1 grows very quickly
- - Output rate is low (.045 tuples / unit)
- - Does not utilize operators fully O1 1
tuple per run, O2 0.9, 03 0.09. Max is 1 tuple
- Greedy
- Queue sizes grow less quickly than FIFOs
- Output rate is high .0625 tuples / unit
- More fully utilizes operators each operator
will run with 1 tuple each time - - Some tuples will stay in the system for a long
time - - Long delay before any tuples are outputted
79So? What is our point?
A single scheduling strategy is NOT sufficient
when dealing with varying input stream rates,
data volume and service requirements!
80New Adaptive Framework
- In response to this need,
we propose a novel technique which will
select between several scheduling strategies
based on current system conditions and quality of
service requirements
81Adaptive Framework
- Choosing between more than one scheduling
strategy can leverage the strengths of each
strategy and minimize the use of an strategy when
it would not perform as well. - Allowing a user to input service requirements
means that the CQ system can adapt to the users
needs, not a static set of needs from the CQ
system.
82Quality of Service Preferences
- Each Application can specify their service
requirement as a weighted list of behaviors that
may be maximized or minimized as desired. - Assumptions / Restrictions
- One (global) preference set at a time
- Preference can change during execution.
- Can only specify relative behavior.
83Service Preferences II
- Input three parameters
- Metric Any statistic that is calculated by the
system. - Quantifier Maximize or Minimize the given
Metric. - Weight The relative weight / importance of this
metric. The sum of all weights is exactly 1.
- Current Supported Metrics
- Throughput
- Queue Sizes
- Delay
84Adaptive Selection Overview
- Given a table of service preferences and a set of
candidate scheduling algorithms - Initially run all candidate algorithms once in
order to gather some statistics about their
performance - Assign a score to each algorithm based on how
well they have met the preferences relative to
the other algorithms (score formulas on next
slides) - Choose scheduling algorithm that will best meet
the preferences based on how algorithms performed
thus far - Run that algorithm for a period of time, record
statistics - Repeat Steps 2-4 until query or streams are over
-
85Adaptive Formulas
Zi- normalized statistic for a preference I the
number of preferences H- Historical
Category decay- decay factor
A schedulers score is comprised by summing the
normalized statistic score times the weight of
the statistic for each of the defined statistics
by the user.
86Choosing Next Scheduling Strategy
- Once the schedulers scores have been calculated,
the next strategy has to be chosen. - Should explore all strategies initially such
that it can learn how each will perform - Periodically should rotate strategies because a
strategy that did poorly before, could be viable
now - Remember, the score for the last ran algorithm
is not updated, only the other candidates have
their score updated
- Roulette Wheel MIT99
- Chooses next algorithm with a probability
equivalent to its score - Assign an initial score to each strategy such
that each will have a chance to run - Favors the better scoring algorithms, but will
still pick others.
87Experiments
- 3 parameters
- Number of streams
- Arrival Pattern
- Number of service preferences
88Experiment Setup
- 5 different query plans
- Select and Window-Join Operators
- Incoming streams use simulated data with a
Poisson arrival pattern. The mean arrival time is
altered to control burstiness - Want to show that Adaptive better meets the
preferences than a single algorithm, if not, then
the techniques are not worthwhile
89Single Stream Result (2 Requirements)
The adaptive strategy performs as well as PTT in
this environment
90Single Stream Result (3 Requirements)
91Multi Stream Result (2 Requirements)
The Adaptive Framework performs as well as, if
not better than both individual scheduling
algorithms, with differing service requirements.
92Multi Stream Result (3 Requirements)
93Related Work Comparison
Research Work Comparison
Aurora CCC02 More complex QoS model. Makes use of alternate adaptive techniques
STREAM MWA03 Only Meets memory requirement
NiagaraCQ CDT00 Only adapts prior to query execution. Concerned more with generating optimal query plans
Eddies AH00 (Telegraph) Finer grain adaptive strategy, look to incorporate in future
XJoin UF00 Finer grained technique, look to incorporate in future
UF01 Only focuses on maximizing rate, uses a single adaptive strategy
Tukwila IFF99 UF98 Reorganizes plans on the fly, look to incorporate this in the future
94Conclusions
- Identified a gap in existing CQ research and
proposed a novel adaptive technique to address
the problem. - Draws on genetic algorithms and AI research
- Alters the scheduling algorithm based on how well
the execution is meeting the service preferences - The adaptive strategy is showing some promising
experiment results - Never performs worse than any single strategy
- Often performs as well as the best strategy, and
often outperforms it. - Adapts to varying user environments without
manually changing scheduling strategies
95Overall Blizz
- Many interesting problems arise in this new
stream context - There is room for lots of fun research
96http//davis.wpi.edu/dsrg/raindrop/
Project Overview Publications Talks
Email raindrop_at_cs.wpi.edu