The Raindrop Engine: Continuous Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

The Raindrop Engine: Continuous Query Processing

Description:

System returns the results to the user as a stream ... title Dream Catcher /title author last King /last first S. /first /author ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 97
Provided by: SK169
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: The Raindrop Engine: Continuous Query Processing


1
The Raindrop EngineContinuous Query Processing
  • Elke A. Rundensteiner
  • Database Systems Research Lab, WPI
  • 2003

2
Monitoring Applications
  • Monitor troop movements during combat and warn
    when soldiers veer off course
  • Send alert when patients vital signs begin to
    deteriorate
  • Monitor incoming news feeds to see stories on
    Iraq
  • Scour network traffic logs looking for intruders

3
Properties of Monitoring Applications
  • Queries and monitors run continuously, possibly
    unending
  • Applications have varying service preferences
  • Patient monitoring only want freshest data
  • Remote sensors have limited memory
  • News service wishes maximal throughput
  • Taking 60 seconds to process vital signs and
    sound an alert may be too long

4
Properties of Streaming Data
  • Possibly never ending stream of data
  • Unpredictable arrival patterns
  • Network congestion
  • Weather (for external sensors)
  • Sensor moves out of range

5
DBMS Approach to Continuous Queries
  • Insert each new tuple into the database, encode
    queries as triggers MWA03
  • Problems
  • High overhead with inserts CCC02
  • Triggers do not scale well CCC02
  • Uses static optimization and execution strategies
    that cannot adapt to unpredictable streams
  • System is less utilized if data streams arrive
    slowly
  • No means to input application service
    requirements

6
New Class of Query Systems
  • CQ Systems emerged recently (Aurora, Stream,
    NiagaraCQ, Telegraph, et al.)
  • Generally work as follows
  • System subscribes to some streams
  • End users issued continuous queries against
    streams
  • System returns the results to the user as a
    stream
  • All CQ systems use some adaptive techniques to
    cope with unpredictable streams

7
Overview of Adaptive Techniques in CQ Systems
Research Work Technique(s) Goal
Aurora CCC02 Load shedding, batch tuple processing to reduce context switching Maintain high quality of service
STREAM MWA03 Adaptive scheduling algorithm (Chain) BBM03, Load shedding Minimize memory requirements during periods of bursty arrival
NiagaraCQ CDT00 Generate near-optimal query plans for multiple queries Efficiently share computation between multiple queries, highly scalable system, maximize output rate
Eddies AH00 (Telegraph) Dynamically route tuples among Joins Keep system constantly busy, improve throughput
XJoin UF00 Break Join into 3 stages and make use of memory and disk storage Keep Join and system running at full capacity at all times
UF01 Schedule streams with the highest rate Maximize throughput to clients
Tukwila IFF99 UF98 Reorganize query plans on the fly by using using synchronization packets to tell operators to finish up their current work. Improve ill-performing query plans
8
The WPI Stream Project Raindrop
Runtime Engine
CAPE Runtime Engine
QoS Inspector
Operator Configurator
Operator Scheduler
Plan Migrator
Distribution Manager
Query Plan Generator
Execution Engine
Storage Manager
Stream Receiver
Stream / Query Registration GUI
Queries
Stream Provider
Results
9
Topics Studied in Raindrop Project
  • Bring XML into Stream Engine
  • Scalable Query Operators (Punctuations)
  • Cooperative Plan Optimization
  • Adaptive Operator Scheduling
  • On-line Query Plan Migration
  • Distributed Plan Execution

10
PART I XQueries on XML Streams
(Automaton Meets Algebra)
  • Based on CIKM03
  • Joint work with Hong Su and Jinhui Jian

11
Whats Special for XML Stream Processing?
ltBiditemsgt ltbook year2001"gt
lttitlegtDream Catcherlt/titlegt
ltauthorgtltlastgtKinglt/lastgtltfirstgtS.lt/firstgtlt/author
gt ltpublishergtBt Bound lt/publishergt
ltpricegt 30 lt/initialgt lt/bookgt

Pattern Retrieval on Token Streams
12
Two Computation Paradigms
  • Automata-based yfilter02, xscan01, xsm02, xsq03,
    xpush03
  • Algebraic niagara00,
  • This Raindrop framework intends to integrate
    both paradigms into one

13
Automata-Based Paradigm
  • Auxiliary structures for
  • Buffering data
  • Filtering
  • Restructuring

FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
//book/title
4
title

book
1
2
price
//book
//book/price
3
14
Algebraic Computation
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
book
book
book
title
author
publisher
price
Text
Text
Text
last
first
Text
Text
b
t
lttitlegt lt/titlegt
ltbookgt lt/bookgt

Navigate b, /title -gt t
b
ltbookgt lt/bookgt

15
Observations
Automata Paradigm Algebra Paradigm
Good for pattern retrieval on tokens Does not support token inputs
Need patches for filtering and restructuring Good for filtering and restructuring
Present all details on same low level Support multiple descriptive levels (declarative-gtprocedural)
Little studied as query processing paradigm Well studied as query process paradigm
Either paradigm has deficiencies Both paradigms
complement each other
16
How to Integrate Two Paradigms
17
How to Integrate Two Models?
  • Design choices
  • Extend algebraic paradigm to support automata?
  • Extend automata paradigm to support algebra?
  • Come up with completely new paradigm?
  • Extend algebraic paradigm to support automata
  • Practical
  • Reuse extend existing algebraic query
    processing engines
  • Natural
  • Present details of automata computation at low
    level
  • Present semantics of automata computation (target
    patterns) at high level

18
Raindrop Four-Level Framework
High (Declarative)
Stream Logic Plan
Low (Procedural)
Abstraction Level
19
Level I Semantics-focused Plan Rainbow-ZPR02
  • Express query semantics regardless of stored or
    stream input sources
  • Reuse existing techniques for stored XML
    processing
  • Query parser
  • Initial plan constructor
  • Rewriting optimization
  • Decorrelation
  • Selection push down

20
Example Semantics-focused Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
ltBiditemsgt ltbook year2001"gt
lttitlegtDream Catcherlt/titlegt
ltauthorgtltlastgtKinglt/lastgtltfirstgtS.lt/firstgtlt/author
gt ltpublishergtBt Bound lt/publishergt
ltpricegt 30 lt/initialgt lt/bookgt

21
Level II Stream Logical Plan
  • Extend semantics-focused plan to accommodate
    tokenized stream inputs
  • New input data format
  • contextualized tokens
  • New operators
  • StreamSource, Nav, ExtractUnnest, ExtractNest,
    StructuralJoin
  • New rewrite rules
  • Push-into-Automata

22
One Uniform Algebraic View
Algebraic Stream Logical Plan
Tuple-based plan
Query answer
Tuple stream
Token-based plan (automata plan)
XML data stream
23
Modeling the Automata in Algebraic PlanBlack
BoxXScan01 vs. White Box
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
20 Return ltInexpensivegt t lt/Inexpensivegt
b //book p b/price t b/title
XScan
Black Box
24
Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tuple-based plan
Token-based plan (automata plan)
25
Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tuple-based plan
StructuralJoin b
ExtractNest b, p
ExtractNest b, t
Navigate b, /title-gtt
Navigate b, /price-gtp
Navigate S1, //book -gtb
26
Example Uniform Algebraic Plan
FOR b in stream(biditems.xml) //book LET p
b/price t b/title WHERE p lt
30 Return ltInexpensivegt t lt/Inexpensivegt
Tagger Inexpensive, t-gtr
Select plt30
StructuralJoin b
ExtractNest b, p
ExtractNest b, t
Navigate b, /title-gtt
Navigate b, /price-gtp
Navigate S1, //book -gtb
27
From Semantics-focused Plan to Stream Logical Plan
28
Level III Stream Physical Plan
  • For each stream logical operator, define how to
    generate outputs when given some inputs
  • Multiple physical implementations may be provided
    for a single logical operator
  • Automata details of some physical implementation
    are exposed at this level
  • Nav, ExtractNest, ExtractUnnest, Structural Join

29
One Implementation of Extract/Structural Join
lttitlegtlt/titlegt ltpricegtlt/pricegt
SJoin //book
ltpricegtlt/pricegt
lttitlegtlt/titlegt
ExtractNest b, t
ExtractNest /b, p
Nav b, /title-gtt
Nav b, /price-gtp

title
book
1
Nav ., //book -gtb
price
4
ltbiditemsgt ltbookgt lttitlegt Dream Catcher lt/titlegt
lt/bookgt
30
Level IV Stream Execution Plan
  • Describe coordination between operators regarding
    when to fetch the inputs
  • When input operator generates one output tuple
  • When input operator generates a batch
  • When a time period has elapsed
  • Potentially unstable data arrival rate in stream
    makes fixed scheduling strategy unsuitable
  • Delayed data under scheduling may stall engine
  • Bursty data not under scheduling may cause
    overflow

31
Raindrop Four-Level Framework (Recap)
Express the semantics of query regardless of
input sources
Accommodate tokenized input streams
Stream Logic Plan
Describe how operators manipulate given data
Decides the Coordination among operators
32
Optimization Opportunities
33
Optimization Opportunities
General rewriting (e.g., selection push down)
Break-linear-navigation rewriting
Stream Logic Plan
Physical implementations choosing
Execution strategy choosing
34
From Semantics-focused to Stream Logical Plan In
or Out?
Tuple-based Plan
Query answer
Pattern retrieval in Semantics-focused plan
Tuple stream
Token-based plan (automata plan)
Apply push into automata
XML data stream
35
Plan Alternatives
36
Experimentation Results
37
Contributions Thus Far
  • Combined automata and algebra based paradigms
    into one uniform algebraic paradigm
  • Provided four layers in algebraic paradigm
  • Query semantics expressed at high layer
  • Automata computation on streams hidden at low
    layer
  • Supported optimization at an iterative manner
    (from high abstraction level to low abstraction
    level)
  • Illustrated enriched optimization opportunities
    by experiments

38
On-Going Issues To be Tackled
  • Exploit XML schema constraints for query
    optimization
  • Costing/query optimization of plans
  • On-the-fly migration into/out of automaton
  • Physical implementation strategies of operators
  • Load-shedding from an automaton

39
PART II On-line Query Plan Migration
  • Joint work with Yali Zhu

40
Motivation for Migration
  • An Initial Good Query Plan may
  • become less effective over time
  • Changes in stream data distributions
    (selectivity)
  • Changes in data arrival rates (operator overload)
  • Addition of new queries/de-registering of
    existing queries
  • Availability of resources allocated to query
    evaluation
  • Changes in quality of service requirements

41
A Simple Motivating Example
42
On-line Plan Optimization
  • Detection of Suboptimality in Plan
  • Query Optimization via Plan Rewriting
  • On-line Migration of Subplan

43
Related Work
  • Efficient mid-query re-optimization of
    sub-optimal query execution plans,
    KabraDeWitt,SIGMOD98. (Only re-optimize part of
    query not started executing yet)
  • On reconfiguring query execution plans in
    distributed object-relational DBMS, K. Ng, 1998
    ICPADS. (plan cloning)
  • Continuously Adaptive Continuous Queries over
    Streams, S. Madden, J. Hellerstein, UC Berkeley.
    ACM SIGMOD 2002.

44
Focus on Dynamic Migration
  • Given a better plan or sub-plan
  • Dynamically migrate running plan to given plan.
  • Guarantee correctness of results
  • No missing
  • No duplicate
  • No incorrect


45
Join Algorithm Stateful Operator
Output Queue AB
  • Symmetric NLJ
  • For each new A tuple
  • Purge State B using time-based window constraints
    W.
  • Join with tuples in State B
  • Output result to output queue
  • Put into State A

a1b1
a1b2
a2b1
a2b2
a3b2
Node AB
State A
State B
a1
b1

a2
b2
a3
b1
b2
a1
a2
a3
Input Queue A
Input Queue B
46
So whats the problem of migration?
  • Old states in old plan still need to join with
    future incoming tuples, cannot be discarded.
  • New tuples arrive randomly and continuously in
    streaming system.

Old Query Plan
State C
State AB
a1b1
c1
ABC
a2b1
c2
c3
a2b1
a2b2
State A
State B
a1
b1
a2
b2
AB
47
Box Concept
  • Migration Unit Box
  • Old box contains an old plan or sub-plan
  • New box contains a new plan or sub-plan
  • Two equivalent boxes
  • Have the same input queues
  • Have the same output queues
  • Contain semantically equivalent sub-plans
  • Can be migrated from one to another

48
But How?
  • Proposal of two Migration Strategies
  • Moving state strategy
  • Parallel track strategy
  • Comparison via Cost models
  • Experimental Evaluation

49
Parallel Track Strategy
  • New plan and old plan co-exist during migration.
  • Run in parallel
  • Share input queues and output queues
  • Window constraints are used to eventually time
    out the old states
  • This is when migration stage is over
  • Discard old plan and run only new plan

50
A Running Example
a1b1c1
a3b2c2
Output Queue ABC
a2b1c1
a3b1c1
a2b2c2
State BC
State C
State A
State AB
a1b1
c1
a3
b2c2


ABC
ABC
a2b1
c2

a3b1

a2b2
a3b2
W3
State A
State B
State B
State C
a3b1

a1
b1

a1
a2b2

b2c2

a2
c2
b2
b2
a3b2

a3
b1
AB
BC
c1
t
a2
a3
c2
a3
c2
b2
b2
A
B
C
Input Queues
51
Pros and Cons
  • We dont need to halt the system in order to do
    migration
  • Low delay on generating results
  • Overhead
  • In old plan part, all-new tuple pair is discarded
    only at the last node.

52
Moving State Strategy
  • First, freeze inputs and drain out the old
    plan
  • Then, establish and connect the new plan
  • And, move over all old states to the new states
  • Lastly, let all new input data go to new plan
    only
  • Resume processing

53
Moving State Strategy
  • State Matching Compare states of old and new
    plans
  • State Moving If two states match, move them.
  • State Re-computation If no match, recompute
    state.

54
Abstract Description
State BC
State C
State A
State AB
a1b1
c1
ABC
ABC
a2b1
State A
State B
State B
State C
a1
b1
a2
AB
BC
55
Intermediate States Sharing
  • We can share intermediate state BC if
  • Inputs for both plans are exactly the same.
  • Tuples arrived at the same state have passed
    exactly the same predicates.
  • Above must hold for any sharing to be possible.

56
Moving State Strategy
a3b1c1
Output Queue ABC
a2b2c2
a3b2c2
X
State BC
State C
State A
State AB
a1b1
c1
a1

b1c1
ABC
ABC
a2b1
a2
b2c2
a3
W3
State A
State B
State B
State C
a1
b1
a1
b2c2

a2
b1
c1


b1
c2
b2
AB
BC
c1
t
a2
a3
X
X
c2
X
a3
c2
b2
b2
A
B
C
Input Queues
57
Why Need Two Pointers
  • Two nodes share the same state may have different
    contents
  • Each node has two pointers for each associated
    state.
  • First points to the first tuple in the state
  • Last points to the last tuple in the state

ABC
ABC
AB
BC
Shared State B
b1
b2
b3
b4
58
Compare the Two Migration Algorithms
  • Different distribution of tasks between old and
    new plans

A B C Query plan used in 1 Query plan used in 2
O O O Old Old
O O N Old New
O N O Old New
N O O Old New
N N O Old New
N O N Old New
O N N Old New
N N N New New
N new tuple arrives after migration start
time. O old tuple arrives before migration start
time. New new query plan is used to compute the
result. Old old query plan is used to compute
the result.
59
Which algorithm performs better?
  • Compare the performances
  • Which one is faster? Cost-saving?
  • New query plan should outperform old query plan
  • In algorithm 1 old plan part deals with 7 out of
    8 cases
  • In algorithm 2 new plan part deals with 7 out of
    8 cases
  • Seems winning
  • However, 2 needs extra cost to re-compute
    intermediate states.
  • So cost models are needed!

60
Cost Model Assumptions
  • All binary NL joins
  • Assume that we already know the statistics of
    each node in query plan
  • Input arriving rate
  • Join node selectivity
  • We compute processing power needed in a period
    of time during which migration happens.
  • Not computing the real power that the system
    used.
  • Those two are different because of resource
    limitation in a real system.

61
Cost Model Assumptions (cont.)
  • Assume tuple processing time is the same for
    tuples of different sizes.
  • Assume when migration starts, the old query plan
    has passed its start-up stage and fully running
  • States are at their max size, controlled by
    window constraints.
  • Window constraints
  • Time-based.
  • Same over all streams in a join.

62
Running Example Revisit
  • Old Query Plan New Query Plan

63
Symbol Definition
  • ?a, ?b, ?c Tuples/time unit, the average arrival
    rate on stream A, B and C.
  • ?ab, ?abc_old The selectivity of node AB and ABC
    in old query plan.
  • ?bc, ?abc_new The selectivity of node BC and ABC
    in new query plan.
  • W Window constraints over all joins, time-based,
    for example 5 time units.
  • t Any t time units after migration has started.
  • Cj Cost of join for each pair of tuples,
    including the cost of accessing the 2 tuples,
    comparing their values, and so on.
  • Cs Cost of inserting/deleting a tuple to/from
    states.
  • Ct Cost of access a tuple, for example, check a
    tuples timestamp.
  • input_name Size of the state for one input of
    a node. For example, A represents the size of
    state A in node AB, and A ?aW

64
Cost Model for Algorithm I for old plan part
State C
State AB
  • Cost for node AB in old plan
  • State starts full
  • CAB cost of purge cost of insert cost of
    join
  • Cs(?a/?b) ?b Cs(?b/?a) ?a Cs ?b Cs
    ?aCj(?aB ?bA) t
  • 2Cj ?a ?bW 2Cs(?a ?b) t ---
    formula (1)
  • ?AB (?aB ?bA) ?ab 2 ?a?bW ?ab ---
    formula (2)
  • NAB 2 ?a?bW ?abt
  • Apply the same formula to each node in old query.
  • Put the cost of each node together would be the
    total cost.

a1b1
c1
ABC
c2
a2b1
c3
a2b2
State A
State B
a1
b1
a2
AB
b2
65
Cost Model for Algorithm I for new plan part
  • Cost for node BC in new plan
  • State start empty, no purge needed if tltW.
  • CBC join cost insert cost
  • t2 ?b ?cCj (?bt ?ct)Cs , Where t lt W
    --- formula (3)
  • In ith time unit after migration start time, and
    output rate is
  • ?i (2i 1) ?b ?c ?bc
  • The total number generated in anytime t (t ltW)
    is
  • NBC ?I1, t ?i t2 ?b ?c ?bc
    --- formula (4)
  • Also apply the same formula to each node in the
    new plan.

State BC
State A
ABC
State B
State C
BC
66
Cost Model for Algorithm II
  • Extra cost needed for computing new states.
  • For our running example, we need to compute state
    BC. The extra cost would be
  • CstateBC Cj BC Cj ?bW?cW W2?b?cCj
  • We can apply formula (1) and (2) for computing
    cost of each node in old plan
  • Because all states are full new states
    re-computing
  • Add above two together would be the total cost of
    algorithm II.

67
Analysis on Cost Models
  • Several parameters control the performance of the
    two migration algorithms
  • Arriving rate
  • Join node selectivity
  • Window size
  • Time
  • Costs may not be linear by time
  • As for algorithm I, new plan part
  • Total migration time largely depends on window
    size
  • Design experiments by varying those parameters.

68
(No Transcript)
69
(No Transcript)
70
Some remaining challenges
  • Alternate Migration Strategies
  • Selection of Box Sizes
  • Dynamic Optimization and Migration
  • Comparison Study to Eddies

71
PART III Adaptive Scheduler Selection
Framework
  • Joint work with Brad Pielech and Tim Sutherland

72
Idea
  • Propose a novel adaptive selection of scheduling
    strategies
  • Observations
  • The scheduling algorithm has large impact on
    behavior of system
  • Utilizing a single scheduling algorithm to
    execute a continuous query is not sufficient
    because all scheduling algorithms have inherent
    flaws or tradeoffs
  • Hypothesis
  • Adaptively choose next scheduling strategy to
    leverage strengths and weaknesses of each and
    outperform a single strategy

73
Continuous Query Issues
  • The arrival rate of data is unpredictable.
  • The volume of data may be extremely high.
  • Certain domains may have different service
    requirements.
  • A scheduling strategy such as Round Robin, FIFO,
    etc. designed to resolve a particular scheduling
    problem (Minimize memory, Maximize throughput,
    etc).
  • What happens if we have multiple problems to
    solve?

74
Scheduling Example
s tuples outputted tuples inputted
C Operator processing cost in time unit
  • Operator 2 is the quickest and most selective
  • Operator 1 is the slowest and least selective
  • Every 1 time unit, a new tuple arrives in Q1
    starting at t0
  • When told to run, an operator will process at
    most 1 tuple, any extra is left in its queue for
    a later run.
  • An operator takes C time units to process its
    input, regardless of the size of the input
  • An operators output size s X of input
    tuples if an operator inputs 1 tuple and s
    0.9, it will output 0.9 tuples.
  • Assume zero time for context switches
  • Assume all tuples are the same size

75
Scheduling Example II
  • Two Scheduling Strategies
  • FIFO starts at leaf and processes the newest
    tuple until completion1,2,3,1,2,3, etc.
  • Greedy schedule the operator with the most
    tuples in its input buffer
  • 1,1,1,2,1,1,1,2,3

We will compare throughput and total queue sizes
for both algorithms for the first few time units
76
Scheduling Example FIFO
FIFO Start at leaf and process the newest tuple
until completion.
End User
s 1 C 0.75
3
Time
s .1 C 0.25
2
  • FIFOs queue size grows very quickly.
  • It spends 1 time unit processing Operator 1,
    then 1 time unit processing 2 and 3.
  • During these 2 time units, 2 tuples arrive in
    1s queue.

FIFO outputs 0.09 tuples to the end user every 2
time units.
s 0.9 C 1
1
Stream
77
Scheduling Example Greedy
  • Greedy
  • Schedule the operator with the largest input
    queue.

End User
s 1 C 0.75
3
Time
s .1 C 0.25
2
  • Greedys queue size grows at a slower rate
    because Operator 1 is run more often
  • But tuples remain queued for long periods of
    time in 2 and 3 until their queue sizes become
    larger than 1s
  • Greedy will finally output a tuple at about t
    16
  • At t 16, Greedy will output 1 tuple, by then,
    FIFO has outputted .72 (.09 x 6) tuples

s 0.9 C 1
1
Stream
78
Scheduling Example Wrap-up
  • FIFO
  • Output tuples at regular intervals
  • - Q1 grows very quickly
  • - Output rate is low (.045 tuples / unit)
  • - Does not utilize operators fully O1 1
    tuple per run, O2 0.9, 03 0.09. Max is 1 tuple
  • Greedy
  • Queue sizes grow less quickly than FIFOs
  • Output rate is high .0625 tuples / unit
  • More fully utilizes operators each operator
    will run with 1 tuple each time
  • - Some tuples will stay in the system for a long
    time
  • - Long delay before any tuples are outputted

79
So? What is our point?
A single scheduling strategy is NOT sufficient
when dealing with varying input stream rates,
data volume and service requirements!
80
New Adaptive Framework
  • In response to this need,
    we propose a novel technique which will
    select between several scheduling strategies
    based on current system conditions and quality of
    service requirements

81
Adaptive Framework
  • Choosing between more than one scheduling
    strategy can leverage the strengths of each
    strategy and minimize the use of an strategy when
    it would not perform as well.
  • Allowing a user to input service requirements
    means that the CQ system can adapt to the users
    needs, not a static set of needs from the CQ
    system.

82
Quality of Service Preferences
  • Each Application can specify their service
    requirement as a weighted list of behaviors that
    may be maximized or minimized as desired.
  • Assumptions / Restrictions
  • One (global) preference set at a time
  • Preference can change during execution.
  • Can only specify relative behavior.

83
Service Preferences II
  • Input three parameters
  • Metric Any statistic that is calculated by the
    system.
  • Quantifier Maximize or Minimize the given
    Metric.
  • Weight The relative weight / importance of this
    metric. The sum of all weights is exactly 1.
  • Current Supported Metrics
  • Throughput
  • Queue Sizes
  • Delay

84
Adaptive Selection Overview
  • Given a table of service preferences and a set of
    candidate scheduling algorithms
  • Initially run all candidate algorithms once in
    order to gather some statistics about their
    performance
  • Assign a score to each algorithm based on how
    well they have met the preferences relative to
    the other algorithms (score formulas on next
    slides)
  • Choose scheduling algorithm that will best meet
    the preferences based on how algorithms performed
    thus far
  • Run that algorithm for a period of time, record
    statistics
  • Repeat Steps 2-4 until query or streams are over

85
Adaptive Formulas
Zi- normalized statistic for a preference I the
number of preferences H- Historical
Category decay- decay factor
A schedulers score is comprised by summing the
normalized statistic score times the weight of
the statistic for each of the defined statistics
by the user.
86
Choosing Next Scheduling Strategy
  • Once the schedulers scores have been calculated,
    the next strategy has to be chosen.
  • Should explore all strategies initially such
    that it can learn how each will perform
  • Periodically should rotate strategies because a
    strategy that did poorly before, could be viable
    now
  • Remember, the score for the last ran algorithm
    is not updated, only the other candidates have
    their score updated
  • Roulette Wheel MIT99
  • Chooses next algorithm with a probability
    equivalent to its score
  • Assign an initial score to each strategy such
    that each will have a chance to run
  • Favors the better scoring algorithms, but will
    still pick others.

87
Experiments
  • 3 parameters
  • Number of streams
  • Arrival Pattern
  • Number of service preferences

88
Experiment Setup
  • 5 different query plans
  • Select and Window-Join Operators
  • Incoming streams use simulated data with a
    Poisson arrival pattern. The mean arrival time is
    altered to control burstiness
  • Want to show that Adaptive better meets the
    preferences than a single algorithm, if not, then
    the techniques are not worthwhile

89
Single Stream Result (2 Requirements)
The adaptive strategy performs as well as PTT in
this environment
90
Single Stream Result (3 Requirements)
91
Multi Stream Result (2 Requirements)
The Adaptive Framework performs as well as, if
not better than both individual scheduling
algorithms, with differing service requirements.
92
Multi Stream Result (3 Requirements)
93
Related Work Comparison
Research Work Comparison
Aurora CCC02 More complex QoS model. Makes use of alternate adaptive techniques
STREAM MWA03 Only Meets memory requirement
NiagaraCQ CDT00 Only adapts prior to query execution. Concerned more with generating optimal query plans
Eddies AH00 (Telegraph) Finer grain adaptive strategy, look to incorporate in future
XJoin UF00 Finer grained technique, look to incorporate in future
UF01 Only focuses on maximizing rate, uses a single adaptive strategy
Tukwila IFF99 UF98 Reorganizes plans on the fly, look to incorporate this in the future
94
Conclusions
  • Identified a gap in existing CQ research and
    proposed a novel adaptive technique to address
    the problem.
  • Draws on genetic algorithms and AI research
  • Alters the scheduling algorithm based on how well
    the execution is meeting the service preferences
  • The adaptive strategy is showing some promising
    experiment results
  • Never performs worse than any single strategy
  • Often performs as well as the best strategy, and
    often outperforms it.
  • Adapts to varying user environments without
    manually changing scheduling strategies

95
Overall Blizz
  • Many interesting problems arise in this new
    stream context
  • There is room for lots of fun research

96
http//davis.wpi.edu/dsrg/raindrop/
Project Overview Publications Talks
Email raindrop_at_cs.wpi.edu
Write a Comment
User Comments (0)
About PowerShow.com