Eddies: Continuously Adaptive Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Eddies: Continuously Adaptive Query Processing

Description:

Suppose you can adapt an in-flight query plan. How would you do it? ... Otherwise, we need a cheap means to observe their changes ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 34
Provided by: Avn8
Category:

less

Transcript and Presenter's Notes

Title: Eddies: Continuously Adaptive Query Processing


1
Eddies Continuously Adaptive Query Processing
  • Based on a SIGMOD2002 paper and talk
  • by Avnur and Hellerstein.

2
State-of-Art in Query Optimization
  • Given
  • Database state and statistics known a-priori
  • One (short) user query to process
  • Query may be run only once
  • Query Processing
  • A-priori decide on a (static) query plan
  • Run query using this one plan
  • Also
  • Possibly update statistics sometimes (in steady
    state)

3
Adaptive Systems General Flavor
  • Repeat
  • Observe (model) environment
  • Use observation to choose behavior
  • Take action

4
Adaptivity in Current DBs
  • Limited coarse grain
  • Repeat
  • Observe (model) environment
  • runstats (once per week!!) model changes in data
  • Use observation to choose behavior
  • query optimization fixes a single static query
    plan
  • Take action
  • query execution blindly follow plan

5
Query Optimization
  • Adaptivity at a per-week frequency!
  • Not suited for volatile environments

6
A Networking Problem!?
  • Networks do dataflow!
  • Significant history of adaptive techniques
  • E.g. TCP congestion control
  • E.g. routing
  • But traditionally much lower function
  • Ship bitstreams
  • Minimal, fixed code
  • Lately, moving up the foodchain?
  • app-level routing
  • active networks

7
Query Plans are Dataflow
  • Programming model iterators
  • old idea, widely used in DB query processing
  • object with three methods
  • Init(), GetNext(), Close()
  • input/output types
  • query plan graph of iterators
  • pipelining iterators that return results before
    children Close()

8
Querying in Volatile Environments
  • Federated query processors
  • No control over stats, performance, admin
    (DataJoiner)
  • Shared-Nothing Systems
  • No control over system balance
  • User control of running queries
  • No control over user interaction (online
    aggregation)
  • Sensor Nets the next killer app
  • No control over anything!

9
Varying
  • Computing resources
  • Data flows unpredictably from sources
  • Code performs unpredictably along flows
  • Continuous volatility due to many decentralized
    systems
  • Data Characteristics
  • Distributions
  • Burstiness
  • User preferences
  • What get fast
  • How much data

10
Toward Continuous Adaptivity
  • Need much more frequent adaptivity
  • Goal adapt per tuple of each relation??
  • The traditional runstats-optimize-execute loop is
    far too coarse-grained
  • So, continuously perform all 3 functions, at
    runtime
  • Aim for adaptivity over best-case performance (as
    the later never exists for long)

11
Road Map
  • Adaptive Query Processing
  • Intra-join adaptivity
  • Synchronization Barriers
  • Moments of Symmetry
  • Eddies
  • Encapsulated, adaptive dataflow

12
Adaptable Operators and Plans
  • Moments of symmetry query
    processing stage during which pipelined query
    operators or inputs can be easily reordered (with
    no or minimal state management)
  • Synchronization barriers require
    inputs from different sources to be coordinated
    and possibly restricted to the rate of the slower
    input
  • We need good operators.

13
Adaptable Joins, Issue 1
  • Synchronization Barrier merge join
  • Right input frozen,waiting for left
  • Cant adapt while waitingfor barrier!
  • So, favor joins that have
  • no barriers or seldom barriers
  • at worst, adaptable barriers

?
2000 2001 2002 2003 2004
2 3 4 5 6
14
Adaptable Joins, Issue 2
  • Would like to reorder in-flight (pipelined) joins
  • Base case swap inputs to a join ??
  • Moment of symmetry
  • inputs can be swapped with no/little state
    management
  • Aim for frequent moments of symmetry ? more
    frequent adaptivity

15
Adaptable Joins, Issue 2
  • Moments of Symmetry
  • Suppose you can adapt an in-flight query plan
  • How would you do it?
  • Base case reorder inputs of a single join
  • Nested loops join

16
Adaptable Joins, Issue 2
  • Moments of Symmetry
  • Suppose you can adapt an in-flight query plan
  • How would you do it?
  • Base case reorder inputs of a single join
  • Nested loops join
  • Cleaner if you waittil end of inner loop

17
Adaptable Joins, Issue 2
  • Moments of Symmetry
  • Suppose you can adapt an in-flight query plan
  • How would you do it?
  • Base case reorder inputs of a single join
  • Nested loops join
  • Cleaner if you waittil end of inner loop
  • Hybrid Hash
  • Reorder while building?

18
Moments of Symmetry, cont.
  • Moment of Symmetry
  • Can swap join inputs w/o state modification
  • Nested Loops join end of each inner loop
  • Hybrid Hash join never
  • Sort-Merge join essentially always
  • More frequent moments of symmetry ? more
    frequent adaptivity

19
Joins for Adaptivity
  • Pipelined hash join (hash ripple or Xjoin)
  • No synchronization barriers
  • Continuous symmetry
  • Good for equi-join
  • Simple (or block) ripple join
  • Synchronization barriers at corners
  • Moments of symmetry at corners
  • Good for non-equi-join
  • When symmetry At corners, i.e., for each new
    tuple, once it has been processed using the given
    operator s state

R
S
?
20
Beyond Binary Joins
  • Think of swapping inners
  • Can be done at a global moment of symmetry
  • Intuition like an n-ary join
  • Except that each pair can bejoined by a
    different algorithm!
  • So
  • Need to introduce n-ary joins to a query engine

21
Need well-behaved join algorithms
  • Pipelining
  • Avoid synch barriers
  • Frequent moments of symmetry

22
Continuous Adaptivity Goal Eddies
Eddy
  • Avoid need for traditional cost estimation
  • Avoid generation of a good query plan

23
Continuous Adaptivity Eddies
Eddy
  • A pipelining n-ary tuple-routing iterator
    (just like join or sort)
  • works well with ops that havefrequent moments of
    symmetry

24
Continuous Adaptivity Eddies
Eddy
  • Adjusts flow adaptively
  • Tuples flow in different orders
  • Visit each op once before output

25
Routing Eddies
Eddy
  • Naïve routing policy
  • All ops fetch from eddy as fast as possible
  • Previously-seen tuples precede new tuples

26
Schedule Grab when Ready?
  • Two expensive selections s1 and s2
  • Selectivity(s1)Selectivity(s2)50
  • Cost(s2) 5.
  • Vary Cost(s1).
  • What expect? ?
  • Does it make a difference at all?

27
Cost Factor?
  • Two expensive selections, 50 selectivity
  • Cost(s2) 5. Vary cost of s1.
  • Favors faster operation

28
But is it Enough?
  • Given two expensive selections
  • Cost same, say cost(s1)cost(s2)5
  • Selectivity(s2) 50.
  • Vary selectivity of s1.
  • Does that make a difference?

29
Selectivity-based?
  • Two expensive selections, cost 5
  • Selectivity(s2) 50. Vary selectivity of s1.

30
Schedule Selectivity-based?
  • Conclude Heavy tuple shedder early on is good.

31
How to choose?
  • If we knew all selectivities and all costs (and
    they were static), maybe we could pick the best
    overall schedule here.
  • Otherwise, we need a cheap means to observe their
    changes
  • And, we need a means to react in a simply manner
    based on those perceived changes

32
An Aside How to choose?
  • A machine learning problem?
  • Each agent pays off differently
  • Explore Or Exploit?
  • Heuristics ?
  • Sometimes want to randomly choose one
  • Usually want to go with the best
  • If probabilities are stationary, dampen
    exploration over time

33
Eddies with Lottery Scheduling
  • Operator gets 1 ticket when it takes a tuple
  • Favor operators that run fast (low cost)
  • Operator loses a ticket when it returns a tuple
  • Favor operators that drop tuples (low
    selectivity)
  • Winner?
  • Large number of tickets measure of goodness
  • Lottery Scheduling
  • When two operators vie for the same tuple,
    hold a lottery
  • Never let any operator go to zero tickets
  • Support occasional random exploration

34
Lottery-Based Eddy
  • Two expensive selections, cost 5
  • Selectivity(s2) 50. Vary selectivity of s1.

35
In a Volatile Environment
  • Two index joins
  • Slow 5 second delay Fast no delay
  • Toggle after 30 seconds

36
Related Work
Competition Sampling
Query Scrambling
Ingres DECOMP
Inter-Operator
Late Binding
Future Work
Per Query
System R
Eddies
Frequency of Adaptivity
  • Late Binding Dynamic, Parametric
    HP88,GW89,IN92,GC94,AC96,LP97
  • Per Query Mariposa SA96, ASE CR94
  • Competition RDB AZ96
  • Inter-Op KD98, Tukwila IF99
  • Query Scrambling AF96,UFA98
  • Survey Hellerstein, Franklin, et al., DE
    Bulletin 2000

37
Summary
  • Eddies Continuously Adaptive Dataflow
  • Suited for volatile performance environments
  • Changes in operator/machine peformance
  • Changes in selectivities (e.g. with sorted
    inputs)
  • Changes in data delivery
  • Currently adapts join order
  • Competitive methods to adapt access join
    methods?
  • Requires well-behaved join algorithms
  • Pipelining
  • Avoid synch barriers
  • Frequent moments of symmetry
  • The end of the runstats/optimizer/executor
    boundary!
  • At best, System R is good for hints on initial
    ticket distribution
Write a Comment
User Comments (0)
About PowerShow.com