Adaptive Query Processing with Eddies - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Query Processing with Eddies

Description:

Choosing which operator to route a given tuple to. The brain of the eddy ... E.g. hash join operator builds a hash table on one relation first, and then ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 93
Provided by: mega101
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Query Processing with Eddies


1
Adaptive Query Processing with Eddies
  • Amol Deshpande
  • University of Maryland

2
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

3
Query Processing in Database Systems
Declarative Query
Database System
Results
We will focus on traditional select-project-join
queries
4
Query Processing Example
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Database System
Course Instructor
CS2 Smith
Name Course
Joe CS1
Jen CS2
Name Level
Joe Junior
Jen Senior
Students
Enrolled
Courses
5
Query Processing Example
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Name Level Course Instructor
Jen Senior CS2 Smith
Name Level Course
Joe Junior CS1
Jen Senior CS2
Course Instructor
CS2 Smith
Courses
Name Course
Joe CS1
Jen CS2
Name Level
Joe Junior
Jen Senior
Students
Enrolled
6
Example Query Execution Plans
SEC
SEC
CE
SE
S
C
Students
Courses
C
S
E
E
Courses
Students
Enrolled
Enrolled
A Query Execution Plan
An alternate Execution Plan
7
Cost-based Query Optimization
Estimate cost of each plan and choose the best
SEC
Cost g(SE, C, R)
Input sizes

SE
C
Cost f(S, E, R)
Courses

S
E
Students
Enrolled
Runtime Parameters
Cost (Plan)
A Query Execution Plan
8
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Disk(s)
9
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Wide area data sources e.g. remote tables,
web data sources
Disk(s)
10
Cost-based Query Optimization
Results
Query Optimizer
Query Executor
Compiled Query Plan
Declarative Query
Streaming data e.g. Stock tickers
Network logs Sensor networks
11
Estimation Errors
Cost g(SE, C, R)
SEC
SE
C
Courses
S
E
Students
Enrolled
A Query Execution Plan
12
Estimation Errors
Cost g(SE, C, R)
SEC
SE
Unknown runtime parameters
C
Courses
S
E
Students
Enrolled
A Query Execution Plan
13
How to solve this problem ?
  • More sophisticated estimation techniques
  • Sophisticated summary structures
  • e.g. MHists PI97, Wavelets VWI98
  • Feedback loop in the optimization process
  • e.g. SLMK01, BC02
  • Adaptive query processing
  • Cant always build and maintain synopses
  • Runtime environments can be very unpredictable
  • Soadapt query plans mid-way during execution

14
Eddies Extreme Adaptivity
static plans
per tuple
inter- operator
late binding
intra- operator
Dynamic QEP, Parametric, Competitive
Traditional DBMS
Query Scrambling, MidQuery Re-opt
XJoin, DPHJ Convergent QP
Eddies
  • Telegraph TelegraphCQ (at UC Berkeley)
  • Eddies AH00
  • SteMs RDH03
  • Continuous queries MSHR02, CF02, C03, K03
  • Implementation in PostgreSQL Des04
  • Fault-tolerance and load balancing SHB04
  • STAIRs DH04
  • Other work
  • Distributed eddies, Content-based Routing BB05

15
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

16
Eddies AH00
select from S where pred1(S) and pred2(S)
Plans considered by the optimizer
pred2(S)
pred1(S)
S
Output
pred1(S)
pred2(S)
S
Output
Decision made apriori based on statistics Sort
by (1-s)/c, where s selectivity, c cost Once
this decision is made, all tuples are processed
using the same order
17
Eddies AH00
select from S where pred1(S) and pred2(S)
Executing the query using an Eddy
  • An eddy operator
  • Intercepts tuples from source(s) and output
    tuples from operators
  • Query executed by routing tuples between the
    operators
  • Uses feedback from the operators to route

Change routing gt Change query
execution plan used
18
Per-tuple State
select from S where pred1(S) and pred2(S)
Executing the query using an Eddy
pred2(S)
Eddy
Output
S
  • Two Bitmaps
  • Ready bits - which operators can a tuple be
    routed to next
  • Done bits - which operators has a tuple already
    been through

pred1(S)
Example Ready(t1) 1, 1 - can be routed to
either Done(t1) 0, 0 - not done either
Example Ready(t2) 1, 0 - can be routed to
pred1 Done(t2) 0, 1 - done pred2
For selection queries, ready is a bit-complement
of done
19
Eddies Routing Policy
  • Choosing which operator to route a given tuple to
  • The brain of the eddy

Pred2 is more selective Send here 99 of the
time Send to the other operator 1 of the time
Lottery Scheduling Avnur 00 Simplified
Description 1. Maintain for each operator
tuples sent tuples returned
cost per tuple 2. Choose (roughly) based on the
above 3. Explore by randomly sending tuples in
the wrong orders
sent 100 received 2
pred2(S)
Eddy
Output
S
pred1(S)
sent 30 received 20
20
A Join Query
select from students, enrolled, courses where
students.name enrolled.name and
enrolled.course courses.course
Name Level Course Instructor
Jen Senior CS2 Smith
Name Level Course
Joe Junior CS1
Jen Senior CS2
Course Instructor
CS2 Smith
Courses
Name Course
Joe CS1
Jen CS2
Name Level
Joe Junior
Jen Senior
Students
Enrolled
21
Eddies AH00
Query execution using an eddy
A traditional query plan
A key difference Tuples cant be arbitrarily
routed to any operator E.g. S tuples cant be
routed to E Join C Use ready bits to identify
this
22
Eddies w/ Joins
  • Traditional join operators typically consume one
    relation entirely and then start reading the
    second relation
  • E.g. hash join operator builds a hash table on
    one relation first, and then reads in the other
    relation
  • This is problematic for eddies
  • An eddy needs to see tuples from different
    relations in order to make its routing decisions
  • Also, if the inner relations are pre-decided, not
    much options left for adapting the join order
  • Avnur, Hellerstein 00 discusses this issue in
    detail for traditional join operators

23
Symmetric Hash Join
  • We will use a new join operator called symmetric
    hash join operator
  • Also called doubly pipelined
  • Other variants include ripple joins, Xjoins
    (disk-based)
  • When a new S tuple arrives
  • It is built into S.name hashtable
  • Probed into E.name hash table to find matches
    with already arrived E tuples
  • Matches are immediately output
  • Symmetric Operation !!

S
E
24
Query Execution using Eddies
Probe to find matches
Insert with key hash(joe)
Joe Jr
Joe Junior
Joe Junior
Eddy
S
E
Output
C
No matches Eddy processes the next tuple
25
Query Execution using Eddies
Probe
Insert
Joe CS1
Joe Jr
Joe CS1
Jen Sr
Joe Jr CS1
Eddy
S
Joe CS1
E
Output
C
Joe Jr CS1
Joe Jr CS1
CS2 Smith
26
Query Execution using Eddies
Probe
Joe CS1
Joe Jr
Jen CS2 Smith
Jen Sr
Jen CS2 Smith
Jen Sr. CS2 Smith
Eddy
S
Jen Sr. CS2 Smith
Jen CS2
E
Output
C
Probe
Jen CS2
CS2 Smith
Joe Jr CS1
Jen CS2 Smith
Jen CS2
27
Per-tuple State
  • Here also we need to keep track of what operators
    a tuple has already been through
  • Again use
  • Ready bits - operators that can be applied next
  • Done bits - operators that have already been
    applied
  • Unlike selections, these are not bit-complements
    of each other

28
Per-tuple State
S Join E E Join C
Ready 1 0
Done 0 0
Joe Junior
Eddy
S
E
Output
C
29
Per-tuple State
S Join E E Join C
Ready 1 1
Done 0 0
Joe Jr
Jen Sr
Eddy
S
Joe CS1
E
Output
C
CS2 Smith
30
Per-tuple State
S Join E E Join C
Ready 0 1
Done 1 0
Joe CS1
Joe Jr
Jen Sr
Joe Jr CS1
Eddy
S
E
Output
C
CS2 Smith
31
Execution Postmortem
  • Can we talk about what exactly the eddy did
    during the execution ?
  • Yes !

32
Execution Postmortem
Output
Output
E C
E S
S E
Courses
C E
Students
Course Instructor
CS2 Smith
Name Level
Joe Junior
Jen Senior
Students
Enrolled
Courses
Enrolled
Name Level
Joe Junior
Jen Senior
Name Course
Joe CS1
Course Instructor
CS2 Smith
Name Course
Jen CS2
Eddy executes different query execution plans for
different parts of data
33
Execution Postmortem
  • Can we talk about what exactly the eddy did
    during the execution ?
  • Yes !
  • Eddy executes different plans for different parts
    of data
  • This is where the adaptivity comes from

34
Routing policy
  • Lottery scheduling unfortunately doesnt work
    well with joins
  • Just because a join operator does not return
    tuples right now doesnt mean it wont return
    more tuples later
  • In other words, a join operator is state-ful
  • Selection operators are state-less

35
Example Delayed Data Sources
SETUP
gtgt
Execution plan 1
Execution plan 2
SEC
SEC
CE
SE
S
C
C
E
S
E
Cost (Plan 1) gt Cost (Plan 2)
36
Example Delayed Data Sources
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
time
37
SETUP E and C arrive early S is
delayed
gtgt
sent and received suggested (so far) that S Join
E is better option for E tuples
S0
S
E
S
S0
S S0
E
C
time
C
S0E
SE
(S S0)E
Eddy learns the correct sizes
Too Late !!
38
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
C
SE
S
E
Execution Plan Used
Query is executed using the worse plan.
Too Late !!
39
Joins and Lottery Scheduling
  • Lottery scheduling doesnt work well with joins
  • Not clear how any routing policy can work without
    reasonable knowledge of future
  • Whatever the current state in the join operators,
    an adversary can send tuples to make it look very
    bad
  • Two possible solutions
  • Allow manipulation of state (STAIRs) DH04
  • Dont embed state in the operators (SteMs)
    RDH03

40
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

41
STAIRs DH04
  • Expose join state to the eddy
  • Provide state management primitives
  • That guarantee correctness of execution
  • That can be used to manipulate embedded state in
    the operators
  • Also allow support for cyclic queries etc

42
New Operator STAIR
S E
HashTable S.Name
HashTable E.Name

Eddy
S
Output
E
C

HashTable E.Course
HashTable C.Course
E C
43
New Operator STAIR
Storage, Transformation and Access for
Intermediate Results
44
Query execution using STAIRS
  • Similar to using Join Operators

Probe into E.Name STAIR
s1
s1
s1
s1
45
STAIR Operations
  • Build (insert)
  • Insert the given tuple into the STAIR
  • Probe (lookup)
  • Find matching tuples for the given tuple
  • State Management Operations
  • Demotion
  • Promotion

46
State Management Primitive Demotion
  • Replace a tuple in a STAIR with a projection of
    that tuple

S.Name STAIR
HashTable

E.Name STAIR
s1
Demoting e2c1 to e2
HashTable

e1
e2c1
e2
Eddy
S
E
Output
C
HashTable

e2
s1e1
HashTable

c1
E.Course STAIR
Can be thought of as undoing work
C.Course STAIR
47
State Management Primitive Promotion
  • Replace a tuple in a STAIR with the result of
    joining it with other tuples

S.Name STAIR
  • Two arguments
  • A tuple
  • A join to be used to promote this
    tuple

HashTable

E.Name STAIR
s1
HashTable

e1
e1c1
e2c1
Eddy
S
E
Output
C
HashTable

e2
s1e1
HashTable

c1
e1
E.Course STAIR
Can be thought of as precomputation of work
C.Course STAIR
48
STAIRs Correctness
  • Theorem For any sequence of applications of the
    state management operations, STAIRs will produce
    the correct query output.
  • STAIRs will produce every result tuple
  • There will be no spurious duplicates

49
Lifting Burden of History Delayed Data Sources
50
SETUP E and C arrive early S is
delayed
gtgt
S0
S
E
S0
E
C
time
C
S0E
Eddy learns the correct selectivities
51
SETUP E and C arrive early S is
delayed
gtgt
S0
E.Name STAIR

HashTable
S
E
E
Eddy
S
C
E
Output
C
time
E

HashTable
C
Eddy decides to migrate E
Eddy learns the correct selectivities
By promoting E using E C
C.Course STAIR
52
SETUP E and C arrive early S is
delayed
gtgt
S.Name STAIR

HashTable
S
S0
E.Name STAIR

HashTable
S
S S0
S S0
E
Eddy
S
C
(S S0) E C
E
Output
C
time
E

HashTable
C
C.Course STAIR
53
S.Name STAIR

HashTable
S
E.Name STAIR

HashTable
UNION
Eddy
S
E
Output
C
E

HashTable
C
Most of the data is processed using the correct
plan
C.Course STAIR
54
Further Motivating Adaptive State Management
  • Eager pre-computation for faster response times
  • Query scrambling UFA98
  • Partial results RH02
  • Selective caching of intermediate results
  • Continuous queries over streams
  • Cyclic queries
  • Adapting the join spanning tree used

55
Making State Migration Decisions
  • Another policy question
  • Optimal migration decisions
  • Requires knowledge of future selectivities and
    the sizes of relations

56
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

57
Alternative SteMs RDH03
  • Dont embed the state in the operators at all
  • Note Not the original motivation for SteMs
  • Focus was on increasing opportunities for
    adaptivity by breaking up the join operators
  • We will focus on a very simplistic version of the
    operator

58
Query Execution using SteMs
59
Query Execution using SteMs
S SteM
Probe

Joe Jr
Insert
Jen Sr
E SteM

Joe CS1
Jen CS2 Smith
Jen Sr. CS2 Smith
Jen CS2
Jen CS2
Eddy
S
Jen CS2
Jen CS2
E
C
C SteM
Jen Sr. CS2 Smith
Jen CS2

Probe
CS2 Smith
Jen CS2 Smith
60
Query Execution using SteMs
  • State inside the operators is independent of
    previous routing decisions
  • Because no intermediate tuples are ever stored
  • Doesnt have the same problem as the join or
    STAIR operators
  • Optimal routing policy easy to write down
  • Similarities to queries with only selections
  • But not storing intermediate results increases
    the computation cost significantly

61
SteMs Drawbacks
  • Recomputation of intermediate result tuples
  • Constrained plan choices
  • Available plans depend highly on the arrival
    order

62
SETUP E and C arrive early S is
delayed
gtgt
S0
S SteM

S
S0
E SteM
E

E
C
Eddy
S
E
C
C SteM
time

C
Under the mechanism, there is no way to execute
the other plan for this setup
63
SteMs Drawbacks
  • Recomputation of intermediate result tuples
  • Constrained plan choices
  • Available plans depend highly on the arrival
    order
  • Though more subtle, the second drawback might be
    the more important one

64
Recap
  • An eddy operator
  • Can affect the query execution plan(s) used by
    routing different tuples differently
  • Eddy w/ Selections
  • Well understood
  • Even if selections are correlated
  • Babu, Munagala et al SIGMOD 2004, ICDT 2005

65
Recap
  • Eddies for multi-way joins
  • Opportunities for adaptivity depend on the join
    operators used
  • Higher adaptivity tends to push logic into the
    eddy gt Routing policies very important

Sort-merge Hybrid-Hash
Index-nested loop joins
Nested-loop Joins
Pipelined/ Symmetric Hash Join
SteMs/ STAIRs
Blocking opeators Little adaptivity
Similarities to selections
Suffers from state accumulation problems
Policy issues not well-understood
See AH00
66
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

67
Implementation Details
  • In PostgreSQL Database System code base
  • In the context of TelegraphCQ project
  • Highly efficient implementation SIGREC04
  • Eddy, SteMs, STAIRs export get_next() functions
  • Routing decisions are made per batch
  • Can control batch size
  • Routing decisions made for all possible ready
    bitmaps
  • Decisions are encoded in arrays that are indexed
    with ready bits
  • Efficiently find the operator to route to

68
Results - Overheads (1)
All plans have identical costs, so adaptivity
plays no role
69
Results - Overheads (2)
70
Policies used for experiments
  • Routing policy
  • Observe
  • Selectivities of predicates on base tables
  • Domain sizes of join attributes
  • Compute join selectivities and use them to route
    tuples
  • Migration policy
  • Tie state migration decisions to routing
    decisions
  • Follow the routing policy decisions to make sure
    that most tuples are routed correctly
  • Caveats
  • May end doing migrations late in the query
    execution
  • May thrash

71
State Migration Illustrative Example
select from customer c, orders o, lineitem l
where c.custkey o.custkey and o.orderkey
l.orderkey and c.nationkey 1 and c.acctbal
gt 9000 and l.shipdate gt date 1996-01-01
Setup lineitem arrives sorted on shipdate gt
selectivity(l.shipdate gt ) very low
initially gt orders routed to join with
lineitem (bad) No explicit delays introduced
72
Illustrative Example (1)
73
Illustrative Example (2)
74
Experiments Synthetic Workload
  • Modeled after the Wisconsin Benchmark
  • 20 Tables for varying sizes
  • Randomly generated queries
  • Environment
  • Rates proportional to table sizes no delays or
  • Random initial delays introduced or
  • Random data rates

75
Traditional vs STAIRs
76
SteMs vs STAIRs
77
Joins vs STAIRs
78
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

79
Continous Query Processing
  • Eddies ideal for executing continuous queries
    over data streams
  • Dynamic runtime conditions make a static plan
    unsuitable
  • Queries typically executed over sliding windows
  • Find average over last one week
  • Note Continuous vs Multi-query processing
  • Not identical
  • Data streams literature does not make this
    difference explicit
  • Application environments tend to have a large
    number of simultaneous queries

80
Continuous Query Processing
  • CACQ Madden et al 2002
  • Focus on sharing work as much as adaptivity
  • Uses SteMs augmented with a deletion operator
  • To handle sliding windows
  • Also uses predicate indexes
  • For handling a large number of queries on the
    same set of streams but with different predicates
  • E.g. millions of stock alerts over a few streams

81
Roadmap
  • Adaptive Query Processing Motivation
  • Eddies AH00
  • STAIRs DH04 and SteMs RDH03
  • Experimental Study
  • Implementation in PostgreSQL Des03
  • Continuous queries MSHR02 (very briefly)
  • Open problems

82
Some open problems (1)
  • Eddies for continuous query processing
  • Much work since CACQ, but not a solved problem
  • E.g. computational inefficiency of SteMs
  • Many other proposed CQ architectures face the
    same problem
  • MJoins (NiagaraCQ)
  • Stanford STREAM processor (earlier version)
  • Later added intermediate result caches
  • Note These two dont use eddies explicitly
  • Routing policies for CQ still an open question
  • Different from routing policies for non-CQ queries

83
Some open problems (2)
  • Routing policies
  • Whether eddies will succeed depends on the
    routing policies
  • Little work so far...
  • SteMs, STAIRs
  • Theoretical analysis of optimization space, and
    practical viability analysis needed
  • Especially in the context of continuous query
    processing

84
Some open problems (3)
  • Eddies for multi-query processing (non-CQ)
  • SteMs may be sufficient for CQ processing, but
    not for normal multi-query processing
  • Parallel, distributed environments, P2P, Grid..
  • Disk
  • Flexibility demanded by adaptive techniques at
    odds against the careful scheduling typically
    done by DBMSs
  • XJoins
  • Very little work on understanding this

85
Some open problems (4)
  • Optimization with expanded plan space
  • Eddies can explore a plan space much larger than
    traditional plan space
  • They allow relations to be broken into pieces,
    with each piece executed separately
  • Can we explore this plan space in a non-adaptive
    setting ?
  • Recent work on
  • Conditional Planning Deshpande et al, ICDE 2005
  • Content-based Routing Babu et al, VLDB 2005

86
Summary
  • Increasing need for adaptivity
  • Eddy A highly adaptive query processor
  • Executes queries by routing tuples through
    operators
  • SteMs, STAIRs
  • New operators proposed to handle problems with
    traditional join operators
  • Very promising especially for continuous and
    wide-area query processing
  • Exciting research lies ahead

87
The End
  • Questions ?

88
Fatal Flaw Burden of Routing History
Routing decisions get embedded in the state
Joe CS1
Joe Jr
Jen Sr
Jen CS2 Smith
Eddy
S
E
Output
C
CS2 Smith
Joe Jr CS1
Jen CS2
Future adaptibility is severly constrained
89
Example Delayed Data Sources
SETUP
gtgt
Execution plan 1
Execution plan 2
SEC
SEC
CE
SE
S
C
C
E
S
E
Cost (Plan 1) gt Cost (Plan 2)
90
Example Delayed Data Sources
SETUP E and C arrive early S is
delayed
gtgt
S
E
C
time
A plan may have to be chosen without any
statistical information about the data
Earliest time sufficient information may be
available to choose optimal plan
91
Tricky State Configurations 1
  • Want to undo the decision to route E1 to S E

E1
S0
E2C
Result S0EC already produced
C
S0E1
E2
92
Tricky State Configurations 2
S E
HashTable E.Name

HashTable S.Name
S
E1
E2C1
E2C2I

HashTable E.Course
HashTable C.Course
Eddy
S
C1
SE1
E
C2I
E2
C
I
E C

HashTable C.Intstructor
HashTable I.Instructor
I
C2
SE1C1
SE2C1
C I
Write a Comment
User Comments (0)
About PowerShow.com