Title: Adaptive Query Processing in the Looking Glass
1Adaptive Query Processing in the Looking Glass
- Shivnath Babu (Stanford Univ.)
- Pedro Bizarro (Univ. of Wisconsin, Madison)
2Adaptive Query Processing (AQP)
SystemsPublication Timeline
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.
1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
Introduction
3Motivation
- Plenty of recent work on Adaptive Query
Processing (AQP) in different contexts - Conventional DBMS query processing, data
integration, continuous queries in stream systems - No exhaustive, in-depth categorization and
comparison of AQP systems to date - Difficult to answer questions like
- Will techniques from one system work on another?
- What are the shortcomings of each system?
- Which system is best for a new application
domain?
Introduction
4Our Contributions
- Detailed study of current AQP systems
- Classification of AQP systems into 3 families
- Comparison across families in terms of AQP tasks
- Identification of shortcomings new approaches
to address them
Introduction
5Roadmap
- Introduction to AQP
- The three AQP system families
- Comparison across families in terms of AQP tasks
- Summary of what we learned
6Primer on Traditional Query Processing
Query
Optimizer Chooses best plan
Introduction
7Need for Adaptive Query Processing
Detect plan suboptimality, re-optimize
Errors in stats estimates, optimizer mistakes
Correlated skewed data distributions
Stats system conditions may change while query
is running
Monitor for changes, re-optimize
Continuous queries, long-running queries
AQP is integral to the current CS-wide push
towards autonomic computing
Introduction
8Our Focus AQP for a Single Query
- AQP System
- A system that interleaves the optimization and
execution aspects of query processing, possibly
multiple times, during the processing of a single
query
Introduction
9Roadmap
- Introduction to AQP
- The three AQP system families
- Comparison across families in terms of AQP tasks
- Summary of what we learned
10AQP System Families
- Plan-based AQP systems
- AQP for traditional plan-based DBMSs
- Continuous-Query-based (CQ-based) AQP systems
- AQP for long-running continuous queries over data
streams - Routing-based AQP systems
- AQP for DBMSs and continuous queries based on
adaptive tuple routing
AQP Families
11AQP in Plan-based Systems
Query
Optimizer Chooses best plan
Chosen plan
Executor Runs chosen plan
Runstats
Statistics Tracker Creates/updates stats
AQP Families
12AQP in Plan-based Systems
Query
Optimizer Chooses best plan
Chosen plan
Executor Runs chosen plan
Runstats
Statistics Tracker Creates/updates stats
AQP Families
13Example Plan-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.
1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
14Primer on Continuous Query Processing
Chosen packets
- Continuous Queries (CQs) are long-running queries
usually over data streams - Example CQ Filtering packet streams
- Stream properties or system conditions may change
while query is running ? best plan may change
s3
s2
s1
Packets
AQP Families
15AQP in CQ-based Systems
Query
Optimizer Chooses best plan
Statistics Tracker Creates/updates stats
Runstats
AQP Families
16AQP in CQ-based Systems
Continuous Query
Optimizer Chooses best plan
Catalog (stream rates, data distr.)
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
17AQP in CQ-based Systems
Continuous Query
Optimizer Ensures that plan is best for current
stats
Uses stats to cost plans
Catalog (stream rates, data distr.)
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
18AQP in CQ-based Systems
Continuous Query
Optimizer Ensures that plan is best for current
stats
Uses stats to cost plans
Re-optimize
Catalog (stream rates, data distr.)
Stats to track
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
19Example CQ-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.
1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
20Primer on Routing-based Processing
- Non-plan-based architecture where tuples are
routed individually through operators - No optimizer
- Exemplified by Eddies AH00
AQP Families
21AQP in Routing-based Systems
Query
Optimizer Chooses best plan
AQP Families
22AQP in Routing-based Systems
Query or Continuous Query
Tuple Router Integrated Optimizer Stats Tracker
AQP Families
23Example Routing-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Query Scrambling
Eddies
CAPE
NiagaraCQ
Parametric opt.
1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
24Roadmap
- Introduction to AQP
- The three AQP system families
- Comparison across families in terms of AQP tasks
- Summary of what we learned
25Comparison Across AQP System Families
- Goal To bring out AQP algorithms and features,
not performance numbers
- Models, assumptions, and approach
- Techniques for tracking statistics
- Re-optimization subtasks
- When and how to re-optimize
- Switching between plans
- Pros cons of using a conventional optimizer
- Performance issues
- Quality of re-optimization
- Run-time overhead thrashing
- Scalability
Comparison
26Comparison Across AQP System Families
- Goal To bring out AQP algorithms and features,
not performance numbers
- Models, assumptions, and approach
- Techniques for tracking statistics
- Re-optimization subtasks
- When and how to re-optimize
- Switching between plans
- Pros cons of using a conventional optimizer
- Performance issues
- Quality of re-optimization
- Run-time overhead thrashing
- Scalability
Comparison
27Techniques for Tracking Statistics
- Observation
- Mostly in Plan-based systems
- Competition
- Mostly in Plan-based systems
- Profiling
- Mostly in CQ-based systems
- Exploration
- In Routing-based systems
Comparison
28Tracking Statistics Observation KD98
- Collect statistics on operator behavior or
intermediate subexpressions in a plan
Comparison
29Tracking Statistics Competition A93
- Extra processing to collect statistics
Comparison
30Tracking Statistics Profiling BMM04
- Extra processing on a fraction of the input
tuples (e.g., a random sample) to collect
statistics - Builds a statistical profile that can be used
to estimate many individual statistics
s3
s2
s1
Profiled tuples
Comparison
31Tracking Statistics Exploration AH00
- A fraction of tuples are routed along routes
different from the current best route to track
statistics along those routes - No redundant processing
Chosen packets
s2
s3
s1
Tuple Router
Packets
Comparison
32Comparing Statistics-Tracking Techniques Extra
Overhead Introduced
- Exploration (inefficient routes for some tuples)
Increasing overhead
- Profiling (extra processing on some tuples)
- Competition (lots of extra work)
Comparison
33Comparing Statistics-Tracking Techniques
Coverage of Different Statistics
- Observation Competition (limited by plan)
Increasing coverage
- Exploration (limited by large number of routes)
- Profiling (highest since it builds statistics
profile)
Comparison
34Comparing Statistics-Tracking Techniques
Accuracy of Estimation
Increasing accuracy
- Exploration (but, susceptible to routing bias)
- Profiling (depends on sampling fraction)
Comparison
35Roadmap
- Introduction to AQP
- The three AQP system families
- Comparison across families in terms of AQP tasks
- Summary of what we learned
36What have we learned? (1)
- Many similarities in internals of different AQP
families - Can re-use many current (and new) AQP techniques
across families - Ex Profiling from CQ-based systems
- Enables, e.g., faster detection of plan
suboptimality in Plan-based systems - Generates more accurate statistics at lower cost
in Routing-based systems
New Ideas
37What have we learned? (2)
- Current AQP systems are reactive
- E.g., do not consider sensitivity to
errors/changes in stats
Proactive Re-optimization
New Ideas
38What have we learned? (3)
- Challenging meta problems in AQP for continuous
queries need to be addressed - Larger and more complex plan spaces ? higher
costs for statistics tracking and re-optimization - Tracking Return-of-Investment on AQP
- Avoiding thrashing, e.g., on bursty changes in
statistics
Proposal Plan Logging for Continuous Queries
New Ideas
39Plan Logging for Continuous Queries
- Log the statistics and re-optimization history
- Query is long-running
- Example view over log for R S T
?
?
Rate(R) s(R,S) Plan Cost
1024 0.75 P1 12762
5642 0.72 P2 72332
934 0.76 P1 12003
time
Plans lying in a high-dimensional space of
statistics
New Ideas
40Summary
- AQP is becoming important
- New data and application trends
- CS-wide push towards Autonomic Computing
- Significant amount of work on AQP in recent years
- Our contributions
- In-depth categorization and comparison of AQP
systems and techniques - Identified current shortcomings and new
approaches to AQP
Conclusions