Adaptive Query Processing in the Looking Glass - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Adaptive Query Processing in the Looking Glass

Description:

... Challenging meta problems in AQP for continuous queries need to be addressed Larger and more complex plan ... queries need to ... Correlated & skewed data ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 41
Provided by: Compute46
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Query Processing in the Looking Glass


1
Adaptive Query Processing in the Looking Glass
  • Shivnath Babu (Stanford Univ.)
  • Pedro Bizarro (Univ. of Wisconsin, Madison)

2
Adaptive Query Processing (AQP)
SystemsPublication Timeline
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.

1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
Introduction
3
Motivation
  • Plenty of recent work on Adaptive Query
    Processing (AQP) in different contexts
  • Conventional DBMS query processing, data
    integration, continuous queries in stream systems
  • No exhaustive, in-depth categorization and
    comparison of AQP systems to date
  • Difficult to answer questions like
  • Will techniques from one system work on another?
  • What are the shortcomings of each system?
  • Which system is best for a new application
    domain?

Introduction
4
Our Contributions
  • Detailed study of current AQP systems
  • Classification of AQP systems into 3 families
  • Comparison across families in terms of AQP tasks
  • Identification of shortcomings new approaches
    to address them

Introduction
5
Roadmap
  • Introduction to AQP
  • The three AQP system families
  • Comparison across families in terms of AQP tasks
  • Summary of what we learned

6
Primer on Traditional Query Processing
Query
Optimizer Chooses best plan
Introduction
7
Need for Adaptive Query Processing
Detect plan suboptimality, re-optimize
Errors in stats estimates, optimizer mistakes
Correlated skewed data distributions
Stats system conditions may change while query
is running
Monitor for changes, re-optimize
Continuous queries, long-running queries
AQP is integral to the current CS-wide push
towards autonomic computing
Introduction
8
Our Focus AQP for a Single Query
  • AQP System
  • A system that interleaves the optimization and
    execution aspects of query processing, possibly
    multiple times, during the processing of a single
    query

Introduction
9
Roadmap
  • Introduction to AQP
  • The three AQP system families
  • Comparison across families in terms of AQP tasks
  • Summary of what we learned

10
AQP System Families
  • Plan-based AQP systems
  • AQP for traditional plan-based DBMSs
  • Continuous-Query-based (CQ-based) AQP systems
  • AQP for long-running continuous queries over data
    streams
  • Routing-based AQP systems
  • AQP for DBMSs and continuous queries based on
    adaptive tuple routing

AQP Families
11
AQP in Plan-based Systems
Query
Optimizer Chooses best plan
Chosen plan
Executor Runs chosen plan
Runstats
Statistics Tracker Creates/updates stats
AQP Families
12
AQP in Plan-based Systems
Query
Optimizer Chooses best plan
Chosen plan
Executor Runs chosen plan
Runstats
Statistics Tracker Creates/updates stats
AQP Families
13
Example Plan-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.

1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
14
Primer on Continuous Query Processing
Chosen packets
  • Continuous Queries (CQs) are long-running queries
    usually over data streams
  • Example CQ Filtering packet streams
  • Stream properties or system conditions may change
    while query is running ? best plan may change

s3
s2
s1
Packets
AQP Families
15
AQP in CQ-based Systems
Query
Optimizer Chooses best plan
Statistics Tracker Creates/updates stats
Runstats
AQP Families
16
AQP in CQ-based Systems
Continuous Query
Optimizer Chooses best plan
Catalog (stream rates, data distr.)
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
17
AQP in CQ-based Systems
Continuous Query
Optimizer Ensures that plan is best for current
stats
Uses stats to cost plans
Catalog (stream rates, data distr.)
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
18
AQP in CQ-based Systems
Continuous Query
Optimizer Ensures that plan is best for current
stats
Uses stats to cost plans
Re-optimize
Catalog (stream rates, data distr.)
Stats to track
Statistics Tracker Monitors stream stats and
system conditions
AQP Families
19
Example CQ-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Eddies
Query Scrambling
CAPE
NiagaraCQ
Parametric opt.

1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
20
Primer on Routing-based Processing
  • Non-plan-based architecture where tuples are
    routed individually through operators
  • No optimizer
  • Exemplified by Eddies AH00

AQP Families
21
AQP in Routing-based Systems
Query
Optimizer Chooses best plan
AQP Families
22
AQP in Routing-based Systems
Query or Continuous Query
Tuple Router Integrated Optimizer Stats Tracker
AQP Families
23
Example Routing-based AQP Systems
STREAM
Tukwila
POP
Re-Opt
River
Query Scrambling
Eddies
CAPE
NiagaraCQ
Parametric opt.

1976
1977
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
Pipeline sch.
Ingres
DEC-Rdb
DQE
Corrective processing
RedBrick
Conquest
Expected cost opt.
Memory adap.
AQP Families
24
Roadmap
  • Introduction to AQP
  • The three AQP system families
  • Comparison across families in terms of AQP tasks
  • Summary of what we learned

25
Comparison Across AQP System Families
  • Goal To bring out AQP algorithms and features,
    not performance numbers
  • Models, assumptions, and approach
  • Techniques for tracking statistics
  • Re-optimization subtasks
  • When and how to re-optimize
  • Switching between plans
  • Pros cons of using a conventional optimizer
  • Performance issues
  • Quality of re-optimization
  • Run-time overhead thrashing
  • Scalability

Comparison
26
Comparison Across AQP System Families
  • Goal To bring out AQP algorithms and features,
    not performance numbers
  • Models, assumptions, and approach
  • Techniques for tracking statistics
  • Re-optimization subtasks
  • When and how to re-optimize
  • Switching between plans
  • Pros cons of using a conventional optimizer
  • Performance issues
  • Quality of re-optimization
  • Run-time overhead thrashing
  • Scalability

Comparison
27
Techniques for Tracking Statistics
  • Observation
  • Mostly in Plan-based systems
  • Competition
  • Mostly in Plan-based systems
  • Profiling
  • Mostly in CQ-based systems
  • Exploration
  • In Routing-based systems

Comparison
28
Tracking Statistics Observation KD98
  • Collect statistics on operator behavior or
    intermediate subexpressions in a plan

Comparison
29
Tracking Statistics Competition A93
  • Extra processing to collect statistics

Comparison
30
Tracking Statistics Profiling BMM04
  • Extra processing on a fraction of the input
    tuples (e.g., a random sample) to collect
    statistics
  • Builds a statistical profile that can be used
    to estimate many individual statistics

s3
s2
s1
Profiled tuples
Comparison
31
Tracking Statistics Exploration AH00
  • A fraction of tuples are routed along routes
    different from the current best route to track
    statistics along those routes
  • No redundant processing

Chosen packets
s2
s3
s1
Tuple Router
Packets
Comparison
32
Comparing Statistics-Tracking Techniques Extra
Overhead Introduced
  • Observation
  • Exploration (inefficient routes for some tuples)

Increasing overhead
  • Profiling (extra processing on some tuples)
  • Competition (lots of extra work)

Comparison
33
Comparing Statistics-Tracking Techniques
Coverage of Different Statistics
  • Observation Competition (limited by plan)

Increasing coverage
  • Exploration (limited by large number of routes)
  • Profiling (highest since it builds statistics
    profile)

Comparison
34
Comparing Statistics-Tracking Techniques
Accuracy of Estimation
Increasing accuracy
  • Exploration (but, susceptible to routing bias)
  • Profiling (depends on sampling fraction)
  • Observation Competition

Comparison
35
Roadmap
  • Introduction to AQP
  • The three AQP system families
  • Comparison across families in terms of AQP tasks
  • Summary of what we learned

36
What have we learned? (1)
  • Many similarities in internals of different AQP
    families
  • Can re-use many current (and new) AQP techniques
    across families
  • Ex Profiling from CQ-based systems
  • Enables, e.g., faster detection of plan
    suboptimality in Plan-based systems
  • Generates more accurate statistics at lower cost
    in Routing-based systems

New Ideas
37
What have we learned? (2)
  • Current AQP systems are reactive
  • E.g., do not consider sensitivity to
    errors/changes in stats

Proactive Re-optimization
New Ideas
38
What have we learned? (3)
  • Challenging meta problems in AQP for continuous
    queries need to be addressed
  • Larger and more complex plan spaces ? higher
    costs for statistics tracking and re-optimization
  • Tracking Return-of-Investment on AQP
  • Avoiding thrashing, e.g., on bursty changes in
    statistics

Proposal Plan Logging for Continuous Queries
New Ideas
39
Plan Logging for Continuous Queries
  • Log the statistics and re-optimization history
  • Query is long-running
  • Example view over log for R S T

?
?
Rate(R) s(R,S) Plan Cost
1024 0.75 P1 12762
5642 0.72 P2 72332
934 0.76 P1 12003
time
Plans lying in a high-dimensional space of
statistics
New Ideas
40
Summary
  • AQP is becoming important
  • New data and application trends
  • CS-wide push towards Autonomic Computing
  • Significant amount of work on AQP in recent years
  • Our contributions
  • In-depth categorization and comparison of AQP
    systems and techniques
  • Identified current shortcomings and new
    approaches to AQP

Conclusions
Write a Comment
User Comments (0)
About PowerShow.com