Title: Using State Modules for Adaptive Query Processing
1Using State Modules for Adaptive Query Processing
- Vijayshankar Raman
- IBM Almaden Research Center
- Amol Deshpande
- Joseph M. Hellerstein
- University of California,
Berkeley
2- All the material is taken directly or adapted
from the paper - Using State Modules for Adaptive Query
Processing - by Vijayshankar Raman, Amol Deshpande, Joseph M.
Hellerstein
3Contents
- Background
- Overview
- Framework
- Variations in Adaptations
- Illustrated Examples and Experiments
- Conclusion
4Background
- Uncertainties in query execution
- Cardinality estimates are highly imprecise
- Demands on memory, system load and network
bandwidth are typically unknown at runtime - Data distribution and rates often cannot be known
in advance - User preference in interactive system changes
over time - Necessity of adaptive execution in stream system
5Background
- Federated Facts and Figures (FFF) query system to
combine data from diverse and distributed data
sources - Volatility of distributed data sources
- Volatility of user interests during online query
processing
6What do you mean by adaptability?
- No static plan of execution, dynamically changing
execution plan according to the changing
environment, at the same time, should guarantee
the result is correct - Adaptability requires flexibility, such as
- Choices of AM and Join Algorithms
- Ordering of operators
- Choices of query spanning tree
7How to achieve flexibility?
- Proposed Solutions
- Refine the granularity of query models
- Breaking down large operator, exposing the inside
to the control of optimizer - Separate and encapsulate the state data structure
from Join - Optimizer has more decisions to make
- Consequence
- Optimizer gains more flexibility and freedom at
the expense of assuming more responsibilities
8Overview
- Adaptive Execution of SPJ
- Routing constraints needed for on-the-fly
Adaptation - Focus on the adaptive processing of join
- Introduce the framework of dynamic routing
- Keep on adding flexibility in execution by
revising and relaxing the routing constraints
gradually - Support other join algorithms
- Support multiple access methods
- Support cyclic query
- Non-symmetric treatment of input relations
9Join Operator
- Logical construct, black box
- Typically involve multiple physical operations
- Q1 Which physical operations are involved?
10Different Levels of Adaptation
- Q2 What is the advantage of (b) compared to (a)?
- Q3 What is the advantage of (c) compared to (b)?
11Comparison Discussion
- Both (a) and (b) make use of only the index
access method on T and pre-chosen implementation
for RS and ST joins - (c) allows all access methods (tuples from AM are
routed to SteMs, rather than joins) and allows a
variety of routing decisions that permit
different join algorithms and join order - Q4 Decomposing of Join operator brings about
adaptation. Why?
12- Does the routing framework work?
- (Appendix) Showed all SPJ can be executed by
routing tuples carefully between AM, SteMs and
selections - Caution
- Arbitrary routing results in
- Duplicate results
- Missing results
- Infinite loops
- Solution
- Flexibility comes at the price of Routing
constraints - Proposed Routing constraints
13Framework Components (overview)
- Four kinds of modules
- Selection modules Query predicate
- Access modules Access method over data source
- State modules Encapsulate data structure in
traditional join algorithms - Eddy modules Route tuples between the other
modules - Each module runs asynchronously
14Functionality of Main Modules
15Query Planning
- Check that the query is valid
- Create an AM on each access method
- Create a SM on each predicate
- Create a SteM on each base table
- Create any seed tuples needed for scans
16Example of N-way Symmetric Hash Join
- Demonstrate how to implement n-way symmetric join
with SteMs
Q5 Comparing (ii) with (i), which one will you
choose?
17Executing Arbitrary SPJ Queries with SteMs
1. Acyclic SPJ queries with single scan AM on
each table Example n-ary SHJ Required
Rules SteMs implemented with hash
indices. Eddy obeys Routing Constraints BuildFi
rst Singleton tuple from table T must first be
routed to build into SteMT SteM BounceBack All
Build tuples and NO Probe tuples Atomicity
Build and Probing Coupled BoundedRepetition No
tuple routed to same module more than once.
18Relax constraints to allow other Join
Algorithms SteMs NEED NOT be implemented with
hash indices. Build and Probe operations
decoupled Potential problems?
19- 2. Competitive AMs
- Example Queries with more than one AM.
- Goal Run multiple AMs/ source and let Eddy
dynamically choose one AM or switch between AMs - Duplicacy problem?
-
- Required Rules
- SteM BounceBack A SteMS must bounce back a build
tuple s unless it is a duplicate of another s
that is already in SteMS.
20- 3. Index AMs
- When a data source has an index AM.
-
- potential problem?
- Required Rules
- SteM BounceBack
- A SteMS must bounce back a build tuple s unless
it is a duplicate of another s that is already
in SteMS. - A SteMS must bounce back a probe tuple r unless S
has a scan AM, or SteMS already contains all
matches for r.
21- More Adaptation
- 4. Cyclic Queries
- Static spanning tree choices hurt in two ways
- The spanning tree choice is typically made based
on selectivities - A static spanning tree choice can also constrain
the generation of partial query results - Required Rules
- ProbeCompletion Constraint A tuple t that has
been bounced back after probing into a SteMS must
not probe into any other SteM afterwards. The
routing policy must however maintain t in the
dataflow, routing it to other modules, until it
has been probed into an AM on S. - Prior Probers and Probe Completion Table
- 5. Relaxing the BuildFirst Constraint
- if one of the input tables is much larger than
the others?
22Summary of Constraints
23Conclusion
- The salient points of our experimental study are
as follows - Even a simple join algorithm like the index join
encapsulates multiple physical operations, and
this causes - A head-of-line blocking problem. This problem can
be avoided by breaking the join module into
SteMs. - SteMs allow the Eddy to efficiently learn between
competitive access methods, while doing almost no
redundant work. - SteMs allow the Eddy to dynamically choose the
join spanning tree for cyclic queries. - SteMs allow the Eddy to dynamically switch
between an index join algorithm and a symmetric
hash join algorithm during query execution. - With SteMs, the Eddy can adaptively choose the
way it reorders tuples in interactive
environments. - Thank you ?