Using State Modules for Adaptive Query Processing - PowerPoint PPT Presentation

About This Presentation

Title:

Using State Modules for Adaptive Query Processing

Description:

A SteMS must bounce back a build tuple s unless it is a duplicate of another s' ... Constraint: A tuple t that has been bounced back after probing into a SteMS must ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 24

Provided by: webC

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Using State Modules for Adaptive Query Processing

1
Using State Modules for Adaptive Query Processing

Vijayshankar Raman
IBM Almaden Research Center
Amol Deshpande
Joseph M. Hellerstein
University of California,
Berkeley

All the material is taken directly or adapted
from the paper
Using State Modules for Adaptive Query
Processing
by Vijayshankar Raman, Amol Deshpande, Joseph M.
Hellerstein

3
Contents

Background
Overview
Framework
Variations in Adaptations
Illustrated Examples and Experiments
Conclusion

4
Background

Uncertainties in query execution
Cardinality estimates are highly imprecise
Demands on memory, system load and network
bandwidth are typically unknown at runtime
Data distribution and rates often cannot be known
in advance
User preference in interactive system changes
over time
Necessity of adaptive execution in stream system

5
Background

Federated Facts and Figures (FFF) query system to
combine data from diverse and distributed data
sources
Volatility of distributed data sources
Volatility of user interests during online query
processing

6
What do you mean by adaptability?

No static plan of execution, dynamically changing
execution plan according to the changing
environment, at the same time, should guarantee
the result is correct
Adaptability requires flexibility, such as
Choices of AM and Join Algorithms
Ordering of operators
Choices of query spanning tree

7
How to achieve flexibility?

Proposed Solutions
Refine the granularity of query models
Breaking down large operator, exposing the inside
to the control of optimizer
Separate and encapsulate the state data structure
from Join
Optimizer has more decisions to make
Consequence
Optimizer gains more flexibility and freedom at
the expense of assuming more responsibilities

8
Overview

Adaptive Execution of SPJ
Routing constraints needed for on-the-fly
Adaptation
Focus on the adaptive processing of join
Introduce the framework of dynamic routing
Keep on adding flexibility in execution by
revising and relaxing the routing constraints
gradually
Support other join algorithms
Support multiple access methods
Support cyclic query
Non-symmetric treatment of input relations

9
Join Operator

Logical construct, black box
Typically involve multiple physical operations
Q1 Which physical operations are involved?

10
Different Levels of Adaptation

Join of three table

Q2 What is the advantage of (b) compared to (a)?
Q3 What is the advantage of (c) compared to (b)?

11
Comparison Discussion

Both (a) and (b) make use of only the index
access method on T and pre-chosen implementation
for RS and ST joins
(c) allows all access methods (tuples from AM are
routed to SteMs, rather than joins) and allows a
variety of routing decisions that permit
different join algorithms and join order
Q4 Decomposing of Join operator brings about
adaptation. Why?

Does the routing framework work?
(Appendix) Showed all SPJ can be executed by
routing tuples carefully between AM, SteMs and
selections
Caution
Arbitrary routing results in
Duplicate results
Missing results
Infinite loops
Solution
Flexibility comes at the price of Routing
constraints
Proposed Routing constraints

13
Framework Components (overview)

Four kinds of modules
Selection modules Query predicate
Access modules Access method over data source
State modules Encapsulate data structure in
traditional join algorithms
Eddy modules Route tuples between the other
modules
Each module runs asynchronously

14
Functionality of Main Modules
15
Query Planning

Check that the query is valid
Create an AM on each access method
Create a SM on each predicate
Create a SteM on each base table
Create any seed tuples needed for scans

16
Example of N-way Symmetric Hash Join

Demonstrate how to implement n-way symmetric join
with SteMs

Q5 Comparing (ii) with (i), which one will you
choose?
17
Executing Arbitrary SPJ Queries with SteMs
1. Acyclic SPJ queries with single scan AM on
each table Example n-ary SHJ Required
Rules SteMs implemented with hash
indices. Eddy obeys Routing Constraints BuildFi
rst Singleton tuple from table T must first be
routed to build into SteMT SteM BounceBack All
Build tuples and NO Probe tuples Atomicity
Build and Probing Coupled BoundedRepetition No
tuple routed to same module more than once.
18
Relax constraints to allow other Join
Algorithms SteMs NEED NOT be implemented with
hash indices. Build and Probe operations
decoupled Potential problems?
19

2. Competitive AMs
Example Queries with more than one AM.
Goal Run multiple AMs/ source and let Eddy
dynamically choose one AM or switch between AMs
Duplicacy problem?
Required Rules
SteM BounceBack A SteMS must bounce back a build
tuple s unless it is a duplicate of another s
that is already in SteMS.

3. Index AMs
When a data source has an index AM.
potential problem?
Required Rules
SteM BounceBack
A SteMS must bounce back a build tuple s unless
it is a duplicate of another s that is already
in SteMS.
A SteMS must bounce back a probe tuple r unless S
has a scan AM, or SteMS already contains all
matches for r.

More Adaptation
4. Cyclic Queries
Static spanning tree choices hurt in two ways
The spanning tree choice is typically made based
on selectivities
A static spanning tree choice can also constrain
the generation of partial query results
Required Rules
ProbeCompletion Constraint A tuple t that has
been bounced back after probing into a SteMS must
not probe into any other SteM afterwards. The
routing policy must however maintain t in the
dataflow, routing it to other modules, until it
has been probed into an AM on S.
Prior Probers and Probe Completion Table
5. Relaxing the BuildFirst Constraint
if one of the input tables is much larger than
the others?

22
Summary of Constraints
23
Conclusion

The salient points of our experimental study are
as follows
Even a simple join algorithm like the index join
encapsulates multiple physical operations, and
this causes
A head-of-line blocking problem. This problem can
be avoided by breaking the join module into
SteMs.
SteMs allow the Eddy to efficiently learn between
competitive access methods, while doing almost no
redundant work.
SteMs allow the Eddy to dynamically choose the
join spanning tree for cyclic queries.
SteMs allow the Eddy to dynamically switch
between an index join algorithm and a symmetric
hash join algorithm during query execution.
With SteMs, the Eddy can adaptively choose the
way it reorders tuples in interactive
environments.
Thank you ?