Title: Multi-Query Optimization and Applications
1Multi-Query Optimization andApplications
- Prasan Roy
- Indian Institute of Technology - Bombay
2Motivation
- Queries often involve repeated computation
- Queries on overlapping views, stored procedures,
nested queries, etc. - Update expressions for a set of overlapping
materialized views - Automatically generated queries
- XML-QL complex path expressions ? SQL query
batches - Our focus Faster query processing by avoiding
repeated computation
3Outline
- Multi-query optimization
- Application to related problems
- Query result caching
- Materialized view selection and maintenance
- Conclusions and future work
4Multi-Query Optimization
Prasan Roy, S. Seshadri, S. Sudarshan and
Siddhesh Bhobe, Efficient and Extensible
Algorithms for Multi-Query Optimization, ACM
SIGMOD 2000
5Motivating Example
Best Plan for A JOIN B JOIN C
Best Plan for B JOIN C JOIN D
100
100
10
B
A
100
100
10
C
B
C
D
10
10
10
10
Total Cost 460
Foreign Key Dependency A?B?C?D
6Motivating Example
100
100
10
10
BC
10
D
A
10
10
100
B
C
10
10
Total Cost 370 Benefit 90
Foreign Key Dependency A?B?C?D
7Problem Statement
- Find the cheapest plan exploiting transiently
materialized common subexpressions (CSEs) - Assumption No shared pipelines
D
A
B
C
Common Subexpression
8Problems
- Locally optimal subplans may not be globally
optimal - Mutually exclusive alternatives
- (A JOIN B JOIN C)
- (B JOIN C JOIN D)
- (C JOIN D JOIN E)
- What to share (B JOIN C) or (C JOIN D) ?
- Materializing and sharing a CSE not necessarily
cheaper
9Example
Best Plan for A JOIN B JOIN C
Best Plan for B JOIN C JOIN D
10
100
10
B
A
1
10
10
C
B
C
D
10
1
1
1
Total Cost 154
Foreign Key Dependency A?B?C?D
10Example
100
10
10
10
BC
D
10
A
1
10
10
B
C
10
1
Total Cost 172 Benefit -18
Foreign Key Dependency A?B?C?D
11Approach
- Set up the search space of execution plans
- Explore the search space to find the best
execution plan
12Representation of Plan Space
BCD
ABC
Equivalence Class (OR node)
Operation (AND node)
BC
CD
AB
Example Plan (Solution Graph)
A
C
D
B
13DAG Generation ModificationsUnification
- Volcano Duplicate subexpressions ? No CSEs!
ABC
BCD
BC
BC
AB
CD
C
A
B
C
D
B
- Modification Duplicate subexpressions unified
14DAG Generation ModificationsSubsumption
- Volcano No expression subsumption ? Missed CSEs
?(Agt50)
?(Alt10)
?(Agt50)
Subsumption derivation
?(Alt10)
?(Agt50)
?(Agt50)
?(Alt10 or Agt50)
?(Agt10)
- Modification Subsumption derivations introduced
15Exploring the Search SpaceAn Exhaustive Algorithm
- Input DAG for query Q
- Output Set of nodes to materialize, corresp.
best plan - Y set of equivalence nodes in DAG
- Pick X ? Y which minimizes BestCost(Q, X)
- Return X
- BestCost(Q, X) cost of the best plan for Q
given that the nodes - in X are transiently materialized
Too expensive! Need heuristics.
16Exploring the Search SpaceA Greedy Heuristic
- Input DAG for query Q
- Output Set of nodes to materialize, corresp.
best plan - X Y set of equivalence nodes in DAG
- While( Y ? )
- Pick z ?Y which maximizes Benefit(z Q, X)
- If( Benefit(z Q, X) gt 0 )
- Y Y z X X U z
- Else Y
- Return X
- Benefit(z Q, X) BestCost(Q, X) - BestCost(Q,
X U z) - Appeared in Gupta, ICDT97. Our Contribution
improve efficiency
17Improving EfficiencySummary
- Input DAG for query Q
- Output Set of nodes to materialize, corresp.
best plan - X Y set of equivalence nodes in DAG
- While( Y ? )
- Pick z ?Y which maximizes Benefit(z Q, X)
- If( Benefit(z Q, X) gt 0 )
- Y Y z X X U z
- Else Y
- Return X
- Restrict the set of materialization candidates
- Compute Benefit efficiently
- Heuristically avoid computing Benefit for some
nodes
?
??
18Improving EfficiencyOnly CSEs Materialized
- CSEs identified in a bottom-up traversal
BCD
ABC
Common Subexpression
BC
CD
AB
A
C
D
B
19Improving EfficiencySummary
- Input DAG for query Q
- Output Set of nodes to materialize, corresp.
best plan - X Y set of equivalence nodes in DAG
- While( Y ? )
- Pick z ?Y which maximizes Benefit(z Q, X)
- If( Benefit(z Q, X) gt 0 )
- Y Y z X X U z
- Else Y
- Return X
- Restrict the set of materialization candidates ?
- Compute Benefit efficiently
- Heuristically avoid computing Benefit for some
nodes
?
??
20Efficient Benefit Computation
Incremental Re-optimization
- X Set of CSEs already materialized
- z unmaterialized CSE
- Best plan given X materialized ?
- Best plan given X U z materialized
- Observation
- Best plans change only for the ancestors of z
21Incremental Re-optimization Example
BCD
ABC
230
230
?
?
230
230
120
120
z (B JOIN C)
100
100
100
100
10
10
BC
BC
10
130
CD
AB
100
100
100
Best Plan
C
B
A
D
10
10
10
10
22Incremental Re-optimization Efficient Propagation
- Ancestor nodes visited bottom-up in a topological
order - Guarantees no revisits
- Propagation path pruned if the current nodes
best cost remains unchanged
23Improving EfficiencySummary
- Input DAG for query Q
- Output Set of nodes to materialize, corresp.
best plan - X Y set of equivalence nodes in DAG
- While( Y ? )
- Pick z ?Y which maximizes Benefit(z Q, X)
- If( Benefit(z Q, X) gt 0 )
- Y Y z X X U z
- Else Y
- Return X
- Restrict the set of materialization candidates ?
- Compute Benefit efficiently ?
- Heuristically avoid computing Benefit for some
nodes
?
??
24Avoiding Benefit Computation
- Monotonicity Assumption
- Benefit of a node does not increase due to
materialization of other nodes - Often true
- An earlier benefit of a node is an upper bound on
its current benefit - Do not recompute a nodes benefit if another
nodes current benefit is greater
Optimization costs decrease by 90
25Experimental Results
- TPCD-0.1 on Microsoft SQL Server 6.5
- using SQL rewriting for MQO
26Alternatives to Greedy Volcano-SH
- A lightweight post-pass heuristic
- Compute the best plan for each query
independently, using Volcano - Find the set of nodes in the best plans to
materialize (cost-based) - Similar previous work
- Subramanium and Venkataraman, SIGMOD 1998
27Alternatives to Greedy Volcano-RU
- A lightweight extension of Volcano
- Batched queries optimized in sequence Q1, Q2, ,
Qn - Find the best plan for query Qi given the best
plans for queries Qj, j lt i - Cost based materialization of nodes in best plans
of Qj, j lt i - Plan quality sensitive to the query sequence
28Experimental Results
29Experimental Results
30Features
- Easily implemented
- First MQO implementation integrated with a
state-of-the-art optimizer (as far as we know) - Also partially prototyped on Microsoft
SQL-Server - Support for index selection
- Index modeled as physical property
(like interesting order) - Extensible and flexible
- New operators, data models
- Readily adapts to other problems
- Query result caching
- Materialized view selection/maintenance
31Query Result Caching
P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy
and S. Sudarshan, Dont Trash Your Intermediate
Results, Cache em, Submitted for publication
32Problem Statement
- Minimize the total execution time of an online
workload by - Caching intermediate/final results of individual
queries, and - Using these cached results to answer later
queries
33System Model
34Contributions
- Intermediate as well as final results cached
- Optimizer-driven cache management
- Adapts to workload changes
- Cache-aware cost-based optimization
- Novel framework for cached result matching
35Experimental Results
- Overheads negligible
- Performance on 900 query TPCD-1 based uniform
cube-point workload
36Materialized View Selection and Maintenance
Hoshi Mistry, Prasan Roy, K. Ramamritham and S.
Sudarshan, Materialized View Selection and
Maintenance Using Multi-Query Optimization, Submit
ted for publication
37Problem Statement
- Speed up maintenance of a set of materialized
views by - Exploiting CSEs between different view
maintenance expressions - Selecting additional views to be materialized
38Contributions
- Optimization of maintenance expressions
- Support for transiently materialized delta
views - Nicely integrates transient vs permanent view
materialization choices
39Experimental Results
- Overheads negligible
- Performance benefit for maintenance of two
TPCD-0.1 based SPJA views
40Conclusion
- MQO is practical
- Low overheads, high benefits
- Easily implemented and integrated
- Leads to novel solutions to related problems
- Query result caching
- Materialized view selection and maintenance
41Future Work
- Further extensions of MQO
- Shared execution pipelines
- Query result caching in presence of updates
- Other problems
- Continuous queries, XML view caching, etc.
42Other Contributions
- Garbage Collection in Object Oriented Databases
- Developed a transaction-aware cyclic reference
counting algorithm - Provided a formal proof of correctness
S. Ashwin, Prasan Roy, S. Seshadri, Avi
Silberschatz and S. Sudarshan, Garbage Collection
in Object-Oriented Databases Using Transactional
Cyclic Reference Counting, VLDB 1997 Prasan
Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan
and S. Ashwin, Garbage Collection in
Object-Oriented Databases Using Transactional
Cyclic Reference Counting, Invited Paper, VLDB
Journal, August 1998