Title: Exploiting Similarity of Subqueries for Complex Query Optimization1
1COSC6421 Advanced Database Systems
Instructor Jarek Gryz
Exploiting Similarity of Subqueries for Complex
Query Optimization1
Presented by Qiong Wang
York University
2Outline
- Introduction-Why exploiting similarity
- Description of Technique
- Experimental Result
- Advantages and Limitation Discussion
- Related work
- Summary
3Background Introduction
- User queries become more and more complex, such
as increasing complexity of the operations and
the query structure. - Query optimizers are seeking an efficient query
execution plan( QEP) in DBMS. But, there are some
limitations - Limitations of two types of current algorithms
determining a optimal join orders - Dynamic programming -gt Optimal Solution but
long optimization time - Heuristics Based Algorithm-gt Good time complexity
but suboptimal solution - Limitations of some other studies which can find
a good plan - May not make use of some special characteristics
of the underlying queries - Only exploiting the common sub-queries in a query
4Technique Introduction
- A new similarity-based optimization technique.
- Takes the structural characteristics of a complex
query into consideration - The key idea
- Identify groups of similar sub-queries that often
appear in a complex query - Share the optimization result within each group
in the query
5Technique Description
- Preliminary
- Exploiting Similarity of Sub-Query
- Optimizing Query
6A query Q
Preliminary
-
- TR1,R2,R3,,Rm The set of Base Tables
referenced in Q - P p1,p2,p3,pn The set of predicates
referenced in Q - Table Instance - each table reference in Q
7 Preliminary
- (1) A Query Graph G
- V the set of all vertices in G lt Table
Instances - For x ? V, R ? T , mappingd(x) R
- E the set of all edges in G lt Predicates
- For e ? E, c ? 2P , e is labeled with ? (e)
c -
- If there is at least one predicate in P involving
vertices - x and y, then query graph G has an edge e
between x and y - The set of all predicates involving x and y
labeled on the edge e
8Constructing Query graph
- An example query graph
- G (V,E,T,P,df)
- Sizeof(x) Size of the table represented by
x - Sel (e) Selectivity of ?(e), i.e., the
selectivity of the conjunction of - all the simple predicates
in ?(e)
9 Preliminary
- A data structure which represents a query graph
- Every vertex x in query graph G has a node x
- Node x has a set of adjacent nodes y1, y2, ...,
yn - A ring x ? y1 ? y2 ? ... ? yn ? x
- to represent such an adjacency
relationship. - Node x is called the owner (node) of the ring,
- Nodes y1, y2, ..., yn are called the members of
the ring.
10 Preliminary
- (3) Similar Sub-query Graphes
Suppose we have two sub-query graphs G(V
,EV , TV , PV , dV , ?V ) G(V,
EV, TV, PV, dV, ?V) Rt
Error bound for table size Rs Error
bound for condition selectivities
11Constructing Query graph
- Similar Sub-query Graphes (contd)
If G and G satisfy the following conditions,
they are regarded as a pair of similar sub-query
graphs with respect to error bounds rt and rs,
G (rt,rs)G
12Example of a query graph with similar sub-queries
13Exploiting Similarity of Sub-Queries
- Algorithm Design Idea
- Identify pairs of similar sub-queries with
respect to the given error bounds rt and rs - Optimize one sub-query in a similar pair, and map
and apply the resulting execution plan to the
other sub-query - Replace similar sub-queries with their
(estimated) result tables in the query graph and
optimize the resulting modified query.
14Exploiting Similarity of Sub-Queries
- Algorithm Explanation
- Assume all self-loops are
removed
1. Choosing Starting Nodes (1) Construct
ring network and similarity starting lists
15Exploiting Similarity of Sub-Queries
Similarity Starting List
OL2
Each list is for one base table and has a header
containing -Base Table Name Ri, -Two
sublists OLi all its instances (nodes) SLi
--other table instances whose sizes are
within error bound rt with respect to
the size of Ri
SL2
Base Table R2
16Choosing Starting Nodes (contd)
- (2) In order to find a pair of as large as
possible similar subquery graphs, we should
choose a pair of nodes with the maximum number of
adjacent node pairs. -
- Because the larger the similar subquery graphs in
a pair, the more the optimization work can be
shared.
17How to Choose Starting Nodes
For any pair of potential starting nodes x and y
selected from a similarity starting list, we have
a formula
m be the size of set T for query Q
choosing a pair of starting nodes is to choose
the pair that maximizes the value of formula (1).
18How to Choose Starting Nodes
- Two indicator arrays O (occurrence) and S
(similarity) - For each table instance
- The lengths of arrays O and S are the size
of T - Oxi indicates
- current adjacent nodes representing base
table Ri -
- Sxi indicates
- current adjacent nodes whose table sizes are
- within the given error tolerance with respect
to Ri - excluding Oxi.
19How to Choose Starting Nodes (contd)
20Searching for Similar Sub-query Graphs
- 2. For all the unselected nodes (x1 and y1) in
the rings of x, y - Check if sizeof(x1) and sizeof(y1) are within
error bound rt - Check if adding x1 and y1 into the current
sub-query pair graph will violate the similarity
of sub-queries or not - Add x1, y1 into sub-query pair graphs
21Searching for Similar Sub-query Graphs (Contd)
- 3. Use two threshold values c1 and c2 to
determine whether to accept, reject, or hold a
pair of similar sub-query graphs. - Let n be the number of nodes in a similar
subquery graph - (1) if n c2, then the new pair of similar
subquery graphs is accepted - (2) if c1 n lt c2, then we put this pair on
hold - (3) if n lt c1, then the new pair is rejected.
22- 4. Remove all the nodes in the accepted graphs
and choose another starting nodes and repeated
the procedure until no pair of nodes can be
expanded.
235. Optimizing Query
- Apply an optimization algorithm to optimize one
of the sub-query graphs in each pair. - Map the execution plan for one sub-query to the
one for its partner - Example, G1 and G2
- Nodes x1 ? y1, x2 ? y2, x3 ? y3,
- By optimizing G1, we get such a plan
((x1x2) x3). - The mapped plan for G2 is ((y1y2) y3).
24Experimental Result
Comparison of I/O costs for execution plans
generated by two techniques For a set
of randomly-generated test queries.
25Effect of changing error bounds on I/O costs
26Effect of changing error bounds on I/O costs
- Very small error bounds cannot yield good
performance, since the smaller the error bounds,
the smaller the similar sub-queries - Moderate error bounds (0.15 - 0.40) yield the
best performance, since similar sub-queries with
reasonable sizes can be found - Large error bounds lead to poor performance,
since sub-queries are less similar in such cases,
which makes that sharing execution plans between
them may not be appropriate - Very large error bound (close to 1), the
performance of the similarity-based technique
stays at the same level. - The sizes of similar sub queries may reach
their maximums - when the error bounds are beyond a certain
limit.
27Advantages and Limitation Discussion
- Advantages
- Take the features of original structures of
complex query into consideration and design a
query graph - Exploiting the similarity of Sub-queries beyond
common sub-query, reduce the optimization time by
sharing the same mapped plan - Reduce the query complexity before apply any
optimization algorithm. - Limitation
- According to the experimental result, the
selectivity of similarity sub-queries depends on
the error bound. - 2. The paper didnt mention the application of
this technique in multiple queries.
28Related Work
- (1) The author also introduced a technique to
optimize a complex query by exploiting common
sub-queries recently. - This technique is shown to be more effective
than a pure randomization based method since it
takes the structural characteristics of a complex
query into consideration. - (2) Some other Algorithms used to optimize large
query - Iterative improvement (II)
- Simulated Annealing (SA)
- Tabu Search (TS)
- AB algorithm (AB)
- Generic Algorithm (GA)
29Related Work
- (2) Related topic-Multiple-Query
-
- A single query given to a system may result in
multiple queries - Queries are given to the system from various
users. Then batching all users requests is
possibly required. - Some parts of the query are automatically
generated by programs
30Multiple Query Optimization
- Major issue in multiple-query processing is
the redundancy due to accessing the same data
multiple times in different queries.2 - Recognize the possibilities of sharing
- Exploiting common sub-expressions and reduce
evaluation cost - Reusing Materialized or intermediate results from
other computation - Example Volcano-SH and RU 3
- Modify the optimizer search strategy and find a
globally optimal plan. - Example IE (Interleave Execution) and HA
(Heuristic Algorithm) 2
31Summary
- Described a new query optimization technique for
complex query, exploiting similarity of
Sub-queries - Algorithm Design Idea
- Query Graph, Ring Network, Similar Sub-query
Graph - Discussed experimental result, advantages and
limitation - Related work Discussion
32Reference
- 1 Yingying Tao, Qiang Zhu and Calisto Zuzarte.
Exploiting Similarity of Subqueries for Complex
Query Optimization, LNCS 2736, Jan 2003, Pages
747 - 759 - 2 TIMOS K. SELLIS and C. Lin. Multiple Query
Optimization, ACM TODS13(1) p23-52 (1988) -
- 3 Prasan Roy, S. Seshadri, S. Sudarshan,
Siddhesh Bhobe. Efficient and Extensible
Algorithms for Multi Query Optimization, ACM 2000
1-58113-218-2/00/05 -
- 4 Jamal R.Alsabbagh, Vijay V. Raghavan.
Analysis of Common Subexpression Exploitation
Models in Multiple-Query Processing -
33Thank You!