Title: PhD Defense Parametric Query Optimization
1PhD DefenseParametric Query Optimization
- Arvind Hulgeri
- Dept of Computer Science and Engg.
- Indian Institute of Technology Bombay
2Parametric query An example
- Select
- From A, B
- Where A.x B.y and A.z lt ? And B.w lt ?
- Example of parametric cost function
- f a1.s1 a2.s2 a3
- Where, s1 selectivity of predicate A.z lt ?
- s2 selectivity of predicate B.w
lt ? - a1, a2, a3 are constants
- JDBC/ODBC prepared statements
3Conventional Opt v/s PQO
- Conventional optimization
- Assumes complete knowledge of all cost parameters
- E.g. selectivity and resource availability
- Generates a single optimal plan for a given query
- Parametric query optimization (PQO)
- Generates multiple candidate plans, each optimal
for some region of the parameter space - POSP Parametrically optimal set of plans
- Picks appropriate plan at run time
- Extra optimization effort amortized over multiple
invocation of the query with different parameters
4PQO A 1-parameter example
cost
0
1
parameter
5Thesis Contributions
- We classify cost functions as
- linear, piecewise linear and non-linear
- PQO for linear cost functions
- Recursive decomposition algorithm
- Cost polytope algorithm
- PQO for piecewise linear cost functions
- Extend a conventional query optimizer
- PQO for non-linear cost functions
- AniPQO Almost Non-Intrusive PQO
- Memory Cognizant Query Optimization
6PQO Solution
- Finding POSP
- Or, as a heuristic, a subset thereof
- Picking an appropriate plan from POSP at runtime
- Index the parameter space decomposition
- Use the index to find the appropriate plan from
POSP - Or, evaluate the cost of each plan in POSP
- Optimization using an AND-OR DAG framework
7Parametric Query Optimization for Linear Cost
Functions
8PQO for Linear Cost Functions
- Our solutions use a conventional optimizer as a
subroutine - The solutions work for arbitrary number of
parameters - Assumptions
- The conventional optimizer returns the cost
function of the optimal plan - The parameter space of interest is a closed
convex polytope - Parameter space polytope
9Polytope Examples
- Convex polytope intersection of halfspaces
Lower convex polytope
Convex polytope
10Properties of Linear Cost Functions Ganguly,
VLDB98
- If all the vertices of a polytope in the
parameter space have same optimal plan then the
plan is optimal at all points within that
polytope - Each plan in POSP has only one region of
optimality and the region is a convex polytope.
11Recursive Decomposition Algorithm
- Start with the parameter space of interest
parameter space polytope - Optimize the vertices of the polytope using a
conventional query optimizer - If two of the vertices of a polytope have two
different optimal plans then - Partition the polytope into two polytopes
- Continue recursively
12Shortcomings of the recursive decomposition
algorithm
- May over-partition the parameter space and may
need to merge partitions in a post-pass. - We can reduce number of calls to the conventional
optimizer using cost polytope algorithm
13Cost Polytope Algorithm
- Based on an online polytope construction
algorithm - The cost function of each plan is represented by
a hyperplane in Rn1 - N parameter dimensions 1 cost dimension
- Construct a lower convex polytope that represents
the optimal cost at each point in the parameter
space
14Cost Polytope An Example
Cost
b
a
c
Parameter
15Cost Polytope Algorithm
- Start with a initial cost polytope
- Put vertices of the parameter space polytope into
a queue of vertices to be optimized - Repeat till the queue is empty
- Remove and optimize the first vertex in the queue
- Intersect the cost hyperplane with the cost
polytope - Project new vertices of the cost polytope onto
parameter space and insert the projection points
into the queue
16Cost polytope algorithm An example
Cost
Parameter
Not optimized
Currently optimized
Already optimized
17Cost polytope algorithm An example
Cost
a
Parameter
Not optimized
Currently optimized
Already optimized
18Cost polytope algorithm An example
Cost
c
a
Parameter
Not optimized
Currently optimized
Already optimized
19Cost polytope algorithm An example
Cost
b
a
c
Parameter
Not optimized
Currently optimized
Already optimized
20Faces and facets of a polytope
faces 2-faces U 1-faces U 0-faces
21Complexity of Cost Polytope Algorithm
- Cost polytope algorithm makes a maximum of F
calls to the optimizer - The lower bound on the number of calls is v
- Under certain assumptions, the expected number of
calls is (f v) - In general, in high-dimension, f ltlt v
22Parametric Query Optimization for
Piecewise-linear Cost Functions
23Piecewise Linear Cost Functions
Cost
Parameter
- PQO solutions for linear case do not extend to
piecewise linear case
24Piecewise Linear Cost Function
Cost
Parameter
- Partition the parameter space into convex
polytopes - Within each partition the cost function is linear
in the parameters - But pre-partitioning the space to make all cost
functions linear in each partition is impractical
25PQO Algorithm for Piecewise Linear Cost Functions
(PLCF)
- Extend a conventional query optimizer
- (System-R or Volcano)
- Extensions are intrusive to query optimizer
- Partition space only when necessary (on demand)
- Extend plan cost
- Cost ? Cost function
- Extend comparison of alternative operators or
plans - Pick min cost plan ? MinMergeCostFunctions
- Extensions work for arbitrary number of parameters
26MinMergeCostFunction An example
Cost
Parameter
27MinMergeCostFunction An example
Cost
Parameter
28Extending System-R Algorithm
- Extended System-R algorithm is exactly same as
basic System-R algorithm except - Replace cost by cost function
- Use AddCostFunction instead of simple cost
addition - Use MinMergeCostFunction instead of simple cost
comparision
29AniPQOAlmost Non-Intrusive Parametric Query
Optimizationfor Nonlinear Cost Functions
30Features of AniPQO
- Works with arbitrary nonlinear and discontinuous
cost functions - Experimental evaluation suggests that it works
well for standard cost models for relational
operators - Conceptually works for arbitrary number of
parameters - Experimental evaluation suggests that it is
practical for up to 4 parameters - Is minimally-intrusive
31Assumptions
- The parameter space of interest is a closed
convex polytope - Parameter space polytope
- The (cost estimation component of the)
conventional optimizer is extended so as to
return the cost of a given plan at a given point
in the parameter space
32Algorithm to find POSP
- Starts with
- CSOP Ø / current set of optimal plans /
- Decomposition parameter space polytope
- At each iteration
- A non-optimized vertex of the current
decomposition is optimized - If optimization returns a new plan
- The new plan inserted in CSOP
- Parameter space decomposed is modified based on
new CSOP - Parameter space decomposition is the partitioning
of parameter space into regions s.t. for each
region a plan from CSOP is optimal throughout the
region - AniPOSP CSOP / AniPOSP ? POSP /
33Algorithm Iterations
b
a
c
34Algorithm Iterations
b
a
C
c
35Algorithm Iterations
b
a
A
B
C
C
D
d
c
36Algorithm Iterations
b
a
A
B
C
C
D
d
c
37Algorithm to find POSP
- For linear cost functions
- The above algorithm is exact and finds the
complete POSP - For nonlinear cost functions
- May not find all plans in POSP
- We use it as a heuristic
38AniPQO contribution
- Handling parameter space decomposition with
nonlinear cost functions - Optimality threshold (t)
- AND-OR-DAG representation of the plan
alternatives - Reduces run-time overhead of plan choice
- Increases the quality of the heuristic solution
- Integrating it with the conventional optimizer
- faster optimization
39Optimality threshold (t)
- If the cost of a plan at a point is close to
optimal, we can treat the plan as optimal at the
point - We modify the algorithm as follows
- If optimizing a vertex returns a new plan
- if cost of no plan in CSOP is within a factor t
of the cost of the new plan at the point - Parameter space polytope is decomposed afresh
based on new CSOP - The new plan is inserted in CSOP
- This brings down calls to the conventional
optimizer
40Maintaining decomposition
- Hard to find exact decomposition
- We approximate non-linear regions to convex
polytopes - Maintain only edge skeleton of decomposition
- Edge skeleton consists of vertices and lines
connecting the vertices - Disregard higher dimensional faces
- The edge skeleton induced by a set of plans, say
CSOP, can be constructed given - Decomposition vertices
- Vertex tagging
- The subset of plans (in CSOP) that are optimal at
the vertex
41Vertex tagging
42Vertex tagging ? Edge skeleton
43Updating edge skeleton
- When a new plan is added to CSOP, we need to
carve out region for the new plan it involves - Identifying conflicting edges
- Edges in the decomposition, such that the new
plan is optimal at one end and sub-optimal at the
other end - Finding new decomposition vertices
- One-to-one correspondence between the conflicting
edges and the new vertices in the parameter space
decomposition - Finding equi-cost point for a given set of plans
44Conflicting edge ? New vertex
ae
a
e
A
ab
E
abf
abe
bde
B
de
bcd
D
bc
C
c
d
cd
45Storing Plans in DAGExample Two separate plans
EqClass3
EqClass3
EqClass2
EqClass1
EqClass2
EqClass1
S12
S11
S22
S21
Plan 1
Plan 2
Operator
Sub-plans
46Storing Plans in DAG Example Combined DAG
EqClass3
EqClass2
EqClass1
S12
S11
S22
S21
Operator
Sub-plans
47Advantages of storing AniPOSP in DAG
- Reduced effort in picking a plan at run-time
- Two plans sharing a operator/subplan
- Choosing a plan not in AniPOSP
- Combining parts of different plans in AniPOSP
result in a valid plan
48Conventional Optimizer
- Volcano based optimizer developed at IIT Bombay
- Generates bushy plans
- Standard techniques for estimating cost using
statistics about relations - I/O components
- CPU components
- Extended to return the cost of a plan at a given
point in the parameter space
49Database and Queries
- TPCD database with scale factor 1 (1GB)
- With and without indices on the primary keys of
the relations involved - Parameters are selectivities of range predicates
- table.column lt parameter_value
- SPJ queries tested
- Involved 2 to 5 relations
- Involved 2 to 4 parameters
- Query R4P4 SPJ query with 4 relations and 4
parameterized range predicates
50Results for query R4P4 with no indices
- POSP 134
- Sum of operators across the plans 1816
- operators in the DAG 85
- AniPOSP 49
- DAG-AniPOSP POSP 87
- where DAG-AniPOSP is the set of plans in the DAG
built using AniPOSP
U
51Result quality for query R4P4with no indices
Plans
U
Maximum Degradation ()
U
52Optimization Overhead for query R4P4 with no
indices
53Integration with the optimizer
- Loose integration
- Separate invocation of the optimizer for each
parameter value - Tight integration
- Optimizer equivalence rules are applied only once
- Resultant DAG of equivalent plans is used
repeatedly to find optimal plan with different
parameter values
54Execution time for query R4P4
- A single invocation of the underlying optimizer
takes about 16ms - AniPQO time (ms)
55Related Work
- Graefe and Karen SIGMOD'89 Cole and Graefe
SIGMOD'94 Ioannidis, Ng, Shim and Sellis
VLDB'92 - Ganguly and Krishnamurthy COMAD'94 Sumit
Ganguly VLDB'98 Sumit Ganguly Personal
Communication, 01 - Betawadkar IITK97 Prasad IITK97 Rao
IITK97 - Ghosh et. al. VLDB02
56Conclusion
- PQO for linear cost functions
- Simple and minimally intrusive
- Works for arbitrary number of parameters
- PQO for piecewise linear cost functions
- Intrusive
- Works for arbitrary number of parameters
- General since nonlinear and discontinuous cost
functions can be approximated to piecewise linear
form
57Conclusion (contd.)
- AniPQO
- Works with arbitrary nonlinear and discontinuous
cost functions - Conceptually works for arbitrary number of
parameters - Is minimally-intrusive
- Uses AND-OR-DAG representation of the plan
alternatives
58Questions?
http//www.cse.iitb.ac.in/aru
59When heuristic fails
p
p
parameter
60Backup Slides
61Query R4P4
- select partsupp.ps_suppkey
- from partsupp, supplier, nation, region
- where partsupp.ps_suppkey supplier.s_supp
key - and supplier.s_nationkey nation.n_nationk
ey - and nation.n_regionkey region.r_regionkey
- and ps_partkey lt 1
- and s_suppkey lt 2
- and n_nationkey lt 3
- and region.r_regionkey lt 4
62Memory Cognizant Query OptimizationAn overview
63Why Memory Cognizant Optimizer?
- Memory intensive operators
- Typical optimizers assume all the memory to be
available to each operator in the query tree
Wrong assumption
64Why Memory Cognizant Optimizer? (contd)
- Memory will get divided amongst all the operators
running simultaneously in a pipeline - Cost of a plan depends upon this memory division
65Contributions
- We develop efficient techniques to divide the
available memory optimally among operators in a
pipeline - We show how to make a cost-based decision of
breaking (i.e. converting) a pipelined edge into
a blocking edge - We develop practical memory cognizant query
optimization algorithm