PhD Defense Parametric Query Optimization - PowerPoint PPT Presentation

1 / 65

About This Presentation

Title:

PhD Defense Parametric Query Optimization

Description:

PhD Defense. Parametric Query Optimization. Arvind Hulgeri. Dept of ... plan is added to CSOP, we need to carve out region for the new plan; it involves ... – PowerPoint PPT presentation

Number of Views:372

Avg rating:3.0/5.0

Slides: 66

Provided by: ARU52

Category:

more less

Transcript and Presenter's Notes

Title: PhD Defense Parametric Query Optimization

1
PhD DefenseParametric Query Optimization

Arvind Hulgeri
Dept of Computer Science and Engg.
Indian Institute of Technology Bombay

2
Parametric query An example

Select
From A, B
Where A.x B.y and A.z lt ? And B.w lt ?
Example of parametric cost function
f a1.s1 a2.s2 a3
Where, s1 selectivity of predicate A.z lt ?
s2 selectivity of predicate B.w
lt ?
a1, a2, a3 are constants
JDBC/ODBC prepared statements

3
Conventional Opt v/s PQO

Conventional optimization
Assumes complete knowledge of all cost parameters
E.g. selectivity and resource availability
Generates a single optimal plan for a given query
Parametric query optimization (PQO)
Generates multiple candidate plans, each optimal
for some region of the parameter space
POSP Parametrically optimal set of plans
Picks appropriate plan at run time
Extra optimization effort amortized over multiple
invocation of the query with different parameters

4
PQO A 1-parameter example
cost
0
1
parameter
5
Thesis Contributions

We classify cost functions as
linear, piecewise linear and non-linear
PQO for linear cost functions
Recursive decomposition algorithm
Cost polytope algorithm
PQO for piecewise linear cost functions
Extend a conventional query optimizer
PQO for non-linear cost functions
AniPQO Almost Non-Intrusive PQO
Memory Cognizant Query Optimization

6
PQO Solution

Finding POSP
Or, as a heuristic, a subset thereof
Picking an appropriate plan from POSP at runtime
Index the parameter space decomposition
Use the index to find the appropriate plan from
POSP
Or, evaluate the cost of each plan in POSP
Optimization using an AND-OR DAG framework

7
Parametric Query Optimization for Linear Cost
Functions
8
PQO for Linear Cost Functions

Our solutions use a conventional optimizer as a
subroutine
The solutions work for arbitrary number of
parameters
Assumptions
The conventional optimizer returns the cost
function of the optimal plan
The parameter space of interest is a closed
convex polytope
Parameter space polytope

9
Polytope Examples

Convex polytope intersection of halfspaces

Lower convex polytope
Convex polytope
10
Properties of Linear Cost Functions Ganguly,
VLDB98

If all the vertices of a polytope in the
parameter space have same optimal plan then the
plan is optimal at all points within that
polytope
Each plan in POSP has only one region of
optimality and the region is a convex polytope.

11
Recursive Decomposition Algorithm

Start with the parameter space of interest
parameter space polytope
Optimize the vertices of the polytope using a
conventional query optimizer
If two of the vertices of a polytope have two
different optimal plans then
Partition the polytope into two polytopes
Continue recursively

12
Shortcomings of the recursive decomposition
algorithm

May over-partition the parameter space and may
need to merge partitions in a post-pass.
We can reduce number of calls to the conventional
optimizer using cost polytope algorithm

13
Cost Polytope Algorithm

Based on an online polytope construction
algorithm
The cost function of each plan is represented by
a hyperplane in Rn1
N parameter dimensions 1 cost dimension
Construct a lower convex polytope that represents
the optimal cost at each point in the parameter
space

14
Cost Polytope An Example
Cost
b
a
c
Parameter
15
Cost Polytope Algorithm

Start with a initial cost polytope
Put vertices of the parameter space polytope into
a queue of vertices to be optimized
Repeat till the queue is empty
Remove and optimize the first vertex in the queue
Intersect the cost hyperplane with the cost
polytope
Project new vertices of the cost polytope onto
parameter space and insert the projection points
into the queue

16
Cost polytope algorithm An example
Cost
Parameter
Not optimized
Currently optimized
Already optimized
17
Cost polytope algorithm An example
Cost
a
Parameter
Not optimized
Currently optimized
Already optimized
18
Cost polytope algorithm An example
Cost
c
a
Parameter
Not optimized
Currently optimized
Already optimized
19
Cost polytope algorithm An example
Cost
b
a
c
Parameter
Not optimized
Currently optimized
Already optimized
20
Faces and facets of a polytope
faces 2-faces U 1-faces U 0-faces
21
Complexity of Cost Polytope Algorithm

Cost polytope algorithm makes a maximum of F
calls to the optimizer
The lower bound on the number of calls is v
Under certain assumptions, the expected number of
calls is (f v)
In general, in high-dimension, f ltlt v

22
Parametric Query Optimization for
Piecewise-linear Cost Functions
23
Piecewise Linear Cost Functions
Cost
Parameter

PQO solutions for linear case do not extend to
piecewise linear case

24
Piecewise Linear Cost Function
Cost
Parameter

Partition the parameter space into convex
polytopes
Within each partition the cost function is linear
in the parameters
But pre-partitioning the space to make all cost
functions linear in each partition is impractical

25
PQO Algorithm for Piecewise Linear Cost Functions
(PLCF)

Extend a conventional query optimizer
(System-R or Volcano)
Extensions are intrusive to query optimizer
Partition space only when necessary (on demand)
Extend plan cost
Cost ? Cost function
Extend comparison of alternative operators or
plans
Pick min cost plan ? MinMergeCostFunctions
Extensions work for arbitrary number of parameters

26
MinMergeCostFunction An example
Cost
Parameter
27
MinMergeCostFunction An example
Cost
Parameter
28
Extending System-R Algorithm

Extended System-R algorithm is exactly same as
basic System-R algorithm except
Replace cost by cost function
Use AddCostFunction instead of simple cost
addition
Use MinMergeCostFunction instead of simple cost
comparision

29
AniPQOAlmost Non-Intrusive Parametric Query
Optimizationfor Nonlinear Cost Functions
30
Features of AniPQO

Works with arbitrary nonlinear and discontinuous
cost functions
Experimental evaluation suggests that it works
well for standard cost models for relational
operators
Conceptually works for arbitrary number of
parameters
Experimental evaluation suggests that it is
practical for up to 4 parameters
Is minimally-intrusive

31
Assumptions

The parameter space of interest is a closed
convex polytope
Parameter space polytope
The (cost estimation component of the)
conventional optimizer is extended so as to
return the cost of a given plan at a given point
in the parameter space

32
Algorithm to find POSP

Starts with
CSOP Ø / current set of optimal plans /
Decomposition parameter space polytope
At each iteration
A non-optimized vertex of the current
decomposition is optimized
If optimization returns a new plan
The new plan inserted in CSOP
Parameter space decomposed is modified based on
new CSOP
Parameter space decomposition is the partitioning
of parameter space into regions s.t. for each
region a plan from CSOP is optimal throughout the
region
AniPOSP CSOP / AniPOSP ? POSP /

33
Algorithm Iterations
b
a
c
34
Algorithm Iterations
b
a
C
c
35
Algorithm Iterations
b
a
A
B
C
C
D
d
c
36
Algorithm Iterations
b
a
A
B
C
C
D
d
c
37
Algorithm to find POSP

For linear cost functions
The above algorithm is exact and finds the
complete POSP
For nonlinear cost functions
May not find all plans in POSP
We use it as a heuristic

38
AniPQO contribution

Handling parameter space decomposition with
nonlinear cost functions
Optimality threshold (t)
AND-OR-DAG representation of the plan
alternatives
Reduces run-time overhead of plan choice
Increases the quality of the heuristic solution
Integrating it with the conventional optimizer
faster optimization

39
Optimality threshold (t)

If the cost of a plan at a point is close to
optimal, we can treat the plan as optimal at the
point
We modify the algorithm as follows
If optimizing a vertex returns a new plan
if cost of no plan in CSOP is within a factor t
of the cost of the new plan at the point
Parameter space polytope is decomposed afresh
based on new CSOP
The new plan is inserted in CSOP
This brings down calls to the conventional
optimizer

40
Maintaining decomposition

Hard to find exact decomposition
We approximate non-linear regions to convex
polytopes
Maintain only edge skeleton of decomposition
Edge skeleton consists of vertices and lines
connecting the vertices
Disregard higher dimensional faces
The edge skeleton induced by a set of plans, say
CSOP, can be constructed given
Decomposition vertices
Vertex tagging
The subset of plans (in CSOP) that are optimal at
the vertex

41
Vertex tagging
42
Vertex tagging ? Edge skeleton
43
Updating edge skeleton

When a new plan is added to CSOP, we need to
carve out region for the new plan it involves
Identifying conflicting edges
Edges in the decomposition, such that the new
plan is optimal at one end and sub-optimal at the
other end
Finding new decomposition vertices
One-to-one correspondence between the conflicting
edges and the new vertices in the parameter space
decomposition
Finding equi-cost point for a given set of plans

44
Conflicting edge ? New vertex
ae
a
e
A
ab
E
abf
abe
bde
B
de
bcd
D
bc
C
c
d
cd
45
Storing Plans in DAGExample Two separate plans
EqClass3
EqClass3
EqClass2
EqClass1
EqClass2
EqClass1
S12
S11
S22
S21
Plan 1
Plan 2
Operator
Sub-plans
46
Storing Plans in DAG Example Combined DAG
EqClass3
EqClass2
EqClass1
S12
S11
S22
S21
Operator
Sub-plans
47
Advantages of storing AniPOSP in DAG

Reduced effort in picking a plan at run-time
Two plans sharing a operator/subplan
Choosing a plan not in AniPOSP
Combining parts of different plans in AniPOSP
result in a valid plan

48
Conventional Optimizer

Volcano based optimizer developed at IIT Bombay
Generates bushy plans
Standard techniques for estimating cost using
statistics about relations
I/O components
CPU components
Extended to return the cost of a plan at a given
point in the parameter space

49
Database and Queries

TPCD database with scale factor 1 (1GB)
With and without indices on the primary keys of
the relations involved
Parameters are selectivities of range predicates
table.column lt parameter_value
SPJ queries tested
Involved 2 to 5 relations
Involved 2 to 4 parameters
Query R4P4 SPJ query with 4 relations and 4
parameterized range predicates

50
Results for query R4P4 with no indices

POSP 134
Sum of operators across the plans 1816
operators in the DAG 85
AniPOSP 49
DAG-AniPOSP POSP 87
where DAG-AniPOSP is the set of plans in the DAG
built using AniPOSP

U
51
Result quality for query R4P4with no indices
Plans
U
Maximum Degradation ()
U
52
Optimization Overhead for query R4P4 with no
indices
53
Integration with the optimizer

Loose integration
Separate invocation of the optimizer for each
parameter value
Tight integration
Optimizer equivalence rules are applied only once
Resultant DAG of equivalent plans is used
repeatedly to find optimal plan with different
parameter values

54
Execution time for query R4P4

A single invocation of the underlying optimizer
takes about 16ms
AniPQO time (ms)

55
Related Work

Graefe and Karen SIGMOD'89 Cole and Graefe
SIGMOD'94 Ioannidis, Ng, Shim and Sellis
VLDB'92
Ganguly and Krishnamurthy COMAD'94 Sumit
Ganguly VLDB'98 Sumit Ganguly Personal
Communication, 01
Betawadkar IITK97 Prasad IITK97 Rao
IITK97
Ghosh et. al. VLDB02

56
Conclusion

PQO for linear cost functions
Simple and minimally intrusive
Works for arbitrary number of parameters
PQO for piecewise linear cost functions
Intrusive
Works for arbitrary number of parameters
General since nonlinear and discontinuous cost
functions can be approximated to piecewise linear
form

57
Conclusion (contd.)

AniPQO
Works with arbitrary nonlinear and discontinuous
cost functions
Conceptually works for arbitrary number of
parameters
Is minimally-intrusive
Uses AND-OR-DAG representation of the plan
alternatives

58
Questions?
http//www.cse.iitb.ac.in/aru
59
When heuristic fails
p
p
parameter
60
Backup Slides
61
Query R4P4

select partsupp.ps_suppkey
from partsupp, supplier, nation, region
where partsupp.ps_suppkey supplier.s_supp
key
and supplier.s_nationkey nation.n_nationk
ey
and nation.n_regionkey region.r_regionkey
and ps_partkey lt 1
and s_suppkey lt 2
and n_nationkey lt 3
and region.r_regionkey lt 4

62
Memory Cognizant Query OptimizationAn overview
63
Why Memory Cognizant Optimizer?

Memory intensive operators
Typical optimizers assume all the memory to be
available to each operator in the query tree
Wrong assumption

64
Why Memory Cognizant Optimizer? (contd)

Memory will get divided amongst all the operators
running simultaneously in a pipeline
Cost of a plan depends upon this memory division

65
Contributions

We develop efficient techniques to divide the
available memory optimally among operators in a
pipeline
We show how to make a cost-based decision of
breaking (i.e. converting) a pipelined edge into
a blocking edge
We develop practical memory cognizant query
optimization algorithm

Write a Comment

User Comments (0)