PhD Defense Parametric Query Optimization - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

PhD Defense Parametric Query Optimization

Description:

PhD Defense. Parametric Query Optimization. Arvind Hulgeri. Dept of ... plan is added to CSOP, we need to carve out region for the new plan; it involves ... – PowerPoint PPT presentation

Number of Views:372
Avg rating:3.0/5.0
Slides: 66
Provided by: ARU52
Category:

less

Transcript and Presenter's Notes

Title: PhD Defense Parametric Query Optimization


1
PhD DefenseParametric Query Optimization
  • Arvind Hulgeri
  • Dept of Computer Science and Engg.
  • Indian Institute of Technology Bombay

2
Parametric query An example
  • Select
  • From A, B
  • Where A.x B.y and A.z lt ? And B.w lt ?
  • Example of parametric cost function
  • f a1.s1 a2.s2 a3
  • Where, s1 selectivity of predicate A.z lt ?
  • s2 selectivity of predicate B.w
    lt ?
  • a1, a2, a3 are constants
  • JDBC/ODBC prepared statements

3
Conventional Opt v/s PQO
  • Conventional optimization
  • Assumes complete knowledge of all cost parameters
  • E.g. selectivity and resource availability
  • Generates a single optimal plan for a given query
  • Parametric query optimization (PQO)
  • Generates multiple candidate plans, each optimal
    for some region of the parameter space
  • POSP Parametrically optimal set of plans
  • Picks appropriate plan at run time
  • Extra optimization effort amortized over multiple
    invocation of the query with different parameters

4
PQO A 1-parameter example
cost
0
1
parameter
5
Thesis Contributions
  • We classify cost functions as
  • linear, piecewise linear and non-linear
  • PQO for linear cost functions
  • Recursive decomposition algorithm
  • Cost polytope algorithm
  • PQO for piecewise linear cost functions
  • Extend a conventional query optimizer
  • PQO for non-linear cost functions
  • AniPQO Almost Non-Intrusive PQO
  • Memory Cognizant Query Optimization

6
PQO Solution
  • Finding POSP
  • Or, as a heuristic, a subset thereof
  • Picking an appropriate plan from POSP at runtime
  • Index the parameter space decomposition
  • Use the index to find the appropriate plan from
    POSP
  • Or, evaluate the cost of each plan in POSP
  • Optimization using an AND-OR DAG framework

7
Parametric Query Optimization for Linear Cost
Functions
8
PQO for Linear Cost Functions
  • Our solutions use a conventional optimizer as a
    subroutine
  • The solutions work for arbitrary number of
    parameters
  • Assumptions
  • The conventional optimizer returns the cost
    function of the optimal plan
  • The parameter space of interest is a closed
    convex polytope
  • Parameter space polytope

9
Polytope Examples
  • Convex polytope intersection of halfspaces

Lower convex polytope
Convex polytope
10
Properties of Linear Cost Functions Ganguly,
VLDB98
  • If all the vertices of a polytope in the
    parameter space have same optimal plan then the
    plan is optimal at all points within that
    polytope
  • Each plan in POSP has only one region of
    optimality and the region is a convex polytope.

11
Recursive Decomposition Algorithm
  • Start with the parameter space of interest
    parameter space polytope
  • Optimize the vertices of the polytope using a
    conventional query optimizer
  • If two of the vertices of a polytope have two
    different optimal plans then
  • Partition the polytope into two polytopes
  • Continue recursively

12
Shortcomings of the recursive decomposition
algorithm
  • May over-partition the parameter space and may
    need to merge partitions in a post-pass.
  • We can reduce number of calls to the conventional
    optimizer using cost polytope algorithm

13
Cost Polytope Algorithm
  • Based on an online polytope construction
    algorithm
  • The cost function of each plan is represented by
    a hyperplane in Rn1
  • N parameter dimensions 1 cost dimension
  • Construct a lower convex polytope that represents
    the optimal cost at each point in the parameter
    space

14
Cost Polytope An Example
Cost
b
a
c
Parameter
15
Cost Polytope Algorithm
  • Start with a initial cost polytope
  • Put vertices of the parameter space polytope into
    a queue of vertices to be optimized
  • Repeat till the queue is empty
  • Remove and optimize the first vertex in the queue
  • Intersect the cost hyperplane with the cost
    polytope
  • Project new vertices of the cost polytope onto
    parameter space and insert the projection points
    into the queue

16
Cost polytope algorithm An example
Cost
Parameter
Not optimized
Currently optimized
Already optimized
17
Cost polytope algorithm An example
Cost
a
Parameter
Not optimized
Currently optimized
Already optimized
18
Cost polytope algorithm An example
Cost
c
a
Parameter
Not optimized
Currently optimized
Already optimized
19
Cost polytope algorithm An example
Cost
b
a
c
Parameter
Not optimized
Currently optimized
Already optimized
20
Faces and facets of a polytope
faces 2-faces U 1-faces U 0-faces
21
Complexity of Cost Polytope Algorithm
  • Cost polytope algorithm makes a maximum of F
    calls to the optimizer
  • The lower bound on the number of calls is v
  • Under certain assumptions, the expected number of
    calls is (f v)
  • In general, in high-dimension, f ltlt v

22
Parametric Query Optimization for
Piecewise-linear Cost Functions
23
Piecewise Linear Cost Functions
Cost
Parameter
  • PQO solutions for linear case do not extend to
    piecewise linear case

24
Piecewise Linear Cost Function
Cost
Parameter
  • Partition the parameter space into convex
    polytopes
  • Within each partition the cost function is linear
    in the parameters
  • But pre-partitioning the space to make all cost
    functions linear in each partition is impractical

25
PQO Algorithm for Piecewise Linear Cost Functions
(PLCF)
  • Extend a conventional query optimizer
  • (System-R or Volcano)
  • Extensions are intrusive to query optimizer
  • Partition space only when necessary (on demand)
  • Extend plan cost
  • Cost ? Cost function
  • Extend comparison of alternative operators or
    plans
  • Pick min cost plan ? MinMergeCostFunctions
  • Extensions work for arbitrary number of parameters

26
MinMergeCostFunction An example
Cost
Parameter
27
MinMergeCostFunction An example
Cost
Parameter
28
Extending System-R Algorithm
  • Extended System-R algorithm is exactly same as
    basic System-R algorithm except
  • Replace cost by cost function
  • Use AddCostFunction instead of simple cost
    addition
  • Use MinMergeCostFunction instead of simple cost
    comparision

29
AniPQOAlmost Non-Intrusive Parametric Query
Optimizationfor Nonlinear Cost Functions
30
Features of AniPQO
  • Works with arbitrary nonlinear and discontinuous
    cost functions
  • Experimental evaluation suggests that it works
    well for standard cost models for relational
    operators
  • Conceptually works for arbitrary number of
    parameters
  • Experimental evaluation suggests that it is
    practical for up to 4 parameters
  • Is minimally-intrusive

31
Assumptions
  • The parameter space of interest is a closed
    convex polytope
  • Parameter space polytope
  • The (cost estimation component of the)
    conventional optimizer is extended so as to
    return the cost of a given plan at a given point
    in the parameter space

32
Algorithm to find POSP
  • Starts with
  • CSOP Ø / current set of optimal plans /
  • Decomposition parameter space polytope
  • At each iteration
  • A non-optimized vertex of the current
    decomposition is optimized
  • If optimization returns a new plan
  • The new plan inserted in CSOP
  • Parameter space decomposed is modified based on
    new CSOP
  • Parameter space decomposition is the partitioning
    of parameter space into regions s.t. for each
    region a plan from CSOP is optimal throughout the
    region
  • AniPOSP CSOP / AniPOSP ? POSP /

33
Algorithm Iterations
b
a
c
34
Algorithm Iterations
b
a
C
c
35
Algorithm Iterations
b
a
A
B
C
C
D
d
c
36
Algorithm Iterations
b
a
A
B
C
C
D
d
c
37
Algorithm to find POSP
  • For linear cost functions
  • The above algorithm is exact and finds the
    complete POSP
  • For nonlinear cost functions
  • May not find all plans in POSP
  • We use it as a heuristic

38
AniPQO contribution
  • Handling parameter space decomposition with
    nonlinear cost functions
  • Optimality threshold (t)
  • AND-OR-DAG representation of the plan
    alternatives
  • Reduces run-time overhead of plan choice
  • Increases the quality of the heuristic solution
  • Integrating it with the conventional optimizer
  • faster optimization

39
Optimality threshold (t)
  • If the cost of a plan at a point is close to
    optimal, we can treat the plan as optimal at the
    point
  • We modify the algorithm as follows
  • If optimizing a vertex returns a new plan
  • if cost of no plan in CSOP is within a factor t
    of the cost of the new plan at the point
  • Parameter space polytope is decomposed afresh
    based on new CSOP
  • The new plan is inserted in CSOP
  • This brings down calls to the conventional
    optimizer

40
Maintaining decomposition
  • Hard to find exact decomposition
  • We approximate non-linear regions to convex
    polytopes
  • Maintain only edge skeleton of decomposition
  • Edge skeleton consists of vertices and lines
    connecting the vertices
  • Disregard higher dimensional faces
  • The edge skeleton induced by a set of plans, say
    CSOP, can be constructed given
  • Decomposition vertices
  • Vertex tagging
  • The subset of plans (in CSOP) that are optimal at
    the vertex

41
Vertex tagging
42
Vertex tagging ? Edge skeleton
43
Updating edge skeleton
  • When a new plan is added to CSOP, we need to
    carve out region for the new plan it involves
  • Identifying conflicting edges
  • Edges in the decomposition, such that the new
    plan is optimal at one end and sub-optimal at the
    other end
  • Finding new decomposition vertices
  • One-to-one correspondence between the conflicting
    edges and the new vertices in the parameter space
    decomposition
  • Finding equi-cost point for a given set of plans

44
Conflicting edge ? New vertex
ae
a
e
A
ab
E
abf
abe
bde
B
de
bcd
D
bc
C
c
d
cd
45
Storing Plans in DAGExample Two separate plans
EqClass3
EqClass3
EqClass2
EqClass1
EqClass2
EqClass1
S12
S11
S22
S21
Plan 1
Plan 2
Operator
Sub-plans
46
Storing Plans in DAG Example Combined DAG
EqClass3
EqClass2
EqClass1
S12
S11
S22
S21
Operator
Sub-plans
47
Advantages of storing AniPOSP in DAG
  • Reduced effort in picking a plan at run-time
  • Two plans sharing a operator/subplan
  • Choosing a plan not in AniPOSP
  • Combining parts of different plans in AniPOSP
    result in a valid plan

48
Conventional Optimizer
  • Volcano based optimizer developed at IIT Bombay
  • Generates bushy plans
  • Standard techniques for estimating cost using
    statistics about relations
  • I/O components
  • CPU components
  • Extended to return the cost of a plan at a given
    point in the parameter space

49
Database and Queries
  • TPCD database with scale factor 1 (1GB)
  • With and without indices on the primary keys of
    the relations involved
  • Parameters are selectivities of range predicates
  • table.column lt parameter_value
  • SPJ queries tested
  • Involved 2 to 5 relations
  • Involved 2 to 4 parameters
  • Query R4P4 SPJ query with 4 relations and 4
    parameterized range predicates

50
Results for query R4P4 with no indices
  • POSP 134
  • Sum of operators across the plans 1816
  • operators in the DAG 85
  • AniPOSP 49
  • DAG-AniPOSP POSP 87
  • where DAG-AniPOSP is the set of plans in the DAG
    built using AniPOSP

U
51
Result quality for query R4P4with no indices
Plans
U
Maximum Degradation ()
U
52
Optimization Overhead for query R4P4 with no
indices
53
Integration with the optimizer
  • Loose integration
  • Separate invocation of the optimizer for each
    parameter value
  • Tight integration
  • Optimizer equivalence rules are applied only once
  • Resultant DAG of equivalent plans is used
    repeatedly to find optimal plan with different
    parameter values

54
Execution time for query R4P4
  • A single invocation of the underlying optimizer
    takes about 16ms
  • AniPQO time (ms)

55
Related Work
  • Graefe and Karen SIGMOD'89 Cole and Graefe
    SIGMOD'94 Ioannidis, Ng, Shim and Sellis
    VLDB'92
  • Ganguly and Krishnamurthy COMAD'94 Sumit
    Ganguly VLDB'98 Sumit Ganguly Personal
    Communication, 01
  • Betawadkar IITK97 Prasad IITK97 Rao
    IITK97
  • Ghosh et. al. VLDB02

56
Conclusion
  • PQO for linear cost functions
  • Simple and minimally intrusive
  • Works for arbitrary number of parameters
  • PQO for piecewise linear cost functions
  • Intrusive
  • Works for arbitrary number of parameters
  • General since nonlinear and discontinuous cost
    functions can be approximated to piecewise linear
    form

57
Conclusion (contd.)
  • AniPQO
  • Works with arbitrary nonlinear and discontinuous
    cost functions
  • Conceptually works for arbitrary number of
    parameters
  • Is minimally-intrusive
  • Uses AND-OR-DAG representation of the plan
    alternatives

58
Questions?
http//www.cse.iitb.ac.in/aru
59
When heuristic fails
p
p
parameter
60
Backup Slides
61
Query R4P4
  • select partsupp.ps_suppkey
  • from partsupp, supplier, nation, region
  • where partsupp.ps_suppkey supplier.s_supp
    key
  • and supplier.s_nationkey nation.n_nationk
    ey
  • and nation.n_regionkey region.r_regionkey
  • and ps_partkey lt 1
  • and s_suppkey lt 2
  • and n_nationkey lt 3
  • and region.r_regionkey lt 4

62
Memory Cognizant Query OptimizationAn overview
63
Why Memory Cognizant Optimizer?
  • Memory intensive operators
  • Typical optimizers assume all the memory to be
    available to each operator in the query tree
    Wrong assumption

64
Why Memory Cognizant Optimizer? (contd)
  • Memory will get divided amongst all the operators
    running simultaneously in a pipeline
  • Cost of a plan depends upon this memory division

65
Contributions
  • We develop efficient techniques to divide the
    available memory optimally among operators in a
    pipeline
  • We show how to make a cost-based decision of
    breaking (i.e. converting) a pipelined edge into
    a blocking edge
  • We develop practical memory cognizant query
    optimization algorithm
Write a Comment
User Comments (0)
About PowerShow.com