Title: Memory Cognizant Query Optimization
1Memory Cognizant Query Optimization
- Arvind Hulgeri
- S. Seshadri
- S. Sudarshan
- Department of Computer Science and Engineering
- I.I.T. Bombay
2Outline
- Why Memory Cognizant Optimizer?
- Contributions
- 2-Phase versus 1-Phase Optimization
- Related Work
- Details of 1-Phase Approach
- Experimental Evaluation
- Conclusion and Future Work
3Why Memory Cognizant Optimizer?
- Memory intensive operators
- Typical optimizers assume all the memory to be
available to each operator in the query tree
Wrong assumption
4Why Memory Cognizant Optimizer? (contd)
- Memory will get divided amongst all the operators
running simultaneously in a pipeline - Cost of a plan depends upon this memory division
5Contributions
- We develop efficient techniques to divide the
available memory optimally among operators in a
pipeline - We show how to make a cost-based decision of
breaking (i.e. converting) a pipelined edge into
a blocking edge - We develop practical memory cognizant query
optimization algorithm
62-Phase versus 1-Phase Approach
- 2-Phase Approach (2PO)
- Phase 1 Optimize the query using conventional
optimizer - Phase 2 Distribute available memory optimally
amongst the operators in a pipeline - Problem
- May not give optimal execution time as the plan
generated may itself be sub-optimal for the given
memory
72-Phase versus 1-Phase Approach (contd)
- 1-Phase Approach (1PO)
- We propose to modify the traditional optimizer to
make it memory cognizant - Plan chosen takes into account the division of
memory amongst operators
8Related Work
- Two Phase Approach
- Phase 1 Optimize the query to get an optimal
query plan - Phase 2 Schedule the query plan generated by
phase 1 - Static Scheduling
- Bouganim,et.al., CIKM'98, Nag, DeWitt,
CIKM'98 - Dynamic Scheduling Bouganim,et.al., CIKM'98
91-Phase Approach
- We propose to modify the traditional optimizer to
make it memory cognizant - Plan chosen takes into account the division of
memory amongst operators - We show how to optimize the query given the cost
versus memory allocated function for each operator
10Overview of 1PO
- Optimal Division of Available Memory
- Approximating Cost Functions
- Extended Cost Function and Plan
- Extensions to Query Optimizer
- Breaking Pipelined Edges
11Optimal Division of Available Memory
- All the operators in a pipeline run
simultaneously in given space - The cost of running such a pipeline depends upon
how much memory each operator in the pipeline gets
12Optimal Division of Available Memory (contd)
- Need to optimally divide the available memory
amongst all the operators in the pipeline - Function OptMerge divides memory between two
operators/plans in a pipeline and generates
combined cost function
13function OptMergefor general form of cost
function
- for m 1 to MaxAvailMem do
- mergeCost(m) cost_\infty
- for mx 0 to m do
- my m - mx
- costxy costx(mx) costy(my)
- if costxy lt mergeCost(m)
- mergeCost(m) costxy
- return mergeCost
14Approximating Cost Functions
- General Form of Cost Function
- Time complexity of various operations on cost
function depends on available memory. e.g. - OptMerge O(m2)
- MinMerge O(m)
- The routine MinMerge compares two input cost
functions for the entire memory range and at each
memory point picks up the lower cost value
15Approximating Cost Functions (contd)
- Piecewise Linear Cost Function
- Time complexity is independent of available
memory, - And is a function of number of segments in the
cost functions. e.g. - OptMerge O((sum of segemts) 2) ltlt change
- MinMerge O(sum of segemts)
16function OptMerge for Piecewise Linear cost
Functions
- mergeCost infinity
- for each change-over point (m,c) in costx do
- costy' costy shifted by (m,c)
- mergeCost MinMerge(mergeCost, costy')
- for each change-over point (m,c) in costy do
- costx costx shifted by (m,c)
- mergeCost MinMerge(mergeCost, costx)
- return mergeCost
17Extended Cost Function
- Define cost as a function of memory allocated
instead of a single cost value for each operator
and plan
18Extended Plan
- For a given query no single plan may be optimal
in entire memory range - Define optimal Plan as list of pairs
- m1, m2, optPlan
- where m1, m2 is a range of memory and optPlan
is the optimal plan in that range
19Extensions to Query Optimizer
- Comparing alternative operators or plans
- One plan is consistently better than the other in
the entire memory range retain the one plan with
less cost - One plan is better at some memory range and the
other one is better at some other memory range
maintain both of them indicating which one is
better in which range
20Breaking Pipelined Edges
- A plan P feeding an operator O through a
pipeline edge E will have a trade-off - Option I E is pipelined
- P gets less memory, costs more
- no IO incurred at E
- Option II E is broken
- (converted into a blocking edge)
- P gets full memory, costs less
- IO incurred at E
21Breaking Pipelined Edges (contd)
- Let the cost of P be PPC(i)
- Option I
- Cost of P PPC(i)
- Option II
- Cost of P BBC(i) PPC(M 1) IO
- Where IO cost of IO at E when blocked
- M total available memory
22Breaking Pipelined Edges (contd)
- The optimal cost of plan P with blocking decision
incorporated is given by MinMerge(PPC, BPC) - The routine MinMerge compares two input cost
functions for the entire memory range and at each
memory point picks up the lower cost value
23Breaking Pipelined Edges (contd)
24Experimental Evaluation
- Comparison 1PO against 2PO
- Metric Estimated cost of the optimal plan
produced by the optimizer - Tested the algorithms with around 20,000 randomly
generated queries on a TPCD-1 based star schema
25Experimental Evaluation (contd)
- Each generated query is of the form
- select sum(quantity)
- from orders, supplier, part, customer, time
- where join-list and select-list
- group by groupby-list
- Memory available is chosen randomly between 10
blocks to 10,000 blocks
26Estimated Cost Reduction
27Optimization Times
28Conclusion
- For the class of queries we considered, 1PO gives
benefits, but generally 2PO performs about as
well as 1PO - The preliminary results indicate that using 1PO
for query optimization may not be beneficial
good news!!!
29Future Work
- Parametric Query Optimization
- Memory Cognizant Query Optimization for
ORDB(Object Oriented Databases)