Title: Reducing Order Enforcement Cost in Complex Query Plans
1Reducing Order Enforcement Cost in Complex Query
Plans
- Ravindra Guravannavar and S. Sudarshan
- (To appear in ICDE 2007)
2Background
- Sort-based query processing algorithms
- Sort-merge Join (also Union/Intersection)
- Sort-based grouping and duplicate elimination
- Explicit order by
- Notion of Interesting Sort Orders (System-R)
- Find and remember the best plan for each sort
order that may be useful - Optimization goal in Volcano (expr, sort-order)
3The Problem
- Interesting orders can be too many!
- Factorial in number of attributes involved
- Plan cost can vary substantially with the choice
of interesting order - Clustering and covering indices
- Other operators in the input sub-expressions
- Possibility of partial sorting
4Motivation
- Joins in data integration and decision support
involve large number of attributes - Increasing use of covering indices
- Several alternative sort orders
- Partial sorting
- Query patterns
- Attributes common to multiple operators
- Known techniques
- Work only for unary operators like group-by
5Outline of the Talk
- Partial sorting
- Changes to external sort
- Optimizer changes to handle partial sort orders
- Interesting orders for a join tree A special
case - Problem is NP-Hard
- A 2-approximation for the special case
- The general problem
- Notion of favorable orders
- Plan generation using favorable orders
- Post-optimization phase
- Experimental results
6Exploiting Partial Sort Orders
R.a1S.a1 and R.a2S.a2
(a1)? (a1,a2)
()? (a1,a2)
R
S
C. Index on (R.a1)
- Sort on (a1, a2) given (a1)
- Standard external-sort
- Cost is independent of input sort order
- Replacement-selection
- Produces single run but incurs I/O
- Both methods break the pipeline first o/p tuple
after reading all i/p
7A Minor Change to External Sorting
- Multiple partial sort segments
- Hold only one segment at any given time
- When a new segment starts
- Sort the current segment and output
- No run generation I/O if each segment fits in
memory - Early output (good for Top-K)
- Reduced comparisons
- O(n log n/k) Vs. O(n log n), k segments
8Optimizer Changes to Handle Partial Sort Orders
- Cost Model for Partial Sort
- Let the input order be o1
- Required (output) order be o2
- Let osLongest common prefix between o1 and o2
- Let oro2 os (i.e, os or o2)
- A(o) Attribute set of order o
- ? Empty (no) sort order
coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
9Optimizer Changes to Handle Partial Sort Orders
- Cost Model for Partial Sort
coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
o2(a,c)
o1(a,b)
e
os(a), or(c), es(ak)(e)
10Flexible Order Requirements
- Most operators have interest in any order on the
attributes involved - Merge-Join, Merge-Union, Group By, Duplicate
Elimination - Binary operators demand the same order from inputs
G
a1, a2
a1,a2,a3,a4
a4,a7
a3,a5,a6
11Finding Optimal is NP-Hard
- A special case
- All relations/intermediate results of the same
size - All attribute cardinalities same
- We try to maximize the length of common prefixes
- Maximize SLCP(pi, pj)
- Reduction from graph layout problem SUM-CUT
- Optimal algorithm for paths and 2-approximation
for binary trees
12A 2-Approximation Algorithm
- Optimal algorithm for paths
- OPT(i,j) max OPT(i,k) OPT(k1,j) c(i,j),
i k lt j - 2-Approximation for binary trees
Even levels
Odd levels
- OPT OPT-EVEN OPT-ODD - Take the one with
higher benefit
13General Case
- Logical plan space for inputs not expanded
- (i.e, Join order not fixed)
- Varying sizes of relations and intermediate
results - All orders on base relations do not have the same
cost (due to clustering and covering indices)
14Overview of the Approach
- Identify a small set of favorable orders
- Orders that are relatively inexpensive
- Should not require expanding the input plan space
- Plan generation (Phase-1)
- Deduce the interesting orders from the favorable
orders - Try each of the interesting order, retain the
best - Plan refinement (Phase-2)
- Use the 2-approximation algorithm and refine the
sort orders further
15Favorable Orders
- Benefit of an order
- benefit(o, e) cbp(e, ?) coe (e, ?, o)
cpb(e,o) - Positive benefit ?The order can be obtained at
cost - less than the full sort of unordered result
(e.g., the - clustering order)
- Favorable orders
- ford(e) o benefit(o,e) gt 0
- Can be a huge set
- E.g., Every order having the clustering order as
its prefix is a favorable order.
16Minimal Favorable Orders
- A favorable order o that satisfies
- o o s.t. cbp(e, o) coe(e, o, o) cbp(e,o)
- o s.t. o o and cbp(e, o) cbp(e,o)
- E.g., Relation R with clustering index on (a1,a2)
- (a1,a2) is a minimal favorable order
- (a1 ), (a1,a2,a3) are not
- ford-min(e) Set of all minimal favorable orders
for expression e - For base relations size of ford-min limited to
the number of covering indices
17Computing Favorable Orders Issues
- Defined in terms of cost of best plan
- Need them before optimizing input sub-expressions
- Even ford-min can get prohibitively large for
join, group-by expressions
J1
ford-min contains every permutation of the join
attributes
J2
S
R
18Heuristics for Computing ford-min
- eR o o is clustering or covering index
order - esp(e1) o o ford-min(e1)
- ePL(e1) o o ford-min(e1) and oo L
- Pa,b(e1), ford-min(e1)(a,c,b) ?
ford-min(e)(a) - ee1 e2
- Let Tford-min(e1) U ford-min(e2)
- T U o o T and o((o S) permute(S
A(o S)))
19Heuristics for Computing ford-min
Sa,b,c,d
T (a,b,e), (b), (a)
ford-min(a,b,e),(b)
ford-min(a)
20Plan Generation (Phase-1)
- Form the set I of interesting orders to try
- Collect input favorable orders and rqd. o/p order
- Take LCP with the set of join attributes
- Extend the orders (arbitrarily) to include
remaining attributes - For each order o in I, generate optimization
sub-goals for input sub-expressions
21Plan Refinement (Phase-2)
a,e,h
(a,b,c,h)
(a,h,b,c)
a,e,h
a,d,h
(a,h,e)
(a,d,h)
(a,e,h)
(a,h,d)
R4 (a)
R3 (a)
R2 (a)
R1 (a)
- Identify the suffix that can be freely reordered
- Use the 2-approximation algorithm to reorder the
suffix
22Experiments
- Benefits of exploiting partial sort orders
- Evaluate the plans produced by our optimizer
extensions
23Experiment 1
- SELECT suppkey, partkey FROM lineitem
- ORDER BY suppkey, partkey
(suppkey) ? (suppkey, partkey)
24Experiment 2
- R(c1,c2,c3), 10 M records, (c1)?(c1,c2),
card(c1)10,000
25Experiment 3
26Experiment 4
- Parts running out of stock
- SELECT ps_suppkey, ps_partkey, ps_availqty,
- sum(l_quantity) AS total_required
- FROM partsupp, lineitem
- WHERE ps_suppkeyl_suppkey AND
ps_partkeyl_partkey - AND l_linestatus'O'
- GROUP BY ps_partkey, ps_suppkey, ps_availqty,
- HAVING sum(l_quantity) gt ps_availqty
- ORDER BY ps_partkey
27Experiment 4 - Plans
Merge-Join Plan on SYS1 and SYS2
Plan Generated by PYRO-O
28Experiment 4 5 - Timings
29Experiments with Variants of PYRO
PYRO Baseline PYRO PYRO-O- No partial
sort PYRO-P Postgres Heuristic PYRO-O Our
Approach PYRO-E Exhaustive
30Optimization Overheads
31Questions?