Reducing Order Enforcement Cost in Complex Query Plans - PowerPoint PPT Presentation

About This Presentation
Title:

Reducing Order Enforcement Cost in Complex Query Plans

Description:

Reducing Order Enforcement Cost in Complex Query Plans. Ravindra Guravannavar and S. Sudarshan ... benefit(o, e) = cbp(e, ?) coe (e, ?, o) cpb(e,o) ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 32
Provided by: Rav124
Category:

less

Transcript and Presenter's Notes

Title: Reducing Order Enforcement Cost in Complex Query Plans


1
Reducing Order Enforcement Cost in Complex Query
Plans
  • Ravindra Guravannavar and S. Sudarshan
  • (To appear in ICDE 2007)

2
Background
  • Sort-based query processing algorithms
  • Sort-merge Join (also Union/Intersection)
  • Sort-based grouping and duplicate elimination
  • Explicit order by
  • Notion of Interesting Sort Orders (System-R)
  • Find and remember the best plan for each sort
    order that may be useful
  • Optimization goal in Volcano (expr, sort-order)

3
The Problem
  • Interesting orders can be too many!
  • Factorial in number of attributes involved
  • Plan cost can vary substantially with the choice
    of interesting order
  • Clustering and covering indices
  • Other operators in the input sub-expressions
  • Possibility of partial sorting

4
Motivation
  • Joins in data integration and decision support
    involve large number of attributes
  • Increasing use of covering indices
  • Several alternative sort orders
  • Partial sorting
  • Query patterns
  • Attributes common to multiple operators
  • Known techniques
  • Work only for unary operators like group-by

5
Outline of the Talk
  • Partial sorting
  • Changes to external sort
  • Optimizer changes to handle partial sort orders
  • Interesting orders for a join tree A special
    case
  • Problem is NP-Hard
  • A 2-approximation for the special case
  • The general problem
  • Notion of favorable orders
  • Plan generation using favorable orders
  • Post-optimization phase
  • Experimental results

6
Exploiting Partial Sort Orders
R.a1S.a1 and R.a2S.a2
(a1)? (a1,a2)
()? (a1,a2)
R
S
C. Index on (R.a1)
  • Sort on (a1, a2) given (a1)
  • Standard external-sort
  • Cost is independent of input sort order
  • Replacement-selection
  • Produces single run but incurs I/O
  • Both methods break the pipeline first o/p tuple
    after reading all i/p

7
A Minor Change to External Sorting
  • Multiple partial sort segments
  • Hold only one segment at any given time
  • When a new segment starts
  • Sort the current segment and output
  • No run generation I/O if each segment fits in
    memory
  • Early output (good for Top-K)
  • Reduced comparisons
  • O(n log n/k) Vs. O(n log n), k segments

8
Optimizer Changes to Handle Partial Sort Orders
  • Cost Model for Partial Sort
  • Let the input order be o1
  • Required (output) order be o2
  • Let osLongest common prefix between o1 and o2
  • Let oro2 os (i.e, os or o2)
  • A(o) Attribute set of order o
  • ? Empty (no) sort order

coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
9
Optimizer Changes to Handle Partial Sort Orders
  • Cost Model for Partial Sort

coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
o2(a,c)
o1(a,b)
e
os(a), or(c), es(ak)(e)
10
Flexible Order Requirements
  • Most operators have interest in any order on the
    attributes involved
  • Merge-Join, Merge-Union, Group By, Duplicate
    Elimination
  • Binary operators demand the same order from inputs

G
a1, a2
a1,a2,a3,a4
a4,a7
a3,a5,a6
11
Finding Optimal is NP-Hard
  • A special case
  • All relations/intermediate results of the same
    size
  • All attribute cardinalities same
  • We try to maximize the length of common prefixes
  • Maximize SLCP(pi, pj)
  • Reduction from graph layout problem SUM-CUT
  • Optimal algorithm for paths and 2-approximation
    for binary trees

12
A 2-Approximation Algorithm
  • Optimal algorithm for paths
  • OPT(i,j) max OPT(i,k) OPT(k1,j) c(i,j),
    i k lt j
  • 2-Approximation for binary trees

Even levels
Odd levels
- OPT OPT-EVEN OPT-ODD - Take the one with
higher benefit
13
General Case
  • Logical plan space for inputs not expanded
  • (i.e, Join order not fixed)
  • Varying sizes of relations and intermediate
    results
  • All orders on base relations do not have the same
    cost (due to clustering and covering indices)

14
Overview of the Approach
  • Identify a small set of favorable orders
  • Orders that are relatively inexpensive
  • Should not require expanding the input plan space
  • Plan generation (Phase-1)
  • Deduce the interesting orders from the favorable
    orders
  • Try each of the interesting order, retain the
    best
  • Plan refinement (Phase-2)
  • Use the 2-approximation algorithm and refine the
    sort orders further

15
Favorable Orders
  • Benefit of an order
  • benefit(o, e) cbp(e, ?) coe (e, ?, o)
    cpb(e,o)
  • Positive benefit ?The order can be obtained at
    cost
  • less than the full sort of unordered result
    (e.g., the
  • clustering order)
  • Favorable orders
  • ford(e) o benefit(o,e) gt 0
  • Can be a huge set
  • E.g., Every order having the clustering order as
    its prefix is a favorable order.

16
Minimal Favorable Orders
  • A favorable order o that satisfies
  • o o s.t. cbp(e, o) coe(e, o, o) cbp(e,o)
  • o s.t. o o and cbp(e, o) cbp(e,o)
  • E.g., Relation R with clustering index on (a1,a2)
  • (a1,a2) is a minimal favorable order
  • (a1 ), (a1,a2,a3) are not
  • ford-min(e) Set of all minimal favorable orders
    for expression e
  • For base relations size of ford-min limited to
    the number of covering indices

17
Computing Favorable Orders Issues
  • Defined in terms of cost of best plan
  • Need them before optimizing input sub-expressions
  • Even ford-min can get prohibitively large for
    join, group-by expressions

J1
ford-min contains every permutation of the join
attributes
J2
S
R
18
Heuristics for Computing ford-min
  • eR o o is clustering or covering index
    order
  • esp(e1) o o ford-min(e1)
  • ePL(e1) o o ford-min(e1) and oo L
  • Pa,b(e1), ford-min(e1)(a,c,b) ?
    ford-min(e)(a)
  • ee1 e2
  • Let Tford-min(e1) U ford-min(e2)
  • T U o o T and o((o S) permute(S
    A(o S)))

19
Heuristics for Computing ford-min
Sa,b,c,d
T (a,b,e), (b), (a)
ford-min(a,b,e),(b)
ford-min(a)
20
Plan Generation (Phase-1)
  • Form the set I of interesting orders to try
  • Collect input favorable orders and rqd. o/p order
  • Take LCP with the set of join attributes
  • Extend the orders (arbitrarily) to include
    remaining attributes
  • For each order o in I, generate optimization
    sub-goals for input sub-expressions

21
Plan Refinement (Phase-2)
a,e,h
(a,b,c,h)
(a,h,b,c)
a,e,h
a,d,h
(a,h,e)
(a,d,h)
(a,e,h)
(a,h,d)
R4 (a)
R3 (a)
R2 (a)
R1 (a)
  • Identify the suffix that can be freely reordered
  • Use the 2-approximation algorithm to reorder the
    suffix

22
Experiments
  • Benefits of exploiting partial sort orders
  • Evaluate the plans produced by our optimizer
    extensions

23
Experiment 1
  • SELECT suppkey, partkey FROM lineitem
  • ORDER BY suppkey, partkey

(suppkey) ? (suppkey, partkey)
24
Experiment 2
  • R(c1,c2,c3), 10 M records, (c1)?(c1,c2),
    card(c1)10,000

25
Experiment 3
26
Experiment 4
  • Parts running out of stock
  • SELECT ps_suppkey, ps_partkey, ps_availqty,
  • sum(l_quantity) AS total_required
  • FROM partsupp, lineitem
  • WHERE ps_suppkeyl_suppkey AND
    ps_partkeyl_partkey
  • AND l_linestatus'O'
  • GROUP BY ps_partkey, ps_suppkey, ps_availqty,
  • HAVING sum(l_quantity) gt ps_availqty
  • ORDER BY ps_partkey

27
Experiment 4 - Plans
Merge-Join Plan on SYS1 and SYS2
Plan Generated by PYRO-O
28
Experiment 4 5 - Timings
29
Experiments with Variants of PYRO
PYRO Baseline PYRO PYRO-O- No partial
sort PYRO-P Postgres Heuristic PYRO-O Our
Approach PYRO-E Exhaustive
30
Optimization Overheads
31
Questions?
Write a Comment
User Comments (0)
About PowerShow.com