Reducing Order Enforcement Cost in Complex Query Plans - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Reducing Order Enforcement Cost in Complex Query Plans

Description:

Reducing Order Enforcement Cost in Complex Query Plans. Ravindra Guravannavar and S. Sudarshan ... benefit(o, e) = cbp(e, ?) coe (e, ?, o) cpb(e,o) ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 32

Provided by: Rav124

Category:

more less

Transcript and Presenter's Notes

Title: Reducing Order Enforcement Cost in Complex Query Plans

1
Reducing Order Enforcement Cost in Complex Query
Plans

Ravindra Guravannavar and S. Sudarshan
(To appear in ICDE 2007)

2
Background

Sort-based query processing algorithms
Sort-merge Join (also Union/Intersection)
Sort-based grouping and duplicate elimination
Explicit order by
Notion of Interesting Sort Orders (System-R)
Find and remember the best plan for each sort
order that may be useful
Optimization goal in Volcano (expr, sort-order)

3
The Problem

Interesting orders can be too many!
Factorial in number of attributes involved
Plan cost can vary substantially with the choice
of interesting order
Clustering and covering indices
Other operators in the input sub-expressions
Possibility of partial sorting

4
Motivation

Joins in data integration and decision support
involve large number of attributes
Increasing use of covering indices
Several alternative sort orders
Partial sorting
Query patterns
Attributes common to multiple operators
Known techniques
Work only for unary operators like group-by

5
Outline of the Talk

Partial sorting
Changes to external sort
Optimizer changes to handle partial sort orders
Interesting orders for a join tree A special
case
Problem is NP-Hard
A 2-approximation for the special case
The general problem
Notion of favorable orders
Plan generation using favorable orders
Post-optimization phase
Experimental results

6
Exploiting Partial Sort Orders
R.a1S.a1 and R.a2S.a2
(a1)? (a1,a2)
()? (a1,a2)
R
S
C. Index on (R.a1)

Sort on (a1, a2) given (a1)
Standard external-sort
Cost is independent of input sort order
Replacement-selection
Produces single run but incurs I/O
Both methods break the pipeline first o/p tuple
after reading all i/p

7
A Minor Change to External Sorting

Multiple partial sort segments
Hold only one segment at any given time
When a new segment starts
Sort the current segment and output
No run generation I/O if each segment fits in
memory
Early output (good for Top-K)
Reduced comparisons
O(n log n/k) Vs. O(n log n), k segments

8
Optimizer Changes to Handle Partial Sort Orders

Cost Model for Partial Sort
Let the input order be o1
Required (output) order be o2
Let osLongest common prefix between o1 and o2
Let oro2 os (i.e, os or o2)
A(o) Attribute set of order o
? Empty (no) sort order

coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
9
Optimizer Changes to Handle Partial Sort Orders

Cost Model for Partial Sort

coe(e, o1,o2) D(e, A(os)) X coe(e, ?, or),
where esp(e) and p equates A(os) to a constant.
o2(a,c)
o1(a,b)
e
os(a), or(c), es(ak)(e)
10
Flexible Order Requirements

Most operators have interest in any order on the
attributes involved
Merge-Join, Merge-Union, Group By, Duplicate
Elimination
Binary operators demand the same order from inputs

G
a1, a2
a1,a2,a3,a4
a4,a7
a3,a5,a6
11
Finding Optimal is NP-Hard

A special case
All relations/intermediate results of the same
size
All attribute cardinalities same
We try to maximize the length of common prefixes
Maximize SLCP(pi, pj)

Reduction from graph layout problem SUM-CUT
Optimal algorithm for paths and 2-approximation
for binary trees

12
A 2-Approximation Algorithm

Optimal algorithm for paths

OPT(i,j) max OPT(i,k) OPT(k1,j) c(i,j),
i k lt j
2-Approximation for binary trees

Even levels
Odd levels
- OPT OPT-EVEN OPT-ODD - Take the one with
higher benefit
13
General Case

Logical plan space for inputs not expanded
(i.e, Join order not fixed)
Varying sizes of relations and intermediate
results
All orders on base relations do not have the same
cost (due to clustering and covering indices)

14
Overview of the Approach

Identify a small set of favorable orders
Orders that are relatively inexpensive
Should not require expanding the input plan space
Plan generation (Phase-1)
Deduce the interesting orders from the favorable
orders
Try each of the interesting order, retain the
best
Plan refinement (Phase-2)
Use the 2-approximation algorithm and refine the
sort orders further

15
Favorable Orders

Benefit of an order
benefit(o, e) cbp(e, ?) coe (e, ?, o)
cpb(e,o)
Positive benefit ?The order can be obtained at
cost
less than the full sort of unordered result
(e.g., the
clustering order)
Favorable orders
ford(e) o benefit(o,e) gt 0
Can be a huge set
E.g., Every order having the clustering order as
its prefix is a favorable order.

16
Minimal Favorable Orders

A favorable order o that satisfies
o o s.t. cbp(e, o) coe(e, o, o) cbp(e,o)
o s.t. o o and cbp(e, o) cbp(e,o)
E.g., Relation R with clustering index on (a1,a2)
(a1,a2) is a minimal favorable order
(a1 ), (a1,a2,a3) are not
ford-min(e) Set of all minimal favorable orders
for expression e
For base relations size of ford-min limited to
the number of covering indices

17
Computing Favorable Orders Issues

Defined in terms of cost of best plan
Need them before optimizing input sub-expressions
Even ford-min can get prohibitively large for
join, group-by expressions

J1
ford-min contains every permutation of the join
attributes
J2
S
R
18
Heuristics for Computing ford-min

eR o o is clustering or covering index
order
esp(e1) o o ford-min(e1)
ePL(e1) o o ford-min(e1) and oo L
Pa,b(e1), ford-min(e1)(a,c,b) ?
ford-min(e)(a)
ee1 e2
Let Tford-min(e1) U ford-min(e2)
T U o o T and o((o S) permute(S
A(o S)))

19
Heuristics for Computing ford-min
Sa,b,c,d
T (a,b,e), (b), (a)
ford-min(a,b,e),(b)
ford-min(a)
20
Plan Generation (Phase-1)

Form the set I of interesting orders to try
Collect input favorable orders and rqd. o/p order
Take LCP with the set of join attributes
Extend the orders (arbitrarily) to include
remaining attributes
For each order o in I, generate optimization
sub-goals for input sub-expressions

21
Plan Refinement (Phase-2)
a,e,h
(a,b,c,h)
(a,h,b,c)
a,e,h
a,d,h
(a,h,e)
(a,d,h)
(a,e,h)
(a,h,d)
R4 (a)
R3 (a)
R2 (a)
R1 (a)

Identify the suffix that can be freely reordered
Use the 2-approximation algorithm to reorder the
suffix

22
Experiments

Benefits of exploiting partial sort orders
Evaluate the plans produced by our optimizer
extensions

23
Experiment 1

SELECT suppkey, partkey FROM lineitem
ORDER BY suppkey, partkey

(suppkey) ? (suppkey, partkey)
24
Experiment 2

R(c1,c2,c3), 10 M records, (c1)?(c1,c2),
card(c1)10,000

25
Experiment 3
26
Experiment 4

Parts running out of stock

SELECT ps_suppkey, ps_partkey, ps_availqty,
sum(l_quantity) AS total_required
FROM partsupp, lineitem
WHERE ps_suppkeyl_suppkey AND
ps_partkeyl_partkey
AND l_linestatus'O'
GROUP BY ps_partkey, ps_suppkey, ps_availqty,
HAVING sum(l_quantity) gt ps_availqty
ORDER BY ps_partkey

27
Experiment 4 - Plans
Merge-Join Plan on SYS1 and SYS2
Plan Generated by PYRO-O
28
Experiment 4 5 - Timings
29
Experiments with Variants of PYRO
PYRO Baseline PYRO PYRO-O- No partial
sort PYRO-P Postgres Heuristic PYRO-O Our
Approach PYRO-E Exhaustive
30
Optimization Overheads
31
Questions?

Write a Comment

User Comments (0)