Query Execution (end) Query Optimization - PowerPoint PPT Presentation

1 / 63

About This Presentation

Title:

Query Execution (end) Query Optimization

Description:

Scan one table first, then the next, eliminate duplicates. Cost: B(R) B(S) 17. Nested loop join ... Two-Pass Algorithms Based on Sorting. Join R S ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 64

Provided by: csU70

Category:

more less

Transcript and Presenter's Notes

Title: Query Execution (end) Query Optimization

1
Query Execution (end)Query Optimization

Lecture 23

2
Query Execution
3
Outline

Logical/physical operators
Cost parameters and sorting
One-pass algorithms
Nested-loop joins
Two-pass algorithms

4
Query Execution
Query or update
User/ Application
Query compiler
Query execution plan
Execution engine
Record, index requests
Index/record mgr.
Page commands
Buffer manager
Read/write pages
Storage manager
storage
5
Logical v.s. Physical Operators

Logical operators
what they do
e.g., union, selection, project, join, grouping
Physical operators
how they do it
e.g., nested loop join, sort-merge join, hash
join, index join

6
Query Execution Plans
SELECT S.sname FROM Purchase P, Person
Q WHERE P.buyerQ.name AND
Q.cityurbana AND Q.phone gt 5430000
buyer
?
Cityurbana
phonegt5430000

Query Plan
logical tree
implementation choice at every node
scheduling of operations.

Buyername
(Simple Nested Loops)
Person
Purchase
(Table scan)
(Index scan)
Some operators are from relational algebra, and
others (e.g., scan, group) are not.
7
How do We Combine Operations?

The iterator model. Each operation is implemented
by 3 functions
Open sets up the data structures and performs
initializations
GetNext returns the the next tuple of the
result.
Close ends the operations. Cleans up the data
structures.
Enables pipelining!
Contrast with data-driven materialize model.

8
Cost Parameters

Cost parameters
M number of blocks that fit in main memory
B(R) number of blocks holding R
T(R) number of tuples in R
V(R,a) number of distinct values of the
attribute a
Estimating the cost
Important in optimization (next lecture)
Compute I/O cost only
We compute the cost to read the tables
We dont compute the cost to write the result
(because pipelining)

9
Reminder Sorting

Two pass multi-way merge sort
Step 1
Read M blocks at a time, sort, write
Result have runs of length M on disk
Step 2
Merge M-1 at a time, write to disk
Result have runs of length M(M-1)?M2
Cost 3B(R), Assumption B(R) ? M2

10
Scanning Tables

The table is clustered (I.e. blocks consists only
of records from this table)
Table-scan if we know where the blocks are
Index scan if we have a sparse index to find the
blocks
The table is unclustered (e.g. its records are
placed on blocks with other tables)
May need one read for each record

11
Cost of the Scan Operator

Clustered relation
Table scan B(R) to sort 3B(R)
Index scan B(R) to sort B(R) or 3B(R)
Unclustered relation
T(R) to sort T(R) 2B(R)

12
One pass algorithm
13
One-pass Algorithms

Selection s(R), projection P(R)
Both are tuple-at-a-Time algorithms
Cost B(R)

Unary operator
Input buffer
Output buffer
14
One-pass Algorithms

Duplicate elimination d(R)
Need to keep a dictionary in memory
balanced search tree
hash table
etc
Cost B(R)
Assumption B(d(R)) lt M

15
One-pass Algorithms

Grouping gcity, sum(price) (R)
Need to keep a dictionary in memory
Also store the sum(price) for each city
Cost B(R)
Assumption number of cities fits in memory

16
One-pass Algorithms

Binary operations R n S, R U S, R S
Assumption min(B(R), B(S)) lt M
Scan one table first, then the next, eliminate
duplicates
Cost B(R)B(S)

17
Nested loop join
18
Nested Loop Joins

Tuple-based nested loop R S
for each tuple r in R do
for each tuple s in S do
if r and s join then output (r,s)
Cost T(R) T(S), sometimes T(R) B(S)

19
Nested Loop Joins

Block-based Nested Loop Join
for each (M-1) blocks bs of S do
for each block br of R do
for each tuple s in bs do
for each tuple r in br do
if r and s join then
output(r,s)

20
Nested Loop Joins
R S
Join Result
Hash table for block of S (k lt B-1 pages)
. . .
. . .
Input buffer for R
Output buffer
21
Nested Loop Joins

Block-based Nested Loop Join
Cost
Read S once cost B(S)
Outer loop runs B(S)/(M-1) times, and each time
need to read R costs B(S)B(R)/(M-1)
Total cost B(S) B(S)B(R)/(M-1)
Notice it is better to iterate over the smaller
relation first
R S Router relation, Sinner relation

22
Two pass algorithm
23
Two-Pass Algorithms Based on Sorting

Duplicate elimination d(R)
Simple idea sort first, then eliminate
duplicates
Step 1 sort runs of size M, write
Cost 2B(R)
Step 2 merge M-1 runs, but include each tuple
only once
Cost B(R)
Some complications...
Total cost 3B(R), Assumption B(R) lt M2

24
Two-Pass Algorithms Based on Sorting

Grouping gcity, sum(price) (R)
Same as before sort, then compute the sum(price)
for each group
As before compute sum(price) during the merge
phase.
Total cost 3B(R)
Assumption B(R) lt M2

25
Two-Pass Algorithms Based on Sorting

Binary operations R n S, R U S, R S
Idea sort R, sort S, then do the right thing
A closer look
Step 1 split R into runs of size M, then split S
into runs of size M. Cost 2B(R) 2B(S)
Step 2 merge M/2 runs from R merge M/2 runs
from S ouput a tuple on a case by cases basis
Total cost 3B(R)3B(S)
Assumption B(R)B(S)lt M2

26
Two-Pass Algorithms Based on Sorting

Join R S
Start by sorting both R and S on the join
attribute
Cost 4B(R)4B(S) (because need to write to
disk)
Read both relations in sorted order, match tuples
Cost B(R)B(S)
Difficulty many tuples in R may match many in S
If at least one set of tuples fits in M, we are
OK
Otherwise need nested loop, higher cost
Total cost 5B(R)5B(S)
Assumption B(R) lt M2, B(S) lt M2

27
Two-Pass Algorithms Based on Sorting

Join R S
If the number of tuples in R matching those in S
is small (or vice versa) we can compute the join
during the merge phase
Total cost 3B(R)3B(S)
Assumption B(R) B(S) lt M2

28
Two Pass Algorithms Based on Hashing

Idea partition a relation R into buckets, on
disk
Each bucket has size approx. B(R)/M
Does each bucket fit in main memory ?
Yes if B(R)/M lt M, i.e. B(R) lt M2

1
2
B(R)
29
Hash Based Algorithms for d

Recall d(R) duplicate elimination
Step 1. Partition R into buckets
Step 2. Apply d to each bucket (may read in main
memory)
Cost 3B(R)
AssumptionB(R) lt M2

30
Hash Based Algorithms for g

Recall g(R) grouping and aggregation
Step 1. Partition R into buckets
Step 2. Apply g to each bucket (may read in main
memory)
Cost 3B(R)
AssumptionB(R) lt M2

31
Hash-based Join

R S
Recall the main memory hash-based join
Scan S, build buckets in main memory
Then scan R and join

32
Partitioned Hash Join

R S
Step 1
Hash S into M buckets
send all buckets to disk
Step 2
Hash R into M buckets
Send all buckets to disk
Step 3
Join every pair of buckets

33
PartitionedHash-Join

Partition both relations using hash fn h R
tuples in partition i will only match S tuples in
partition i.

Read in a partition of R, hash it using h2 (ltgt
h!). Scan matching partition of S, search for
matches.

34
Partitioned Hash Join

Cost 3B(R) 3B(S)
Assumption min(B(R), B(S)) lt M2

35
Hybrid Hash Join Algorithm

When we have more memory B(S) ltlt M2
Partition S into k buckets
But keep first bucket S1 in memory, k-1 buckets
to disk
Partition R into k buckets
First bucket R1 is joined immediately with S1
Other k-1 buckets go to disk
Finally, join k-1 pairs of buckets
(R2,S2), (R3,S3), , (Rk,Sk)

36
Hybrid Join Algorithm

How big should we choose k ?
Average bucket size for S is B(S)/k
Need to fit B(S)/k (k-1) blocks in memory
B(S)/k (k-1) lt M
k slightly smaller than B(S)/M

37
Hybrid Join Algorithm

How many I/Os ?
Recall cost of partitioned hash join
3B(R) 3B(S)
Now we save 2 disk operations for one bucket
Recall there are k buckets
Hence we save 2/k(B(R) B(S))
Cost (3-2/k)(B(R) B(S))
(3-2M/B(S))(B(R) B(S))

38
Indexed Based Algorithms

In a clustered index all tuples with the same
value of the key are clustered on as few blocks
as possible

39
Index Based Selection

Selection on equality sav(R)
Clustered index on a cost B(R)/V(R,a)
Unclustered index on a cost T(R)/V(R,a)

40
Index Based Selection

Example B(R) 2000, T(R) 100,000, V(R, a)
20, compute the cost of sav(R)
Cost of table scan
If R is clustered B(R) 2000 I/Os
If R is unclustered T(R) 100,000 I/Os
Cost of index based selection
If index is clustered B(R)/V(R,a) 100
If index is unclustered T(R)/V(R,a) 5000
Notice when V(R,a) is small, then unclustered
index is useless

41
Index Based Join

R S
Assume S has an index on the join attribute
Iterate over R, for each tuple fetch
corresponding tuple(s) from S
Assume R is clustered. Cost
If index is clustered B(R) T(R)B(S)/V(S,a)
If index is unclustered B(R) T(R)T(S)/V(S,a)

42
Index Based Join

Assume both R and S have a sorted index (B tree)
on the join attribute
Then perform a merge join (called zig-zag join)
Cost B(R) B(S)

43
Optimization

Chapter 16
At the heart of the database engine
Step 1 convert the SQL query to some logical
plan
Step 2 find a better logical plan, find an
associated physical plan

44
Converting from SQL to Logical Plans
Select a1, , an From R1, , Rk Where C
Pa1,,an(s C(R1 R2 Rk))
Select a1, , an From R1, , Rk Where C Group by
b1, , bl
Pa1,,an(g b1, , bm, aggs (s C(R1 R2
Rk)))
45
Converting Nested Queries

Select distinct product.name
From product
Where product.maker in (Select company.name
From
company
where
company.cityUrbana)

Select distinct product.name From product,
company Where product.maker company.name AND
company.cityUrbana
46
Converting Nested Queries

Select distinct x.name, x.maker
From product x
Where x.color blue
AND x.price gt ALL (Select y.price
From
product y
Where
x.maker y.maker
AND
y.colorblue)

How do we convert this one to logical plan ?
47
Converting Nested Queries
Lets compute the complement first

Select distinct x.name, x.maker
From product x
Where x.color blue
AND x.price lt SOME (Select y.price
From
product y
Where
x.maker y.maker
AND
y.colorblue)

48
Converting Nested Queries
This one becomes a SFW query

Select distinct x.name, x.maker
From product x, product y
Where x.color blue AND x.maker y.maker
AND y.colorblue AND x.price lt y.price

This returns exactly the products we DONT want,
so
49
Converting Nested Queries

(Select x.name, x.maker
From product x
Where x.color blue)
EXCEPT
(Select x.name, x.maker
From product x, product y
Where x.color blue AND x.maker y.maker
AND y.colorblue AND x.price lt y.price)

50
Optimization the Logical Query Plan

Now we have one logical plan
Algebraic laws
foundation for every optimization
Two approaches to optimizations
Heuristics apply laws that seem to result in
cheaper plans
Cost based estimate size and cost of
intermediate results, search systematically for
best plan

51
The three components of an optimizer

We need three things in an optimizer
Algebraic laws
An optimization algorithm
A cost estimator

52
Algebraic Laws

Commutative and Associative Laws
R ? S S ? R, R ? (S ? T) (R ? S) ? T
R n S S n R, R n (S n T) (R n S) n T
R ? S S ? R, R ? (S ? T) (R ? S) ? T
Distributive Laws
R ? (S ? T) (R ? S) ? (R ? T)

53
Algebraic Laws

Laws involving selection
s C AND C(R) s C(s C(R)) s C(R) n s C(R)
s C OR C(R) s C(R) U s C(R)
s C (R ? S) s C (R) ? S
When C involves only attributes of R
s C (R S) s C (R) S
s C (R ? S) s C (R) ? s C (S)
s C (R n S) s C (R) n S

54
Algebraic Laws

Example R(A, B, C, D), S(E, F, G)
s F3 (R ?DE S)
?
s A5 AND G9 (R ?DE S)
?

55
Algebraic Laws

Laws involving projections
PM(R ? S) PN(PP(R) ? PQ(S))
Where N, P, Q are appropriate subsets of
attributes of M
PM(PN(R)) PM,N(R)
Example R(A,B,C,D), S(E, F, G)
PA,B,G(R ? S) P ? (P?(R) ? P?(S))

56
Algebraic Laws

Laws involving grouping and aggregation
?(?A, agg(B)(R)) ?A, agg(B)(R)
?A, agg(B)(?(R)) ?A, agg(B)(R) if agg is
duplicate insensitive
Which of the following are duplicate
insensitive ?sum, count, avg, min, max
?A, agg(D)(R(A,B) ?BC S(C,D)) ?A,
agg(D)(R(A,B) ?BC (?B, agg(D)S(C,D)))
Why is this true ?
Why would we do it ?

57
Heuristic Based Optimizations

Query rewriting based on algebraic laws
Result in better queries most of the time
Heuristics number 1
Push selections down
Heuristics number 2
Sometimes push selections up, then down

58
Predicate Pushdown
pname
pname
s pricegt100 AND cityUrbana
makername
makername
cityUrbana
pricegt100
Product
Company
Company
Product
The earlier we process selections, less tuples we
need to manipulate higher up in the tree (but may
cause us to loose an important ordering of the
tuples, if we use indexes).
59
Predicate Pushdown
Select y.name, Max(x.price) From product x,
company y Where x.maker y.name GroupBy
y.name Having Max(x.price) gt 100
Select y.name, Max(x.price) From product x,
company y Where x.makery.name and
x.price gt 100 GroupBy y.name Having Max(x.price)
gt 100

For each company, find the maximal price of its
products.
Advantage the size of the join will be smaller.
Requires transformation rules specific to the
grouping/aggregation
operators.
Wont work if we replace Max by Min.

60
Pushing predicates up
Bargain view V1 categories with some pricelt20,
and the cheapest price
Select V2.name, V2.price From V1, V2 Where
V1.category V2.category and V1.p
V2.price
Create View V1 AS Select x.category,
Min(x.price) AS p From product x Where
x.price lt 20 GroupBy x.category
Create View V2 AS Select y.name, x.category,
x.price From product x, company y Where
x.makery.name
61
Query RewritePushing predicates up
Bargain view V1 categories with some pricelt20,
and the cheapest price
Select V2.name, V2.price From V1, V2 Where
V1.category V2.category and V1.p
V2.price AND V1.p lt 20
Create View V1 AS Select x.category,
Min(x.price) AS p From product x Where
x.price lt 20 GroupBy x.category
Create View V2 AS Select y.name, x.category,
x.price From product x, company y Where
x.makery.name
62
Query RewritePushing predicates up
Bargain view V1 categories with some pricelt20,
and the cheapest price
Select V2.name, V2.price From V1, V2 Where
V1.category V2.category and V1.p
V2.price AND V1.p lt 20
Create View V1 AS Select x.category,
Min(x.price) AS p From product x Where
x.price lt 20 GroupBy x.category
Create View V2 AS Select y.name, x.category,
x.price From product x, company y Where
x.makery.name AND V1.p lt 20
63
Cost-based Optimizations

Main idea apply algebraic laws, until estimated
cost is minimal
Practically start from partial plans, introduce
operators one by one
Will see in a few slides
Problem there are too many ways to apply the
laws, hence too many (partial) plans

Write a Comment

User Comments (0)