Title: Examples of Physical Query Plan Alternatives
1Examples of Physical Query Plan Alternatives
- Selections from Chapters 12, 14, 15
2Query Optimization
- NOTE Relational query languages provide a wide
variety of ways in which a user can express. - HENCE system has many options for evaluating a
query. - Optimizer is important for query performance.
- Generates alternative plans
- Choose plan with least estimated cost.
- Ideally, find best plan.
- Realistically, consistently find a quite good one.
3A Query (Evaluation) Plan
- An extended relational algebra tree
- Annotations at each node indicate
- access methods to use for each table.
- implementation methods used for each relational
operator.
4Query Optimization
- Multi-operator Queries Pipelined Evaluation
- On-the-fly The result of one operator is
pipelined to another operator without creating a
temporary table to hold intermediate result,
called on-the-fly. - Materialized Otherwise, intermediate results
must be materialized.
C
B
A
5Alternative Plans Schema Examples
Sailors (sid integer, sname string, rating
integer, age real) Reserves (sid integer, bid
integer, day dates, rname string)
- Reserves
- Each tuple is 40 bytes long,
- 100 tuples per page,
- 1000 pages.
- Sailors
- Each tuple is 50 bytes long,
- 80 tuples per page,
- 500 pages.
6Alternative Plans Motivating Example
SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
RA Tree
7RA Tree
SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
- Costs
- 1. Scan Sailors
- For each page of Sailors, scan Reserves
- 5005001000 I/Os
- Or,
- 2. Scan Reserves
- For each page of Reserves, scan Sailors
- 10001000 500 I/Os
Plan
8Alternative Plans Motivating Example
RA Tree
SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
- Cost 5005001000 I/Os
- Almost the worst plan!
- Reasons
- selections could be pushed earlier,
- no use made of indexes
- Goal of optimization To find more efficient
plans that compute the same answer.
Plan
9Alternative Plans 1 (No Indexes)
- Main difference push selects.Reduce size of
table to be joined - With 5 buffers, cost of plan
- Scan Reserves (1000) write temp T1 (10 pages,
if we have 100 boats, uniform distribution). - Scan Sailors (500) write temp T2 (250 pages, if
we have 10 ratings). - Sort T1 (2210), sort T2 (24250), merge
(10250) - Total 4060 page I/Os.
- Optimization1 block nested loops join
- join cost 104250, total cost 2770.
- Optimization2 push projections
- T1 has only sid, T2 only sid and sname
- T1 fits in 3 pages, cost of BNL drops to under
250 pages, total lt 2000.
10Alternative Plan Using Index ?
- Push Selections Down ?
- What indices help here?
- Index on Reserves.bid?
- Index on Sailors.sid?
- Index on Reserves.bid?
- Index on Sailors.rating?
11Example Plan With Index
- With index on Reserves.bid
- Assume 100 different bid values.
- Assume 100,000 tuples.
- Assume 100 tuples/disk page
- We get 100,000/100 1000 tuples
- On 1000/100 10 disk pages.
- If index clustered,
- Cost 10 I/Os.
12Example Plan Use Another Index
- Index on Sailors.bid?
- Selection on bid reduces number of tuples
considered in join. - INL with pipelining
- Outer is not materialized
- Projecting out unnecessary fields from outer
doesnt help.
13Example Plan Continued
- Index on Sailors.sid
- - Join column sid is key for Sailors.
- - At most one matching tuple,
unclustered on sid OK. - Cost?
- - For each Reserves tuples (1000) get
matching Sailors tuple (1.2 I/O) so total 1210
I/Os.
14Alternative Plan With Second Index
- Selection Pushing down?
- Push (ratinggt5) before join ?
- Answer
- No, because of availability of sid index on
Sailors. - Reason
- No index on selection result.
- Then selection requires scan Sailors.
15Summary
- A query is evaluated by converting it to a tree
of operators and evaluating the operators in the
tree. - There are several alternative evaluation
algorithms for each relational operator. - Query evaluation must compare alternative plans
based on their estimated costs - Must understand query optimization in order to
fully understand the performance impact of a
given database design on a query workload