Query Optimization - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Query Optimization

Description:

Big Cat Shere Khan 100 70 Big Cat Bageera 100 60 Bear Baloo 300 50 Big Cat Aslan 100 40 Operators Query Plan Query Result Query Query parser Query optimizer SELECT ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 25
Provided by: eben
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization


1
Query Optimization
  • RG, Chapter 15
  • Lecture 16

2
Administrivia
  • Homework 3 available today
  • Written exercise will be posted on class website
  • Due date Tuesday, March 20 by end of class
    period
  • Homework 4 available later this week
  • Implement nested loops and hash join operators
    for minibase
  • Due date April 5 (after Spring Break)
  • Midterm 2 is 3/22, 2 weeks from today
  • In class, covers lectures 10-17
  • Review will be held Tuesday 3/20 7-9 pm 306 Soda
    Hall

3
Review
Now you are here
  • Query plans are a tree of operators that compute
    the result of a query
  • Optimization is the process of picking the best
    plan
  • Execution is the process of executing the plan

You were here
  • Operators are the building blocks for computing
    results of queries
  • Sort
  • Project
  • Join
  • Filter
  • Access methods for files
  • ...

4
Query Plans turning text into tuples
Query Result
Query
Shift
Name
SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Aslan 3
Bageera 3
Elsa 3
Shere Khan 3
Tigger 2
Operators
Query Plan
10 2 1 100 3
10 3 2 100 3
20 3 2 100 3
20 2 3 100 3
30 3 2 100 3

40 100 Aslan Big Cat
50 300 Baloo Bear
60 100 Bageera Big Cat
70 100 Shere Khan Big Cat
90 100 Dumbo Elephant

5
Operator Review
  • Access Path pulls tuples from tables
  • File scans
  • Index scans (clustered or unclustered)
  • Index-only scans
  • Select (or Filter) conditionally excludes tuples
  • Can be pushed/combined with Access Path
    operator
  • Use indexes where possible and apply other
    predicates on the result
  • Can also be applied at intermediate point in
    query plan
  • Projection removes columns and duplicates
  • Column projection often done by operators
  • Duplicate elimination via Sort or Hash

6
Operator Review
  • Sort sorts tuples in a particular order
  • Simple merge sort
  • General external merge sort (with various
    optimizations)
  • B tree traversal
  • Join combine tuples from 2 other operators
  • Page nested loops
  • Block nested loops
  • Index nested loops
  • Sort-merge join
  • Hash-join
  • Other operators for
  • Group By, Temping,

7
Query Optimization steps
SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Query parser
Block 3
  • Parse query from text to intermediate model
  • Traverse intermediate model and produce
    alternate query plans
  • Query plan tree of relational operators
  • Optimizer keeps track of cost and properties of
    plans
  • Pick the cheapest plan
  • Pass cheapest plan on to query execution engine
    to execute and produce results of query

Block 2
Block 1
Query optimizer
Cost 200
Cost 500
Cost 150
To execution engine
8
Query Blocks Units of Optimization
  • Intermediate model is a set of query blocks
  • 1 per SELECT/FROM/WHERE/GROUP BY/HAVING clause

SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Query Block
  • Subqueries produce nested query blocks
  • treated as calls to a subroutine, made once per
    tuple produced by outer query block
  • sometimes subqueries can be rewritten to produce
    cheaper plan

SELECT S.sname FROM Sailors S WHERE S.age IN
(SELECT MAX (S2.age) FROM Sailors
S2 GROUP BY S2.rating)
X
Outer Query Block
Rewritten Query Block
Nested Query Block
9
Query blocks are optimized 1 at a time
  • Convert block to relational algebra tree
  • Traverse tree and build plan bottom up
  • Pick best access method for each relation in FROM
    clause
  • Applying predicates if possible
  • Consider all join trees
  • All ways to join relations in FROM clause 1-at-a
    time
  • Consider multiple permutations and join methods
  • But not all! too many choices
  • Restrict to left-deep plans
  • Prune bad plans along the way

Query Block
10
Converting Query Blocks to Relational Algebra
Trees
  • SQL is relationally complete can express
    everything in relational algebra

SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
?(sname)?(bid100 ? rating gt 5) (Reserves ??
Sailors)
11
SQL extends Relational Algebra
  • SQL is more powerful than relational algebra
  • extend relational algebra to include aggregate
    ops GROUP BY, HAVING
  • How is this query block expressed?
  • SELECT S.sname
  • FROM Sailors S
  • WHERE S.age IN (constant set from subquery)
  • And this query block?
  • SELECT MAX (S2.age)
  • FROM Sailors S2
  • GROUP BY S2.rating

?sname(s(age in set from subquery) Sailors)
s(age in set from subquery) Sailors
?Max(age)(GroupByRating(Sailors) )
GroupByRating(Sailors)
12
Why optimize?
  • Operators have implementation choices
  • Index scan? File scan? Nested loop join? Sort
    merge?
  • Operators can also be applied in different order!

13
Motivating Example -- Schema used
Sailors (sid integer, sname string, rating
integer, age real) Reserves (sid integer, bid
integer, day dates, rname string)
  • As seen in previous lectures
  • Reserves
  • Each tuple is 40 bytes long, 100 tuples per
    page, 1000 pages.
  • Assume there are 100 boats
  • Sailors
  • Each tuple is 50 bytes long, 80 tuples per page,
    500 pages.
  • Assume there are 10 different ratings
  • Assume we have 5 pages in our buffer pool!

14
Motivating Example
SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
  • Cost 5005001000 I/Os
  • Not the worst plan, but
  • Misses several opportunities
  • selections could have been pushed earlier,
  • indexes might have been helpful.
  • Goal of optimization To find more efficient
    plans that compute the same answer.

Plan
1000
500
15
Selectivity calculation
  • Sailors 500 pages, 80 tuples per page, 10
    ratings
  • Selectivity of S.rating gt 5?
  • ½ -gt 50080/2 20,000 tuples
  • 20,000/80 250 pages
  • Reserves 1000 pages, 100 tuples per page, 100
    boats
  • Selectivity of R.bid 100?
  • 1/100 -gt 1000100/100 1000 tuples
  • 1000/100 10 pages

16
Alternative Plans Push Selects (No Indexes)
250
1000
500
1000
500
500,500 IOs
250,500 IOs
500 250 1000
17
Alternative Plans Push Selects (No Indexes)
250
10
250
1000
1000
500
250,500 IOs
500 250 1000
250,500 IOs
18
Alternative Plans Try different join order
10
swap
500
1000
6000 IOs
1000 10 500
250,500 IOs
19
Alternative Plans Push Selects and precompute
result (No Indexes)
10
250
10
250
500
1000
500
1000
6000 IOs
1000 500 250 (10 250)
4250 IOs
20
Alternative Plans Try different join order
10
250
10
250
swap
10
250
1000
500
500
1000
4250 IOs
500 1000 10 (250 10)
4010 IOs
21
Optimized query is 124x cheaper than the original!
4010 IOs
500,500 IOs
22
More Alternative Plans (No Indexes)
  • Main difference Sort Merge Join
  • With 5 buffers, cost of plan
  • Scan Reserves (1000) write temp T1 (10 pages,
    if we have 100 boats,
    uniform distribution) 1010.
  • Scan Sailors (500) write temp T2 (250 pages, if
    have 10 ratings) 750.
  • Sort T1 (2210) sort T2 (24250) merge
    (10250) 2300
  • Total 4060 page I/Os.
  • If use BNL join, join 104250, total cost
    2770.
  • Can also push projections, but must be careful!
  • T1 has only sid, T2 only sid, sname
  • T1 fits in 3 pgs, cost of BNL under 250 pgs,
    total lt 2000.

log4 ceil(10/5) 1 log4 ceil(250/5))3
Ceil(10/3) 4
23
More Alt Plans Indexes
(On-the-fly)
sname
(On-the-fly)
  • With clustered index on bid of Reserves, we get
    100,000/100 1000 tuples on 1000/100 10
    pages.
  • INL with outer not materialized.

rating gt 5
(Index Nested Loops,
with pipelining )
sidsid
(Use hash Index, do not write to temp)
bid100
Sailors
  • Projecting out unnecessary fields from outer
    doesnt help.

Reserves
  • Join column sid is a key for Sailors.
  • At most one matching tuple, unclustered index on
    sid OK.
  • Decision not to push ratinggt5 before the join
    is based on
  • availability of sid index on Sailors.
  • Cost Selection of Reserves tuples (10 I/Os)
    then, for each,
  • must get matching Sailors tuple (10001.2)
    total 1210 I/Os.

10 I/Os for 1000 tuples on 10 pages For each
tuple assume 1.2 pages to find match
24
What is needed for optimization?
  • A closed set of operators
  • Relational ops (table in, table out)
  • Encapsulation based on iterators
  • Plan space, based on
  • Based on relational equivalences, different
    implementations
  • Cost Estimation, based on
  • Cost formulas
  • Size estimation, based on
  • Catalog information on base tables
  • Selectivity (Reduction Factor) estimation
  • A search algorithm
  • To sift through the plan space based on cost!
Write a Comment
User Comments (0)
About PowerShow.com