Title: ICS 214A: Database Management Systems Winter 2004
1ICS 214A Database Management Systems Winter 2004
- Lecture 10
- Part III Query Processing and Optimization
- Professor Chen Li
2Big picture
We are here!
SQL statements
Access plan
Query Processor
optimizer
Read write records, scan relations
Get page containing tuples
Indexes
Record-oriented file system
Buffer manager
Basic file system
Read/write file pages
Hardware
3Query Processor Optimizer
- Overview
- Query processor
- Query optimizer
4Overview query processing steps
- Parsing
- ? Validation
- ? Optimization
- ? Compilation
- ? Execution
5Example SQL query
- Find the movies with stars born in 1960
- SELECT title
- FROM StarsIn
- WHERE starName IN (
- SELECT name
- FROM MovieStar
- WHERE birthdate LIKE 1960
- )
6Example Parse Tree
ltQuerygt
ltSFWgt
SELECT ltSelListgt FROM ltFromListgt
WHERE ltConditiongt
ltAttributegt ltRelNamegt
ltTuplegt IN ltQuerygt
title StarsIn
ltAttributegt ( ltQuerygt )
starName ltSFWgt
SELECT ltSelListgt FROM ltFromListgt
WHERE ltConditiongt
ltAttributegt ltRelNamegt
ltAttributegt LIKE ltPatterngt
name MovieStar
birthDate 1960
7Example Logical Query Plan
?title
?starNamename
?
StarsIn ?name
?birthdate LIKE 1960
MovieStar
8Example Improved Logical Query Plan
?title
starNamename
StarsIn ?name
?birthdate LIKE 1960
MovieStar
9Example One Physical Plan
Hash join
SEQ scan
index scan
StarsIn MovieStar
10Logical vs Physical Plan
Nested Loop
Merge Join
File scan R3
Sort
Sort
File scan R1
File scan R2
11Role of Optimizer
- Convert a logical plan to an optimal physical
plan w.r.t. one or more performance metrics. - Performance Metrics
- I/O,
- CPU,
- Communication,
- Power Consumption,
- Cache misses,
- of internet queries,
- .
12Two topics
- Query processor
- Relational algebra level
- Physical operator level
- estimate costs
- Query optimizer
13Operators in Relational Model
- Tuple-level unary operators ?,?
- can be evaluated immediately
- tuple at a time without looking at entire
relation - Relation-level unary operators
- S - duplicate removal
- VL- aggregations, grouping
- E - sorting
- Binary operators
-
14Example
R(A,B,C), S(C,D,E)
- Select B,D
- From R,S
- Where R.A c ? S.E 2 ? R.CS.C
15 R A B C S C D E a 1 10 10 x 2 b 1 20 2
0 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50
y 3
16Plan I
- ?B,D
-
- sR.Ac? S.E2 ? R.CS.C
- x
- R S
OR ?B,D sR.Ac? S.E2 ? R.C S.C (R x S)
17R x S R.A R.B R.C S.C S.D S.E a 1 10 10
x 2 a 1 10 20 y 2 . .
C 2 10 10 x 2 . .
18Plan II
natural join
19 R S A B C s (R) s(S) C D
E a 1 10 A B C C D E 10
x 2 b 1 20 c 2 10 10 x 2 20 y
2 c 2 10 20 y 2 30 z 2 d 2
35 30 z 2 40 x 1 e 3 45
50 y 3
20Plan III
- Use R.A and S.C Indexes
- (1) Use R.A index to select R tuples with R.A
c - (2) For each R.C value found, use S.C index to
find matching tuples - (3) Eliminate S tuples S.E ? 2
- (4) Join matching R,S tuples, project B,D
attributes and place in result
21 R S A B C C D E a 1
10 10 x 2 b 1 20 20
y 2 c 2 10 30 z 2 d 2 35
40 x 1 e 3 45
50 y 3
A
C
I1
I2
22Query Processor physical level
- Implements a bunch of algorithms that form the
building blocks of the physical plan. - Unary operators
- sort, hash, select, project, aggregation, group
by. - binary operators
- join, union, intersection, difference.
- Important Issue
- Query processor architecture
- how is the execution of different physical
operators synchronized and how is data passed
form one operator to another?
23Query Processor Architecture
- Consider a simple join operator and query
consisting of 2 joins. How should data be passed
form one join to another? - Create temporary files.
- Create a process for each operator and use IPC
(e.g, shared memory, pipes). - OS does scheduling of these processes.
24Query Processor Architecture (cont)
- Take a query plan and convert as much of it as
possible into a block that can be executed and
one single iterative program with loops and other
control structures - Implement operators as iterators with open,
next, close as a single process easy to
parallelize
25Iterators
- Demand-driven evaluation of query tree.
- Operators exchange data in units such as records
- Each operator supports the following interfaces
- open
- next
- close
- open() at top of tree results in cascade of opens
down the tree. - An operator getting a next() call may recursively
make next() calls from within to produce its next
answer. - close() at top of tree results in cascade of
close down the tree
26Benefits of Iterators
- Query executed as one process
- Items produced one at a time
- No temporary files
- amenable to parallelization
27Sample Iterators
- Print
- Open()
- open input
- Next()
- call next on input format the item to screen
- Close()
- close input
28Sample Iterators (cont)
- Scan
- open
- open file
- next
- read next item
- close
- close file
29Sample Iterators (cont)
- Sort (merge sort)
- open
- open input
- build all initial run files calling next on
input - close input
- merge run files
- next
- determine next output item read new item from
correct run file - close
- destroy remaining run files
30Tuple-level unary operator
- selection and projection
- simple to implement using a scan
- relation scan
- index scan
- Memory requirement is small
- Disk IOs depends on
- Records clustered?
- Indexes available?
31Relation-level unary operators
- Duplicate elimination
- Keep seen records in a buffer
- Check if each record is in the buffer
- To speedup the search, use a hash table
- Grouping (MIN/MAX, COUNT, SUM, AVG)
- Scan the records
- Accumulate values
32Binary operators Union
- Could use set semantics or bag semantics
- Assuming set semantics R ? S
- Assume S is smaller than R
- Read S into buffers
- Build a search structure using the entire tuple
as the key - Output S
- For each tuple in R, output it only if it was not
seen in the structure - How about Bag?
33Binary operators Intersection
- Set or bag
- Assuming set semantics R ? S
- Read S into buffers
- Build a search structure using the entire tuple
as the key - For each tuple in R, output it only if it is in
the structure
34Binary operators Difference
- Set or bag
- Assuming set semantics R - S
- Read S into buffers
- Build a search structure using the entire tuple
as the key - For each tuple in R, output it only if it is NOT
in the structure
35Next
- Binary operators Join
- sort based
- hash based
- nested loop based
- indexed based