Title: SQL: Queries, Programming, Triggers
1Database Systems II Query Execution
2Introduction
- We have optimized the logical query plan,
applying relational algebra equivalences. - In order to refine this plan into a physical
query plan, we in particular need to choose one
of the available algorithms to implement the
basic operations (selection, join, . . . ) of the
query plan. - For each alternative physical query plan, we
estimate its cost. - The cost estimates are based on the size
estimates that we discussed in the previous
chapter.
3Introduction
- Disk I/O (read / write of a disk block) is orders
of magnitude more expensive than CPU operations. - Therefore, we use the number of disk I/Os to
measure the cost of a physical query plan. - We ignore CPU costs, timing effects, and double
buffering requirements. - We assume that the arguments of an operator are
found on disk, but the result of the operator is
left in main memory.
4Introduction
- We use the following parameters (statistics) to
express the cost of an operator- B(R) of
blocks containing R tuples,- f(R) max of
tuples of R per block,- M memory blocks
available in the buffer,- HT(i) levels in
index i,- LB(i) of leaf blocks in index i. - M may comprise the entire main memory, but
typically the main memory needs to be shared with
other processes, and M is much (!) smaller.
5Introduction
- The performance of relational operators depends
on many parameters such as the following ones. - Are the tuples of a relation stored physically
contiguous (clustered)? If yes, the number of
blocks to be read is greatly reduced compared to
non-clustered storage. - Is a relation sorted by the relevant (selection,
join) attribute? Otherwise, it may need to be
sorted on-the-fly. - Which indexes exist? Some algorithms require the
existence of a corresponding index.
6Introduction
- Each operator (selection, join, . . .) in a
logical query plan can be implemented by one of a
fairly large number of alternative algorithms . - We distinguish three types of algorithms-
sorting-based algorithms,- hash-based
algorithms,- index-based algorithms. - Sorting, building of hash table or building of
index can either have happened in advance or may
happen on the fly.
7Introduction
- We can also categorize algorithms according to
the number of passes over the data - - one-pass algorithms read data only once
from disk,- two-pass algorithms read data
once from disk, write intermediate relation
back to disk and then read the intermediate
relation once.- multiple-pass algorithms
perform more than two passes over the data,
not considered in class.
8One-Pass Algorithms for Unary Operations
- Consider the unary, tuple-at-a-time operations,
selection and projection on relation R. - Read all the blocks of R into the input buffer,
one at a time. - Perform the operation on each tuple and move the
selected / projected tuple to the output buffer.
Inputbuffer
Outputbuffer
Unaryoperation
R
9One-Pass Algorithms for Unary Operations
- Output buffer may be input buffer of other
operation and is not counted. - Thus, algorithm requires only M 1 buffer
blocks. - I/O cost is B(R).
- If some index is applicable for a selection, have
to read only blocks that contain qualifying
tuples.
10One-Pass Algorithms for Binary Operations
- Binary operations union, intersection,
difference, Cartesian product, and join. - Use subscripts B and S to distinguish between the
set- and bag version, e.g. - The bag union can be computed
using a very simple one-pass algorithm copy each
tuple of R to the output, and copy each tuple of
S to the output. - I/O cost is B(R) B(S), M 1.
11One-Pass Algorithms for Binary Operations
- Other binary operations require the reading of
the smaller of the two input relations into main
memory. - One buffer to read blocks of the larger relation,
M-1 buffers for holding the entire smaller table. - I/O cost is B(R) B(S).
- In main memory, a data structure is built that
efficiently supports insertions and searches. - Data structure, e.g., hash table or binary
balanced tree. Space overhead can be neglected. -
12One-Pass Algorithms for Binary Operations
- For set union, read the smaller relation (S) into
M-1 buffers, representing it in a data structure
whose search key consists of all attributes. - All these tuples are also copied to the output.
- Read all blocks of R into the M-th buffer, one at
a time. - For each tuple t of R, check whether t is in S.
If not, copy t to the output. - For set intersection, copy t to output if it also
is in S.
13Nested-Loop Joins
- We now consider algorithms for the join operator.
- The simplest one is the nested-loop join, a
one-and-a-half pass algorithm. - One table is read once, the other one multiple
times. - It is not necessary that one relation fits in
main memory. - Perform the join through two nested loops over
the two input relations.
14Nested-Loop Joins
- Tuple-based nested-loop join
- natural join R S, join attribute
C - for each r ? R do
- for each s ? S do
- if r.C s.C then output (r,s)
- Outer relation R, inner relation S.
- One buffer for outer relation, one buffer for
inner relation. - M 2.
- I/O cost is T(R) x T(S).
15Nested-Loop Joins
- Example
- Relations not clustered
- T(R1) 10,000 T(R2) 5,000
- R1 as the outer relation
- Cost for each R1 tuple t1
- read tuple t1 read relation R2
- Total I/O cost is 10,000 (15,000)50,010,000
16Nested-Loop Joins
- Can do much better by organizing access to both
relations by blocks. - Use as much buffer space as possible (M-1) to
store tuples of the outer relation. - Block-based nested-loop join
- for each chunk of M-1 blocks of R do
read these blocks into the buffer - for each block b of S do
read b into the buffer for each tuple t of b
do find the tuples of R that join
with t and output the join results
17Nested-Loop Joins
- Example
- R1 as the outer relation
- T(R1) 10,000, T(R2) 5,000
- S(R1) S(R2) 1/10 block
- M 101, 100 buffers for R1, 1 buffer for R2
- 10 R1 chunks
- cost for each R1 chunk
- read chunk 1,000 IOs
- read R2 5,000 IOs
- total I/O cost is 10 x 6,000 60,000 IOs
18Nested-Loop Joins
- Can do even better by reversing the join order.
- R2 R1
- T(R1) 10,000, T(R2) 5,000
- S(R1) S(R2) 1/10 block
- M 101, 100 buffers for R2, 1 buffer for R1
- 5 R2 chunks
- cost for each R2 chunk
- read chunk 1,000 IOs
- read R1 10,000 IOs
- total I/O cost is 5 x 11,000 55,000 IOs
19Nested-Loop Joins
- Finally, performance is dramatically improved
when input relations are clustered. - With clustered relations, for each R2 chunk
- read chunk 100 IOs
- read R1 1,000 IOs
- Total I/O is 5 x 1,100 5,500 IOs.
- Note that the IO cost for a one-pass join (which
has the minimum IO of any join algorithm) in this
example is 1,000 500 1,500 IOs. - For a comparison, the one-pass join requires
M501 buffer blocks, which is optimal.
20Two-Pass Algorithms Based on Sorting
- If the input relations are sorted, the efficiency
of duplicate elimination, set-theoretic
operations and join can be greatly improved. - Reserve one buffer for each of the input
relations R and S and another buffer for the
output. - Scan both relations simultaneously in sort order,
merging matching tuples. - For example, for set intersection repeatedly
consider the tuple t that is least in the sort
order (w.r.t. primary key) among all tuples in
the input buffer. If it appears in both R and S,
output t.
21Two-Pass Algorithms Based on Sorting
- In the following, we present a simple sort-merge
join algorithm. - It is called merge-join, if step (1) can be
skipped, since the input relations R1 and R2 are
already sorted. - Sort-merge join
- (1) if R1 and R2 not sorted, sort them
- (2) i ? 1 j ? 1
- while (i ? T(R1)) ? (j ? T(R2)) do
- if R1 i .C R2 j .C then outputTuples
- else if R1 i .C gt R2 j .C then j ? j1
- else if R1 i .C lt R2 j .C then i ? i1
22Two-Pass Algorithms Based on Sorting
- Procedure outputTuples produces all pairs of
tuples from R1 and R2 with C R1 i .C R2 j
.C. - In the worst case, need to match each pairs of
tuples from R1 and R2 (nested-loop join). - Procedure outputTuples
- While (R1 i .C R2 j .C) ? (i ? T(R1)) do
- jj ? j
- while (R1 i .C R2 jj .C) ? (jj ?
T(R2)) do - output pair R1 i , R2 jj
- jj ? jj1
- i ? i1
23Two-Pass Algorithms Based on Sorting
Example i R1i.C R2j.C j 1 10
5 1 2 20 20 2 3 20
20 3 4 30 30 4 5 40
30 5 50 6 52 7
24Two-Pass Algorithms Based on Sorting
- Example
- Both R1, R2 ordered by C relations clustered.
Memory
..
R1
R1 R2
..
R2
Total cost read R1 cost read R2 cost
1,000 500 1,500 IOs
25Two-Pass Algorithms Based on Sorting
- What if input relations are not yet in the
required sort order? - Do Two-Phase, Multiway Merge-Sort (2PMMS).
- Phase 1 Sort each block of relation R separately
in main memory, write sorted sublists back to
disk. - Phase 2 Merge all the B(R) sorted sublists.
input buffer (sorted)
selectsmallestunchosen
output buffer
. . .
pointer to firstunchosen tuple
26Two-Pass Algorithms Based on Sorting
- Each sorted sublist has a length of M blocks.
- Number of sublists is B(R)/M.
- Therefore,
- This means we require
- In phase 1, each tuple is read and written once.
In phase 2, each tuple is read again. We ignore
the cost of writing the results to disk. - Thus, the IO cost is 3B(R).
27Two-Pass Algorithms Based on Sorting
- IO cost is 4B(R), if sorting is used as a first
step of sort-join and the results must be written
to the disk. - If relation R is too big, apply the idea
recursively. - Divide R into chunks of size M(M-1), use 2PMMS to
sort each one, and take resulting sorted lists as
input for a third (merge) phase. - This leads to Multi-Phase, Multiway Merge Sort.
28Two-Pass Algorithms Based on Sorting
(i) For each 100 blk chunk of R - read
chunk - sort in memory - write to disk
sorted chunks
R1
...
R2
Memory
29Two-Pass Algorithms Based on Sorting
(ii) Read all chunks merge write out Sorted
file Memory Sorted Chunks
...
...
30Two-Pass Algorithms Based on Sorting
Sort cost cach tuple is read, written,
read, written Join cost each tuple is read
Sort cost R1 4 x 1,000 4,000 Sort cost R2
4 x 500 2,000 Total cost sort cost join
cost 6,000 1,500 7,500 IOs
31Two-Pass Algorithms Based on Sorting
- Nested loop join (best version discussed above)
needs only 5,500 IOs, i.e. outperforms sort-join. - However, the situation changes for the following
scenario - R1 10,000 blocks clustered
- R2 5,000 blocks not ordered.
- R1 is 1000 blocks, sorting needs M ? 31.62.R2
is 500 blocks, sorting needs M ? 22.36. I.e.,
need at least M32 buffers.
32Two-Pass Algorithms Based on Sorting
- Nested-loops join 5000 x (10010,000) 50 x
10,100 - 100
- 505,000 IOs
-
- Sort-join 5(10,0005,000) 75,000 IOs
- Sort-join clearly outperforms nested-loop join!
33Two-Pass Algorithms Based on Sorting
- Simple sort-join costs 5(B(R) B(S)) IOs.
- It requires
- It assumes that tuples with the same join
attribute value fit in M blocks. - If we do not have to worry about large numbers of
tuples with the same join attribute value, then
we can combine the second phase of the sort with
the actual join (merge). - We can save the writing to disk in the sort step
and the reading in the merge step.
34Two-Pass Algorithms Based on Sorting
- This algorithm is an advanced sort-merge join.
- Repeatedly find the least C-value c among the
tuples in all input buffers. - Instead of writing a sorted output buffer to
disk, and reading it again later, identify all
the tuples of both relations that have Cc. - Cost is only 3(B(R) B(S)) IOs.
- Since we have to simultaneously sort both input
tables and keep them in memory, the memory
requirements are getting larger
35Two-Pass Algorithms Based on Hashing
- If both input relations are too large to be
stored in the buffer, hash all the tuples of both
relations applying the same hash function to the
join attribute(s). - Hash function h has domain of k hash values, i.e.
k buckets. - Only tuples from R and S that fall into the same
bucket i can join. - Hash first relation R, then relation S, writing
the buckets to disk.
36Two-Pass Algorithms Based on Hashing
- To hash relation R, read it block by block.
- Allocate one buffer block to each of the k
buckets. - For each tuple t, move it to the buffer of h(t).
- If a buffer is full, write it to disk and
initialize it. - Finally, write to disk all partially full buffer
blocks. - IO cost is B(R).
- Memory requirement M k1.
37Two-Pass Algorithms Based on Hashing
- For each i, read the i-th bucket of R into
completely memory, and read the i-th bucket of S
into memory, one block at a time. - For each S tuple s in the buffer block, determine
matching tuples r in R and output (r,s). - We assume that each bucket fits into main memory.
38Two-Pass Algorithms Based on Hashing
Hash join Hash function h, range 0 . . .
k Buckets for R1 G0, G1, ... Gk Buckets for R2
H0, H1, ... Hk Algorithm (1) Hash R1 tuples into
G buckets (2) Hash R2 tuples into H buckets (3)
For i 0 to k do match tuples in buckets Gi,
Hi and output results
39Two-Pass Algorithms Based on Hashing
Example hash function even/odd buckets
R1 R2 Buckets 2 5 Even 4 4 3
12 Odd 5 3 8 13 9
8 11 14
2 4 8
4 12 8 14
3 5 9
5 3 13 11
R1 R2
40Two-Pass Algorithms Based on Hashing
- R, S clustered (un-ordered).
- Use 100 hash buckets of 10 blocks each.
- To hash R read R, hash, and write buckets.
- Hash S in the same way.
- R
100
...
...
10 blocks
41Two-Pass Algorithms Based on Hashing
- Suppose R is the smaller of the input relations.
- Read one R bucket, build memory hash table (with
other hash function). - Read corresponding S bucket, one block at a time,
and hash probe. - Repeat same procedure for all buckets.
- R
S
...
R
...
memory
42Two-Pass Algorithms Based on Hashing
- Cost
- Bucketize Read R write
- Read S write
- Join Read R, S
- Total cost 3 x 1,000500 4,500 IO
- This is an approximation, since buckets will vary
in size, and we have to round up to full blocks.
43Two-Pass Algorithms Based on Hashing
- Memory requirements
- Size of R bucket (B(R)/M-1)
- k M-1 number of hash buckets
- This is assuming that all hash buckets of R have
the same size. - Same calculation for S.
- The buckets for the smaller input relation must
fit into main memory.
44Index-Based Algorithms
- Index-based algorithms are especially useful for
the selection operator, but also for the join
operator. - We distinguish clustering and non-clustering
indexes. - A clustering index is an index where all tuples
with a given search key value appear on (roughly)
as few blocks as possible. - One relation can have only one clustering index,
but multiple non-clustering indexes.
45Index-Based Algorithms
Index join For each r ? R do X ?
index (S, C, r.C) for each s ? X do
output (r,s) index(rel, attr,
value) returns the set of rel tuples with attr
value
46Index-Based Algorithms
- ExampleAssume R.C index exists 2 levels.Assume
S clustered, unordered.Assume R.C index fits in
memory. - Cost reads of S 500 IOs
- for each S tuple
- - probe index no IO
- - if match, read R tuple 1 IO.
47Index-Based Algorithms
What is expected number of matching tuples?
(a) say R.C is key, S.C is foreign key then
expect 1 match
(b) say V(R,C) 5000, T(R) 10,000 with
uniform distribution assumption expect
10,000/5,000 2 matching tuples.
48Index-Based Algorithms
What is expected number of matching tuples?
(c) Say DOM(R, C)1,000,000 T(R)
10,000 with alternate assumption expect
10,000 1 matches 1,000,000
100
49Index-Based Algorithms
Total cost of index join
(a) Total cost 5005000(1)1 5,500 IO (b)
Total cost 5005000(2)1 10,500 IO (c) Total
cost 5005000(1/100)1550 IO
50Index-Based Algorithms
What if index does not fit in memory?
- Example say R1.C index is 201 blocks.
- Keep root 99 leaf nodes in memory.
- Expected cost of each probe is
- E (0)99 (1)101 ? 0.5.
- 200 200
51Index-Based Algorithms
- Total cost (including probes)
- For case (b)
- 5005000 probe get records
- 5005000 0.52 uniform assumption
- 50012,500 13,000 IOs
- For case (c)
- 50050000.5 ? 1 (1/100) ? 1
- 500250050 3,050 IOs
52Summary Join Algorithms
- Nested-loop join ok for small relations
(relative to memory size). - Hash-join usually best for equi-join, where
relations not sorted and no indexes exist. - Sort-merge join good for non-equi-join e.g.,
R.C gt S.C. - If relations already sorted, use merge join.
- If index exists, index-join can be efficient
- (depends on expected result size).