Title: Query Processing
1Query Processing
2Basic Steps in Query Processing
- 1. Parsing and translation
- 2. Optimization
- 3. Evaluation
3Basic Steps in Query Processing (Cont.)
- Parsing and translation
- translate the query into its internal form. This
is then translated into relational algebra. - Parser checks syntax, verifies relations
- Evaluation
- The query-execution engine takes a
query-evaluation plan, executes that plan, and
returns the answers to the query.
4Basic Steps Optimization
- A relational algebra expression may have many
equivalent expressions - E.g., ?balance?2500(?balance(account)) is
equivalent to ?balance(?balance?2500(acc
ount)) - Each relational algebra operation can be
evaluated using one of several different
algorithms - Correspondingly, a relational-algebra expression
can be evaluated in many ways. - Annotated expression specifying detailed
evaluation strategy is called an evaluation-plan. - e.g., can use an index on balance to find
accounts with balance lt 2500, - or, can perform complete relation scan and
discard accounts with balance ? 2500
5Basic Steps Optimization (Cont.)
- Query Optimization Amongst all equivalent
evaluation plans choose the one with lowest cost.
- Cost is estimated using statistical information
from the database catalog - e.g. number of tuples in each relation, size of
tuples, etc. - We first study
- How to measure query costs
- Algorithms for evaluating relational algebra
operations - How to combine algorithms for individual
operations in order to evaluate a complete
expression - Then
- We study how to optimize queries, that is, how to
find an evaluation plan with lowest estimated cost
6Measures of Query Cost
- Cost is generally measured as total elapsed time
for answering query - Many factors contribute to time cost
- disk accesses, CPU, or even network communication
- Typically disk access is the predominant cost,
and is also relatively easy to estimate.
Measured by taking into account - Number of seeks average-seek-cost
- Number of blocks read average-block-transfer-co
st - Number of blocks written average-block-transfer-
cost
7Statistics and Catalogs
- Need information about the relations and indexes
involved. Catalogs typically contain at least - tuples (NTuples) and pages (NPages) for each
relation. - distinct key values (NKeys) and NPages for each
index. - Index height, low/high key values (Low/High) for
each tree index. - Catalogs updated periodically.
- Updating whenever data changes is too expensive
lots of approximation anyway, so slight
inconsistency ok. - More detailed information (e.g., histograms of
the values in some field) are sometimes stored.
8Relational Operations
- We will consider how to implement
- Selection ( )
- Projection ( )
- Join ( )
- Since each op returns a relation, ops can be
composed. After we cover the operations, we will
discuss how to optimize queries formed by
composing them.
9Computing Selection ?(attr op value)
- No index on attr
- If rows are not sorted on attr
- Scan all data pages to find rows satisfying
selection condition - Cost F (number of pages in file to be sorted)
- If rows are sorted on attr and op is , gt, lt
then - Use binary search (at log2 F ) to locate first
data page containing row in which (attr value) - Scan further to get all rows satisfying (attr op
value) - Cost log2 F (cost of scan)
10Computing Selection ?(attr op value)
- Clustered B tree index on attr (for or
range search) - Locate first index entry corresponding to a row
in which (attr value). Cost depth of tree - Rows satisfying condition packed in sequence in
successive data pages scan those pages. - Cost number of pages occupied by qualifying
rows
B tree
index entries (containing rows) that
satisfy condition
11Computing Selection ?(attr op value)
- Unclustered B tree index on attr (for or
range search) - Locate first index entry corresponding to a row
in which (attr value). - Cost depth of tree
- Index entries with pointers to rows satisfying
condition are packed in sequence in successive
index pages - Scan entries and sort record Ids to identify
table data pages with qualifying rows - Any page that has at least one such row must be
fetched once. - Cost number of rows that satisfy selection
condition
12Unclustered B Tree Index
index entries (containing row Ids) that
satisfy condition
data page
Data file
B Tree
13Computing Selection ?(attr value)
- Hash index on attr (for search only)
- Hash on value. Cost ? 1.2
- 1.2 typical average cost of hashing (gt 1 due
to possible overflow chains) - Finds the (unique) bucket containing all index
entries satisfying selection condition - Clustered index all qualifying rows packed in
the bucket (a few pages) - Cost number of pages occupies by the bucket
- Unclustered index sort row Ids in the index
entries to identify data pages with qualifying
rows - Each page containing at least one such row must
be fetched once - Cost number of rows in bucket
14Computing Selection ?(attr value)
- Unclustered hash index on attr (for equality
search)
buckets
data pages
15Access Path
- Access path is the notion that denotes algorithm
data structure used to locate rows satisfying
some condition - Examples
- File scan can be used for any condition
- Hash equality search all search key
attributes of hash index are specified in
condition - B tree equality or range search a prefix of
the search key attributes are specified in
condition - B tree supports a variety of access paths
- Binary search Relation sorted on a sequence of
attributes and some prefix of that sequence is
specified in condition
16Access Paths Supported by B tree
- Example Given a B tree whose search key is the
sequence of attributes a2, a1, a3, a4 - Access path for search ?a1gt5 ? a23 ? a3x (R)
find first entry having a23 ? a1gt5 ? a3x and
scan leaves from there until entry having a2gt3 or
a3 ? x. Select satisfying entries - Access path for search ? a23 ? a3 gtx (R)
locate first entry having a23 and scan leaves
until entry having a2gt3. Select satisfying
entries - Access path for search ? a1gt5 ? a3 x (R)
Scan of R
17Choosing an Access Path
- Selectivity of an access path number of pages
retrieved using that path - If several access paths support a query, DBMS
chooses the one with lowest selectivity - Size of domain of attribute is an indicator of
the selectivity of search conditions that involve
that attribute - Example ? CrsCodeCS305 ? GradeB
- a B tree with search key CrsCode has lower
selectivity than a B tree with search key Grade
18Schema for Examples
Sailors (sid integer, sname string, rating
integer, age real) Reserves (sid integer, bid
integer, day dates, rname string)
- Similar to old schema rname added for
variations. - Reserves
- Each tuple is 40 bytes long, 100 tuples per
page, 1000 pages. - Sailors
- Each tuple is 50 bytes long, 80 tuples per page,
500 pages.
19Simple Selections
SELECT FROM Reserves R WHERE R.rname
Joe
- With no index, unsorted
- Must essentially scan the whole relation cost
is 1000 I/Os (pages in R). - With no index, sorted data
- Utilize the sort order on rname by doing a
binary search to locate the first Joe. Cost is
log2 1000 ? 10 I/Os. - With a B tree index on selection attribute
- Use index to find qualifying data entries, then
retrieve corresponding data records. Cost of
finding the starting page is 2 or 3 I/Os for a
clustered index add one more I/O for an
unclustered index add one page per qualifying
tuple. - Hash index
- 1 or 2 I/Os to retrieve the index pages. If 100
reservations by Joe then an additional 1-100 disk
accesses depending how these records are
distributed.
20Using an Index for Selections
SELECT FROM Reserves R WHERE R.rname lt
C
- Cost depends on qualifying tuples, and
clustering. - Assume we estimate roughly 10 of Reserves tuples
will be in result ( 10,000 tuples, or 100
pages). - With a clustered index cost is 100 I/Os 1 or 2
disk accesses for index. - With an unclustered index cost could be as high
as 10,000 I/Os in the worst case. (might be
cheaper to simply scan the entire relation)
21A Note on Complex Selections
(daylt8/9/94 AND rnamePaul) OR bid5 OR sid3
- Selection conditions are first converted to
conjunctive normal form (CNF)
- (daylt8/9/94 OR bid5 OR sid3 ) AND
(rnamePaul OR bid5 OR sid3)
22Two Approaches to General Selections
- Consider the selection condition
- daylt8/9/94 AND bid5 AND sid3
- First approach Find the most selective access
path, retrieve tuples using it, and apply any
remaining terms that dont match the index - A B tree index on day can be used then, bid5
and sid3 must be checked for each retrieved
tuple. - Similarly, a hash index on ltbid, sidgt could be
used daylt8/9/94 must then be checked. - Terms that match the index reduce the number of
tuples retrieved other terms are used to discard
some retrieved tuples, but do not affect number
of tuples/pages fetched.
23Intersection of Rids
- Second approach (if we have 2 or more matching
indexes) - Get sets of rids of data records using each
matching index. - Then intersect these sets of rids.
- Retrieve the records and apply any remaining
terms. - For the given example condition
- If we have a B tree index on day and an index
on sid, we can retrieve rids of records
satisfying daylt8/9/94 using the first, rids of
records satisfying sid3 using the second,
intersect, retrieve records and check bid5.
24The Projection Operation
- To implement projection we have to do the
following - Remove unwanted attributes.
- Eliminate any duplicate tuples produced.
- The expensive part is removing duplicates.
- SQL systems dont remove duplicates unless the
keyword DISTINCT is specified in a query. - There are two basic algorithms
- Sorting Approach.
- Hashing Approach.
25Approach based on sorting
- Modify Pass 1 of external sort to eliminate
unwanted fields. If B buffer pages are
available, runs of about 2B pages can be
produced, but tuples in runs are smaller than
input tuples. (Size ratio depends on and size
of fields that are dropped.) - Modify merging passes to eliminate duplicates.
Thus, number of result tuples smaller than input.
(Difference depends on of duplicates.)
26Example
SELECT DISTINCT R.sid, R.bid FROM Reserves
R
- Cost
- In Pass 1, read original relation (1000 pages),
write out same number of smaller tuples. - Assume we have 20 buffer pages
- Assume that each smaller tuple is 10 bytes long.
- Thus cost is 250 pages (7 runs about 40 pages
each). - In merging passes, fewer tuples written out in
each pass. - The temporary relation can be merged in 1 pass,
since we have 20 buffer pages. - We read the runs at a cost of 250 I/Os and merge
them. - The total cost is 1500 I/Os.
27Projection Based on Hashing
- Partitioning phase
- Read R using one input buffer. For each tuple,
discard unwanted fields, apply hash function h1
to choose one of B-1 output buffers. - Result is B-1 partitions (of tuples with no
unwanted fields). 2 tuples from different
partitions guaranteed to be distinct. - Duplicate elimination phase
- For each partition, read it and build an
in-memory hash table, using hash fn h2 (? h1) on
all fields, while discarding duplicates. - If partition does not fit in memory, can apply
hash-based projection algorithm recursively to
this partition. - Cost For partitioning, read R, write out each
tuple, but with fewer fields. This is read in
next phase. - In our projection example this cost is 1000 2
250 1500 I/Os.
28Discussion of Projection
- Sort-based approach is the standard better
handling of duplicate elimination and result is
sorted. - If an index on the relation contains all wanted
attributes in its search key, can do index-only
scan. - Apply projection techniques to data entries (much
smaller!) - If an ordered (i.e., tree) index contains all
wanted attributes as prefix of search key, can do
even better - Retrieve data entries in order (index-only scan),
discard unwanted fields, compare adjacent tuples
to check for duplicates.
29Computing Joins R AB S
- The cost of joining two relations makes the
choice of a join algorithm crucial - Assume M pages in R, pR tuples per page, N pages
in S, pS tuples per page. - In our examples, R is Reserves and S is Sailors.
- Cost metric of I/Os. We will ignore output
costs.
30Simple Nested Loops Join
foreach tuple r in R do foreach tuple s in S
do if ri sj then add ltr, sgt to result
- For each tuple in the outer relation R, we scan
the entire inner relation S. - Cost M pR M N 1000 1001000500
I/Os. - Page-oriented Nested Loops join For each page
of R, get each page of S, and write out matching
pairs of tuples ltr, sgt, where r is in R-page and
S is in S-page. - Cost M MN 1000 1000500
- If smaller relation (S) is outer, cost 500
5001000 - Choose smaller relation for the outer loop
31Block Nested Loops Join
- Use one page as an input buffer for scanning the
inner S, one page as the output buffer, and use
all remaining pages to hold block of outer R. - For each matching tuple r in R-block, s in
S-page, add ltr, sgt to result. Then read next
R-block, scan S, etc. - Cost can be reduced to
- M (M/(B-2)) ? N (by using B
buffer pages instead of 1.)
R S
Join Result
Hash table for block of R (B-2 pages)
. . .
. . .
Input buffer for S
Output buffer
32Examples of Block Nested Loops
- Cost Scan of outer outer blocks scan of
inner - outer blocks ? of pages of outer / blocksize?
- i.e. Cost M N ?M/(B-2) ?
- With Reserves (R) as outer, and 100 pages of R
- Cost of scanning R is 1000 I/Os a total of 10
blocks. - Per block of R, we scan Sailors (S) 10500
I/Os. - If space for just 90 pages of R, we would scan S
12 times. - With 100-page block of Sailors as outer
- Cost of scanning S is 500 I/Os a total of 5
blocks. - Per block of S, we scan Reserves 51000 I/Os.
33Index Nested Loops Join
foreach tuple r in R do foreach tuple s in S
where ri sj do add ltr, sgt to result
- If there is an index on the join column of one
relation (say S), can make it the inner and
exploit the index. - Cost M ( (MpR) cost of finding matching S
tuples) - For each R tuple, cost of probing S index is
about 1.2 for hash index, 2-4 for B tree. Cost
of then finding S tuples depends on clustering. - Clustered index 1 I/O (typical), unclustered
upto 1 I/O per matching S tuple. - Effective if number of rows of S that match
tuples in R is small and index is clustered
34Examples of Index Nested Loops
- Hash-index (unclustered) on sid of Sailors (as
inner) - Scan Reserves 1000 page I/Os, 1001000 tuples.
- For each Reserves tuple 1.2 I/Os to get data
entry in index, plus 1 I/O to get (the exactly
one) matching Sailors tuple. Total 220,000
I/Os. - Hash-index (unclustered) on sid of Reserves (as
inner) - Scan Sailors 500 page I/Os, 80500 tuples.
- For each Sailors tuple 1.2 I/Os to find index
page with data entries (1.2 40000 48,500
I/Os), plus cost of retrieving matching Reserves
tuples. Assuming uniform distribution, 2.5
reservations per sailor (100,000 / 40,000), thus
cost of retrieving them is 2.540,000
I/Os.Total cost is 48,500 100,000 148,500
I/Os (still better than simple nested loops)
35Sort-Merge Join R AB S
sort R on A sort S on B while !eof(R) and
!eof(S) do scan r and s concurrently until
tr.Ats.B if (tr.Ats.Bc) output
?Ac(r)??Bc (s)
?Ac(r)
r
?
s
?Bc (s)
36Sort-Merge Join
- Cost of sorting assuming B buffers
2 M log B-1 M 2 N log B-1 N - Cost of merging depends on whether ?Ac(R) and
?Bc (S) can be fit in buffers - If yes, merge step takes M N
- In no, then each ?Ac(r)??Bc (s) has to be
computed using block-nested join. -
37Example of Sort-Merge Join
- Both Reserves and Sailors can be sorted in 2
passes. - With 100 buffer pages Sorting R 22 1000
sorting S 22 500 Thus total join cost 7500.
38Hash-Join
- Partition both relations using hash fn h R
tuples in partition i will only match S tuples in
partition i.
- Read in a partition of R, hash it using h2 (ltgt
h!). Scan matching partition of S, search for
matches.
39Observations on Hash-Join
- partitions k ? B-1 (why?), and B-2 gt size of
largest partition to be held in memory. Assuming
uniformly sized partitions, and maximizing k, we
get - k B-1, and M/(B-1) lt B-2, i.e., B must be gt
- If we build an in-memory hash table to speed up
the matching of tuples, a little more memory is
needed. - If the hash function does not partition
uniformly, one or more R partitions may not fit
in memory. Can apply hash-join technique
recursively to do the join of this R-partition
with corresponding S-partition.
40Cost of Hash-Join
- In partitioning phase, readwrite both relns
2(MN). In matching phase, read both relns MN
I/Os. - Sort-Merge Join vs. Hash Join
- Given a minimum amount of memory, both have a
cost of 3(MN) I/Os. Hash Join superior on this
count if relation sizes differ greatly. Also,
Hash Join shown to be highly parallelizable. - Sort-Merge less sensitive to data skew result is
sorted.
41General Join Conditions
- Nested loop and block nested loops can be used
regardless of the join condition. - Equalities over several attributes (e.g.,
R.sidS.sid AND R.rnameS.sname) - For Index NL, build index on ltsid, snamegt (if S
is inner) or use existing indexes on sid or
sname. - For Sort-Merge and Hash Join, sort/partition on
combination of the two join columns. - Inequality conditions (e.g., R.rname lt S.sname)
- For Index NL, need (clustered!) B tree index.
- Range probes on inner matches likely to be
much higher than for equality joins. - Hash Join, Sort Merge Join not applicable.
- Block NL quite likely to be the best join method
here.