Title: Chapter 13 Query Processing
1Chapter 13 Query Processing
- Melissa Jamili
- CS 157B
- November 11, 2004
2Outline
- Overview
- Measures of query cost
- Selection operation
- Basic algorithms
- Selections using indices
- Selections involving comparisons
- Implementation of complex selections
- Sorting
3Overview
- Query processing the range of activities
involved in extracting data from a database. - Includes translation of queries,
query-optimizing transformations, evaluation of
queries.
4Overview (cont.)
- Steps involved in processing a query
- Parsing and translation
- Optimization
- Evaluation
5Step 1. Parsing and translation
- System checks the syntax of the query.
- Creates a parse-tree representation of the query.
- Translates the query into a relational-algebra
expression.
6Step 2. Optimization
- Each relational-algebra operation can be executed
by one of several different algorithms. - Ex. Every tuple can be searched linearly or a B
tree can be used to index the tuples. - A query optimizer must know the cost of each
operation.
7Step 3. Evaluation
- A relational-algebra expression and evaluation
primitive are needed. - evaluation primitive - relational-algebra
expression annotated with instructions specifying
how to evaluate each operation.
8Step 3. Evaluation (cont.)
- Query execution plan or query-evaluation plan
sequence of primitive operations that can be used
to evaluate the expression.
? balance (s balance lt 2500 (account))
9Step 3. Evaluation (cont.)
- Then the query-execution engine takes a
query-evaluation plan, executes that plan, and
returns the answers to the query.
10Steps in query processing
11Measures of Query Cost
- Measured in terms of different resources,
including disk accesses and CPU execution time - Measured in response time for a query-evaluation
plan (clock time to execute the plan)
12Measures of Query Cost (cont.)
- Disk accesses, number of block transfers from
disk, are usually the most important cost - To calculate add these numbers
- of seek operations performed
- of blocks read
- of blocks written
- After multiplying them by the average seek time,
average transfer time, and average transfer time,
respectively.
13Selection Operation
- File scan is the lowest-level operator to access
data. - File scans are search algorithms that locate and
retrieve records that fulfill a selection
condition. - Allows an entire relation to be read where the
relation is stored in a single, dedicated file.
14Basic Algorithms
- Two basic scan algorithms to implement the
selection operation - A1 (linear search).
- The system scans each file block and tests all
records to see whether they satisfy the selection
condition. - System terminates if the required record is
found, without looking at the other records. - A2 (binary search).
- System performs the binary search on the blocks
of the file. - File is ordered on an attribute, and selection
condition is an equality comparison.
15Selections Using Indices
- Index structures referred to a access paths,
since they provide a path through which data can
be located and accessed. - Primary index allows records to be read in an
order that corresponds to the physical order in
the file. - Secondary index any index that is not a primary
index.
16Selections Using Indices (cont.)
- A3 (primary index, equality on key).
- Use the index to retrieve a single record that
satisfies the corresponding equality condition. - A4 (primary index, equality on nonkey).
- Same as A3, but multiple records may need to be
fetched.
17Selections Using Indices (cont.)
- A5 (secondary index, equality).
- Retrieve a single record if the equality
condition is on a key. - Retrieve multiple records if the indexing field
is not a key.
18Selections Involving Comparisons
- Consider the form sA v (r).
- Can be implemented by linear or binary search or
by using indices - A6 (primary index, comparison).
- For A v, look for first tuple that has the
value of A v, return all the tuples starting
from the tuple up to the end of the file. - For A lt v, file scan from the beginning up to
(but not including) the first tuple with
attribute A v.
19Selections Involving Comparisons (cont.)
- A7 (secondary index, comparison).
- The lowest-level blocks are scanned
- From the smallest value up to v for lt and .
- From v up to the maximum value for gt and .
20Implementation of Complex Selections
- Previously only considered simple selection
conditions with an equality or comparison
operation. - Now consider more complex selection predicates.
21Implementation of Complex Selections (cont.)
- Conjunction s?1 n ?2 n n ?n (r)
- Disjunction s?1 ? ?2 ? ? ?n (r)
- Negation s-? (r)
22Implementation of Complex Selections (cont.)
- A8 (conjunctive selection using one index).
- Determine if there is an access path for an
attribute in one of the simple conditions. - If yes, algorithms A2 through A7 can be used.
- On all records retrieved test if they satisfy the
remaining simple conditions.
23Implementation of Complex Selections (cont.)
- A9 (conjunctive selection using composite index
(multiple attributes). - Search index directly if selection specifies an
equality condition on 2 or move attributes and a
composite index exists. - Type of index determines if A3, A4, or A5 will be
used.
24Implementation of Complex Selections (cont.)
- A10 (conjunction selection by intersection of
identifiers). - Requires indices with record pointers.
- Scan each index for pointers that satisfy an
individual condition. - Take intersection of all retrieved pointers to
get set of pointers that satisfy the conjunctive
condition. - Then pointers used to retrieve actual records.
- If indices not available on all conditions, then
algorithm tests retrieved records against
remaining conditions.
25Implementation of Complex Selections (cont.)
- A11 (disjunctive selection by union of
identifiers). - Each index is scanned for pointers to tuples that
satisfy the individual condition. - Union of all the retrieved pointers gives the set
of pointers to all tuples that satisfy the
disjunctive condition. - Pointers then used to retrieve actual records.
26Sorting
- Focus on external sorting where relations that
are bigger than memory. - Most common is external merge-sort algorithm.
27Sorting (cont.)
- N-way merge
- A number of runs are created and sorted.
- Runs are merged.
28Conclusion
- Query processing - translation, transformation,
and evaluation. - Query cost measured by number of disk accesses.
- Selection operation algorithms
- Sorting N-way merge.