Query Processing - PowerPoint PPT Presentation

About This Presentation

Title:

Query Processing

Description:

Disjunctions. Grouping pointers when selection is on multiple attributes: ... union or intersection, depending on the condition (disjunction or conjunction. ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 32

Provided by: richard204

Learn more at: http://www.cs.iit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Query Processing

1
Query Processing
2
Steps in Query Processing

Validate and translate the query
Good syntax.
All referenced relations exist.
Translate the SQL to relational algebra.
Optimize
Make it run faster.
Evaluate

3
Translation Example

Possible SQL Query
SELECT balance
FROM account
WHERE balancelt2500
Possible Relational Algebra Query
?balance?balancelt2500(account))

4
Tree Representation of Relational Algebra

?balance?balancelt2500(account))

?balance
?balancelt2500
account
5
Making An Evaluation Plan

Annotate Query Tree with evaluation instructions
The query can now be executed by the query
execution engine.

?balance
?balancelt2500
use index 1
account
6
Before Optimizing the Query

Must predict the cost of execution plans.
Measured by
CPU time,
Number of disk block reads,
Network communication (in distributed DBs),
where C(CPU) lt C(Disk) lt C(Network).
Major factor is buffer space.
Use statistics found in the catalog to help
predict the work required to evaluate a query.

7
Disk Cost

Seek time rotational latency arm movement.
Scan time time to read the data.
Typically, seek time is orders of magnitude
greater.
Disk cost is assumed to be highest, so it can be
used to approximate total cost.

8
Reading Data, No Indices

Linear scan
Cost is a function of file size.
Binary search on ordering attribute
Cost is lg of the file size.
Requires table to be sorted.

9
Reading Data with Indices

Primary index index on sort key.
Can be dense or sparse.
Secondary index index on non-sort key.
Queries can be point queries or range queries.
Point queries return a single record.
Range queries return a sequence of consecutive
records.

10
Point Queries

Point queries
Cost index cost block read cost.
Range queries (c1 lt key lt c2)
Primary index
Cost index cost scan of blocks
Secondary index
Cost blocks(index cost scan of block)

11
More on Range Queries

Range query on sort key (c1 lt key)
c1 lt key Linear scan until you find key.
c1 gt key Use index to find key, then linear
scan.
Range query using secondary index
Scan through index blocks. Requires accessing
index for every record.

12
More Complex Selections

Conditions on multiple attributes
Negations
Disjunctions
Grouping pointers when selection is on multiple
attributes
Find a set of solutions for each condition.
Either compute its union or intersection,
depending on the condition (disjunction or
conjunction.)

13
Sorting

Sorted relations are easier to scan.
The cost of sorting a relation before querying it
can be less than querying an unsorted relation.
Two types of sorts
In memory
Out of memory (a.k.a., external sorting)

14
External Merge Sort

Use this when you cannot fit the relation in
memory.
Assume there are M memory buffers.
Two phases
Create sorted runs.
Merge sorted runs.

15
External Merge Sort, Phase 1

Fill the M memory buffers with the next M blocks
of the relation.
Sort the M blocks.
Write the sorted blocks to disk.

16
External Merge Sort, Phase 2

Assume there are at most M-1 runs.
Read the first block of each run into memory.
At each iteration, find the lowest record from
the M-1 runs.
Place it into the memory buffer.
If any run is empty, read its next block.

17
External Merge Sort Notes

Can be extended to an arbitrarily large relation
using multiple passes.
Cost is
Br(2 lg_(M-1) (Br/M) 1)
Br is the number of blocks for the relation.
B is the size of a memory buffer.

18
Nested Loop Join

No indices (for now).
Nested Loop
R join S
R is the outer relation.
S is the inner relation.
Read a block of R, then read each block of S and
compare their contents using the join condition.
Write any matching tuples to another block.

19
Nested Loop Join Cost

If you read tuple by tuple, its
tuples in R blocks in S blocks in R.
Question Which should be in inner relation, and
which should be the outer?

20
Block Nested Loop

Nested Loop Join, but block by block instead.
Cost for R join S, where R is outer, S is inner
blocks in R blocks in S blocks in S

21
Block Nested Loop Improvements

Sorted relations?
More memory?

22
Indexed Nested Loop Join

Assume we have an index on a join attribute of
one of the relations, R or S.
Questions
Which should the index be on?
Or, if both have indices on them, which should be
the outer one?

23
Indexed Nested Loop Join Cost

blocks in R rows in R Ls
Ls is the cost of looking up a record in S using
the index.

24
More Joins

Merge join
Sort R and S, and then merge them.
Hash join
Hash R and S into buckets, and compare the bucket
contents.

25
Evaluation

Materialization Build intermediate tables as
the expression goes up the tree.
Here, one intermediate table is created for the
select, and is the input of the project.

?balance
?balancelt2500
account
26
Materialization Cost

Cost of writing out intermediate results to disk.

27
Pipelining

Compute several operations simultaneously.
As soon as a tuple is created from one operation,
send it to the next. Here, send selected tuples
straight to the projection.

?balance
?balancelt2500
account
28
Implementation of Pipelining