Query Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Query Processing

Description:

Disjunctions. Grouping pointers when selection is on multiple attributes: ... union or intersection, depending on the condition (disjunction or conjunction. ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 32
Provided by: richard204
Learn more at: http://www.cs.iit.edu
Category:

less

Transcript and Presenter's Notes

Title: Query Processing


1
Query Processing
2
Steps in Query Processing
  • Validate and translate the query
  • Good syntax.
  • All referenced relations exist.
  • Translate the SQL to relational algebra.
  • Optimize
  • Make it run faster.
  • Evaluate

3
Translation Example
  • Possible SQL Query
  • SELECT balance
  • FROM account
  • WHERE balancelt2500
  • Possible Relational Algebra Query
  • ?balance?balancelt2500(account))

4
Tree Representation of Relational Algebra
  • ?balance?balancelt2500(account))

?balance
?balancelt2500
account
5
Making An Evaluation Plan
  • Annotate Query Tree with evaluation instructions
  • The query can now be executed by the query
    execution engine.

?balance
?balancelt2500
use index 1
account
6
Before Optimizing the Query
  • Must predict the cost of execution plans.
  • Measured by
  • CPU time,
  • Number of disk block reads,
  • Network communication (in distributed DBs),
  • where C(CPU) lt C(Disk) lt C(Network).
  • Major factor is buffer space.
  • Use statistics found in the catalog to help
    predict the work required to evaluate a query.

7
Disk Cost
  • Seek time rotational latency arm movement.
  • Scan time time to read the data.
  • Typically, seek time is orders of magnitude
    greater.
  • Disk cost is assumed to be highest, so it can be
    used to approximate total cost.

8
Reading Data, No Indices
  • Linear scan
  • Cost is a function of file size.
  • Binary search on ordering attribute
  • Cost is lg of the file size.
  • Requires table to be sorted.

9
Reading Data with Indices
  • Primary index index on sort key.
  • Can be dense or sparse.
  • Secondary index index on non-sort key.
  • Queries can be point queries or range queries.
  • Point queries return a single record.
  • Range queries return a sequence of consecutive
    records.

10
Point Queries
  • Point queries
  • Cost index cost block read cost.
  • Range queries (c1 lt key lt c2)
  • Primary index
  • Cost index cost scan of blocks
  • Secondary index
  • Cost blocks(index cost scan of block)

11
More on Range Queries
  • Range query on sort key (c1 lt key)
  • c1 lt key Linear scan until you find key.
  • c1 gt key Use index to find key, then linear
    scan.
  • Range query using secondary index
  • Scan through index blocks. Requires accessing
    index for every record.

12
More Complex Selections
  • Conditions on multiple attributes
  • Negations
  • Disjunctions
  • Grouping pointers when selection is on multiple
    attributes
  • Find a set of solutions for each condition.
  • Either compute its union or intersection,
    depending on the condition (disjunction or
    conjunction.)

13
Sorting
  • Sorted relations are easier to scan.
  • The cost of sorting a relation before querying it
    can be less than querying an unsorted relation.
  • Two types of sorts
  • In memory
  • Out of memory (a.k.a., external sorting)

14
External Merge Sort
  • Use this when you cannot fit the relation in
    memory.
  • Assume there are M memory buffers.
  • Two phases
  • Create sorted runs.
  • Merge sorted runs.

15
External Merge Sort, Phase 1
  • Fill the M memory buffers with the next M blocks
    of the relation.
  • Sort the M blocks.
  • Write the sorted blocks to disk.

16
External Merge Sort, Phase 2
  • Assume there are at most M-1 runs.
  • Read the first block of each run into memory.
  • At each iteration, find the lowest record from
    the M-1 runs.
  • Place it into the memory buffer.
  • If any run is empty, read its next block.

17
External Merge Sort Notes
  • Can be extended to an arbitrarily large relation
    using multiple passes.
  • Cost is
  • Br(2 lg_(M-1) (Br/M) 1)
  • Br is the number of blocks for the relation.
  • B is the size of a memory buffer.

18
Nested Loop Join
  • No indices (for now).
  • Nested Loop
  • R join S
  • R is the outer relation.
  • S is the inner relation.
  • Read a block of R, then read each block of S and
    compare their contents using the join condition.
  • Write any matching tuples to another block.

19
Nested Loop Join Cost
  • If you read tuple by tuple, its
  • tuples in R blocks in S blocks in R.
  • Question Which should be in inner relation, and
    which should be the outer?

20
Block Nested Loop
  • Nested Loop Join, but block by block instead.
  • Cost for R join S, where R is outer, S is inner
  • blocks in R blocks in S blocks in S

21
Block Nested Loop Improvements
  • Sorted relations?
  • More memory?

22
Indexed Nested Loop Join
  • Assume we have an index on a join attribute of
    one of the relations, R or S.
  • Questions
  • Which should the index be on?
  • Or, if both have indices on them, which should be
    the outer one?

23
Indexed Nested Loop Join Cost
  • blocks in R rows in R Ls
  • Ls is the cost of looking up a record in S using
    the index.

24
More Joins
  • Merge join
  • Sort R and S, and then merge them.
  • Hash join
  • Hash R and S into buckets, and compare the bucket
    contents.

25
Evaluation
  • Materialization Build intermediate tables as
    the expression goes up the tree.
  • Here, one intermediate table is created for the
    select, and is the input of the project.

?balance
?balancelt2500
account
26
Materialization Cost
  • Cost of writing out intermediate results to disk.

27
Pipelining
  • Compute several operations simultaneously.
  • As soon as a tuple is created from one operation,
    send it to the next. Here, send selected tuples
    straight to the projection.

?balance
?balancelt2500
account
28
Implementation of Pipelining
  • Requires buffers for each operation.
  • Can be
  • Demand driven an operator must be asked to
    generate a tuple.
  • Producer driven an operator generates a tuple
    whether its asked for or not.

29
Query Optimization
30
Some Actions of Query Optimization
  • Reordering joins.
  • Changing the positions of projects and selects.
  • Changing the access structures used to read data.

31
Catalog Info
  • Number of tuples in r.
  • Number of blocks for r.
  • Size of tuple of r.
  • Blocking factor a r the number of r tuples that
    fit in a block.
  • The number of distinct values of each attribute
    of r.
Write a Comment
User Comments (0)
About PowerShow.com