Quick Review of Apr 17 material - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Quick Review of Apr 17 material

Description:

Bitmap Indices ... the attribute that this bitmap tracks (e.g., 'Moonroof': bit ... Queries are answered by combining several bitmaps using and, or, not. Today ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 16
Provided by: david227
Learn more at: http://www.cs.umd.edu
Category:
Tags: apr | bitmap | material | quick | review

less

Transcript and Presenter's Notes

Title: Quick Review of Apr 17 material


1
Quick Review of Apr 17 material
  • Multiple-Key Access
  • There are good and bad ways to run queries on
    multiple single keys
  • Indices on Multiple Attributes
  • Combining two keys into a single concatenated
    attribute
  • Grid files
  • crude array with one dimension and linear scale
    for each attribute
  • more than one cell may point to a given bucket of
    values
  • array grid may be dynamically resized during use
  • Other alternatives are spatial databases R-tree,
    quad-trees, k-d tree
  • Bitmap Indices
  • linear array of bits bit j is set if tuple j has
    the attribute that this bitmap tracks (e.g.,
    Moonroof bit j is 1 if record j is a car with
    moonroof)
  • Queries are answered by combining several bitmaps
    using and, or, not

2
Today
  • HW 4 due Thursday April 24 (next class)
  • Questions 12.11, 12.12, 12.13, 12.16
  • No HW for next week
  • Today
  • Start Chapter 13 Query Processing

3
Query Processing
  • SQL is good for humans, but not as an internal
    (machine) representation of how to calculate a
    result
  • Processing an SQL (or other) query requires these
    steps
  • parsing and translation
  • turning the query into a useful internal
    representation in the extended relational algebra
  • optimization
  • manipulating the relational algebra query into
    the most efficient form (one that gets results
    the fastest)
  • evaluation
  • actually computing the results of the query

4
Query Processing Diagram
5
Query Processing Steps
  • 1. parsing and translation
  • details of parsing are covered in other places
    (texts and courses on compilers). Weve already
    covered SQL and relational algebra translating
    between the two should be relatively familiar
    ground
  • 2. optimization
  • This is the meat of chapter 13. How to figure
    out which plan, among many, is the best way to
    execute a query
  • 3. evaluation
  • actually computing the results of the query is
    mostly mechanical (doesnt require much
    cleverness) once a good plan is in place.

6
Query Processing Example
  • Initial query select balance
  • from account
  • where balancelt2500
  • Two different relational algebra expressions
    could represent this query
  • sel balancelt2500(Pro balance(account))
  • Pro balance( sel balancelt2500(account))
  • which choice is better? It depends upon metadata
    (data about the data) and what indices are
    available for use on these operations.

7
Query Processing Metadata
  • Cost parameters (some are easy to maintain some
    are very hard -- this is statistical info
    maintained in the systems catalog)
  • n(r ) number of tuples in relation r
  • b(r ) number of disk blocks containing tuples of
    relation r
  • s(r ) average size of a tuple of relation r
  • f(r ) blocking factor of r how many tuples fit
    in a disk block
  • V(A,r) number of distinct values of attribute A
    in r. (V(A,r)n(r ) if A is a candidate key)
  • SC(A,r) average selectivity cardinality factor
    for attribute A of r. Equivalent to n(r
    )/V(A,r). (1 if A is a key)
  • min(A,r) minimum value of attribute A in r
  • max(A,r) maximum value of attribute A in r

8
Query Processing Metadata (2)
  • Cost parameters are used in two important
    computations
  • I/O cost of an operation
  • the size of the result
  • In the following examination well find it useful
    to differentiate three important operations
  • Selection (search) for equality (R.A1c)
  • Selection (search) for inequality (R.A1gtc) (range
    queries)
  • Projection on attribute A1

9
Selection for Equality (no indices)
  • Selection (search) for equality (R.A1c)
  • cost (sequential search on a sorted relation)
  • b(r )/2 average unsuccessful
  • b(r )/2 SC(A1,r) -1 average successful
  • cost (binary search on a sorted relation)
  • log b(r ) average unsuccessful
  • log b(r ) SC(A1,r) -1 average successful
  • size of the result n(select(R.A1c))
  • SC(A1,r) n(r )/V(A1,r)

10
Selection for Inequality (no indices)
  • Selection (search) for inequality (R.A1gtc)
  • cost (file unsorted)
  • b(r )
  • cost (file sorted on A1)
  • b(r )/2 b(r )/2 (if we assume that half the
    tuples qualify)
  • b(r ) in general
  • (regardless of the number of tuples that
    qualify. Why?)
  • size of the result
  • depends upon the query unpredictable

11
Projection on A1
  • Projection on attribute A1
  • cost
  • b(r )
  • size of the result n(Pro(R,A1))
  • V(A1,r)

12
Selection (Indexed Scan) for Equality
  • Primary Index on key
  • cost (height1) unsuccessful
  • cost (height1) 1 successful
  • Primary (clustering) Index on non-key
  • cost (height1) SC(A1,r)/f(r )
  • all tuples with the same value are clustered
  • Secondary Index
  • cost (height1) SC(A1,r)
  • tuples with the same value are scattered

13
Selection (Indexed Scan) for Inequality
  • Primary Index on key search for first value and
    then pick tuples gt value
  • cost (height1) 1 size of the result (in disk
    pages)
  • height2 n(r ) (max(A,r)-c)/(max(A,r)-min(A
    ,r))/f(r )
  • Primary (clustering) Index on non-key
  • cost as above (all tuples with the same value are
    clustered)
  • Secondary (non-clustering) Index
  • cost (height1) B-treeLeaves/2 size of
    result (in tuples)
  • height1 B-treeLeaves/2 n(r )
    (max(A,r)-c)/(max(A,r)-min(A,r))

14
Complex Selections
  • Conjunction (select where theta1 and theta2)
  • (s1 of tuples satisfying selection
    condition theta1)
  • combined SC (s1/n(r )) (s2/n(r )) s1s2/n(r
    )2
  • assuming independence of predicates
  • Disjunction (select where theta1 or theta2)
  • combined SC 1 - (1 - s1/n(r )) (1 - s2/n(r ))
  • s1/n(r )) s2/n(r ) - s1s2/n(r )2
  • Negation (select where not theta1)
  • n(! Theta1) n(r ) - n(Theta1)

15
Complex Selections with Indices
  • GOAL apply the most restrictive condition first
    and combined use of multiple indices to reduce
    the intermediate results as early as possible
  • Why? No index will be available on intermediate
    results!
  • Conjunctive selection using one index B
  • select using B and then apply remaining
    predicates on intermediate results
  • Conjunctive selection using a composite key index
    (R.A1, R.A2)
  • create a composite key or range from the query
    values and search directly (range search on the
    first attribute (MSB of the composite key) only)
  • Conjunctive selection using two indices B1 and
    B2
  • search each separately and intersect the tuple
    identifiers (TIDs)
  • Disjunctive selection using two indices B1 and
    B2
  • search each separately and union the tuple
    identifiers (TIDs)
Write a Comment
User Comments (0)
About PowerShow.com