Query Optimizer - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Query Optimizer

Description:

E.g. Selectivity of values - 1/domain - used to determine number of tuples of each value ... Fraction of rows with specified values(s) for specific ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 30
Provided by: susanv5
Category:

less

Transcript and Presenter's Notes

Title: Query Optimizer


1
  • Query Optimizer
  • (Chapter 9.0 - 9.6)

2
Optimization
  • Minimizes uses of resources by choosing best set
    of alternative query access plans
  • considers  I/O cost, CPU cost
  • gathers statistics - may become out of date (DB2
    RUNSTATS command)
  • E.g. Selectivity of values - 1/domain - used to
    determine number of tuples of each value

3
Filter Factor FF (selectivity)
  • Fraction of rows with specified values(s) for
    specific attribute that result from the
    predicate restriction
  • FF(c) records satisfying condition c
  • total of records in relation
  • Estimate attribute with i distinct values as
  • Assume R is rows in table R
  • ( R/i) / R  1/col_cardinality
  • e.g. (10,000/2)/10,000 1/2               

4
Filter Factor FF
  • FF tells how many tuples satisfy predicate -
    hopefully only need to access those tuples
    index
  • Statistical assumptions - uniform distribution
    of column values, independent join distribution
    of values from any 2 columns

5
Assumptions
  • Attribute values independent
  • Conjunctive select (independent)                
    C1 and C2                 FF(C1) FF(C2)   
  • e.g.  1/2 (gender) 1/4 (class)
  • 1/8 freshman in CS are female

6
Information for Optimization
  • SYSCOLUMNS     col_name, table_name, of values,
    High, Low
  • Statistics on columns that deviate strongly from
    the uniform assumption
  • Cluster Ratio how well clustering property holds
    for rows with respect to a given index          
    if 100 clustered with updates, becomes
    less clustered              if clustering ratio
    80 or more, use sequential prefetch

7
Examples of FF
  • if SQL statement specified
  • col const,
  • DB2 assumes FF is 1/col_cardinality
  • col between const1 and const2
  • DB2 assumes FF(const2 - const1)/(High - Low)
  • For some predicates, FF not predictable by simple
    formula

8
Explain Plan
  • You can have access to query plan with
  •   EXPLAIN PLAN statement for SQL_query in ORACLE
  • gives access type (index) col

9
Using Indexes
  • System must decide if to use index
  • What if more than one index, which one?
  • What if composite index?

10
Plans using Indexes
  • Can use an index if index matches select
    condition in where clause
  • A matching index scan - only have to access a
    limited number of contiguous leaf entries to
    access data
  • Predicate screening index entries to eliminate
    RIDs
  • Non-matching index scan use index to identify
    RIDs
  • Index-only retrieval dont have to access data,
    RIDs
  • Multiple index retrieval use gt1 index to
    identify RIDs

11
Indexes Matching index scan
  • A matching index scan is a single-step query plan
  • Only have to access contiguous leaf nodes
  • Example Assume a table T1 with multiple indexes
    on columns C1, C2 and C3
  • Single where clause and (one) index matches
  • Select from T1
  • where C110
  • search B-tree to leaf level for leftmost entry
    having specified values useful for , between

12
Index Scan
  • If multiple where clauses and all ''
  • Select from T1
  • where C110 and C25 and C31
  •   a)  if there is a composite index and a select
  • condition matches all index columns
  • only have to read contiguous leaf pages
  •  FF FF(P1) FF(P2) ...
  • b)   if there is a separate index for each clause
  •       must choose one of the indexes

13
Index Scan Predicate screening
  • 3. If all select conditions match composite index
    columns and some selects are a range
  • Select from T1 where C110 and C2
  • between 5 and 50 and C3 like A
  • - Access contiguous leaf pages, but not all
    results on contiguous leaf pages
  • - Must examine index entries to determine if in
    the result
  • called predicate screening

14
Predicate screening
  • discard RIDs based on values (for index)
  • will access fewer tuples because RIDs used
    instead to eliminate potential tuples

15
Index Scan
  • 4. If select conditions match some index columns
    of composite index
  •           Select from T1
  • where C110 and C230 and C620
  • - a matching scan can be used if at least one
    of the columns in select is first column of index
  • must eliminate tuples with what indexes you can,
    then examine the tuples

16
Rules for predicate matching
  • Decide how many attributes to match in a
    composite index after the first column, so can
    read in a small contiguous range of leaf entries
    in B-tree to get RIDs
  • Match first column of composite index then
  • look at index columns from left to right
  • Match ends when no predicate found
  • If range (lt, like, between) for a column, match
    terminates thereafter
  •   If a range, easier to scan all entries for
    range - treat rest of entries as screening
    predicates

17
Non-matching index scan
  • Not always used by DBMSs
  • attributes in where clause don't include initial
    attribute of index
  •           Select from T1
  • where C230 and C315
  • search leaf entries of index and compare values
    for entries
  • must read in all leaf pages to find C2, C3 values
          e.g. 50 index pages vs 500,000 data pages

18
Index only retrieval
  • elements retrieved in select clause are
    attributes of compose index
  • don't need to access rows (actual data)
        Select C1, C3 from T1
  • where C15 and C3 between 2 and 5
  •        Select count() from T1

19
Multiple Index Access
  • If conjunctive conditions in where clause (and),
    can use gt1 index
  • Extract RIDs from each index satisfying matching
    predicate
  • Intersect lists of RIDs (and them) from each
    index
  • Final list - satisfies all predicates indexed
  • If disjunctive conditions (or)          Union
    the two lists of RIDs

20
Some Query optimizer rules for using RID-lists
(then use list prefetch)
  •    1.  predicted active resulting RIDs must not
    be  gt 50 of RID pool
  • 2.  Limit to any single RID list the size of the
    RID memory pool (16M RIDs)
  • 3.  RID list cannot be generated by screening
    predicates

21
Rules for multiple indexes
  • Optimizer determines diminishing returns using
    multiple index access
  • 1.  List indexes with matching predicates in
    where clause
  • 2.  Place indexes in order by increasing filter
    factor
  • 3.  For successive indexes, extract RID list only
    if reduced cost for final row returned    
    e.g. no sense reading 100's of pages of a
    new index to get number of rows to only 1 tuple

22
List Prefetch for accessing rows with RID list
  • Assume once a list of RIDs is created, the system
    can order pages to minimize disk I/O
  • E.g. elevator algorithm for disk request
    scheduling

23
Problem Using RID lists with Multiple Indexes
  • Prospects Table 50M rows - 10 row per page
  • Indexes
  • zipcode 100,000 values (100 entries per page)
  • hobby 100 values (1000 entries per page
  • page 50 values (1000 entries per page
  • incomeclass 10 values (1000 entries per page)

24
Problem contd
  • Select name, stradr from prospects
  • where zipcode between 02159 and 02658
  • and age 40 and hobby chess and incomeclass
    10
  • FF in ascending order
  • FF(zipcode) 500/100,000 1/200
  • FF(hobby) 1/100
  • FF(age) 1/50
  • FF(incomeclass) 1/10

25
Problem contd
  • Rows in table is 5,000,000
  • Data rows read if use indexes
  • (1) 50,000,000/200 250,000
  • (1,2) 250,000/100 2500
  • (1,2,3) 2500/50 50
  • (1,2,3,4) 50/10 5
  • How much time will this take? Is it cost
    effective to use all of these indexes?
  • see textbook Pg. 579

26
Problem contd I/O costs
  • Cost
  • RIO is 1/80
  • Sequential Prefetch 1/800
  • List Prefetch 1/200
  • Note textbook assumes if read lt 3 pages use RIO

27
Problem contd
  • Table scan
  • 5,000,000/800 6250
  • Using index 1
  • index (500,000/200 3)/800 4
  • data 250,000/200 1250
  • Using indexes 12
  • index (50,000/100)/800 0.625
  • data 2500/200 12.5

28
Problem contd
  • Using indexes 1,2,3
  • index (50,000/50)/800 1.25
  • data 50/200 0.25
  • Using indexes 1,2,3,4
  • index (50,000/10)/800 6.25
  • data 5/200 0.025

29
Problem contd
Write a Comment
User Comments (0)
About PowerShow.com