Fast Incremental Maintenance of Approximate histograms : - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Incremental Maintenance of Approximate histograms :

Description:

Viswanath Poosala (Bell Laboratories) Presented by: Amrita Tamrakar. CSE 6392. 09-feb-2006 ... http://www.cs.uwaterloo.ca/~ashraf/pubs/vldb04autostats.pdf ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 21
Provided by: amr67
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Fast Incremental Maintenance of Approximate histograms :


1
Fast Incremental Maintenance of Approximate
histograms
  • Phillip B. Gibbons (Intel Research Pittsburgh)
  • Yossi Matias (Tel Aviv University)
  • Viswanath Poosala (Bell Laboratories)
  • Presented by
  • Amrita Tamrakar
  • CSE 6392
  • 09-feb-2006

2
Introduction
  • What is a histogram?
  • Issues in Histogram maintenance
  • Novel concept of Backing sample
  • Types of approximate histograms
  • Incremental maintenance of approx histograms
  • Challenges and solutions
  • Conclusion

3
What is a histogram?
Frequency
  • maintained to approximate the distribution of
    data in the attributes
  • constructed by partitioning the data into
    mutually disjoint subsets
  • Frequency as y axis and the data intervals as x
    axis
  • Oracle, DB2, SQLserver, Sybase, Informix

Data value interval
http//www.shodor.org/interactivate/activities/his
togram/
4
History of Histogram
Equi-width histogram
Compressed histogram
Learn more on Histogram
5
Issues on Histogram Maintenance
  • precomputed on underlying data
  • Stored in main memory , less overhead
  • What about the maintenance ??
  • Database is modified
  • Query is changed(?)
  • Outdated histogram
  • Does periodic updates solve the problem?
  • Recomputing from the scratch
  • Poor estimation during the in-between period
  • Whats the solution ?

6
The solution to outdated histograms
  • Maintain Approximate histogram in presence of
    database updates
  • Split and merge technique for quick adjustment
  • Backing sample stored in secondary mm

7
Backing Sample
  • Only row id and the necessary attributes
  • At any time, backing sample random sample
  • No entire table scan
  • Records in Consecutive disk blocks

Histogram
Backing sample (100KB)
2 KB
Main memory
Relation (20GB)
8
How to maintain a backing sample?
  • During insertions
  • Reservoir sampling technique
  • Obtain sample of data from a single scan without
    a priori knowledge of no of tuples.
  • Length of random skip chosen such that each tuple
    is likely to be in the reservoir.

1
First n
2
MaintainBackingSample
Skip random no of record and replace
n
n1
9
How to maintain a backing sample?
  • During modification
  • Modify if tuple present in sample
  • During Deletion
  • Remove from the sample
  • If sample size decrease below lower bound L, then
    recompute from disk.

10
Maintain approximate Histograms Different
Classes of Histograms
  • Equidepth histograms
  • No. of tuples in each bucket is same
  • Contiguous ranges of attribute values

Frequency of occurrence
Data value
11
Different Classes of Histograms
  • Compressed (V,F) histogram
  • N highest frequencies stored in singleton buckets
  • For other values, use equi-depth histogram
  • Both histograms needs to store for each bucket
  • The largest value in the bucket B.maxval
  • The Count B.count
  • Approximate histograms are calculated from the
    random sample of the Relation
  • How to maintain these histograms?

12
Fast Incremental maintenance of approximate
equi-depth histograms
  • During Insertion
  • Maintain a threshold (T) upper bound
  • If no of tuples lt T, insertion will increment the
    bucket count.
  • Else recompute the histogram
  • Split and merge algorithm
  • Reduce the no. of recomputations from the sample
  • When bucket count reaches T, instead of
    recomputing split the bucket in half.
  • But maintain the number of bucket as fixed by
    merging two buckets whose total countltT

13
Split n merge algorithm
Insert threshold
14
To handle modify and delete
  • Deletion can lower the bucket count
  • Maintain a Tl as lower threshold
  • Merge if below threshold
  • Split bucket with largest count

Delete threshold
15
Fast Incremental maintenance of approximate
compressed histograms
  • Values with high frequencies can span more than
    one bucket replace by single bucket with single
    count singleton buckets
  • Construct compressed histogram on the sample and
    scale it by N/k factor.
  • During insertions
  • If the count doesnt exceed threshold, add to the
    bucket, else update bucket boundaries

16
Challenges to maintain compressed histograms
  • New values may lead to data skew, which may lead
    to new singleton buckets
  • Values may not belong to singleton buckets if
    tuples increase in equi-depth buckets
  • Number of equi-depth buckets needs adjustment
  • No. of tuples in equi-depth buckets needs
    adjustment

17
Solutions to the challenges
  • Large number of same value will cause an
    equi-depth bucket to split but the adjacent
    boundaries will have same value, hint create
    singleton bucket for that value
  • allow singleton buckets with small counts to be
    merged back into equi-depth buckets.
  • Split and merge technique to control imbalance
    between equi-depth buckets and their tuples
    without recomputation

18
To handle deletion and modification
  • Deletion can decrease number of tuples in a
    bucket relative to another bucket , making a
    singleton bucket
  • can drop a bucket count to the lower threshold
    TL.
  • What to do?
  • Merge the pair with smallest combined count and
    split the bucket with largest count
  • Else recompute from backing sample

19
Conclusion
  • Backing sample
  • Incremental maintenance of equi-depth and
    compressed histograms
  • Split and merge technique to reduce access to
    backing sample

20
Use of histograms in Commercial database
Write a Comment
User Comments (0)
About PowerShow.com