Constructing Optimal Wavelet Synopses - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Constructing Optimal Wavelet Synopses

Description:

Title: Fast Approximate Wavelet Tracking on Streams Author: dimitris Last modified by: dimitris Created Date: 3/19/2006 2:46:03 PM Document presentation format – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 19
Provided by: dimitris
Category:

less

Transcript and Presenter's Notes

Title: Constructing Optimal Wavelet Synopses


1
Constructing Optimal Wavelet Synopses
  • Dimitris Sacharidis
  • dsachar_at_dblab.ntua.gr
  • Timos Sellis
  • timos_at_dblab.ntua.gr

2
outline
  • introduction
  • background
  • wavelet basics
  • example
  • wavelet synopses
  • example
  • error metrics
  • optimal synopses
  • interesting issues
  • data streams
  • models
  • streaming wavelet synopses
  • epilogue

3
introduction
  • analyzing massive multi-dimensional datasets
  • complex aggregate queries over large parts of the
    data
  • exploratory nature
  • promptness over accuracy, but with guarantees
  • resort in approximate query processing over
    precomputed synopses (e.g., histograms, samples,
    wavelets)
  • numerous data management applications require to
    continuously generate, process and analyze data
    on-line
  • the data streaming paradigm
  • summarize in real time, using small space and in
    one pass
  • provide approximate query answers with quality
    guarantees
  • provide useful data summarization
  • need to measure inaccuracy, application dependent

4
outline
  • introduction
  • background
  • wavelet basics
  • example
  • wavelet synopses
  • example
  • error metrics
  • optimal synopses
  • interesting issues
  • data streams
  • models
  • streaming wavelet synopses
  • epilogue

5
wavelets basics
  • wavelet decomposition is a mathematical tool for
    the hierarchical decomposition of functions
  • applications in signal/image processing
  • used extensively as a data reduction tool in db
    scenarios
  • selectivity estimation for large aggregate
    queries
  • fast approximate query answers
  • general purpose streaming synopsis
  • features
  • efficient performs in linear time and space (vs.
    histograms N2))
  • high compression ratio, small-B property
  • generalizes to multiple dimensions

6
example
assume a data vector d of 8 values
iteratively perform pair-wise averaging and semi
differencing
every node contributes positively to the leaves
in its left subtree andnegatively to the leaves
in its right subtree
averages are not needed
wavelet tree (a.k.a. error tree)
7
outline
  • introduction
  • background
  • wavelet basics
  • example
  • wavelet synopses
  • example
  • error metrics
  • optimal synopses
  • interesting issues
  • data streams
  • models
  • streaming wavelet synopses
  • epilogue

8
wavelet synopses
  • any set of B coefficients constitutes a B-term
    wavelet synopsis
  • stored as ltindex,valuegt pairs
  • implicitly all non-stored coefficients are set to
    zero
  • introduces reconstruction error per point estimate

e d-d
9
measuring accuracy
  • use some norm to aggregate individual errors
  • L2 norm Sei2 is the sum squared error (sse)
  • sse 224
  • L8 norm max ei is the maximum absolute error
  • max-abs-error 10
  • generalized to any weighted Lp norm Swieip
  • e.g. max-rel-error max (1/di)ei 10/4 250

vector of point errors e
vector of data values d
10
optimal synopses
  • a B-term wavelet synopsis can be optimized for
    any error metric
  • sse optimal synopses are straightforward
  • wavelet transformation is orthonormal (after
    normalization) ? by Parsevals theorem L2 norm is
    preserved
  • choose the highest in absolute (normalized) value
    coefficients
  • other (weighted or non) Lp norm optimal synopses
    require superlinear (quadratic) time in N
  • dynamic programming over the wavelet tree

11
interesting issues
  • I/O efficiency issues when dealing with massive
    multi-dimensional datasets M. Jahangiri, D.
    Sacharidis, C. Shahabi 05
  • during transformation try to minimize I/Os
  • efficient maintenance as new data are appended
    (requires more than just some updating)
  • how about optimizing for workloads of range-sum
    queries?
  • no known results (without using the prefix-sum
    array)
  • ranges overlap arbitrarily ? no easy dynamic
    programming formulation exists

12
outline
  • introduction
  • background
  • wavelet basics
  • example
  • wavelet synopses
  • example
  • error metrics
  • optimal synopses
  • interesting issues
  • data streams
  • models
  • streaming wavelet synopses
  • epilogue

13
working over data streams
  • main challenges when data are streaming
  • stream items are only seen once
  • require small working space
  • process stream items quickly
  • provide an answer quickly with quality guarantees

two models depending on how a data vector a is
rendered
turnstile model stream elements are updates of
type (i,u) which implies ai ? ai u and,
further, do not appear ordered in i
time series model stream elements are vector
values of type (i,ai) and appear ordered in i
(e.g., time)
14
streaming wavelet synopses
  • time series model
  • at most only logN coefficients are affected
  • a large number of coefficients has finalized
    value
  • can perform bottom-up dynamic programming (space
    required is prohibitive)
  • greedy techniques should be deployed instead
  • turnstile model
  • even optimizing for the sse is hard G. Cormode,
    M. Garofalakis, D. Sacharidis 06
  • other error metrics have not been studied

15
outline
  • introduction
  • background
  • wavelet basics
  • example
  • wavelet synopses
  • example
  • error metrics
  • optimal synopses
  • interesting issues
  • data streams
  • models
  • streaming wavelet synopses
  • epilogue

16
epilogue
  • wavelet synopses are a highly successful data
    summarization technique
  • yet, several problems remain open
  • optimize for range query workloads
  • greedy (time-series) streaming algorithms
  • other metrics for general (turnstile) streaming
    data

17
thank you!
http//www.dblab.ntua.gr/
18
unrestricted wavelet synopses
  • the retained coefficients can assume any value,
    not restricted to their decomposed value (even
    harder optimization problem!)
  • quick example optimize for max-abs-error, d
    2, 10, 12, 8 and B1
  • restricted synopsis keep the overall average 8 ?
    m.a.e. 6
  • unrestricted synopsis keep the overall average
    but change its value to 7 ? m.a.e. 5
Write a Comment
User Comments (0)
About PowerShow.com