Constructing Optimal Wavelet Synopses - PowerPoint PPT Presentation

About This Presentation

Title:

Constructing Optimal Wavelet Synopses

Description:

Title: Fast Approximate Wavelet Tracking on Streams Author: dimitris Last modified by: dimitris Created Date: 3/19/2006 2:46:03 PM Document presentation format – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 19

Provided by: dimitris

Category:

more less

Transcript and Presenter's Notes

Title: Constructing Optimal Wavelet Synopses

1
Constructing Optimal Wavelet Synopses

Dimitris Sacharidis
dsachar_at_dblab.ntua.gr
Timos Sellis
timos_at_dblab.ntua.gr

2
outline

introduction
background
wavelet basics
example
wavelet synopses
example
error metrics
optimal synopses
interesting issues
data streams
models
streaming wavelet synopses
epilogue

3
introduction

analyzing massive multi-dimensional datasets
complex aggregate queries over large parts of the
data
exploratory nature
promptness over accuracy, but with guarantees
resort in approximate query processing over
precomputed synopses (e.g., histograms, samples,
wavelets)
numerous data management applications require to
continuously generate, process and analyze data
on-line
the data streaming paradigm
summarize in real time, using small space and in
one pass
provide approximate query answers with quality
guarantees
provide useful data summarization
need to measure inaccuracy, application dependent

4
outline

introduction
background
wavelet basics
example
wavelet synopses
example
error metrics
optimal synopses
interesting issues
data streams
models
streaming wavelet synopses
epilogue

5
wavelets basics

wavelet decomposition is a mathematical tool for
the hierarchical decomposition of functions
applications in signal/image processing
used extensively as a data reduction tool in db
scenarios
selectivity estimation for large aggregate
queries
fast approximate query answers
general purpose streaming synopsis
features
efficient performs in linear time and space (vs.
histograms N2))
high compression ratio, small-B property
generalizes to multiple dimensions

6
example
assume a data vector d of 8 values
iteratively perform pair-wise averaging and semi
differencing
every node contributes positively to the leaves
in its left subtree andnegatively to the leaves
in its right subtree
averages are not needed
wavelet tree (a.k.a. error tree)
7
outline

introduction
background
wavelet basics
example
wavelet synopses
example
error metrics
optimal synopses
interesting issues
data streams
models
streaming wavelet synopses
epilogue

8
wavelet synopses

any set of B coefficients constitutes a B-term
wavelet synopsis
stored as ltindex,valuegt pairs
implicitly all non-stored coefficients are set to
zero
introduces reconstruction error per point estimate

e d-d
9
measuring accuracy

use some norm to aggregate individual errors
L2 norm Sei2 is the sum squared error (sse)
sse 224
L8 norm max ei is the maximum absolute error
max-abs-error 10
generalized to any weighted Lp norm Swieip
e.g. max-rel-error max (1/di)ei 10/4 250

vector of point errors e
vector of data values d
10
optimal synopses

a B-term wavelet synopsis can be optimized for
any error metric
sse optimal synopses are straightforward
wavelet transformation is orthonormal (after
normalization) ? by Parsevals theorem L2 norm is
preserved
choose the highest in absolute (normalized) value
coefficients
other (weighted or non) Lp norm optimal synopses
require superlinear (quadratic) time in N
dynamic programming over the wavelet tree

11
interesting issues

I/O efficiency issues when dealing with massive
multi-dimensional datasets M. Jahangiri, D.
Sacharidis, C. Shahabi 05
during transformation try to minimize I/Os
efficient maintenance as new data are appended
(requires more than just some updating)
how about optimizing for workloads of range-sum
queries?
no known results (without using the prefix-sum
array)
ranges overlap arbitrarily ? no easy dynamic
programming formulation exists

12
outline

introduction
background
wavelet basics
example
wavelet synopses
example
error metrics
optimal synopses
interesting issues
data streams
models
streaming wavelet synopses
epilogue

13
working over data streams

main challenges when data are streaming
stream items are only seen once
require small working space
process stream items quickly
provide an answer quickly with quality guarantees

two models depending on how a data vector a is
rendered
turnstile model stream elements are updates of
type (i,u) which implies ai ? ai u and,
further, do not appear ordered in i
time series model stream elements are vector
values of type (i,ai) and appear ordered in i
(e.g., time)
14
streaming wavelet synopses

time series model
at most only logN coefficients are affected
a large number of coefficients has finalized
value
can perform bottom-up dynamic programming (space
required is prohibitive)
greedy techniques should be deployed instead

turnstile model
even optimizing for the sse is hard G. Cormode,
M. Garofalakis, D. Sacharidis 06
other error metrics have not been studied

15
outline

introduction
background
wavelet basics
example
wavelet synopses
example
error metrics
optimal synopses
interesting issues
data streams
models
streaming wavelet synopses
epilogue

16
epilogue

wavelet synopses are a highly successful data
summarization technique
yet, several problems remain open
optimize for range query workloads
greedy (time-series) streaming algorithms
other metrics for general (turnstile) streaming
data

17
thank you!
http//www.dblab.ntua.gr/
18
unrestricted wavelet synopses

the retained coefficients can assume any value,
not restricted to their decomposed value (even
harder optimization problem!)
quick example optimize for max-abs-error, d
2, 10, 12, 8 and B1
restricted synopsis keep the overall average 8 ?
m.a.e. 6
unrestricted synopsis keep the overall average
but change its value to 7 ? m.a.e. 5

Write a Comment

User Comments (0)