XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation presentation

About This Presentation

Title:

XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation

Description:

Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ronald Parr ... XML is becoming the standard of data exchange ... –

Number of Views:45

Avg rating:3.0/5.0

Slides: 42

Provided by: CSI115

Category:

more less

Transcript and Presenter's Notes

Title: XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation

1
XPathLearner An On-Line Self-Tuning Markov
Histogram for XML Path Selectivity Estimation

Authors Lipyeow Lim, Min Wang, Sriram
Padmanabhan, Jeffrey Scott Vitter, Ronald Parr
Speaker Ho Wai Shing

2
Contents

Introduction the problems in XML path
selectivity estimation
XPathLearner the properties and the details
Experiment Results
Conclusions
Future Work

3
Introduction

XML is becoming the standard of data exchange
We need to query the structure and text data of
XML documents
Selectivity is essential in optimizing evaluation
plans

4
Introduction

Example

5
Introduction

Example

FOR b IN document("")//bookWHERE b/publisher
"Morgan Kaufmann" AND b/year "1998"RETURN
b/title The path expressions //book/publisher
"Morgan Kaufmann"//book/year
"1998"//book/title
6
Introduction

We need a structure to store some statistics of
the data
Then calculate the estimated selectivity from
these statistics
Problem estimate the selectivity of (simple,
single-value, multi-value) path expressions with
limited space

7
Related Work

Path Trees
Markov Tables
k-RO (in Lore)

8
Path Trees

Aggregate siblings with the same tag
tag names only (no data values)
e.g.,

9
Markov Table

selectivity of short paths up to length k is
stored
selectivity of longer paths are estimated using a
Markov model
e.g.,

10
k-RO

used in Lore systems
very similar to Markov table
data values are also objects
stored as a graph

11
Twigs

can answer "twig" queries
a structural query with a small branch
based on suffix tree (for simple paths)
signatures in each node (for estimating
branching)

12
Problems Faced

Offline
need to scan the whole repository beforehand to
gather statistics
unfeasible if the data is remote and is extremely
large
Can solve SPEs only or it's too large
Ignore data values

13
Problems Faced

Not Adaptive to query workload
much space wasted in infrequently asked paths
No Quick Update
needs periodic rescan of repository

14
Objective

XPathLearner
uses Markov based approach,
uses an online algorithm,
is adaptive to workload,
can answer simple paths, single-value paths
(//A/B'3') and multi-value paths
(//A'2'/B'3').
considers data values,
can be easily updated

15
XPathLearner
16
Architecture
17
A More Detailed Example
18
What to Store?

Markov table (1st order in the discussion)

19
What to Store?

may be large if there are many data values
solution only "tag-tag", "tag", and top-k value
entries are stored exactly, other entries are
stored within buckets
default is 1

20
What is Actually Stored?

Compressed 1st order Markov table (or, Markov
histogram)

assumption v1-v4 starts with 'a', v5-v8 starts
with 'b', k 1
21
How to Retrieve Selectivity?

Use this formula
? selectivity
t1, t2, ..., tn tags
t1t2...tn path with these tags
N total number of data items

22
How to Retrieve Selectivity?

Use this formula (it's what we calculate)
? selectivity
t1, t2, ..., tn tags
t1t2...tn path with these tags
f(p) frequency of the path p

23
How to Retrieve Selectivity?

Use this formula (if it's multi-valued)
? selectivity
t1, t2, ..., tn tags
t1t2...tn path with these tags
f(t,v) frequency of the value v in tag t

24
Retrieval Example

for path //B/C/D, estimated selectivity
for path //B/C/Dv3, estimated selectivity

25
How to Update?

get the query feedback, e.g., (BCD, 5)
update the histogram entries that contained in
the query so that the future estimation could be
more accurate
e.g., update B, C, D, BC, BD so that the
estimation is nearer to 5 than before.
two update approaches the Heavy-tail Rule, the
Delta Rule

26
Heavy Tail Rule

put more correction towards the end (tail) of the
path
equation
fk() refers to the frequency before update
fk1() refers to the frequency after update
suggestion wi 2i

27
Heavy Tail Rule

updating those one-'tag' entries
safeguards the terms that were set by exact query
feedback

28
Heavy Tail Rule

A reminder to what is stored

29
Heavy Tail Rule

Example query feedback (ACD, 6)
by the table, estimation f(AC) / f(C) x f(CD)
3 / 7 x 6 ? 3

30
Heavy Tail Rule

updates
new estimation 4 / 8 x 8 4

31
Delta Rule

first proposed by Rumelhart et al.
basic idea
where

32
Experiments
33
Experiments

Data Set DBLP (other experiments are done but
not included in the paper)
Metric average absolute error, average relative
error

34
Experiments
35
Experiments
36
Experiments
37
Experiments
38
Experiments
39
Conclusions

XPathLearner is a new method for estimating the
selectivity of path expressions
It is online, based on query feedback and doesn't
need database scan
use Markov histograms to store statistics

40
Future Work

change from fixed length Markov table to variable
length Markov table
choose the paths to be stored more carefully or
wisely
apply the update method to other areas, e.g.,
graph based structures, to answer branching
queries, etc

41
References

1Lipyeow Lim, Min Wang, Sriram Padmanabhan,
Jeffrey Scott Vitter, Ronald Parr, XPathLearner
An On-Line Self-Tuning Markov Histogram for XML
Path Selectivity Estimation, VLDB'02

Write a Comment

User Comments (0)

About PowerShow.com