XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation

1 / 41
About This Presentation
Title:

XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation

Description:

Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ronald Parr ... XML is becoming the standard of data exchange ... –

Number of Views:45
Avg rating:3.0/5.0
Slides: 42
Provided by: CSI115
Category:

less

Transcript and Presenter's Notes

Title: XPathLearner: An OnLine SelfTuning Markov Histogram for XML Path Selectivity Estimation


1
XPathLearner An On-Line Self-Tuning Markov
Histogram for XML Path Selectivity Estimation
  • Authors Lipyeow Lim, Min Wang, Sriram
    Padmanabhan, Jeffrey Scott Vitter, Ronald Parr
  • Speaker Ho Wai Shing

2
Contents
  • Introduction the problems in XML path
    selectivity estimation
  • XPathLearner the properties and the details
  • Experiment Results
  • Conclusions
  • Future Work

3
Introduction
  • XML is becoming the standard of data exchange
  • We need to query the structure and text data of
    XML documents
  • Selectivity is essential in optimizing evaluation
    plans

4
Introduction
  • Example

5
Introduction
  • Example

FOR b IN document("")//bookWHERE b/publisher
"Morgan Kaufmann" AND b/year "1998"RETURN
b/title The path expressions //book/publisher
"Morgan Kaufmann"//book/year
"1998"//book/title
6
Introduction
  • We need a structure to store some statistics of
    the data
  • Then calculate the estimated selectivity from
    these statistics
  • Problem estimate the selectivity of (simple,
    single-value, multi-value) path expressions with
    limited space

7
Related Work
  • Path Trees
  • Markov Tables
  • k-RO (in Lore)

8
Path Trees
  • Aggregate siblings with the same tag
  • tag names only (no data values)
  • e.g.,

9
Markov Table
  • selectivity of short paths up to length k is
    stored
  • selectivity of longer paths are estimated using a
    Markov model
  • e.g.,

10
k-RO
  • used in Lore systems
  • very similar to Markov table
  • data values are also objects
  • stored as a graph

11
Twigs
  • can answer "twig" queries
  • a structural query with a small branch
  • based on suffix tree (for simple paths)
    signatures in each node (for estimating
    branching)

12
Problems Faced
  • Offline
  • need to scan the whole repository beforehand to
    gather statistics
  • unfeasible if the data is remote and is extremely
    large
  • Can solve SPEs only or it's too large
  • Ignore data values

13
Problems Faced
  • Not Adaptive to query workload
  • much space wasted in infrequently asked paths
  • No Quick Update
  • needs periodic rescan of repository

14
Objective
  • XPathLearner
  • uses Markov based approach,
  • uses an online algorithm,
  • is adaptive to workload,
  • can answer simple paths, single-value paths
    (//A/B'3') and multi-value paths
    (//A'2'/B'3').
  • considers data values,
  • can be easily updated

15
XPathLearner
16
Architecture
17
A More Detailed Example
18
What to Store?
  • Markov table (1st order in the discussion)

19
What to Store?
  • may be large if there are many data values
  • solution only "tag-tag", "tag", and top-k value
    entries are stored exactly, other entries are
    stored within buckets
  • default is 1

20
What is Actually Stored?
  • Compressed 1st order Markov table (or, Markov
    histogram)

assumption v1-v4 starts with 'a', v5-v8 starts
with 'b', k 1
21
How to Retrieve Selectivity?
  • Use this formula
  • ? selectivity
  • t1, t2, ..., tn tags
  • t1t2...tn path with these tags
  • N total number of data items

22
How to Retrieve Selectivity?
  • Use this formula (it's what we calculate)
  • ? selectivity
  • t1, t2, ..., tn tags
  • t1t2...tn path with these tags
  • f(p) frequency of the path p

23
How to Retrieve Selectivity?
  • Use this formula (if it's multi-valued)
  • ? selectivity
  • t1, t2, ..., tn tags
  • t1t2...tn path with these tags
  • f(t,v) frequency of the value v in tag t

24
Retrieval Example
  • for path //B/C/D, estimated selectivity
  • for path //B/C/Dv3, estimated selectivity

25
How to Update?
  • get the query feedback, e.g., (BCD, 5)
  • update the histogram entries that contained in
    the query so that the future estimation could be
    more accurate
  • e.g., update B, C, D, BC, BD so that the
    estimation is nearer to 5 than before.
  • two update approaches the Heavy-tail Rule, the
    Delta Rule

26
Heavy Tail Rule
  • put more correction towards the end (tail) of the
    path
  • equation
  • fk() refers to the frequency before update
  • fk1() refers to the frequency after update
  • suggestion wi 2i

27
Heavy Tail Rule
  • updating those one-'tag' entries
  • safeguards the terms that were set by exact query
    feedback

28
Heavy Tail Rule
  • A reminder to what is stored

29
Heavy Tail Rule
  • Example query feedback (ACD, 6)
  • by the table, estimation f(AC) / f(C) x f(CD)
    3 / 7 x 6 ? 3

30
Heavy Tail Rule
  • updates
  • new estimation 4 / 8 x 8 4

31
Delta Rule
  • first proposed by Rumelhart et al.
  • basic idea
  • where

32
Experiments
33
Experiments
  • Data Set DBLP (other experiments are done but
    not included in the paper)
  • Metric average absolute error, average relative
    error

34
Experiments
35
Experiments
36
Experiments
37
Experiments
38
Experiments
39
Conclusions
  • XPathLearner is a new method for estimating the
    selectivity of path expressions
  • It is online, based on query feedback and doesn't
    need database scan
  • use Markov histograms to store statistics

40
Future Work
  • change from fixed length Markov table to variable
    length Markov table
  • choose the paths to be stored more carefully or
    wisely
  • apply the update method to other areas, e.g.,
    graph based structures, to answer branching
    queries, etc

41
References
  • 1Lipyeow Lim, Min Wang, Sriram Padmanabhan,
    Jeffrey Scott Vitter, Ronald Parr, XPathLearner
    An On-Line Self-Tuning Markov Histogram for XML
    Path Selectivity Estimation, VLDB'02
Write a Comment
User Comments (0)
About PowerShow.com