Approximate XML Query Answers - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Approximate XML Query Answers

Description:

Compress synopsis by merging nodes with similar sub-structures ... XML data always increase incrementally, we need to construct the synopsis model incrementally ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 28

Provided by: coe134

Category:

more less

Transcript and Presenter's Notes

Title: Approximate XML Query Answers

1

Approximate XML Query Answers

Authors N. polyzotis, M. Garofalakis, Y.
Ioannidis
Presenter Hongyu Guo
2
Outline of this talk

Motivation
TreeSketch Approach
Experimental Results
Contributions and Limitations

3
Outline

Motivation
TreeSketch Approach
Experimental Results
Contributions and Limitations

4
Motivations
--Need fast feedback

XML de-facto standard for data exchange
Need to explore large XML data sets and get fast
feedback from complex XML queries
Conflict between fast on-line response and
query execution cost

5
XML Query Challenges

Involve complex traversals of the XML data
hierarchy
Complex queries over massive tree-structured
data--very expensive
Approaches Optimize the query or optimize the
data structure
No need for accurate results, we can instead
return approximate query answers

6
Approximate Query answers
R
Query
.
R

Obtain an approximation to the true result
Currently employed in relational systems
successfully
Use approximate result to get timely feedback

7
Outline
--A technique being used to return fast,
approximate results

Motivation
TreeSketch Approach
Experimental Results
Contributions and Limitations

8
Data and Query Model
--Some background, XML document
a author n name b book p paper y year
k keyword t title
9
Data and Query Process
--Twig Query, Query Tree, and Nested Result Tree
10
Basic Query Scenario
Approximate Nesting Tree
True Nesting Tree

Key idea is to return fast, accurate feedback

11
Approximate Query Answers
-- Two key problems

How to construct concise XML synopses, which
capture the statistical traits of the true data
How to produce approximate query answers over the
synopsis efficiently

12
TreeSketch Construction
--Construction Algorithm

Step 1
Given an XML tree T, build a graph synopsis each
node represents a set of same tag elements, large
tree
Step2
Compress synopsis by merging nodes with similar
sub-structures (i.e. clustering of the XML
elements)
Step 3
Repeat Step 2 until the predefined space budget
constraint is met
Step 4
Return the TreeSketch Synopsis

Space Budget
Perfect
13
More Discussions
--of the construction procedure

Graph synopsis construction
Use node to represent a set of same tag elements
Query can be retrieved with zero-error
The size can become very large-it can easily be
in the order of the original document size
TreeSketch synopsis construction
Compress the synopsis by merging nodes
Bottom-up merging clustering algorithm
Key technique to compress ? Clustering
Based on structure
Model accuracy depends on quality of clustering
Tight clusters ? Accurate synopsis, but large
model
Loose clusters ? Less accuracy, but small model

14
Construction Example
--Count same tag elements
XML Document
(Graph Synopsis)

Synopsis node ? Set of elements of the same tag
Synopsis edge ? Document edge(s)

15
Construction Example
--Calculate number of children per element

Calculate the number of children for each edge
Count r, p mean children in p per element in r

16
Merging Nodes
--Less space budget
TreeSkech synopsis
R(1)
1
P(1)
More Concise Synopsis
2
S(2)
2
F(4)
1
0.5
C(4)
E(2)
17
Compute Approximate Answers
--more like the traditional way

Travel down the tree
Match a pattern in the structure and return a
sub-tree
TreeSketch Fast response
Concise synopsis
Keep statistical information
Node number of same tag elements
Edge number of children per element

18
Compute Approximate Answers
--Example
TreeSketch
Query
Approximate Nesting Tree
R
q0
//section
q1
.//caption
.//equation
q2
q3
Approximate results with structure 1) Take
advantage of the concise structure 2) and the
statistical data
19
Outline

Motivation
TreeSketch Approach
Experimental Results
Contributions and Limitations

20
Experimental Setup

Focus on
the quality of the approximate answers generated
the efficiency of the construction process
Data Set
Data Sets XMark, DBLP, IMDB, SwissProt
Workload 1000 random twig queries

21
Evaluation Methods

Error ? Distance between R and R
Popular metric Tree-edit distance
Min-cost sequence of operations that transform R
to R
Argument not capture the structure similarity
New Evaluation metrics ESD (Element Simulation
Distance)
Calculate the number of children for each edge in
the tree to capture the complete structure of the
tree
model how well the structure of two trees match
from each other
degree of simulation between two trees
Average ESD for evaluation

22
Experimental Results
--Approximate answers, compared with TwigXsketches
23
Experimental Results
--Relative Errors
lt 5 i.e. 95 accuracy
24
Outline
-Strengths and Weaknesses

Motivation
TreeSketch Approach
Experimental Results
Contributions and Limitations

25
TreeSketch Approach
-In this paper

Propose an effective XML-summarization mechanism
Captures the complete tree structure of large XML
data
Experimental results produce fast and accurate
approximate query answers
Author claim The first work to address the
timely problem of producing approximate
tree-structured answers for complex XML queries
Comparison with the related work 2 options
Either compute the exact answer to a path query
expensive
Or use an approach such as twig-XSketch, which
does not capture the complete tree structure of
the underlying XML database

26
Limitations
-Nice research, Next steps for further
investigation

Difficult to optimize some pre-defined
parameters, such as the space budget
which directly related to the accuracy of the
approximate query answers
too large ? affect the efficiency, too small ?
quality of the answers depends on the query,
data set, and the computing resources
Expecting incremental model construction process
XML data always increase incrementally, we need
to construct the synopsis model incrementally
More experiments or some real applications are
needed to justify the scalability of this
technique

27
Thank You / Merci

Write a Comment

User Comments (0)