Approximate XML Query Answers - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Approximate XML Query Answers

Description:

Compress synopsis by merging nodes with similar sub-structures ... XML data always increase incrementally, we need to construct the synopsis model incrementally ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 28
Provided by: coe134
Category:

less

Transcript and Presenter's Notes

Title: Approximate XML Query Answers


1
  • Approximate XML Query Answers

Authors N. polyzotis, M. Garofalakis, Y.
Ioannidis
Presenter Hongyu Guo
2
Outline of this talk
  • Motivation
  • TreeSketch Approach
  • Experimental Results
  • Contributions and Limitations

3
Outline
  • Motivation
  • TreeSketch Approach
  • Experimental Results
  • Contributions and Limitations

4
Motivations
--Need fast feedback
  • XML de-facto standard for data exchange
  • Need to explore large XML data sets and get fast
    feedback from complex XML queries
  • Conflict between fast on-line response and
    query execution cost

5
XML Query Challenges
  • Involve complex traversals of the XML data
    hierarchy
  • Complex queries over massive tree-structured
    data--very expensive
  • Approaches Optimize the query or optimize the
    data structure
  • No need for accurate results, we can instead
    return approximate query answers

6
Approximate Query answers
R
Query
.
R
  • Obtain an approximation to the true result
  • Currently employed in relational systems
    successfully
  • Use approximate result to get timely feedback

7
Outline
--A technique being used to return fast,
approximate results
  • Motivation
  • TreeSketch Approach
  • Experimental Results
  • Contributions and Limitations

8
Data and Query Model
--Some background, XML document
a author n name b book p paper y year
k keyword t title
9
Data and Query Process
--Twig Query, Query Tree, and Nested Result Tree
10
Basic Query Scenario
Approximate Nesting Tree
True Nesting Tree
  • Key idea is to return fast, accurate feedback

11
Approximate Query Answers
-- Two key problems
  • How to construct concise XML synopses, which
    capture the statistical traits of the true data
  • How to produce approximate query answers over the
    synopsis efficiently

12
TreeSketch Construction
--Construction Algorithm
  • Step 1
  • Given an XML tree T, build a graph synopsis each
    node represents a set of same tag elements, large
    tree
  • Step2
  • Compress synopsis by merging nodes with similar
    sub-structures (i.e. clustering of the XML
    elements)
  • Step 3
  • Repeat Step 2 until the predefined space budget
    constraint is met
  • Step 4
  • Return the TreeSketch Synopsis


Space Budget
Perfect
13
More Discussions
--of the construction procedure
  • Graph synopsis construction
  • Use node to represent a set of same tag elements
  • Query can be retrieved with zero-error
  • The size can become very large-it can easily be
    in the order of the original document size
  • TreeSketch synopsis construction
  • Compress the synopsis by merging nodes
  • Bottom-up merging clustering algorithm
  • Key technique to compress ? Clustering
  • Based on structure
  • Model accuracy depends on quality of clustering
  • Tight clusters ? Accurate synopsis, but large
    model
  • Loose clusters ? Less accuracy, but small model

14
Construction Example
--Count same tag elements
XML Document
(Graph Synopsis)
  • Synopsis node ? Set of elements of the same tag
  • Synopsis edge ? Document edge(s)

15
Construction Example
--Calculate number of children per element
  • Calculate the number of children for each edge
  • Count r, p mean children in p per element in r

16
Merging Nodes
--Less space budget
TreeSkech synopsis
R(1)
1
P(1)
More Concise Synopsis
2
S(2)
2
F(4)
1
0.5
C(4)
E(2)
17
Compute Approximate Answers
--more like the traditional way
  • Travel down the tree
  • Match a pattern in the structure and return a
    sub-tree
  • TreeSketch Fast response
  • Concise synopsis
  • Keep statistical information
  • Node number of same tag elements
  • Edge number of children per element

18
Compute Approximate Answers
--Example
TreeSketch
Query
Approximate Nesting Tree
R
q0
//section
q1
.//caption
.//equation
q2
q3
Approximate results with structure 1) Take
advantage of the concise structure 2) and the
statistical data
19
Outline
  • Motivation
  • TreeSketch Approach
  • Experimental Results
  • Contributions and Limitations

20
Experimental Setup
  • Focus on
  • the quality of the approximate answers generated
  • the efficiency of the construction process
  • Data Set
  • Data Sets XMark, DBLP, IMDB, SwissProt
  • Workload 1000 random twig queries

21
Evaluation Methods
  • Error ? Distance between R and R
  • Popular metric Tree-edit distance
  • Min-cost sequence of operations that transform R
    to R
  • Argument not capture the structure similarity
  • New Evaluation metrics ESD (Element Simulation
    Distance)
  • Calculate the number of children for each edge in
    the tree to capture the complete structure of the
    tree
  • model how well the structure of two trees match
    from each other
  • degree of simulation between two trees
  • Average ESD for evaluation

22
Experimental Results
--Approximate answers, compared with TwigXsketches
23
Experimental Results
--Relative Errors
lt 5 i.e. 95 accuracy
24
Outline
-Strengths and Weaknesses
  • Motivation
  • TreeSketch Approach
  • Experimental Results
  • Contributions and Limitations

25
TreeSketch Approach
-In this paper
  • Propose an effective XML-summarization mechanism
  • Captures the complete tree structure of large XML
    data
  • Experimental results produce fast and accurate
    approximate query answers
  • Author claim The first work to address the
    timely problem of producing approximate
    tree-structured answers for complex XML queries
  • Comparison with the related work 2 options
  • Either compute the exact answer to a path query
    expensive
  • Or use an approach such as twig-XSketch, which
    does not capture the complete tree structure of
    the underlying XML database

26
Limitations
-Nice research, Next steps for further
investigation
  • Difficult to optimize some pre-defined
    parameters, such as the space budget
  • which directly related to the accuracy of the
    approximate query answers
  • too large ? affect the efficiency, too small ?
    quality of the answers depends on the query,
    data set, and the computing resources
  • Expecting incremental model construction process
  • XML data always increase incrementally, we need
    to construct the synopsis model incrementally
  • More experiments or some real applications are
    needed to justify the scalability of this
    technique

27
Thank You / Merci
Write a Comment
User Comments (0)
About PowerShow.com