Title: Efficient Processing of Ordered XML Twig Pattern
1Efficient Processing of Ordered XML Twig Pattern
- by Jiaheng Lu, Tok Wang Ling, Tian Yu, Changqing
Li, Wei Ni - Presented by Tian Yu
- 23, Aug 2005
2Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
3Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
4Introduction
- XML data representation rapidly increases
popularity - XML documents modeled as ordered trees.
- XML queries specify patterns of selection
predicates on multiple elements having some
structural relationships (parent-child,
ancestor-descendant)
5What is a Twig Pattern?
- A twig pattern is a small tree whose nodes are
tags, attributes or text values and edges are
either Parent-Child (P-C) edges or
Ancestor-Descendant (A-D) edges. - E.g. Query description Selects Figure elements
which are descendants of Paragraph elements which
in turn are children of Section elements having
child element Title - Twig pattern
Section
Paragraph
Title
Figure
6Motivation
- XML documents modeled as ordered trees, its
natural to have ordered queries. - Four ordered axes following-sibling,
preceding-sibling, following, preceding. - Example
- ordered query
- //book/title/following-siblingchapter
- unordered query
- //book/title/chapter
7Order axis
- Four axis following-sibling, preceding-sibling,
following, and preceding. - In the sample document Set the context node to
be f
a
Context node f Following of f i and
j Preceding of f b, c and e Following-sibling
of f i Preceding-sibling of f e
d
b
e
f
c
i
j
g
h
Sample XML document
Following-sibling of f following of f and share
the same parent with f Preceding-sibling of f
preceding of f and share the same parent with f
8Ordered Twig Pattern
- //chaptertitlerelated work/followingsection
- Intuitive meaning search for all the sections
that appear after (but are not descendents of)
chapter elements with the title related work in
the XML document. - The query node Book is ordered
9Ordered Twig Pattern
- //chaptertitlerelated work/followingsection
10Ordered Twig Pattern
- //chaptertitlerelated work/followingsection
- If the twig pattern is unordered
- section1, section2, and section3 are all matching
elements.
11Ordered Twig Pattern
- //chaptertitlerelated work/followingsection
But for ordered query, section1 and section2 are
not in the solution. How to know that in our
method?
12Motivation
- Naïve Method
- Use the existing algorithm to output the
intermediate path solutions for each individual
root-leaf query path - Merge path solutions so that the final
solutions are guaranteed to satisfy the order
predicates of the query. - Disadvantage of the naïve method
- Many intermediate results may not contribute
to final answers. - Our Solution efficient processing of ordered
XML twig patterns.
13Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
14XML Twig Pattern Matching
- An XML document is commonly modeled as a rooted,
ordered and tagged tree.
book
chapter
preface
chapter
.
Intro
section
section
paragraph
section
title
paragraph
title
paragraph
Data
XML
15Region Coding
- Node Label1 (startPos, endPos, LevelNum)
- E.g.
(1,21,1)
book
(2,4,2)
(13,20,2)
(5,12,2)
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data
- M.P. Consens and T.Milo. Optimizing queries on
files. In In Proceedings of ACM SIGMOD, 1994.
16Region Coding
- Given e1, e2 e1 is ancestor of e2 iff
e1.start lt e2.start and e1.end gt e2.end.
(1,21,1)
e1
book
(2,4,2)
(13,20,2)
(5,12,2)
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
e2
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data
- M.P. Consens and T.Milo. Optimizing queries on
files. In In Proceedings of ACM SIGMOD, 1994.
17Region Coding
- Given e1, e2 e1 is parent of e2 iff e1.start
lt e2.start and e1.end gt e2.end , and e1.level
1 e2.level
(1,21,1)
e1
book
(2,4,2)
(13,20,2)
(5,12,2)
e2
preface
chapter
chapter
(3,3,3)
(9,11,3)
Intro
(17,19,3)
(6,8,3)
(14,16,3)
section
title
section
title
(7,7,4)
(15,15,4)
(18,18,4)
(10,10,4)
Data
Data
- M.P. Consens and T.Milo. Optimizing queries on
files. In In Proceedings of ACM SIGMOD, 1994.
18Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
19Previous work TwigStack
- TwigStack2 a holistic approach
- Two-phase algorithm
- Phase 1 TwigJoin part of intermediate root-leaf
paths are outputted - Phase 2 Merge merge the intermediate paths to
get the final results
- 2. N. Bruno, D. Srivastava, and N. Koudas.
Holistic twig joins optimal xml pattern
matching. In In Proceedings of ACM SIGMOD, 2002.
20Sub-optimality of TwigStack
- TwigStack optimal when the query contains only
ancester-descendant relationship - If the query contains any parent-child
relationship, TwigStack may output some
intermediate path solutions that cannot
contribute to final results. - We call that TwigStack is sub-optimal for queries
with parent-child relationships.
21TwigStackList
- The main problem of TwigStack is to assume all
edges are ancestor-descendant relationship in the
first phase. So it is not efficient for queries
with parent-child relationships. - Improved method TwigStackList3 CIKM 2004
- There is an additional list structure for each
query node to cache elements that likely
participate in final solutions. - TwigStackList3 is an improvement algorithm for
TwigStack, since it considers parent-child
relationships in the first phase. - TwigStackList is optimal when there is no P-C
edge for branching nodes (a branch node is a node
with more than one descendant or child)
3. J. Lu, T. Chen, and T. W. Ling. Efficient
processing of xml twig patterns with parent child
edges a look-ahead approach. In CIKM, pages 533-
542, 2004.
22TwigStackList v.s. TwigStack
Twig Pattern
Root
An XML tree
section
s2
s1
s1
title
p2
t3
paragraph
t1
p1
t1
No Parent-child relationship for branching node
p3
t2
figure
f1
f2
- TwigStack output the it output the uesless path
solution - lt s1,t1gt, since it doesnt check for
parent-child relationsihp. - TwigStackList has no uesless output. lt s1,t1gt is
not in the output.
23Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
24Ordered Children Extension (OCE)
- Definition
- An element en (of Type n) has an OCE if
- 1) In the query Q, for all A-D children of n
(if any), n, there is an element en (with tag
n) that is a descendant of en , and en also
has an OCE and - 2) In the query Q, for all P-C children of n
(if any), n, there is an element e (with tag n)
in the path en to en such that e is the parent
of en, and en also has an OCE and - 3) For each child (or descendant) n of n, if
there is an node m that is the immediate
rightSibling of n, there are elements en and em
such that en is a child (or descendant) of
element en, en.end lt em.start, and both en and
emi have OCE.
The first two conditions are guaranteed in
twigStackList Our main focus is in the third
condition
25Ordered Children Extension (OCE)
- Definition
- Condition 3)
- For each child (or descendant) n of n, if
there is an node m that is the immediate
rightSibling of n, there are elements en and em
such that en is a child (or descendant) of
element en, en.end lt em.start, and both en and
emi have OCE. -
en
n
gt
m
n
em
En
XML document
Ordered XML Query
26Ordered Children Extension (OCE)
- In an Ordered XML query
- If node n is ordered node
- In order to find its OCE, all the three
previous conditions must be checked. - If node n is an unordered node
- In order to find its OCE, only the first
two conditions need to be checked. The last
condition does not apply.
27Ordered Children Extension Example 1
Document
Query
a1
a
gt
c1
e2
e1
b
d
c
b1
d1
28Ordered Children Extension Example 1
Document
Query
a1
a
gt
c1
e2
e1
b
d
c
b1
d1
a1 has an OCE
29Ordered Children Extension Example 1
Document
Query
a1
a
gt
c1
e2
e1
b
d
c
b1
d1
a1 has an OCE 1) a1 has descendants b1 and d1,
and child c1 (fulfill condition 1, 2 of OCE
definition) 2) b1 has a right sibling element c1
, and c1 has a right sibling element d1 (fulfill
condition 3 of OCE definition)
30Ordered Children Extension Example 2
Document
Query
a1
a
gt
c1
e1
b
d
c
b1
d1
31Ordered Children Extension Example 2
Document
Query
a1
a
gt
c1
e1
b
d
c
b1
d1
a1 doesnt have any OCE
32Ordered Children Extension Example 2
Document
Query
a1
a
gt
c1
e1
b
d
c
b1
d1
a1 doesnt have any OCE 1) a1 has descendants b1
and d1, and child c1 (fulfill condition 1, 2 of
OCE definition) 2) b1 has a right sibling node c1
(fulfill condition 3 of OCE definition) 3)
However, c1 only has descendant of d1. There is
no element with the labeld d that is a right
sibling of element c1 (doesnt satisfy condition
3 of OCE definition)
33Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
34Data structure
- Each node n in the twig query has Stream, List,
and Stack - Data Stream Tn
- we partition an XML document into streams
- All elements in a stream are of the same tag and
ordered by their start Position - The elements in each stream is read only once
from head to tail.
a1
Level 1
Ta
a1, a2, a3
a
gt
a3
b2
a2
2
b1 , b2
d1, d2, d3
d
c
b
Tb
Td
d3
d1
3
d2
b1
C1 , C2
Tc
4
c2
c1
Document
35Data structure
- Each node n in the twig query has Stream, List,
and Stack - List Ln
- The elements in lists help to check for P-C
relationship - Elements in each list Ln are strictly nested from
the first to the end, i.e. in the XML document,
each element is an ancestor or parent of the
following element.
La
a1, a2
a
gt
Lb
b1 ..
d
c
b
Ld
d1 ,d3
C1
Lc
36Data structure
- Each node n in the twig query has Stream, List,
and Stack - Stack Sn
- Stacks is used to store elements that have at
least one OCE - Elements in the stack are potential solutions of
the XML query. - When we insert an new element into a stack, the
top element of the stack is popped out if the top
of the stack doesnt have A-D relationship with
the new element.
Sa
a
gt
d
c
b
Sb
Sd
Sc
37A holistic matching algorithm OrderedTJ
- We propose a general algorithm, OrderedTJ, that
computes answers to an ordered query twig. - Our key focus is to check the ordered nodes in
the query and find elements which has at least
one OCE.
38Main function
- OrderedTJ Main function operates in two phases.
39Main function
- OrderedTJ Main function operates in two phases.
Important function
Phase 1
Phase 2
Phase 1 Parts of query root-leaf paths are
output. The ordering requirements in the ordered
query is checked. Phase 2 These solutions are
merged-joined to compute the answers to the whole
query.
40getNext(n)
- It gets the next stream to be processed and
advanced
Check Order
Check P-C
41An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Next Action
Title
t1, t2, t3
Partition an XML document into streams
related work
Related work
42An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
t1, t2, t3
Next Action
Show lists for nodes with P-C child
related work
Related work
43An example of OrderedTJ algorithm
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Show Stacks of every node in the query
related work
Related work
44An example of OrderedTJ algorithm
b1
Document
c1
t1 has no descendant related work
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
advance (Title)
related work
Related work
45An example of OrderedTJ algorithm
b1
Document
t2 has descendant related work
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Insert t2 into the list of Title
related work
Related work
46An example of OrderedTJ algorithm
b1
Document
C1 has no descendant title that has child
related work
c1
c2
c3
Book
Query
gt
t2
s2
s1
t1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance (Chapter)
related work
Related work
47An example of OrderedTJ algorithm
b1
Document
C2 has a descendant t2 that has child related
work
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Insert c2 into the list of chapter
related work
Related work
48An example of OrderedTJ algorithm
b1
Document
c1
s1 is not the following element of c2
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance(Section)
related work
Related work
49An example of OrderedTJ algorithm
b1
Document
c1
c3
c2
Book
Query
gt
s2 is not the following element of c2
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Advance(Section)
related work
Related work
50An example of OrderedTJ algorithm
b1 is has an OCE
b1
Document
c1
c2
c3
Book
Query
gt
t2
s2
t1
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push b1 into the stack of Book
related work
Related work
51An example of OrderedTJ algorithm
b1
b1
Document
c1
c2 is has an OCE
c2
c3
Book
Query
gt
t1
t2
s2
s1
s3
t3
c2
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push c2 into the stack of Chapter
related work
Related work
52An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
t2 is has an OCE
t1
t2
s2
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
Title
Book
b1
t2
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push t2 into the stack of Title
related work
Related work
53An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
rel.. is the leaf node
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
Title
Next Action
t1, t2, t3
Push r to into the stack of Related work
related work
Related work
54An example of OrderedTJ algorithm
b1
b1
Document
A path is found
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Introduction
Algorithm
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Next Action
t1, t2, t3
Output b1, c2, t2,r
related work
Related work
55An example of OrderedTJ algorithm
b1
b1
Document
s3 is a leaf node and follows element c2
c1
c2
c3
Book
Query
gt
c2
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Next Action
t1, t2, t3
Push s3 into stack
related work
Related work
56An example of OrderedTJ algorithm
b1
b1
Document
A path is found
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Next Action
Title
Output b1, s3
t1, t2, t3
related work
Related work
57An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
Previous Output Output b1, c2, t2,r
t1, t2, t3
related work
Related work
Output b1, s3
58An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Next Action
Title
t1, t2, t3
Join the output paths
related work
A match is found
Related work
59An example of OrderedTJ algorithm
b1
b1
Document
c1
c2
c3
Book
Query
gt
c2
s3
t2
s2
t1
s1
s3
t3
Chapter
Section
Introduction
Algorithm
Related work
t2
Title
Book
b1
Chapter
c1, c2, c3
Related work
Section
s1, s2, s3
r
Title
t1, t2, t3
related work
A match is found
Related work
60Optimality of OrderedTJ
- TwigStack doesnt consider P-C relationship,
therefore, it produce more intermediate result
than TwigStackList. - Therefore, we compare the optimality of our
OrderedTJ with TwigStackList. - Example we match ordered query1 in XML document
1 using the two algorithms TwigStackList, and
OrderedTJ.
a1
a
Query 1
Document 1
gt
c1
a2
b
c
b1
61Optimality of OrderedTJ
- TwigStackList can only solve ordered XML query
with naïve method. - Therefore, it convert query 1 to query 2, by
removing the ordered sign in the twig pattern.
a1
a
Query 1
Query 2
a
Document 1
gt
c1
a2
b
c
c
b
b1
62Optimality of OrderedTJ
- Sub-optimality of TwigStackList
- When there is a P-C relationship at the
branching node, there could be redundant
intermediate output. - In this example
- In the streams, the elements are read only
once from head to tail. - Therefore, when the TwigStackList process
element a1, c1, and b1. There is no way to decide
if there is an element b2 that is a child of a1
Therefore, the algorithm outputs useless solution
lta1,c1gt
a1
a
Query 2
Document
TwigStackList
b2
c1
a2
b
c
b1
63Optimality of OrderedTJ
- Optimality of OrderedTJt
- It allows the existence of parent-child
relationship in the first branching edge for the
ordered node. - In this example
- Therefore, when the OrderedTJ process
element a1, c1, and b1. Since there is no element
with tag name b before c1. It doesnt satisfy
condition 3 in the definition of OCE. c1 does not
contribute to any final answer
Therefore, the algorithm doesnt outputs useless
solution lta1,c1gt
a1
a
gt
Query 1
Document
OrderedTJ
c1
a2
b
c
b1
64Optimality of OrderedTJ
TwigStack Optimality
A-D only
TwigStack optimal for A-D only queries.
65Optimality of OrderedTJ
TwigStackList Optimality
A-D for branching node
A-D only
TwigStackList optimal for queries that only has
A-D edge for branching node. The other edges in
the query can be P-C edge.
66Optimality of OrderedTJ
OrderedTJ Optimality
P-C for 1-Branch of ordered node
A-D for branching node
A-D only
OrderedTJ It allows the existence of
parent-child relationship in the first branching
edge for the ordered nodes
67Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
68Experiments
- Algorithms for comparison
- straightforward -TwigStack (short STW)
- straightforward-TwigStackList (STWL)
- Our proposed OrderedTJ
- Benchmarks
- XMark Synthetic Data
- Size 115 M bytes factor1.0
- Treebank Real Data from Wall Street Journal
- Size 82M bytes nodes2.5 million
69Experiments
- Testing Queires
- Q1, Q2, Q3 for XMark Q4,Q5,Q6 for TreeBank)
- Evaluation metrics
- Number of intermediate path solutions
- Total running time
70Experiments Execution Time
OrderedTJ outputs less intermediate
result Therefore, it has less execution time
71Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Table 1. The number of intermediate path solutions
OrderedTJ has the smallest intermediate results
72Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Table 1. The number of intermediate path solutions
For all queries, OrderedTJ has the smallest
intermediate results.
73Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Query 1
gt
test
bold
keyword
Table 1. The number of intermediate path solutions
Only A-D edges, therefore, STW and STWL output
same intermediate result. However, OrderedTJ has
less intermediate result since it also considers
the ordering relationship.
74Experiments Intermediate result
Query 4
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
S
VP
gt
PP
IN
NP
VBN
Table 1. The number of intermediate path solutions
It has P-C edges for non-branching nodes.
Therefore, STWL output less intermediate result
than STW. OrderedTJ output even less intermediate
result since it also consider the ordering
relationship. OrderedTJ still has redundant
intermediate result comparing with the final
useful result. It is because there is P-C edges
on the second branch of ordered node PP
75Experiments Intermediate result
Query Dataset STW STWL OrderedTJ Useful solutions
Q1 XMark 71956 71956 44382 44382
Q2 XMark 65940 65940 10679 10679
Q3 XMark 71522 71522 23959 23959
Q4 TreeBank 2237 1502 381 302
Q5 TreeBank 92705 92705 83635 79941
Q6 TreeBank 10663 11 5 5
Query 6
S
gt
DT
PRP_DOLLAR_
Table 1. The number of intermediate path solutions
STWL output less intermediate result than STW,
since there is a P-C edge in the query. OrderedTJ
output no redundant intermediate result comparing
with the final useful result. It is because it
only has a P-C edge on the first branch of
ordered node PP OrderedTJ is optimal in this case
76Outline
- Introduction and motivation
- Background
- XML tree and twig pattern matching
- Previous two algorithms TwigStack and
TwigStackList - Our Ordered Twig Algorithms
- Ordered Children Extension (for short OCE)
- A generalized holistic matching algorithm
OrderedTJ - Experiments
- Conclusion
77Conclusions
- We developed a new algorithm orderedTJ to solve
the problem of Ordered Twig Pattern matching. - Our algorithm orderedTJ can identify a larger
query class to guarantee I/O optimality. - Experimental results showed the effectiveness,
scalability, and efficiency of our algorithm. - Future work implement more efficient indexing
method, e.g. B tree or R tree to skip XML
elements.
78Reference(1)
- 1 M.P. Consens and T.Milo. Optimizing queries
on files. In In Proceedings of ACM SIGMOD, 1994 - Node Label Regional encoding.
- 2 N. Bruno, D. Srivastava, and N. Koudas.
Holistic twig joins optimal XML pattern
matching. In SIGMOD Conference, pages 310 - 321,
2002 - Propose TwigStack algorithm
- 3 J. Lu, T. Chen, and T. W. Ling. Efficient
processing of xml twig patterns with parent child
edges a look-ahead approach. In CIKM, pages
533-542, 2004. - Propose TwigStackList algorithm
79Reference(2)
- 4 Y. Chen, S. B. Davidson, and Y. Zheng. BLAS
An efficient XPath processing system. In Proc. of
SIGMOD, pages 47-58, 2004. - Propose a new algorithm for XPath query
- 5 J. Lu, T. W. Ling. C.Y Chan and T. Chen, From
Region Encoding To Extended Dewey On Efficient
Processing of XML Twig Pattern Matching In VLDB
2005 - Propose a new twig pattern matching
algorithm - based on a proposed prefix labeling
scheme
80END