Title: Evaluation of Partial Path Queries on XML Data
1Evaluation of Partial Path Queries on XML Data
Stefanos Souldatos
2Evaluation of Partial Path Queries on XML Data
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
3Difficulties on querying XML Data
Creta
4Difficulties on querying XML Data
Search for hotel Name Xiaoying Wu Place Athens
Center, Heraklio Purpose Sightseeing ? struc
tural difference
Parthenon (438 BC)
Phaistos Disk (1700 BC)
Creta
5Difficulties on querying XML Data
Search for hotel Name Theodore Dalamagas Place
Islands Purpose Sea sports ? structural inco
nsistency
Windsurf
Jet ski
Creta
6Difficulties on querying XML Data
Search for hotel Name Dimitri Theodoratos Plac
e Heraklio Purpose HDMS Conference ? unknown
structure
HDMS 2008
Creta
7Difficulties on querying XML Data
Search for hotel Name Stefanos Souldatos Place
Any island Purpose Escape from PhD! ? multi
ple sources
Creta
?
theHotel.gr
1400 islands
hotels.gr
holidays.gr
8Difficulties on querying XML Data
Q1. Can we use XPath to express our queries?
Q2. Can we use existing techniques to evaluate
our queries?
Creta
9Can we Use XPath?
Path queries expressed in XPath
of structure
0
100
path queries
keyword search
10Can we Use XPath?
Path queries expressed in XPath
of structure
0
100
path queries
keyword search
1//City descendant-or-selfancestor-or-selfI
sland
2/City//Island
11Can we Use XPath?
Path queries expressed in XPath
of structure
0
100
path queries
keyword search
partial path queries
12Evaluation of Partial Path Queries on XML Data
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
?
13Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
14Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
15Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
16Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
17Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
18Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
19Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
20Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
21Partial Path Queries
- Query processing
- Full form (13 inference rules)
- Unsatisfiability (cycles)
- Redundant nodes (4 patterns)
- Canonical form
22Evaluation of Partial Path Queries on XML Data
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
?
?
23Naive Techniques
NT1. Producing all possible path queries
24Naive Techniques
NT1. Producing all possible path queries
r
a
c
b
d
e
f
g
25Naive Techniques
NT1. Producing all possible path queries
r
r
r
r
a
a
a
a
c
b
c
b
c
b
c
b
d
d
d
d
e
f
e
f
e
f
e
f
g
g
g
g
26Naive Techniques
NT1. Producing all possible path queries
? too many queries to evaluate
27Naive Techniques
NT2. Decomposing into binary relationships
28Naive Techniques
NT2. Decomposing into binary relationships
r
a
a
a
b
c
Stack-Tree-Desc or PathStack
b
c
d
d
d
d
f
e
f
g
29Naive Techniques
NT2. Decomposing into binary relationships
r
a
a
a
b
c
Merge-join
b
c
d
d
d
d
f
e
f
g
30Naive Techniques
NT2. Decomposing into binary relationships
? intermediate results
31Naive Techniques
NT3. Decomposing into root-to-leaf paths
32Naive Techniques
NT3. Decomposing into root-to-leaf paths
PathStack
33Naive Techniques
NT3. Decomposing into root-to-leaf paths
? overlapping between paths ? intermediate result
s
34Advanced Techniques
PartialMJ. Using a spanning tree
35Advanced Techniques
PartialMJ. Using a spanning tree
Remove edges to create a spanning tree
36Advanced Techniques
PartialMJ. Using a spanning tree
37Advanced Techniques
PartialMJ. Using a spanning tree
PathStack
38Advanced Techniques
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
39Advanced Techniques
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
40Advanced Techniques
PartialMJ. Using a spanning tree
Join conditions (identity, structural, path)
41Advanced Techniques
PartialMJ. Using a spanning tree
? overlapping between paths ? intermediate result
s
42Advanced Techniques
PartialPathStack. Employ a topological order
r
a
c
b
d
e
f
g
43Advanced Techniques
PartialPathStack. Employ a topological order
PartialPathStack
44Advanced Techniques
- PathStack
- Path queries
- Indegree 1
- Outdegree 1
- O(input output)
- PartialPathStack
- Partial path queries
- Indegree 1
- Outdegree 1
- O(inputindegree outputoutdegree)
45Evaluation of Partial Path Queries on XML Data
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
46Queries for Experiments
Q1/Q5
Q2/Q6
Q3/Q7
Q4/Q8
47Experiment 1 (fixed datasets)
Execution time on fixed datasets
- Synthetic Data
- IBM AlphaWorks XML generator
- 2.5 million nodes
- Benchmark Data
- Treebank
- 2.5 million nodes
48Experiment 1 (fixed datasets)
Treebank
49Experiment 1 (fixed datasets)
Treebank
path queries
50Experiment 1 (fixed datasets)
Treebank
too many results
51Experiment 1 (fixed datasets)
Synthetic
52Experiment 2 (size of dataset)
Execution time varying the size of the tree
- Synthetic
- IBM AlphaWorks XML generator
- 1 - 3 million nodes
53Experiment 2 (size of dataset)
PartialMJ
PartialMJ
PartialPathStack
PartialPathStack
Q2
Q3
Q7
PartialMJ
PartialPathStack
54Evaluation of Partial Path Queries on XML Data
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
?
55Conclusion
- Partial path queries
- PartialPathStack
56Future Work
57Questions?
- Querying XML data
- Partial path queries
- Query evaluation
- Experiments
- Conclusion
?
?
?
?
?