Title: Containment of Partially Specified TreePattern Queries
1Containment of Partially Specified Tree-Pattern
Queries
Dimitri Theodoratos (NJIT, USA) Theodore
Dalamagas (NTUA, GREECE) Pawel Placek (NJIT,
USA) Stefanos Souldatos (NTUA, GREECE) Timos
Sellis (NTUA, GREECE)
2IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
3Motivating Example (?)
- Tree structure (e.g. XML) with motorbike spare
parts. - We search for spare parts.
- BUT
4Motivating Example (?)
- Dimitri Theodoratos lives in NJ.
- He has a Yamaha Serrow motorbike in Greece.
- He searches for spare parts in Greece or USA.
- ? structural difference
5Motivating Example (?)
- Theodore Dalamagas has a BMW motorbike.
- He looks for spare parts worldwide.
- ? structural inconsistency
../F650GS/650cc
../650cc/F650GS
6Motivating Example (?)
- Stefanos Souldatos has a Honda Varadero.
- But, he is not fully aware of the tree structure.
- ? unknown structure
7Motivating Example (?)
- Pawel Placek wants to buy a motorbike that he can
easily find spare parts for. - He searches in many different tree structures.
- ? source integration
8Motivation
- ? Querying tree-structured data
- BUT
- ? structure is not always strictly defined
- ? user does not always deal with structure
- ? Find Honda spare parts in Greece.
9Our Approach
- Dimensions semantically related nodes.
- Dimension Graphs summary of the tree structure.
- Query Language partial specification of the
structure (Partially Specified Tree-Pattern
Queries). - We study the problem of Query Containment for
Partially Specified Tree-Pattern Queries.
10IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
11Dimension Graph
dimension graph summary of the tree structure
DIMENSIONS
R (oot)
C (ountry)
L (ocation)
B (rand)
T (ype)
M (odel)
E (ngine)
12Dimension Graph
- offers a summary of the structure of the tree.
- provides the necessary semantics for query
formulation. - sets the framework for querying sources with
structural differences and inconsistencies. - supports query evaluation and optimization.
DIMENSIONS
R (oot)
C (ountry)
L (ocation)
B (rand)
T (ype)
M (odel)
E (ngine)
13Partially Specified Tree-pattern Query
DIMENSIONS
R (oot)
C (ountry)
L (ocation)
B (rand)
- Query Find shops with spare parts for all models
and all engines of BMW motorbikes in Greece. (
structural info)
T (ype)
M (odel)
E (ngine)
14Partially Specified Tree-pattern Query
DIMENSIONS
R (oot)
C (ountry)
partially specified paths (PSP)
L (ocation)
B (rand)
- Query Find shops with spare parts for all models
and all engines of BMW motorbikes in Greece. (
structural info)
T (ype)
M (odel)
E (ngine)
15Partially Specified Tree-pattern Query
DIMENSIONS
R (oot)
C (ountry)
output path ()
partially specified paths (PSP)
L (ocation)
B (rand)
- Query Find shops with spare parts for all models
and all engines of BMW motorbikes in Greece. (
structural info)
T (ype)
M (odel)
E (ngine)
16Partially Specified Tree-pattern Query
parent child
ancestor descendant
DIMENSIONS
R (oot)
C (ountry)
output path ()
partially specified paths (PSP)
L (ocation)
B (rand)
- Query Find shops with spare parts for all models
and all engines of BMW motorbikes in Greece. (
structural info)
T (ype)
M (odel)
E (ngine)
17Partially Specified Tree-pattern Query
node sharing expression (NSE)
parent child
ancestor descendant
DIMENSIONS
R (oot)
C (ountry)
output path ()
partially specified paths (PSP)
L (ocation)
B (rand)
- Query Find shops with spare parts for all models
and all engines of BMW motorbikes in Greece. (
structural info)
T (ype)
M (odel)
E (ngine)
18IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
19Additional Concepts
Full Form Query
20Additional Concepts
Full Form Query
Dimension Trees
DIMENSION TREES QUERY GRAPH
21IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
22Absolute Containment
Each result of Q1 is a result of Q2.
?
Q1 ? Q2
23Absolute Containment
Each result of Q1 is a result of Q2.
?
Q1 ? Q2
homomorphism from Q2 to Q1
24Absolute Containment
Each result of Q1 is a result of Q2.
?
Q1 ? Q2
homomorphism from Q2 to Q1
Q1
Q2
PSP p2
PSP p1
PSP p4
PSP p3
25Relative Containment (w.r.t. G)
Each result of Q1 in G is a result of Q2 in G.
?
Q1 ?G Q2
26Relative Containment (w.r.t. G)
Each result of Q1 in G is a result of Q2 in G.
?
Q1 ?G Q2
homomorphism from the Dimension Trees of Q2 to
the Dimension Trees of Q1
27Relative Containment (w.r.t. G)
Each result of Q1 in G is a result of Q2 in G.
?
Q1 ?G Q2
homomorphism from the Dimension Trees of Q2 to
the Dimension Trees of Q1
A dimension tree of Q1
A dimension tree of Q2
28Relative Containment Heuristic
1msec Absolute Containment (AC)
100msec Relative Containment (RC)
29Relative Containment Heuristic
Relative Containment Heuristic (RCH)
1msec Absolute Containment (AC)
100msec Relative Containment (RC)
- ? sound but not complete
- extract structural information from the Dimension
Graph - insert it in the query Q1
- check Q1 ? Q2 instead of Q1 ?G Q2
30Relative Containment Heuristic
Q1
Q2
Q1 ? Q2
C ?
B ?
B ?
T ?
PSP p1
PSP p2
31Relative Containment Heuristic
Q1
Q2
BT R-C, CB
Q1 ? Q2
C ?
B ?
B ?
T ?
PSP p1
PSP p2
32Relative Containment Heuristic
Q1
Q2
BT R-C, CB
Q1 ? Q2
R ?
C ?
C ?
B ?
B ?
Q1 ?G Q2
T ?
PSP p1
PSP p2
33IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
34Experiments
- We measured
- execution time for
- Absolute Containment (AC)
- Relative Containment (RC)
- Relative Containment Heuristic (RCH)
- accuracy for RCH
- for various graph sizes
- for various query sizes
35Time
Graph dimensions 30
Graph dimensions 40
Graph dimensions 20
RC
RC
RC
RCH
RCH
RCH
Time (msec)
AC
AC
AC
Graph paths 10 - 80
Graph paths 15 - 120
Graph paths 20 - 160
Query PSPs 1
Query PSPs 2
RC
RC
Time (msec)
RCH
RCH
AC
AC
Nodes per PSP 3 - 6
Nodes per PSP 3 - 6
36Accuracy of RCH
- 80 for graphs of common sizes
- based on XML benchmarks (XMach, XMark, etc.)
- 50 for graphs of higher density
37IntroductionData ModelAdditional ConceptsQuery
ContainmentExperimentsConclusion
38Conclusion
- Query Containment for Partially Specified
Tree-Pattern Queries (PSTPQs). - Sound technique for checking Relative Query
Containment - Time one order of magnitude
- Accuracy over 80
39Future Work
- Heuristics for checking Relative Containment
- precomputed and on-the-fly
- trade-off between time and accuracy
- Special forms of queries, e.g. swings
40Questions?
41Links
- Introduction (2-9)
- Data Model (10-17)
- Additional Concepts (18-20)
- Query Containment (21-32)
- Experiments (33-36)
- Conclusion (37-41)
- Appendix (42-46)
42Appendix
43Who defines the dimensions?
- Automatic
- XML tags (dimension graph path summary, path
index, structural summary) - Semi-automatic
- Graph administrator XML tags
- (dimension group of XML tags)
- Graph administrator ontology
- Manual
- Graph administrator
44Inference Rules
INFERENCE RULES (IR1) - Rp1 ? Rp2 (IR2)
Ap1 ? Ap2, Ap2 ? Ap3 - Ap1 ?
Ap3 (IR3) a structural expression that involves
Ap - Rp Ap (IR4) Ap ? Bp - Ap
Bp (IR5) Ap Bp, Bp Cp - Ap
Cp (IR6) Ap ? Bp, Ap Cp - Bp
Cp (IR7) Ap ? Bp, Cp Bp - Cp
Ap (IR8) Ap1 ? Bp1, Bp1 ? Bp2 - Ap2
? Bp2 (IR9) Ap1 Bp1, Bp1 ? Bp2 -
Ap2 Bp2 (IR10) Ap1 Bp1, Ap1 ?
Ap2, Rp2 Bp2 - Ap2 Bp2 (IR11)
Ap1 Bp1, Bp1 ? Bp2 - Ap1 ?
Ap2 (IR12) Ap1 ? Bp1, Cp2 ? Bp2, Dp1
? Dp2 - Dp1 Ap1 (IR13) Ap1 ? Bp1,
Ap2 ? Cp2, Dp1 ? Dp2 - Dp1
Ap1 (IR14) Ap1 Bp1, Bp2 Ap2,
Cp1 ? Cp2 - Cp1 Ap1
1. Full Form Query
45Dimension Trees
r/Greece/BMW/ TE/M
r/Greece/BMW/ T/M E
r/Greece/BMW/ T/E/M
r/Greece/BMW/ TM/E/EM
46Previous Approaches
- Keyword-based search approach
- Absence of structure
- Naive approach
- All possible query patterns are generated
- (HondaGreece, GreeceHonda)
- Approximation techniques
- Relax the query ? more answers
- Traditional integration approach
- Global structure and mapping rules