Title: Satisfiability of XPath Expressions
1Satisfiability of XPath Expressions
2Motivation
- Satisfiability
- Result of XPath expression not always empty?
- Related to
- Avoiding duplicate elimination and ordering
- Subsumption problem
- Examples
- selfa/selfb
- childa/child/parentb
- /child/parent/parent
- /preceding
- /following
3Problem Definition
- Data Model
- Alphabet of tag names S
- XML tree ordered node-labeled tree T
- Document order preorder tree walk
XPath Expressions (subset XPath 2.0) P
A S A P/P / PP P ? P P ? P P
P. A self parent child anc-or-self
desc-or-self foll-sibl prec-sibl. Note
desc ? desc-or-self/child foll ?
anc-or-self/foll-sibl/desc-or-self Fragments
P/,, ? ,P?, ?,
4Tree Description Graphs
- Meaning
- Special r node must map to the root.
- Node with label must map to a node with that
label. - Soled edge with descendent relationship.
- Solid edge with desc-or-self relationship.
- Solid edge with no label child relationship.
- Dashed edge follows in document order.
- Studied in Computational Linguistics
- Generalization of Tree Patterns.
- Defines an existential statement about an XML
tree. - Satisfiability problem.
- Def.s binary relations if input/output nodes are
indicated.
5TDGs and XPath
- (desc-or-selfa/parentb/anc-or-self)
- ?
- (descc/foll-sibl)
Theorem For every path expression in P/,, ?
there is an equivalent TDG. Reverse? Lemma If a
TDG with n nodes has a model then there is a
satisfying XML tree with at most n
nodes. Corollary Satisfiability of TDGs and
P/,, ? is in NP.
6String Matching
- BMS The Bounded Multiple String matching
problem. - Given a finite set of patterns that are strings
over 1,0, is there a string over 1,0 such
that - 1. all the patterns can be matched and
- 2. this string is not longer than the longest
pattern? - Examples
- 11, 0 is satisfiable 1010
- 11, 010 is satisfiable 10100
- 111, 00 is not satisfiable
- Theorem Deciding BMS is NP-complete. (red. of
SAT3)
7Lowerbounds (1/3)
- Theorem Deciding satisf. for P? is
NP-hard.(red. of BMS)
?
aba, ba, bbb
Corollary Deciding satisf. for TDGs is
NP-hard. Corollary Deciding satisf. for P- is
NP-hard.
8Lowerbounds (2/3)
- Theorem Deciding satisf. for P,? is
NP-hard.(red. of SAT3) - Intuition ith parent represents Xi
(para/par/par/par) ? ( X1 ? ?X2
? X4 ) ? (par/parb/par/par)
? (para/par/par/para)
( C1 ? C2 ? C3 ) ? selfpC1pC2pC3
9Lowerbounds (3/3)
- Theorem Deciding satisf. for P/, is
NP-hard.(red. of BMS)
?
aba, ba, bbb
- /selfa/childb/child/child/childa
- anc-or-selfa/parentb/parent
- anc-or-selfb/parentb/parentb
10Upperbounds
- Theorem Deciding satisf. for P is in PTIME
- Sketch of Algorithm
- Transform path-expression to TDG.
- Merge all parents of the same node.
- Repeat previous step until no more merges.
- If no label-conflict while merging then the
answer is yes else no. - Theorem Deciding satisf. for P/ is in PTIME
- Algorithm Similar
11Summary
- Sat. of TDGs is NP-complete.
- Xpath fragments
- Open problems
- upperbound P-
- complexity P? and P/,?
- relationship TDGs and P/,,?