Xpath Query Evaluation - PowerPoint PPT Presentation

About This Presentation

Title:

Xpath Query Evaluation

Description:

Xpath Query Evaluation Restricted language allowing for type inference Axes: child, descendant, parent, ancestor, following-sibling, etc. variables can be bound to ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 34

Provided by: Adm9128

Category:

more less

Transcript and Presenter's Notes

Title: Xpath Query Evaluation

1
Xpath Query Evaluation
2
Goal

Evaluating an Xpath query against a given
document
To find all matches
We will also consider the use of types
Complexity is important
Huge Documents

3
Data complexity vs. Combined Complexity

Two inputs to the query evaluation problem
Data (XML document) of size D
Query (Xpath expression) of size Q
Usually Q ltlt D
Polynomial data complexity
Complexity that is polynomial in D, possibly
exponential in Q
Polynomial combined complexity
Complexity that is polynomial in D and Q
Fixed Parameter Tractable complexity
Complexity Poly(D)f(Q)

4
Xpath standard semantics
5
Core XPath

locpath /' locpath j locpath /' locpath j
locpath j' locpath j locstep.
locstep axis ' ntst ' bexpr ' . . .
' bexpr '.
bexpr bexpr and' bexpr j bexpr or' bexpr j
not(' bexpr )' j locpath.
axis self' j child' j parent' j
descendant' j descendant-or-self' j
ancestor' j ancestor-or-self'
following' j following-sibling'
preceding' j preceding-sibling'.

6
Xpath Query Evaluation

Input XML Document D, Xpath query Q
Output A subset of the nodes of D,
as defined by Q
We will follow Efficient Algorithms for
Processing Xpath Queries / Gottlob, Koch,
Pichler, TODS 2005

7
Simple algorithm

process-location-step(n,Q)
S- Apply Q.first to n
If Qgt 1
For each node n in s do
process-location-step(n,Q.next)

8
Complexity

Worst case in each step of Q the axis is
following
So we apply the query in each step on O(D)
nodes
And we get Time(Q) DTime(Q-1)
I.e. the complexity is O(DQ)

9
Early Systems Performance
Figure taken from Gottlob, Koch, Pichler 05
10
Internet Explorer 6
Figure taken from Gottlob, Koch, Pichler 05
11
IE6 performance as a function of document size
Figure taken from Gottlob, Koch, Pichler 05
12
Polynomial data complexity

Poly data complexity is sometimes considered good
even if exponential in the query size
But can we have polynomial combined complexity
for Xpath query evaluation?
Yes!

13
Two main principles

Query parse trees the query is divided to parts
according to its structure (not to be confused
with the XML tree structure)
Context-value tables for every expression e
occurring in the parse tree, compute a table of
all valid combinations of context c and value v
such that e evaluates to v in c.

14
Xpath query parse tree

descendantb/following-sibling
position() ! last()

15
Bottom-up vs. Top-down evaluation

We will discuss two kinds of query evaluation
algorithms
Bottom-up means that the query parse tree is
processed from the leaves up to the root
Top-down means that the parse tree is processed
from the root to the leaves
When processing we will fill in the context-value
table

16
Bottom-up evaluation

Main idea compute the value for each leaf for
every possible context
Propagate upwards until the root
Dynamic programming algorithm to avoid
re-evaluation of queries in the same context

17
Operational semantics

Needed as a first step for evaluation algorithms
Similar ideas used in compilers design
Here the semantics is based on the notion of
contexts

18
Contexts

The domain of contexts is
C dom X ltk,ngt 1ltkltnlt dom
A context is cltx,k,ngt
where x is a
context node
k is a
context position
n is the
context size

19
Types
20
Semantics for Xpath expressions

The semantics of evaluating an expression is a
4-tuple where the first 3 elements are the
context, and the fourth is the value obtained by
evaluation in the context

21
Some notations

T(t) all nodes satisfying a predicate t
E(e) all nodes satisfying a regular exp. e
(applied with respect to a given axis)
Idxx(x,S) is the index of a node x in the set s
with respect to a given axis and the document
order

22
(No Transcript)
23
Context-value Table

Given a query sub-expression e, the context-value
table of e specifies all combinations of context
c and value v, such that computing e on the
context c results in v
Bottom-up algorithm follows compute the
context-value table in a bottom-up fashion with
respect to the query

24
Bottom-up algorithm
25
Example
4 times
26
Complexity

O(D3Q) space ignoring strings and numbers
O(Q) tables, with 3 columns, each including
values in 1D thus O(D3Q)
An extra O(DQ) multiplicative factor for
strings and numbers
O(D5Q) time ignoring strings and numbers
It can take O(D2) to combine two nodesets
Extra O(Q) in case of strings and numbers

27
Optimization

Represent contexts as pairs of current and
previous node
Allows to get the time complexity down to
O(D4 Q2)
Space complexity can be brought down to
O(D2Q2) via more optimizations

28
Top-down evaluation

Similar idea
But allows to compute only values for contexts
that are needed
Same worst-case bounds

29
Top-down or bottom-up?

General question in processing XML trees
The tradeoff
Usually easier to combine results computed in
children to obtain the result at the parent
So bottom-up traversal is usually easier to
design
On the other hand, some of the computation is
redundant since we dont know if it will become
relevant
So top-down traversal may be more efficient

30
Linear-time fragment

Core Xpath includes only navigation
\ and \\
Core Xpath can be evaluated in O(DQ)
Observtion no need to consider the entire
triple, only current context node
Top-down or bottom-up evaluation with essentially
the same algorithm
But smaller tables (for every query node, all
document nodes and values of evaluation) are
maintained.

31
Types are helpful

Can direct the search
In some parts of the tree there is no hope to get
a match to a given sub-expression of the query
As a result we may have tables with less entries.
Whiteboard discussion

32
Type Checking and Inference

Type checking a single document straightforward
Polynomial combined complexity if automaton
representing type is deterministic, exponential
in automaton size but polynomial in document size
otherwise
Type checking the results of a (Xpath) query
Inferring the results of a query

33
Type Inference

An (incomplete) algorithm for type inference can
work its way to the top of the query parse tree
to infer a type in a bottom-up fashion
Start by inferring a type for the leaves (simple
queries), then use it for their parents
Type Inference is inherently incomplete.
Can be performed for some languages that are
regular in a sense.

34
Restricted language allowing for type inference

Axes child, descendant, parent, ancestor,
following-sibling, etc.
variables can be bound to nodes in the input
tree then passed as parameters
An equality test can be performed between node
ID's, but not between node values.

35
Type Checking

In addition to inferring a type we need to verify
containment in another type.
Type Inference can be used as a tool for Type
Checking.
Type Checking was shown to be decidable for the
same language fragment, but with high complexity.

36
Intuitive connection to text

Queries gt regular expressions
Types (tree automata) gt context free languages
Type Inference gt intersection of context free
and regular languages, resulting in a context
free one
Type checking gt Type Inference inclusion of
context free languages (with some restrictions to
guarantee decidability)

Write a Comment

User Comments (0)