Xpath Query Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Xpath Query Evaluation

Description:

Xpath Query Evaluation Restricted language allowing for type inference Axes: child, descendant, parent, ancestor, following-sibling, etc. variables can be bound to ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 34
Provided by: Adm9128
Category:

less

Transcript and Presenter's Notes

Title: Xpath Query Evaluation


1
Xpath Query Evaluation
2
Goal
  • Evaluating an Xpath query against a given
    document
  • To find all matches
  • We will also consider the use of types
  • Complexity is important
  • Huge Documents

3
Data complexity vs. Combined Complexity
  • Two inputs to the query evaluation problem
  • Data (XML document) of size D
  • Query (Xpath expression) of size Q
  • Usually Q ltlt D
  • Polynomial data complexity
  • Complexity that is polynomial in D, possibly
    exponential in Q
  • Polynomial combined complexity
  • Complexity that is polynomial in D and Q
  • Fixed Parameter Tractable complexity
  • Complexity Poly(D)f(Q)

4
Xpath standard semantics
5
Core XPath
  • locpath /' locpath j locpath /' locpath j
  • locpath j' locpath j locstep.
  • locstep axis ' ntst ' bexpr ' . . .
    ' bexpr '.
  • bexpr bexpr and' bexpr j bexpr or' bexpr j
  • not(' bexpr )' j locpath.
  • axis self' j child' j parent' j
  • descendant' j descendant-or-self' j
  • ancestor' j ancestor-or-self'
  • following' j following-sibling'
  • preceding' j preceding-sibling'.

6
Xpath Query Evaluation
  • Input XML Document D, Xpath query Q
  • Output A subset of the nodes of D,
  • as defined by Q
  • We will follow Efficient Algorithms for
    Processing Xpath Queries / Gottlob, Koch,
    Pichler, TODS 2005

7
Simple algorithm
  • process-location-step(n,Q)
  • S- Apply Q.first to n
  • If Qgt 1
  • For each node n in s do
  • process-location-step(n,Q.next)

8
Complexity
  • Worst case in each step of Q the axis is
    following
  • So we apply the query in each step on O(D)
    nodes
  • And we get Time(Q) DTime(Q-1)
  • I.e. the complexity is O(DQ)

9
Early Systems Performance
Figure taken from Gottlob, Koch, Pichler 05
10
Internet Explorer 6
Figure taken from Gottlob, Koch, Pichler 05
11
IE6 performance as a function of document size
Figure taken from Gottlob, Koch, Pichler 05
12
Polynomial data complexity
  • Poly data complexity is sometimes considered good
    even if exponential in the query size
  • But can we have polynomial combined complexity
    for Xpath query evaluation?
  • Yes!

13
Two main principles
  • Query parse trees the query is divided to parts
    according to its structure (not to be confused
    with the XML tree structure)
  • Context-value tables for every expression e
    occurring in the parse tree, compute a table of
    all valid combinations of context c and value v
    such that e evaluates to v in c.

14
Xpath query parse tree
  • descendantb/following-sibling

  • position() ! last()

15
Bottom-up vs. Top-down evaluation
  • We will discuss two kinds of query evaluation
    algorithms
  • Bottom-up means that the query parse tree is
    processed from the leaves up to the root
  • Top-down means that the parse tree is processed
    from the root to the leaves
  • When processing we will fill in the context-value
    table

16
Bottom-up evaluation
  • Main idea compute the value for each leaf for
    every possible context
  • Propagate upwards until the root
  • Dynamic programming algorithm to avoid
    re-evaluation of queries in the same context

17
Operational semantics
  • Needed as a first step for evaluation algorithms
  • Similar ideas used in compilers design
  • Here the semantics is based on the notion of
    contexts

18
Contexts
  • The domain of contexts is
  • C dom X ltk,ngt 1ltkltnlt dom
  • A context is cltx,k,ngt
  • where x is a
    context node
  • k is a
    context position
  • n is the
    context size

19
Types
20
Semantics for Xpath expressions
  • The semantics of evaluating an expression is a
    4-tuple where the first 3 elements are the
    context, and the fourth is the value obtained by
    evaluation in the context

21
Some notations
  • T(t) all nodes satisfying a predicate t
  • E(e) all nodes satisfying a regular exp. e
    (applied with respect to a given axis)
  • Idxx(x,S) is the index of a node x in the set s
    with respect to a given axis and the document
    order

22
(No Transcript)
23
Context-value Table
  • Given a query sub-expression e, the context-value
    table of e specifies all combinations of context
    c and value v, such that computing e on the
    context c results in v
  • Bottom-up algorithm follows compute the
    context-value table in a bottom-up fashion with
    respect to the query

24
Bottom-up algorithm
25
Example
4 times
26
Complexity
  • O(D3Q) space ignoring strings and numbers
  • O(Q) tables, with 3 columns, each including
    values in 1D thus O(D3Q)
  • An extra O(DQ) multiplicative factor for
    strings and numbers
  • O(D5Q) time ignoring strings and numbers
  • It can take O(D2) to combine two nodesets
  • Extra O(Q) in case of strings and numbers

27
Optimization
  • Represent contexts as pairs of current and
    previous node
  • Allows to get the time complexity down to
    O(D4 Q2)
  • Space complexity can be brought down to
    O(D2Q2) via more optimizations

28
Top-down evaluation
  • Similar idea
  • But allows to compute only values for contexts
    that are needed
  • Same worst-case bounds

29
Top-down or bottom-up?
  • General question in processing XML trees
  • The tradeoff
  • Usually easier to combine results computed in
    children to obtain the result at the parent
  • So bottom-up traversal is usually easier to
    design
  • On the other hand, some of the computation is
    redundant since we dont know if it will become
    relevant
  • So top-down traversal may be more efficient

30
Linear-time fragment
  • Core Xpath includes only navigation
  • \ and \\
  • Core Xpath can be evaluated in O(DQ)
  • Observtion no need to consider the entire
    triple, only current context node
  • Top-down or bottom-up evaluation with essentially
    the same algorithm
  • But smaller tables (for every query node, all
    document nodes and values of evaluation) are
    maintained.

31
Types are helpful
  • Can direct the search
  • In some parts of the tree there is no hope to get
    a match to a given sub-expression of the query
  • As a result we may have tables with less entries.
  • Whiteboard discussion

32
Type Checking and Inference
  • Type checking a single document straightforward
  • Polynomial combined complexity if automaton
    representing type is deterministic, exponential
    in automaton size but polynomial in document size
    otherwise
  • Type checking the results of a (Xpath) query
  • Inferring the results of a query

33
Type Inference
  • An (incomplete) algorithm for type inference can
    work its way to the top of the query parse tree
    to infer a type in a bottom-up fashion
  • Start by inferring a type for the leaves (simple
    queries), then use it for their parents
  • Type Inference is inherently incomplete.
  • Can be performed for some languages that are
    regular in a sense.

34
Restricted language allowing for type inference
  • Axes child, descendant, parent, ancestor,
    following-sibling, etc.
  • variables can be bound to nodes in the input
    tree then passed as parameters
  • An equality test can be performed between node
    ID's, but not between node values.

35
Type Checking
  • In addition to inferring a type we need to verify
    containment in another type.
  • Type Inference can be used as a tool for Type
    Checking.
  • Type Checking was shown to be decidable for the
    same language fragment, but with high complexity.

36
Intuitive connection to text
  • Queries gt regular expressions
  • Types (tree automata) gt context free languages
  • Type Inference gt intersection of context free
    and regular languages, resulting in a context
    free one
  • Type checking gt Type Inference inclusion of
    context free languages (with some restrictions to
    guarantee decidability)
Write a Comment
User Comments (0)
About PowerShow.com