Lazy Query Evaluation for Active XML - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Lazy Query Evaluation for Active XML

Description:

'Best Western' CIS 650. 23. UNIVERSITY of PENNSYLVANIA. Grigoris Karvounarakis October 04 ... 'Best Western' NFQ1. NFQ2. NFQ1 can influence NFQ2, but not vice ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 47
Provided by: Gre3
Category:

less

Transcript and Presenter's Notes

Title: Lazy Query Evaluation for Active XML


1
Lazy Query Evaluation for Active XML
Abiteboul, Benjelloun, Cautis, Manolescu, Milo,
PredaINRIA Futurs
presented by Grigoris Karvounarakis
Univ. of Pennsylvania
CIS 650 October 14,
2004
2
Active XML
function nodes
3
Tree Pattern Queries
result nodes
4
Tree Pattern Queries
  • Similar to Pattern Trees from TAX/TLC algebra
  • variable nodes, used to bind variables to
    sub-trees
  • (variable nodes with the same name must be mapped
    to elements with the same tag name)
  • result nodes
  • Embedding (of a query q into a doc d) Match
  • Result of embedding bindings of output
    variables on witness tree

5
No embedding
6
No embedding
1
but if we evaluate
7
Embedding Example
8
Embedding Example
9
Embedding Example
10
Relevant rewriting
  • (getNearbyRestos) is a relevant function
    node
  • In general, a function node is relevant, if there
    exists some rewriting of the document where some
    of the nodes it produces belongs to a match
  • Rewriting the document by invoking relevant
    function nodes produces relevant rewritings
  • d1 !v1 d2 !v2 dn
  • A document that contains no calls that are
    relevant to a query q is said to be complete for q

1
11
Problem definition
  • Given an Active XML document d and a query q,
    find an efficient way to evaluate the query over
    the document
  • Naïve approach interleave query evaluation with
    function calls
  • Better try to compute (a superset of) the
    relevant functions calls for q and execute q over
    the rewriting of d (that results from executing
    these function calls)

12
Problem definition
  • Given an Active XML document d and a query q,
    find an efficient way to evaluate the query over
    the document
  • Naïve approach interleave query evaluation with
    function calls
  • Better try to compute (a superset of) the
    relevant functions calls for q and execute q over
    the rewriting of d (that results from executing
    these function calls)
  • Efficiency tradeoff
  • time to compute approximation of set of relevant
    functions (larger for more accurate approx)
  • time to execute the function calls (smaller for
    more accurate approx) and time to execute query
    over resulting rewriting of document (smaller
    document for more accurate approx)

13
Outline
  • Definitions
  • Finding relevant calls
  • Sequencing relevant calls
  • Improving accuracy
  • Reducing detection time
  • Conclusions - Discussion

14
Linear Path Queries
/() /nyHotels/() /nyHotels/hotel/() /nyHotels/h
otel/name/() /nyHotels/hotel/rating/() /nyHotels
/hotel/nearby/() /nyHotels/hotel/nearby//() /nyH
otels/hotel/nearby//restaurant/() /nyHotels/hotel
/nearby//restaurant/name/() /nyHotels/hotel/nearb
y//restaurant/address/() /nyHotels/hotel/nearby//
restaurant/rating/()
15
Linear Path Queries
  • Correct, but usually inaccurate
  • Ignores filtering conditions in the path from the
    root or in other branches that could make some of
    the functions irrelevant (e.g. there is no chance
    that a getNearbyRestos() function node under a
    hotel is relevant, if the hotel rating is not
    )

16
Node Focused Queries
  • For each node in the query tree, replace it with
    an OR node (to add a branch () to match any
    functions, similarly with LPQs)
  • Then, for every node v in the resulting query
    tree, create qv q v and its subtree, with
    output node fv pointing at the position of the
    () OR-sibling of v
  • Each such query tree involves the path from the
    root to the node (as in LPQ) any parts of the
    tree that would have to be matched anyway, for
    the whole query tree to match.

17
NFQ Example
nyHotels

hotel



name
nearby
rating



Best Western

restaurant



name
address
rating

X
Y
18
NFQ Example
nyHotels

hotel



name
nearby
rating



Best Western

restaurant



name
address
rating

X
Y
19
NFQ Example
nyHotels

20
NFQ Example
nyHotels

21
NFQ Example
nyHotels

22
Another NFQ Example
Best Western
23
Another NFQ Example
24
Another NFQ Example
25
Another NFQ Example
Best Western
26
Node Focused Queries
  • Assuming that functions can return data of
    arbitrary type, the function nodes that are
    relevant for a query q are precisely the ones
    retrieved by the NFQs of q

27
Outline
  • Definitions
  • Finding relevant calls
  • Sequencing relevant calls
  • Improving accuracy
  • Reducing detection time
  • Conclusions - Discussion

28
Sequencing relevant calls
  • Naïve NFQA algorithm
  • Evaluate all NFQs
  • Pick one of the returned functions, say fv
  • Evaluate the function and rewrite the document (d
    !fv d)
  • Until all NFQs return empty results (i.e., there
    are no more relevant calls)
  • After every loop, although the NFQs remain the
    same, their result can change (since evaluating
    functions at step 3 above can introduce new
    function nodes or make some results irrelevant)

29
Improving NFQA
  • Predict when NFQ results could not have
    possibly changed and avoid reevaluating them
  • Identify dependences between NFQs and the effect
    of executing functions they return

30
Influence of NFQs
NFQ1
NFQ2
nyHotels

Best Western
NFQ1 can influence NFQ2, but not vice versa
31
Influence of NFQs
  • NFQ1 may influence NFQ2 iff the output function
    node of NFQ1 is an ancestor (in the query tree)
    of the output node of NFQ2
  • Two NFQs belong in the same layer if they may
    influence (directly or transitively) each other.
  • Inside every layer, we have to reevaluate every
    NFQ after every function call
  • Multiple equivalent NFQs (i.e., in the same
    layer) can only exist under // so that, not
    knowing the output type, both nodes could appear
    as descendants of each other, e.g. //a, //b in
    /a/b, //a matches /a and //b matches /a/b, while
    in /b/a, //b matches /b and //a matches /b/a

32
Influence of NFQs
  • L1 (directly or transitively) some NFQ in
  • We have to process L1 before L2 (without having
    to process L1 again afterwards)
  • When processing L1 has finished, OR-nodes
    corresponding to returned functions are redundant
    and thus NFQs in L2 can be simplified by removing
    them

33
Parallelizing calls
  • Let qlin be the linear path from the root to the
    output node of NFQ q, not inclusive (note qlin
    is a regular expression)
  • Two NFQs q, q that belong to the same layer are
    independent iff there are no common words in the
    regular languages of qlin, qlin
  • E.g //a, //b are independent
  • But //a//c and //b//c are not (e.g. both match
    /a/b/c)
  • If all NFQs in a layer are independent, we can
    call all functions returned by the same NFQ in a
    step of NFQA in parallel.
  • Other sufficient conditions could exist, too

34
Outline
  • Definitions
  • Finding relevant calls
  • Sequencing relevant calls
  • Improving accuracy
  • Reducing detection time
  • Conclusions - Discussion

35
Using types
  • Use function return type to predict shape of
    data that a function call can return
  • Similar to check for existence of a possible
    rewriting
  • If this shape cannot match the (corresponding
    part of) the query pattern, they can be discarded
  • In some cases, one can go further and restrict
    not only the output type but also the specific
    names of functions that could match
  • Refined NFQs
  • Use set of function names of appropriate return
    type instead of ()
  • Use F-guides (later) to make them even more
    refined

36
Refined NFQ example
nyHotels
hotel
nearby


name
rating




Best Western
37
Refined NFQ example
nyHotels
hotel
nearby


name
rating
getNearbyRestos
getRating


Best Western
38
Pushing queries
  • Similar to pushing selections on scans in
    relational queries or pushing queries to data
    sources in mediator systems
  • Reduce amount of (useless) data that are
    transferred (assuming functions correspond to
    remote (web) services), by filtering irrelevant
    matches and projecting only on output variable
    nodes

39
Outline
  • Definitions
  • Finding relevant calls
  • Sequencing relevant calls
  • Improving accuracy
  • Reducing detection time
  • Conclusions - Discussion

40
Lenient rewriting
  • Trade accuracy for efficiency
  • Use XPath or LPQs instead of NFQ (faster
    processing)
  • Use a lenient form of type checking (ignoring
    order and cardinality of elements)

41
Function call guides
  • Similar to dataguides for function calls
  • One occurrence for each path that leads to some
    function node pointers to function nodes

42
Function call guides
  • Similar to dataguides for function calls
  • One occurrence for each path that leads to some
    function node pointers to function nodes

paths that dont lead to functions are left out
43
Function call guides
  • Similar to dataguides for function calls
  • One occurrence for each path that leads to some
    function node pointers to function nodes

pointers to getHotels calls
pointers to getRating calls
pointers to getNearbyRestos, getNearbyMuseums
calls
44
Function call guides
  • Use F-guides for
  • Generation of Refined NFQs (use return type
    within appropriate F-guide part to get only
    function names that can indeed appear in the
    corresponding tree fragment)
  • Efficient approximation of relevant function
    nodes evaluate queries (NFQs) on F-guide ?
    evaluate queries on original document using LPQs
  • Initial filtering Can get rid of NFQs for nodes
    that dont have any children in the F-guide

45
Conclusions
  • Active XML Interesting new area
  • Nothing fundamentally novel
  • Applies known tools (distributed processing, lazy
    evaluation) in a new context, giving new life to
    documents
  • Greatest challenge formulate the right research
    questions well
  • Answers to these well-formulated questions are
    fairly easy.
  • Contributions of this paper
  • Formulates such an interesting question
  • Thorough understanding of different aspects of
    the problem (accuracy vs. performance and their
    effect to overall efficiency)

46
Questions?
Write a Comment
User Comments (0)
About PowerShow.com