Lazy Query Evaluation for Active XML - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Lazy Query Evaluation for Active XML

Description:

'Best Western' CIS 650. 23. UNIVERSITY of PENNSYLVANIA. Grigoris Karvounarakis October 04 ... 'Best Western' NFQ1. NFQ2. NFQ1 can influence NFQ2, but not vice ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 47

Provided by: Gre3

Category:

more less

Transcript and Presenter's Notes

Title: Lazy Query Evaluation for Active XML

1
Lazy Query Evaluation for Active XML
Abiteboul, Benjelloun, Cautis, Manolescu, Milo,
PredaINRIA Futurs
presented by Grigoris Karvounarakis
Univ. of Pennsylvania
CIS 650 October 14,
2004
2
Active XML
function nodes
3
Tree Pattern Queries
result nodes
4
Tree Pattern Queries

Similar to Pattern Trees from TAX/TLC algebra
variable nodes, used to bind variables to
sub-trees
(variable nodes with the same name must be mapped
to elements with the same tag name)
result nodes
Embedding (of a query q into a doc d) Match
Result of embedding bindings of output
variables on witness tree

5
No embedding
6
No embedding
1
but if we evaluate
7
Embedding Example
8
Embedding Example
9
Embedding Example
10
Relevant rewriting

(getNearbyRestos) is a relevant function
node
In general, a function node is relevant, if there
exists some rewriting of the document where some
of the nodes it produces belongs to a match
Rewriting the document by invoking relevant
function nodes produces relevant rewritings
d1 !v1 d2 !v2 dn
A document that contains no calls that are
relevant to a query q is said to be complete for q

1
11
Problem definition

Given an Active XML document d and a query q,
find an efficient way to evaluate the query over
the document
Naïve approach interleave query evaluation with
function calls
Better try to compute (a superset of) the
relevant functions calls for q and execute q over
the rewriting of d (that results from executing
these function calls)

12
Problem definition

Given an Active XML document d and a query q,
find an efficient way to evaluate the query over
the document
Naïve approach interleave query evaluation with
function calls
Better try to compute (a superset of) the
relevant functions calls for q and execute q over
the rewriting of d (that results from executing
these function calls)
Efficiency tradeoff
time to compute approximation of set of relevant
functions (larger for more accurate approx)
time to execute the function calls (smaller for
more accurate approx) and time to execute query
over resulting rewriting of document (smaller
document for more accurate approx)

13
Outline

Definitions
Finding relevant calls
Sequencing relevant calls
Improving accuracy
Reducing detection time
Conclusions - Discussion

14
Linear Path Queries
/() /nyHotels/() /nyHotels/hotel/() /nyHotels/h
otel/name/() /nyHotels/hotel/rating/() /nyHotels
/hotel/nearby/() /nyHotels/hotel/nearby//() /nyH
otels/hotel/nearby//restaurant/() /nyHotels/hotel
/nearby//restaurant/name/() /nyHotels/hotel/nearb
y//restaurant/address/() /nyHotels/hotel/nearby//
restaurant/rating/()
15
Linear Path Queries

Correct, but usually inaccurate
Ignores filtering conditions in the path from the
root or in other branches that could make some of
the functions irrelevant (e.g. there is no chance
that a getNearbyRestos() function node under a
hotel is relevant, if the hotel rating is not
)

16
Node Focused Queries

For each node in the query tree, replace it with
an OR node (to add a branch () to match any
functions, similarly with LPQs)
Then, for every node v in the resulting query
tree, create qv q v and its subtree, with
output node fv pointing at the position of the
() OR-sibling of v
Each such query tree involves the path from the
root to the node (as in LPQ) any parts of the
tree that would have to be matched anyway, for
the whole query tree to match.

17
NFQ Example
nyHotels

hotel

name
nearby
rating

Best Western

restaurant

name
address
rating

X
Y
18
NFQ Example
nyHotels

hotel

name
nearby
rating

Best Western

restaurant

name
address
rating

X
Y
19
NFQ Example
nyHotels

20
NFQ Example
nyHotels

21
NFQ Example
nyHotels

22
Another NFQ Example
Best Western
23
Another NFQ Example
24
Another NFQ Example
25
Another NFQ Example
Best Western
26
Node Focused Queries

Assuming that functions can return data of
arbitrary type, the function nodes that are
relevant for a query q are precisely the ones
retrieved by the NFQs of q

27
Outline

Definitions
Finding relevant calls
Sequencing relevant calls
Improving accuracy
Reducing detection time
Conclusions - Discussion

28
Sequencing relevant calls

Naïve NFQA algorithm
Evaluate all NFQs
Pick one of the returned functions, say fv
Evaluate the function and rewrite the document (d
!fv d)
Until all NFQs return empty results (i.e., there
are no more relevant calls)
After every loop, although the NFQs remain the
same, their result can change (since evaluating
functions at step 3 above can introduce new
function nodes or make some results irrelevant)

29
Improving NFQA

Predict when NFQ results could not have
possibly changed and avoid reevaluating them
Identify dependences between NFQs and the effect
of executing functions they return

30
Influence of NFQs
NFQ1
NFQ2
nyHotels

Best Western
NFQ1 can influence NFQ2, but not vice versa
31
Influence of NFQs

NFQ1 may influence NFQ2 iff the output function
node of NFQ1 is an ancestor (in the query tree)
of the output node of NFQ2
Two NFQs belong in the same layer if they may
influence (directly or transitively) each other.
Inside every layer, we have to reevaluate every
NFQ after every function call
Multiple equivalent NFQs (i.e., in the same
layer) can only exist under // so that, not
knowing the output type, both nodes could appear
as descendants of each other, e.g. //a, //b in
/a/b, //a matches /a and //b matches /a/b, while
in /b/a, //b matches /b and //a matches /b/a

32
Influence of NFQs

L1 (directly or transitively) some NFQ in
We have to process L1 before L2 (without having
to process L1 again afterwards)
When processing L1 has finished, OR-nodes
corresponding to returned functions are redundant
and thus NFQs in L2 can be simplified by removing
them

33
Parallelizing calls

Let qlin be the linear path from the root to the
output node of NFQ q, not inclusive (note qlin
is a regular expression)
Two NFQs q, q that belong to the same layer are
independent iff there are no common words in the
regular languages of qlin, qlin
E.g //a, //b are independent
But //a//c and //b//c are not (e.g. both match
/a/b/c)
If all NFQs in a layer are independent, we can
call all functions returned by the same NFQ in a
step of NFQA in parallel.
Other sufficient conditions could exist, too

34
Outline

Definitions
Finding relevant calls
Sequencing relevant calls
Improving accuracy
Reducing detection time
Conclusions - Discussion

35
Using types

Use function return type to predict shape of
data that a function call can return
Similar to check for existence of a possible
rewriting
If this shape cannot match the (corresponding
part of) the query pattern, they can be discarded
In some cases, one can go further and restrict
not only the output type but also the specific
names of functions that could match
Refined NFQs
Use set of function names of appropriate return
type instead of ()
Use F-guides (later) to make them even more
refined

36
Refined NFQ example
nyHotels
hotel
nearby

name
rating

Best Western
37
Refined NFQ example
nyHotels
hotel
nearby

name
rating
getNearbyRestos
getRating

Best Western
38
Pushing queries

Similar to pushing selections on scans in
relational queries or pushing queries to data
sources in mediator systems
Reduce amount of (useless) data that are
transferred (assuming functions correspond to
remote (web) services), by filtering irrelevant
matches and projecting only on output variable
nodes

39
Outline

Definitions
Finding relevant calls
Sequencing relevant calls
Improving accuracy
Reducing detection time
Conclusions - Discussion

40
Lenient rewriting

Trade accuracy for efficiency
Use XPath or LPQs instead of NFQ (faster
processing)
Use a lenient form of type checking (ignoring
order and cardinality of elements)

41
Function call guides

Similar to dataguides for function calls
One occurrence for each path that leads to some
function node pointers to function nodes

42
Function call guides

Similar to dataguides for function calls
One occurrence for each path that leads to some
function node pointers to function nodes

paths that dont lead to functions are left out
43
Function call guides

Similar to dataguides for function calls
One occurrence for each path that leads to some
function node pointers to function nodes

pointers to getHotels calls
pointers to getRating calls
pointers to getNearbyRestos, getNearbyMuseums
calls
44
Function call guides

Use F-guides for
Generation of Refined NFQs (use return type
within appropriate F-guide part to get only
function names that can indeed appear in the
corresponding tree fragment)
Efficient approximation of relevant function
nodes evaluate queries (NFQs) on F-guide ?
evaluate queries on original document using LPQs
Initial filtering Can get rid of NFQs for nodes
that dont have any children in the F-guide

45
Conclusions

Active XML Interesting new area
Nothing fundamentally novel
Applies known tools (distributed processing, lazy
evaluation) in a new context, giving new life to
documents
Greatest challenge formulate the right research
questions well
Answers to these well-formulated questions are
fairly easy.
Contributions of this paper
Formulates such an interesting question
Thorough understanding of different aspects of
the problem (accuracy vs. performance and their
effect to overall efficiency)