Incremental Fusion of XML Fragments through Semantic Identifiers - PowerPoint PPT Presentation

About This Presentation
Title:

Incremental Fusion of XML Fragments through Semantic Identifiers

Description:

first Serge /first /author /book /bib prices entry price 39.95 /price ... 'Serge' first 'W.' 'Abiteboul' author. first. last. Year='1994' Year ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 35
Provided by: defau635
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Incremental Fusion of XML Fragments through Semantic Identifiers


1
Incremental Fusion of XML Fragments through
Semantic Identifiers
Maged El-Sayed, Elke A. Rundensteiner, and Murali
Mani Database Systems Research Lab Computer
Science Department Worcester Polytechnic
Institute Worcester, MA 01609-2280, USA
2
Motivation
  • Views integrate data from different sources
  • Source data may be available at different times
  • View result to be computed incrementally
  • Incremental results need to be fused

3
Applications Needing Fusion
  • Incremental view maintenance
  • Stream monitoring
  • Data integration
  • Data warehousing
  • E-commerce
  • . . .

4
Motivation Fusion for XML
  • Challenges in XML views
  • Nesting via FLWR expressions
  • Complex result re-structuring
  • Order handling
  • . . .

5
Running Example XML Data
ltbibgt ltbook year 1994gt lttitlegtTCP/IP
Illustratedlt/titlegt ltauthorgt
ltlastgtStevenslt/lastgtltfirstgtW.lt/firstgt
lt/authorgt lt/bookgt ltbook year 2000gt
lttitlegtData on the Weblt/titlegt ltauthorgt
ltlastgtAbiteboullt/lastgt
ltfirstgtSergelt/firstgt lt/authorgt
lt/bookgt lt/bibgt
ltpricesgt ltentrygt ltpricegt39.95lt/pricegt
ltb-titlegtData on the Weblt/b-titlegt
lt/entrygt ltentrygt ltpricegt
65.95lt/pricegt ltb-titlegtTCP/IP
Illustratedlt/b-titlegt lt/entrygt ltentrygt
ltpricegt 69.99lt/pricegt
ltb-titlegtAdvanced Programming in the
Unix environment lt/b-titlegt lt/entrygt
lt/pricesgt
prices.xml
bib.xml
6
Example XML View
ltresultgt FOR y in distinct-values(doc("bib.xm
l")/bib/book/_at_year) ORDER BY y RETURN
ltyGroup Y ygt ltbooksgt FOR
b in doc ("bib.xml")/bib/book,
e in doc (prices.xml")/prices/entry
WHERE y b/_at_year and
b/title e/b-title RETURN
ltentrygt b/title
e/pricelt/entrygt lt/booksgt
lt/yGroupgt lt/resultgt
result
yGroup
yGroup
Y2000
Y1994
books
books
entry
entry
title
price
price
title
TCP/IP
65.95
Data on..
39.95
bib.xml
prices.xml
bib
prices
book
book
Year2000
entry
entry
entry
Year1994
title
author
price
title
author
price
price
b-title
b-title
b-title
Data
Advanced..
Data
69.99
39.95
65.95
TCP/IP..
TCP/IP..
last
first
last
first
Abiteboul
Serge
Stevens
W.
7
Example Incremental Updates
what ?
how?
8
Example Alternatives for Fusion

result
yGroup
books
entry
title
price
Advanced...
69.99
  • We need to decide for each node
  • Where to add
  • What to merge
  • Which order to impose


9
Outline
  • Semantic Identifier Generation
  • What they are?
  • Fusion of XML Results through Semantic-Ids
  • How to use them?
  • Experimental Evaluation
  • Related Work
  • Conclusions

10
Overall Solution Id-based Fusion
  • Goal Need to decide how to merge processed
    fragments with XML result re location and order.
  • Idea Assign Semantic ids to nodes in XML
    results
  • Semantic ids must be reproducible
  • when processing two source XML nodes contributing
    to same node in XML result, same id is
    generated
  • even when sources nodes are not always equal in
    value or id and
  • when processing at different times.

11
Overall Solution Id-based Fusion
  • Options
  • Syntactic Approach
  • Algebraic Approach

12
Background XML Query Model
  • XQuery ? XAT algebra tree ZPR02
  • XAT Operators
  • XAT Relational Operators Select, Join
  • XAT XML Operators Navigate Unnest, Navigate
    Collection, Tagger, Combine
  • XAT Data Model (XAT Table)
  • Order sensitive table of tuples
  • Columns represent user-specified or internally
    generated variable bindings
  • Cell in tuple holds an XML node or a sequence of
    XML nodes

Navigate
?b, _at_year/text()col1
13
Background Base Node IDs
  • Fast Lexicographical Key DR03
  • Encodes
  • Node hierarchy
  • Node order

?bib.xml
bib.xml
b
b
bib
bib
b.f
b.b
b.l
book
book
book
Year2000
Year1994
b.f.b
Year1994
b.l.b
b.f.f
b.l.f
b.b.b
b.b.f
title
author
author
title
title
author
Data
b.l.f.f
Advanced
b.l.f.b
TCP/IP..
b.b.f.b
b.f.f.f
b.f.b.f
b.b.f.f
first
last
last
first
last
first
Stevens
W.
Serge
Abiteboul
Stevens
W.
prices.xml
e
prices
e.l
e.b
e.f
entry
entry
entry
e.f.b
e.b.b
e.l.b
e.f.f
e.l.f
e.b.f
price
price
price
b-title
b-title
b-title
Advanced..
Data
69.99
39.95
65.95
TCP/IP..
14
Background XAT Algebra Tree
21
ltresultgt FOR y in distinct-values(doc("bib.xm
l")/bib/book/_at_year) ORDER BY y RETURN
ltyGroup Y ygt ltbooksgt FOR b in
doc ("bib.xml")/bib/book, e
in doc (prices.xml")/prices/entry WHERE
y b/_at_year and b/title
e/b-title RETURN ltentrygt
b/title e/pricelt/entrygt lt/booksgt
lt/yGroupgt lt/resultgt
Expose col8
20
14
Tltresultgtcol7lt/resultgtcol8
Tltentrygtcol4lt/entrygtcol5
19
Combine col7
13
? col2, col3col4

18
TltyGroup Yygtcol6lt/yGroupgtcol7
?e, pricecol3
12
17
OrderByy
11
?b,titlecol2
16
Tltbooksgtcol5lt/booksgtcol6
10
Join b/title e/b-title
15
GroupByy(Combinecol5)
9
?S2,entrye
7
LOJy col1
6
Distinct(y)
?b, _at_year/text()col1
2
?S1,book/_at_year/text()y
? S1,bookb
5
8
1
4
S bib.xmlS1
S bib.xmlS1
Sprices.xmlS2
15
Semantic Identifier Generation
  • Phase One Compute Context Schema
  • What Rules define how to compute node lineage
    and order
  • When Computed at algebra tree generation time
  • Where Defined at schema level of algebra tree
    (XAT table columns)
  • Note No access to actual data needed
  • Phase Two Generate Semantic ids
  • What Use Context Schema to decide how to
    generate or manipulate semantic ids
  • When Performed at query execution time
  • Where Only some operators manipulate
    Semantic-ids of XML nodes (e.g., Tagger, XML
    union, etc.)

16
Phase 1 Context Schema Computation
  • Define one context schema for each XAT column
    coli
  • Composed of two lists order list and lineage
    list
  • e.g., coli.cnxtSchm (ordCols)lngCols
  • Order list can be
  • Null column has no order
  • Empty order is reflected by lineage list
  • Has 1 or more column names reflecting columns
    specifying order
  • Lineage list can be
  • Empty lineage of the columns depends only on
    itself
  • Has 1 or more column names reflecting columns
    specifying lineage
  • Rules for computing Context Schema specific to
    algebraic operator

17
Context Schema Example
18
Context Schema Example
A
14
Tltentrygtcol4lt/entrygtcol5
13
? col2, col3col4

?e, pricecol3
12
11
?b,titlecol2
10
Join b/title e/b-title
9
?S2,entrye
7
LOJy col1
6
Distinct(y)
?b, _at_year/text()col1
3
2
?S1,book/_at_year/text()y
? S1,bookb
5
8
1
4
S bib.xmlS1
S bib.xmlS1
Sprices.xmlS2
19
Context Schema Example (cont.)
21
Expose col8
20
Tltresultgtcol7lt/resultgtcol8
19
Combine col7
18
TltyGroup Yygtcol6lt/yGroupgtcol7
17
OrderByy
16
Tltbooksgtcol5lt/booksgtcol6
15
GroupByy(Combinecol5)
A
20
Phase 2 Semantic Identifier Generation
  • Based on Context Schema, generate ids for nodes
    in XML result.
  • Format ltorder prefixgt ltbodygt
  • Order prefix (optional)
  • What Reflects local order or no order ()
  • How A composition of source node ids and/or
    values, or a constant .
  • Body
  • What Reflects lineage (and possibly order)
  • How A composition of source node ids, values,
    and/or constant
  • Properties of semantic identifiers
  • Reproducible
  • Compact

21
Semantic Identifiers Example
A
14
Tltentrygtcol4lt/entrygtcol5
13
? col2, col3col4

?e, pricecol3
12
11
?b,titlecol2
10
Join b/title e/b-title
9
?S2,entrye
7
LOJy col1
6
Distinct(y)
?b, _at_year/text()col1
3
2
?S1,book/_at_year/text()y
? S1,bookb
5
8
1
4
S bib.xmlS1
S bib.xmlS1
Sprices.xmlS2
22
Semantic Identifiers Example (cont.)
21
Expose col8
20
Tltresultgtcol7lt/resultgtcol8
19
Combine col7
18
TltyGroup Yygtcol6lt/yGroupgtcol7
17
OrderByy
16
Tltbooksgtcol5lt/booksgtcol6
15
GroupByy(Combinecol5)
A
23
Semantic Ids Example (cont.)
c
result
1994 c
2000c
yGroup
yGroup
Y2000
Y1994
1994c
2000c
books
books
b.b..e.fc
b.f..e.bc
entry
entry
(a)b.b.b
(b)e.f.b
(a)b.f.b
(b)e.b.b
title
price
price
title
Data ..
TCP/IP
65.95
39.95
Query result annotated with Semantic ids
24
Fusing XML Results Through Semantic Ids
  • Deep Union Operator
  • Unions two XML trees by matching their root nodes
    using semantic ids,
  • and recursively performs deep union on their
    respective list of children nodes BDT99
  • Our XML Views become distributive over Deep
    Union
  • V(S ?S) V(S)
    V(?S)

25
Fusing XML Results Though Semantic Ids
  • For our running example
  • V(S1 ?S1, S1 ?S1, S2) V(S1, S1, S2)
    V(?S1,S1,S2) V(S1,?S1,S2)
    V(?S1,?S1,S2)
  • Note that V(S1,?S1,S2) V(?S1,?S1,S2)
    V(S1,?S1,S2) where S1 (S1 ?S1)
  • Real Xquery syntax needs to go here as
    reminder????

26
Example Fusing Incremental Results
V(?S1,S1,S2)
V(S1, ?S1,S2)
V(S1,S1,S2)
c
c
c
result
result
result
1994 c
1994 c
1994 c
2000c
yGroup
yGroup
yGroup
yGroup
Y1994
Y1994
Y2000
Y1994
1994c
1994c
1994c
2000c
books
books
books
books
b.b..e.fc
b.l..e.lc
b.b..e.fc
b.f..e.bc
entry
entry
entry
entry
(b)e.f.b
(a)b.b.b
(a)b.l.b
(b)e.l.b
(b)e.f.b
(a)b.b.b
(b)e.b.b
(a)b.f.b
price
price
title
title
title
price
price
title
Advanced...
Data ..
69.99
TCP/IP
65.95
TCP/IP
65.95
39.95
27
Experimental Evaluation
  • Solution Implemented within Rainbow Java XML
    Query Engine Zetal03
  • Data XMark Benchmark SEBC02
  • Queries Queries vary in their id generation

28
Experimental Results Query 1
ltresultgt ltcustomersgt for p in
doc(site.xml")/people/person where
p/id/text() .lt. 63750 return
ltcustomergt ltlocationgtp/address/city
/text()lt/locationgt p/name lt/customergt
lt/customersgt ltopen_bidsgt for oa in
doc(site.xml")/open_auctions/open_auction
where oa/id/text() .lt. 30000 return ltbidgt
oa/reserve oa/intial lt/bidgt lt/open_bidsgt lt/
resultgt
Query 1
29
Experimental Results Query 1
30
Experimental Results Query 2
Query 2
ltresultgt for p in doc(site.xml")/people/person w
here p/id/text() .lt. 63750 return
p/name lt/resultgt
31
Related Work
  • View Maintenance
  • Materialization of auxiliary data
    AMR98,AFP03,ZG98
  • (Auxiliary data requires maintenance, no order
    support)
  • Reproducible ids LD2000
  • (Complex id with nested structure, places
    limitation on maintainable views, no order
    support)
  • Skolem functions PAG96,BCF04
  • (Does not support order)
  • XML Stream Processing
  • Structure encoding IHW02
  • (Limited queries, no support for correlated
    nested queries)
  • Special ids FLBC02
  • (Predefined decomposition of the streamed
    document and id assignment)

32
Conclusions Here ???
  • ????
  • ?gtgtgtgtgt
  • Phase One Compute Context Schema
  • Phase Two Generate Semantic ids

33
Rainbow XQuery Engine website http//davis.wpi.e
du/dsrg/rainbow/index.htmlSoftware
downloadhttp//davis.wpi.edu/dsrg/rainbow/Rainbo
wCore/release.htmAlso Maged can be contacted
atmaged_at_cs.wpi.edu
34
References
  • ZPR02 X. Zhang, B. Pielech, and E. A.
    Rundensteiner. Honey, I Shrunk the XQuery! An
    XML Algebra Optimization Approach. In WIDM, pages
    1522, Nov. 2002.
  • DR03 K. Deschler and E. Rundensteiner. Mass A
    multi-axis storage structure for large xml
    documents. In CIKM, pages 520523, Nov 2003.
  • BDT99 P. Buneman, A. Deutsch, and W. C. Tan. A
    deterministic model for semi-structured data. In
    Workshop on Query Processing for Semistructured
    Data and Non-Standard Data Formats, Jan 1999.
  • AMR98 S. Abiteboul and et al. Incremental
    Maintenance for Materialized Views over
    Semistructured Data. In VLDB, pages 3849, 1998.
  • AFP03 M. A. Ali, A. Fernandes, and N. W.
    Paton. MOVIE An incremental maintenance system
    for materialized object views. DKE Journal,
    47(2)131166, 2003.
  • ZG98 Y. Zhuge and H. Garcia-Molina. Graph
    Structured Views and Their Incremental
    Maintenance. In ICDE, pages 116125, 1998.
  • LD2000 H. Liefke and S. B. Davidson. View
    maintenance for hierarchical semistructured data.
    In DWKD, pages 114125, 2000.
  • PAG96 Y. Papakonstantinou and et al. Object
    fusion in mediator systems. In VLDB, pages
    413424, 1996.
  • BCF04 P. Bohannon, B. Choi, and W. Fan.
    Incremental evaluation of schema-directed XML
    publishing. In SIGMOD, pages 503514, 2004.
  • IHW02 Z. G. Ives and et al. An xml query engine
    for network-bound data. The VLDB Journal, 11
    (4)402402, December 2002.
  • FLBC02 L. Fegaras and et al. Query processing
    of streamed xml data. In CIKM, pages 126 133,
    2002.
Write a Comment
User Comments (0)
About PowerShow.com