Title: Beyond%20the%20Rainbow:%20
1Beyond the Rainbow A Pot of Gold ala XML
Database Projects
2Motivation
- XML is new, and here to stay
- Universal flexible representation of data
- De facto standard for information exchange
- XQuery is useful, and here to stay
- Powerful query language for XML
- De facto standard for XML querying
- Plentitude of relevant new issues
3XML Paradigm
EVE-Middleware
Internet
WWW global scale distributed information
system for sharing data
XML1
XML5
RDBL6
XML2
XML3
RDB4
XMLn
4 What We Aim For
- efficient
- flexible
- scalable
- lightweight
- resource-sensitive
- adaptive
Internet
EVE-Middleware
RDB5
XML1
XML6
XML2
RDB3
XML4
XMLn
5 WPI Project Directions
- RAINBOW Exploiting RDB for XML management
- Algebraic-XQuery processing
- XCube Flexible XML Mapping Tool
- Flexible loading/extracting XML to RDB via
XQuery - Updating Virtual XML Views
- Update decomposition and trigger-propagation
- MASS Native XML Query Engine
- Multi-axis compressed order-preserving XML
storage -
6 WPI Project Directions
- XCache XML Query Caching
- Cache containment and query rewriting
- Materialized XML View Maintenance
- Incremental algebraic maintenance strategy
- SAXE XML Incremental Updating Evolution
- Lightweight updating by update query rewriting
- RAINDROP XQuery-based Stream Processing
- Adaptive on-fly multi-subscription optimization
7 8XML meets Relational DBs
Rainbow
9Running Example
ltresultgt FOR t IN document(prices.xml)/book
/title RETURN t lt/resultgt
ltpricesgt ltbookgt lttitlegtTCP/IP
Illustratedlt/titlegt ltpricegt65.95lt/pricegt lt/book
gt ltbookgt lttitlegtData on the Weblt/titlegt ltpric
egt34.95lt/pricegt lt/bookgt lt/pricesgt
10XML Default View
XML Default View
- Fixed and straight-forward mapping scheme.
ltDBgt ltBOOKSgt ltROWgt
ltCovergtPaperbacklt/Covergt ltTITLEgtTexas
Holdem'lt/TITLEgt ltAUTHORSgtDavid Sklansky,
Straight Flushlt/AUTHORSgt lt/ROWgt
ltROWgt ltCovergtPaperbacklt/Covergt
ltTITLEgtDraculalt/TITLEgt ltAUTHORSgtBram
Stokerlt/AUTHORSgt lt/ROWgt lt/BOOKSgt ltgt
lt/DBgt
11Generic Loading
- Knowledge of schema of XML document to be loaded
helps to reduce unnecessary parts.
FUNCTION Q1(root) LET maintag
gettag(root) RETURN ltmaintag root/_at_gt FOR
actual IN root/ LET innertag
gettag(actual) RETURN IF (actual/element())
THEN Q1(actual) ELSE ltinnertag
actual/_at_gt IF (actual/text()) THEN
ltPCDATA valueactual/text()/gt ELSE ""
lt/innertaggt lt/maintaggt
12Instantiation
- Generic loading XQuery expression recursive.
- It works for every XML document.
- Many recursive calls return no value.-
Unnecessary FOR-loops, IF-clauses, and
getName()-fct.
XML Schema
XML
Schema
XQuery Expression (flat)
XQuery Expression
Instantiator
XQuery Expression (recursive)
XQuery Expression
13Instantiation (Example)
(First Step of CLOCK mapping scheme)
Instantiated Loading Query
FUNCTION Q1(root) ltBOOKLISTgt FOR book IN
root/BOOK RETURN ltBOOK book/_at_covergt ltTITLE
gt ltPCDATA valuebook/TITLE/text()/gt lt/TITL
Egt ltAUTHORgt FOR name IN
book/AUTHOR/NAME RETURN ltNAMEgt ltPC
DATA valuebook/A/gt lt/NAMEgt lt/AUTHORgt
lt/BOOKgt lt/BOOKLISTgt
- Short, non-recursive, more efficient
- But XML schema dependent!
14Flexible Mapping Management
15XCube in a Nutshell
- Generic XQuery loading expressions
- XQuery load expression instantiation
- Easy-to-use (no new transformation language).
- Flexible (interchangeable XQuery expressions).
- Adaptable (to workload, data specifics, ).
- General (Schema independent).
- Extendable (with new mapping schemes).
- Tunable (Loading manager).
16 Architecture
Rainbow
XAT Merger
XAT Generator
XAT Decorrelator
User Query Results in XML
User XAT
User XAT
User XQuery
XAT Executor
XAT Optimizer
SQL Generator
XAT
XAT
XAT
Tuples
XAT
View XAT
View XAT
View XQuery
SQL
RDBMS
XAT XML Algebra Tree
17XQuery-Level Optimization
- XAT - XML Algebra Tree Model
- XAT Algebraic Query Plan Optimization
- XAT Query Plan Reduction
18User XML Algebra Tree (XAT)
User Query
?col3
1
ltresultgt FOR t IN document(prices.xml)/book
/title RETURN t lt/resultgt
2
Tltresultsgttlt/resultgtcol3
Agg
3
?R0, book/titlet
6
Sprices.xmlR0
7
19View XML Algebra Tree (XAT)
View Query
ltpricesgt FOR book IN document(dxv.xml)/book/ro
w prices IN document(dxv.xml)/prices/row WHE
RE book/bid prices/bid RETURN ltbookgt bo
ok/title, prices/price lt/bookgt lt/pricesgt
?col6col7
26
?book, bidcol6
27
?prices, bidcol7
28
?
31
Tltbookgt col10col12 lt/bookgtcol5
22
23
Tltpricesgtcol5lt/pricesgtcol4
?book, titlecol10
11
?R3, /prices/rowprices
?R1, /book/rowbook
20
14
Agg
?prices, pricecol12
12
25
15
Sdxv.xml R3
Sdxv.xml R1
21
20Merged XML Algebra Tree (XAT)
User Query
View Query
?col6col7
?col3
26
1
2
Tltpricesgtcol5lt/pricesgtcol4
?book, bidcol6
Tltresultsgttlt/resultgtcol3
11
27
Agg
3
Agg
?prices, bidcol7
12
28
?R0, book/titlet
Tltbookgt col10col12 lt/bookgtcol5
6
?
22
31
?col4 R0
23
7
?book, titlecol10
?R3, /prices/rowprices
?R1, /book/rowbook
20
14
?prices, pricecol12
25
15
Sdxv.xml R3
Sdxv.xml R1
21
21XQuery-Level Optimization
- XML Algebra Representation XAT
- XAT Query Plan Rewriting
- XAT Query Plan Reduction
22XAT Rewrite
- Query Optimization at Logic Algebra Level.
- Goals
- Redundancy Elimination.
- Computation Pushdown.
- Technique
- Equivalence Rewrite Rules.
- Heuristics
- Pushdown Navigates
- Remove Construction of Intermediate Result
- Combine Multiple Operators.
23Before Navigation Pushdown
User Query
View Query
?col6col7
?col3
26
1
?book, bidcol6
2
Tltresultsgttlt/resultgtcol3
27
Agg
3
?prices, bidcol7
28
Tltbookgt col10col12 lt/bookgtcol5
?R0, book/titlet
?
6
22
31
?col4 R0
23
7
?book, titlecol10
?R3, /prices/rowprices
?R1, /book/rowbook
20
Tltpricesgtcol5lt/pricesgtcol4
11
14
?prices, pricecol12
25
15
Agg
Sdxv.xml R3
Sdxv.xml R1
12
21
24After Navigation Pushdown
View Query
User Query
Tltbookgt col10col12 lt/bookgtcol5
22
?col3
1
?col6col7
26
2
Tltresultsgttlt/resultgtcol3
?
31
Agg
3
?prices, pricecol12
?book, titlecol10
23
25
?R0, book/titlet
6
?prices, bidcol7
?book, bidcol6
28
27
Tltpricesgtcol5lt/pricesgtR0
11
Agg
12
25Remove any Taggers?
View Query
User Query
Tltbookgt col10col12 lt/bookgtcol5
22
?col3
1
?col6col7
26
2
Tltresultsgttlt/resultgtcol3
?
31
Agg
3
?prices, pricecol12
?book, titlecol10
23
25
?R0, book/titlet
6
?prices, bidcol7
?book, bidcol6
28
27
Tltpricesgtcol5lt/pricesgtR0
11
Agg
12
26After Tagger Cancel Out
User Query
View Query
?col6col7
26
?
31
?book, titlet
23
27After Making Join
View Query
User Query
JOIN col6col7
31
?prices, pricecol12
25
?prices, bidcol7
28
?R3, /prices/rowprices
20
Sdxv.xml R3
21
28XQuery-Level Optimization
- XML Algebra Representation XAT
- XAT Query Plan Rewriting
- XAT Query Plan Reduction
29XAT Cleanup
- Why
- SQL engine cannot reduce redundancy in XQuery.
- How
- Data Redundancy by Schema Cleanup
- Each operator produced, consumed and modified
some columns. - Minimum schema is then computed.
- Tree Redundancy by Unused Operator Cutting
- Cutting matrix generation.
- Required columns analysis.
- Operator cutting.
30XAT Operator Properties
- Produced
- Desc New column generated by operator.
- Example ?, S, T
- Consumed
- Desc Columns required by operator.
- Example ?, ?
- Modified
- Desc Columns modified by operator.
- Example ?, ?, ?
31Schema Computation
Old Schema
Consumed
Produced
Parent
Node
col3
col3
1
1
?col3
col3, R1, book, col6, t, R3, prices, col7,
col12
t
col3
1
2
Tltresultsgttlt/resultgtcol3
2
R1, book, col6, t, R3, prices, col7, col12
2
3
Agg
3
R1, book, col6, t, R3, prices, col7, col12
col6, col7
3
31
? col6col7
31
R1, book, col6, t
book
t
31
23
?prices, pricecol12
25
?book, titlet
23
R1, book, col6
book
col6
23
27
R1, book
R1
book
27
14
28
?prices, bidcol7
?book, bidcol6
27
R1
R1
14
15
20
R3, prices, col7, col12
prices
col12
31
25
?R3, /prices/rowprices
?R1, /book/rowbook
14
R3, prices, col7
prices
col7
25
28
Sdxv.xml R3
21
R3, prices
R3
prices
28
20
Sdxv.xml R1
15
R3
R3
20
21
32Schema Computation
Node Parent Produced Consumed Minimum Schema
1 col3 col3
2 1 col3 t col3
3 2 t
31 3 col6, col7 t
23 31 t book col6, t
27 23 col6 book book, col6
14 27 book R1 book
15 14 R1 R1
25 31 col12 prices col7, col12
28 25 col7 prices prices, col7
20 28 prices R3 prices
21 20 R3 R3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Agg
3
? col6col7
31
?prices, pricecol12
25
?book, titlet
23
?prices, bidcol7
28
?book, bidcol6
27
?R3, /prices/rowprices
20
?R1, /book/rowbook
14
Sdxv.xml R3
21
Sdxv.xml R1
15
33Schema Computation
Intuition Dont keep anything thats not used
later.
New Schema
R3
prices
col12
R1
book
col7
col6
t
col3
Parent()
1
col3
C
1
?col3
col3
C
P
1
2
Tltresultsgttlt/resultgtcol3
2
t
2
3
t
C
C
3
31
Agg
3
col6, t
C
P
31
23
book, col6
C
P
23
27
? col6col7
31
book
C
P
27
14
?prices, pricecol12
25
?book, titlet
23
R1
P
14
15
col7, col12
C
P
31
25
28
?prices, bidcol7
?book, bidcol6
27
prices, col7
C
P
25
28
20
?R3, /prices/rowprices
?R1, /book/rowbook
prices
C
P
28
20
14
R3
P
20
21
Sdxv.xml R3
21
Sdxv.xml R1
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
15
34Schema Cleanup Result
Node Original Schema Minimum Schema
1 col3, R1, book, col6, t, R3, prices, col7, col12 col3
2 col3, R1, book, col6, t, R3, prices, col7, col12 col3
3 R1, book, col6, t, R3, prices, col7, col12 t
31 R1, book, col6, t, R3, prices, col7, col12 t
23 R1, book, col6, t col6, t
27 R1, book, col6 book, col6
14 R1, book book
15 R1 R1
25 R3, prices, col7, col12 col7, col12
28 R3, prices, col7 prices, col7
20 R3, prices prices
21 R3 R3
35XAT Cleanup
- Schema Cleanup
- Each operator produced, consumed and modified
some columns. - Minimum schema is then computed.
- Unused Operator Cutting
- Cutting matrix generation.
- Required columns analysis.
- Operator cutting.
36Cutting Matrix
- Purpose
- Get rid of unused operators.
- Equations
- Propagation of modified
- Propagation of required
- Identify cuttable node.
37Matrix Computation
1
?col3
Parent() col3 t col6 col7 book R1 col12 prices R3 Cut?
1 C
2 1 P C
3 2 - - - - - - - - -
31 3 C C
23 31 P C
27 23 P C
14 27 P C
15 14 P
25 31 P C
28 25 P C
20 28 P C
21 20 P
Tltresultsgttlt/resultgtcol3
2
Agg
3
JOIN col6col7
31
?prices, pricecol12
25
?book, titlet
23
28
?prices, bidcol7
?book, bidcol6
27
20
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
38Matrix Computation (Cont.1)
Intuition Give me only the required columns in
order to get the final result.
1
?col3
Cut?
R3
prices
col12
R1
book
col7
col6
t
col3
Parent()
Tltresultsgttlt/resultgtcol3
2
R
R
R
R
1
C
P
1
2
-
-
-
-
-
-
-
M
-
2
3
Agg
3
C
C
3
31
C
P
31
23
JOIN col6col7
31
C
P
23
27
C
P
27
14
?prices, pricecol12
25
?book, titlet
23
P
14
15
C
P
31
25
28
?prices, bidcol7
C
P
25
28
?book, bidcol6
27
C
P
28
20
20
P
20
21
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
39Matrix Computation (Cont. 2)
1
?col3
Parent() col3 t col6 col7 book R1 col12 prices R3 Cut?
1 R R R R
2 1 P C
3 2 - M - - - - - - -
31 3 C C X
23 31 P C
27 23 P C X
14 27 P C
15 14 P
25 31 P C X
28 25 P C X
20 28 P C X
21 20 P X
Tltresultsgttlt/resultgtcol3
2
Agg
3
JOIN col6col7
31
?prices, pricecol12
25
?book, titlet
23
28
?prices, bidcol7
?book, bidcol6
27
20
?R3, /prices/rowprices
?R1, /book/rowbook
We assume Join didnt modify t. Otherwise, only
node 25 will be deleted.
14
Sdxv.xml R3
21
Sdxv.xml R1
15
40XAT after Cutting
1
?col3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Tltresultsgttlt/resultgtcol3
Agg
3
2
Agg
JOIN col6col7
31
3
Reduced To
?prices, pricecol12
25
?book, titlet
?book, titlet
23
23
?R1, /book/rowbook
28
?prices, bidcol7
14
?book, bidcol6
27
20
Sdxv.xml R1
15
?R3, /prices/rowprices
?R1, /book/rowbook
14
Sdxv.xml R3
21
Sdxv.xml R1
15
41SQL Generated
SELECT book.title as t, book.bid as
col6, prices.price as col12, prices.bi
d as col7 FROM book book, prices
prices WHERE col6col7
1
?col3
?col3
1
Tltresultsgttlt/resultgtcol3
2
Tltresultsgttlt/resultgtcol3
2
Agg
3
Agg
3
31
JOIN col6col7
?book, titlet
23
?prices, pricecol12
25
23
?book, titlet
?R1, /book/rowbook
14
28
?prices, bidcol7
27
?book, bidcol6
Sdxv.xml R1
15
20
?R3, /prices/rowprices
?R1, /book/rowbook
14
SELECT book.title as t, FROM book book,
Sdxv.xml R3
21
15
Sdxv.xml R1
42XQuery-Level Optimization
- XML Algebra Representation XAT
- XAT Query Plan Rewriting
- XAT Query Plan Reduction
43Performance Gain in Execution
44Rainbow Engine Overhead
XAT Optimizer
XAT Merger
XAT Generator
XAT Decorrelator
User XAT
User XAT
XAT Rewrite
User XQuery
XAT
XAT
XAT
XAT Executor
SQL Generator
XAT
View XAT
View XAT
XAT Cleanup
View XQuery
Total 32,522 ms
Ack. XQuery using Kweelt Parser
45Rainbow
- http//davis.wpi.edu/dsrg/rainbow
- https//sourceforge.net/projects/rainbow-engine/
46Related Work
- XPERANTOVLDBJ2000 XQGM vs. XAT
- Xquery Views over RDB, Extension by UDFs for XML
features - SilkRouteIEEE2001(242)
- Xquery Views over RDB, Generate SQL Efficiently
- AGORAVLDB2000
- Syntax level rewriting.
47Summary
- Efficient XQuery Processing
- XML Algebra Tree (XAT)
- XAT Optimization
- Rewrite by using equivalent rules
- Cleanup
- Schema cleanup
- Operator cutting
- Prototype system implementation.