Title: Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
1RaindropAn Algebra-Automata Combined XQuery
Engine over XML Streams
- Hong Su, Elke Rundensteiner, Murali Mani, Ming Li
- Worcester Polytechnic Institute
- Worcester, MA
- VLDB 2004
2Stream Processing
data sources
Networks
data requesters
3Whats Special for XML Stream Processing
Token-by-Token access manner
ltauctionsgt
ltauctiongt
ltsellergt
ltprimarygt
ltphonegt
timeline
Pattern Retrieval on Token Streams
4Two Computation Paradigms
- Automata-based yfilter, xscan, xsm, xsq, xpush
- Algebraic niagara00,
-
FOR a in stream(bids)//auction, b in
a/sellerhomepage, c in
a/biddersameAddr WHERE b//phone
508 Return ltauctiongt b, c lt/auctiongt
Tagger
homepage
4
3
seller
auction
phone
Navigate a, /bidder-gt c
1
2
5
6
bid
Navigate a, /seller-gtb
8
9
7
bidder
sameAddr
Navigate stream(bids),//auction-gta
Automata
Algebra
5Comparison of Two Paradigms
Automata Paradigm Algebra Paradigm
Good for pattern retrieval on tokens Does not support token inputs
Need patches for filtering and restructuring Good for filtering and restructuring
Present all details on same low level Support multiple descriptive levels (e.g., logical plan, physical plan)
Little studied as query processing paradigm Well studied as query process paradigm
Either paradigm has deficiencies Both paradigms
complement each other
6Four-Level Algebraic Framework
- This Raindrop framework intends to integrate
both paradigms into one
Express the semantics of query regardless of
input sources
High (Declarative)
Semantics-Focused Plan
Accommodate tokenized streams/ automata
computation
Stream Logic Plan
Describe implementation details of operators
Decide how an operator is invoked (scheduling)
Low (Procedural)
Abstraction Level
7Level I Semantics-Focused Plan
- Express query semantics regardless of stored or
stream input sources Rainbow-ZPR02 - Reuse existing general optimization techniques
- Decorrelation
- Cancel duplicate navigation operators
8Example Semantics-Focused Plan
Stream Data ltauctionsgt ltauctiongt
ltsellergt ltprimarygtltphonegt508
lt/phonegtlt/primarygt
ltsecondarygtltphonegt613lt/phonegtlt/secondarygt
lt/sellergt ltbidgtltbiddergtlt/biddergtlt
biddergtlt/biddergtlt/bidgt
lt/auctiongt
Query
FOR a in stream(bids)//auction, b in
a/sellerhomepage, c in
a/biddersameAddr WHERE b//phone
508 Return ltauctiongt b, c lt/auctiongt
Plan and Input/output Data
source ltauctionsgt lt/auctionsgt
a ltauctiongt lt/auctiongt
b ltsellergt lt/sellergt
c ltbiddergt lt/biddergt
NavUnnest a, /bid/bidder -gtc
ltauctionsgt lt/auctionsgt
ltauctiongt. .. lt/auctiongt
NavUnnest a, /seller -gtb
NavUnnest stream(bids),//auction-gta
9Level II Stream Logical Plan
- Extend semantics-focused plan to accommodate
tokenized stream inputs - New input data format
- Tokens
- New operators
- StreamSource, TokenNavigate, ExtractUnnest,
ExtractNest, StructuralJoin - New rewrite rules
- Push-into/Pull-out-of Automata
10One Uniform Algebraic View
Algebraic Stream Logical Plan
Tuple-based plan
Query answer
Tuple stream
Token-based plan (automata plan)
XML data stream
11Modeling Automata in Algebraic PlanBlack
BoxXScan01 vs. White Box
FOR a in stream(bids)//auction, b in
a/sellerhomepage, c in
a/bid/biddersameAddr WHERE b//phone
508 Return ltauctiongt b, c lt/auctiongt
StructuralJoin a
a stream(bids)//auction b a/seller c
a/bid/bidder
ExtractUnnest a, b
ExtractUnnest a, c
XScan
TokenNavigate a, /bid/bidder-gtc
TokenNavigate a, /seller-gtb
TokenNavigate stream(bids), //auction-gta
White Box
Black Box
12Data Model in Algebraic Plan Modeling Automata
ltsellergtlt/sellergt
ltbiddergt...lt/biddergt
StructuralJoin a
ltsellergtlt/sellergt
ltbiddergt...lt/biddergt
ExtractUnnest a, b
ExtractUnnest a, c
ltsellergt
ltprimarygt
ltbiddergt
ltphonegt
ltbidderidgt
508
0314
lt/phonegt
TokenNavigate a, /bid/bidder-gtc
TokenNavigate a, /seller-gtb
lt/primarygt
...
ltauctiongt
ltsellergt
TokenNavigate stream(bids), //auction-gta
ltauctionsgt
ltprimarygt
ltauctiongt
ltphonegt
....
StreamSource
13- For Details of Levels III and IV, please refer to
- Automaton Meets Query Algebra Towards a Unified
Model for XQuery Evaluation over XML Data
Streams, ER 2003 - Raindrop A Uniform and Layered Algebraic
Framework for XQueries on XML Streams, CIKM 2003 - Raindrop A Uniform and Layered Algebraic
Framework for XQueries on XML Streams, Journal
Submission 2004
14Optimization I Computation Into or Out of
Automata?
Out of Automata
Into Automata
NavigateUnest a, /bid/bidder -gtc
NavigateUnnest a, /seller -gtb
NavigateUnnest a, /bid/bidder-gtc
NavUnnest stream(bids), //auction-gta
Automata Plan
StructuralJoin a
NavigateUnnest a, /seller-gtb
ExtractUnnest a, b
ExtractUnnest a, c
Automata Plan
TokenNavigate a, /seller-gtb
TokenNavigate a, /bid/bidder-gtc
ExtracUnnest stream(bids), a
TokenNavigate stream(bids),
//auction-gta
TokenNavigate stream(bids),
//auction-gta
15Experimentation Results
16Optimization II Semantic Query Optimization
- General schema-based optimizations
- Eliminate predicate/join,
- Focus on operators manipulating flat values
- XML specific schema-based optimizations
- Focus on pattern retrieval
- Fall into two categories
- General XML SQO
- Minimize query tree YCL-ATT 01
- Stream XML SQO (our focus)
17Stream-Specific XML SQO
- Observations
- Pattern retrieval over tokens solely relies on
document-order traversal - Schema constraints help expedite document-order
traversal - State-of-the-Art
- XPush03 covers limited query (boolean XPath
match) and one type of constraints - Our goals
- Support more powerful query (XQuery)
- Support more types of constraints (XSchema)
18Step I Construct Query Graph
FOR a in stream(bids)//auction, b in
a/sellerhomepage, c in
a/bid/biddersameAddr WHERE b//phone
508 Return ltauctiongt b, c lt/auctiongt
(a) Example Query
(b) Query Tree
19Example XML Schema
20Step II Apply Optimization Rules
- Offer optimization rules utilizing
- occurrence constraints
- exclusive constraints
- order constraints
- Apply rules in an order ensuring
- no beneficial rule missed
- no redundant rule introduced
21Step III Translate Rewritten Query Graph Back to
Plan (I)
when lt/phonegt is encountered twice, check
//phone if fails the predicate, suspend states
s2 and s3
Utilize Occurrence Constraints
22Step III Translate Rewritten Query Graph Back to
Plan (II)
when ltbillTogt or ltshipTogt is encountered once
suspend states s2 and s9
Utilize Exclusive Constraints
23Step III Translate Rewritten Query Graph Back to
Plan (III)
when ltprimarygt is encountered once, check
/homepage if no presence, suspend states s10, s3
and s2
Utilize Order Constraints
24- http//davis.wpi.edu/dsrg/raindrop/
suhong_at_cs.wpi.edu
Thank WPI DSRG Rainbow Team for XAT
Algebra Support
25(No Transcript)
26(No Transcript)
27(No Transcript)
28- Thank WPI DSRG Rainbow Team for XAT
Algebra Support