Title: Processing Recursive Xquery over XML Streams: The Raindrop Approach
1Processing Recursive Xquery over XML Streams The
Raindrop Approach
Mingzhu Wei Ming Li Elke A. Rundensteiner Murali
Mani Worcester Polytechnic Institute XSDM
Workshop, 2006 Supported by USA National Science
Foundation
2Whats Special for XML Streams
Token-by-Token access manner
ltpersongt
Q1 for a in stream(persons)//person return
a, a//name
ltnamegt
Jack, Brooks
lt/namegt
lt/persongt
timeline
Pattern Retrieval on Token Streams
3Running Example
Q1 for a in stream(persons)//person return
a, a//name
D1 1 ltpersongt 2 ltnamegt 3 Jack, Brooks
4 lt/namegt 5 ltchildrengt 6 lt/childrengt 7
lt/persongt 8 ltpersongt 9 ltnamegt
10 Amy 11 lt/namegt 12 lt/persongt
D2 1 ltpersongt 2 ltnamegt Jack,
Brooks 4 lt/namegt 5 ltchildrengt 6
ltpersongt 7 ltnamegt Will,
Brooks 9 lt/namegt 10 lt/persongt 11
lt/childrengt 12 lt/persongt
D1 not recursive
D2 recursive
4Retrieving Patterns Using Automata
How to process / pattern retrieval in automata?
Q2 for a in stream(persons) /person return
a, a/name
person
name
s0
s1
s2
How to process // pattern retrieval in automata?
Q1 for a in stream(persons)//person return
a, a//name
?
?
person
name
s0
s1
s2
s3
s4
Automata of Q1 and its stack
5Raindrop Algebra Plan
StructuralJoin a
op5
ltpersongt ltnamegt lt/namegt lt/persongt
ltnamegt lt/namegt
ExtractNest b
op3
ExtractUnnest a
op4
Navigate a//name-gtb
op2
Navigate //person-gta
Note that structural join (in-time structural
join) only perform Cartesian products! The person
element will be purged after generating output!
op1
Stream data
6Problems with Recursion
D2 1 ltpersongt 2 ltnamegt Jack,
Brooks 4 lt/namegt 5 ltchildrengt 6
ltpersongt 7 ltnamegt Will,
Brooks 9 lt/namegt 10 lt/persongt 11
lt/childrengt 12 lt/persongt
StructuralJoin a
op5
ltnamegt lt/namegt
op3
ExtractNest b
op3
ltpersongt ltnamegt lt/namegt ltpersongt ltnamegt
lt/namegt lt/persongt
op4
ltnamegt lt/namegt
ExtractUnnest a
Navigate a//name-gtb
op2
Navigate //person-gta
op1
After the second person and name and joined, we
cant get the correct result for the first person.
Stream data
7Goals
- How to correctly process recursive data and
recursive queries? - How to guarantee that data is output as early as
possible? - When data is non-recursive, how to make the cost
of the plan as cheap as possible?
8Recursive-Mode Operators
- Each operator has recursive mode operator
- Associate IDs with elements
- Each element is associated with a triple
(startID, endID, level) - Given two elements and the corresponding triples,
we can determine ancestor-descendent and
parent-child relationships.
1 ltpersongt 2 ltnamegt Jack 4
lt/namegt 5 ltchildrengt 6 ltpersongt 7
ltnamegt Amy 9 lt/namegt 10
lt/persongt 11 lt/childrengt 12 lt/persongt
1, 12, 1
2, 4, 2
9Features of Recursive Navigate Operators
- Keep track of the triple for each element.
- Call structural join only when all triples in
Navigate operator are complete.
1 ltpersongt 2 ltnamegt Jack 4
lt/namegt 5 ltchildrengt 6 ltpersongt 7
ltnamegt Amy 9 lt/namegt 10
lt/persongt 11 lt/childrengt 12 lt/persongt
6, 10, 3
7, 9, 4
12
1, , 1
1, -,1
2, -,2
2, 4, 2
Navigate //person-gta
Navigate a//name-gtb
Navigate //person-gta
Navigate a//name-gtb
Token1
Token2
Token 9
Token12
10Features of Recursive Extract Operators
- ExtractUnnest
- Compose the tokens into tuples
- Associate ID information for each corresponding
element - ExtractNest
- Collect the tokens and creates one tuple for the
whole collection. - Move the groupby functionality to the top
structural join
11Changes of Structural Join
a, b1
a, b2
- In-time structural join
- Do Cartesian product
- ID based Structural Join
- Change from In-time structural join to
ID-based-comparison method - ID-based-comparison condition
- (a.startID lt b.startID b.endID lt a.endID
b.level a.level 1) - (a.startID lt b.startID b.endID lt a.endID)
Structural Join a
b1 b2
a
a1, b1
Valid for parent child relationship
a1, b2
Structural Join a
a2, b2
2, 4, 2
a1
b1
1, 12, 1
ExtractUnnest a
ExtractUnnest b
7, 9, 4
a2
b2
6, 10, 3
12Structural Join Invoking Issue
- Invoking strategy structural join will be
invoked only when all the triples are complete.
a1, b1
a1, b2
a2, b2
clean
Structural Join a
2, 4, 2
a1
b1
1, 12, 1
ExtractUnnesta
ExtractUnnestb
7, 9,4
6, 10, 3
a2
b2
13Another Query With ExtractNest Operators
StructuralJoin x
Q3 for x in //a return x//b, x//c
ExtractNesty
ExtractNest z
a
(1,14 )
Navigate x //c-gtz
Navigate x//b-gt y
a
b
c
(2, 9)
(10,11)
(12,13 )
Navigate //a -gt x
b
b
c
(3,4)
(5,6)
(7,8)
Stream data
- ExtractNest ExtractUnnest GroupBy
14Process ExtractNest GroupBy
Structural Join x
b1, b2, b3
Push GroupBy Up
c1, c2
b2, b3
c1
ExtractNesty
ExtractNestz
b1
3, 4
3
c, 7, 8
3
c1
5, 6
3
b2
c,12 ,13
2
c2
10, 11
2
b3
Navigatex//c-gt z
Navigatex//b-gt y
1
a, 1 ,14
a1
Navigate //a -gt x
Q2 for x in //a return x//b, x//c
2
a,2 , 9
a2
Stream data
It is better to do groupby in structural join
here!
15Further Optimization
- Using context-aware structural join
Automata
Navigate
Run-time switching from id-based structural join
to the efficient in-time- structural join
strategy.
Data is recursive
Data is not recursive
Context Check
Recursive Structural Join
In-time Structural Join
Output tuples
Purge tuples
16Plan Optimization with Multiple Structural Joins
for a in stream (s)//a return for b
in a//b return for c in b//c
return c//d, c//e, c//f , b//f ,
a//g
Goal Try to generate as many non-recursive
operators as possible. Traverse the query plan in
a top-down manner. When a structural join that
corresponds to a path expression with // is
encountered, we instantiate this structural join
and its descendents as recursive mode operators.
17Experiments
- Advantages of early invocation of structural join
- Context-aware structural join VS recursive
structural join
18Recursion-free Mode VS Recursive Mode
19Related work
- Stack-Tree-AncAJK02
- Use stack to store the chain of ancestor
candidates - Can be combined to our system
- Transducer-based XML query processorLPY02
- FSA without stack are not sufficient for handling
recursion. - YFilter NFA-based path navigation DF03
- Do not guarantee that the structural join is
processed at first possible moment
20Conclusions
- Propose a new class of stream operators for
recursive XQuery stream processing - Propose a context-aware structural join
- Use cheaper algebra operators whenever possible
in plan generation - Illustrate performance benefits with little
overhead in experiments
21- http//davis.wpi.edu/dsrg/raindrop/
samanwei_at_cs.wpi.edu
22Thank you!
Questions?