Title: Incremental Maintenance of XML Structural Indexes
1Incremental Maintenance of XML Structural Indexes
- Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1
- 1Department of Computer Science, Duke University
- 2IBM T. J. Watson Research Center
2Motivation
- XML is gaining tremendously in popularity in
recent years - Used to represent many kinds of data
- Major DB vendors are rushing to incorporate
solutions for native XML repositories and
retrieval - IBM DB2, Oracle , Microsoft SQL Server
- Tamino, Natix, X-Hive,
3Overview
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
4Label Path Expressions
paper
/paper/section/algorithm
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
5Structural Indexes
- Why do we need them?
- Speedup the evaluation of path expressions
- Provides a structural summary of the data graph
- Structural indexes
- DataGuide Goldman Widom 97
- 1-index Milo Suciu 99
- A(k)-index Kaushik et al. 02, D(k)-index Qun
et al. 03,M(k)-index He Yang 04 - Integration of structural indexes and inverted
listsKaushik et al. 04 - Focus on maintenance
- Has a major effect on index efficiency
- Remains an overlooked issue
6Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
71-Index Definition
- Constructed by using bisimilarity
- Definition based on stability
- Partition data nodes into index nodes
- dnode (v) and inode (Iv)
- Iu is vs index parent if u is vs parent
- An inode is stable if all of its dnodes have the
same index parents - In a 1-index, all inodes are stable
Iu
u
Iv
v
81-Index Example
paper
paper
1
1
13
section
section
14
title
section
2
2,4,8,13
section
8
4
section
15
3
exp
exp
title
algorithm
exp
algorithm
16
title
15,16
10
3,5,9,14
6,10
6
9
algorithm
title
5
title
18
proof
about
11
17,18
proof
17
7
proof
7
about
11
about
uses
proof
12
12
/paper/section/algorithm
uses
data graph
1-index
91-Index Quality
paper
- Assigning dnodes that are bisimilar into
different inodes - does not affect correctness,
- but does affect efficiency
- The quality of an index
1
section
2,4,8,13
2,4
8,13
exp
title
algorithm
15,16
3,5,9,14
6,10
proof
11
17,18
inodes
7
- 1 X 100
about
proof
inodes in the minimum 1-index
12
uses
Ideal quality 0
10Previous Results
- Construction
- The PT algorithm Paige Tarjan 87, in time O(m
log n) - m edges, n - nodes
- Edge changes
- The propagate algorithm Kaushik et al. 02
- Quality of the 1-index after update
- No guarantee on the quality of the resulted index
- 3 5 after 500 edge insertions in experiments
- Subgraph addition
- Index-reconstruction
11Edge Insertion An Example (1)
R
R
R
A
B
A
B
A
B
C1
C2
C3
C1, C2
C3
C3
C1
C2
D1
D2
D3
D3
D1, D2
D3
D1, D2
Data Graph
1-Index
Split 1
12Edge Insertion An Example (2)
R
R
R
A
B
A
B
A
B
C3
C1
C2
C2, C3
C1
C2, C3
C1
D3
D1
D2
D3
D1
D2
D2, D3
D1
Split 2
Merge 1
Merge 2
Indeed the minimum 1-index for the data graph
after update Not a coincidence!
13Minimum Minimal Indexes
- Minimum with the smallest number of inodes
- Minimal no two inodes can be merged
R
R
R
A1
A2
A1
A2
A1,A2
B2
B1
B2
B1
B1,B2
Data graph Minimum 1-index
Minimal 1-index
14Quality Guarantee
- Theorem The split/merge algorithm always
maintains a minimal 1-index - Lemma For acyclic data graphs, there is a unique
minimal 1-index - The minimum 1-index is always maintained
- For cyclic data graphs, there could be more than
one minimal 1-index - One of them is maintained
15Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
16A(k)-Index Definition
- k-bisimilarity
- Definition based on stability
- A(0)-index partition by label
-
- A(k)-Index
- An inode in A(k)-index is stable if all of its
dnodes have the same index parents in
A(k-1)-index - Only interested in paths of length k
- Shown to be much smaller and more efficient than
1-index Kaushik et al. 02 - But, no efficient maintenance algorithms are
known!
17A(k)-index Example
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C2,C3
C1
C2,C3
C1
C1,C2,C3 C4,C5,C6
C6
C4
C5
C4
C5,C6
C4,C5,C6
Data graph A(2) (1-index)
A(1) A(0)
Maintenance of A(i)-index requires the
information in A(i-1)-index
18A(k)-index Refinement Tree
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C2,C3
C1
C2,C3
C1
C1,C2,C3 C4,C5,C6
C6
C4
C5
C4
C5,C6
C4,C5,C6
Data graph A(2) (1-index)
A(1) A(0)
19A(k)-index Refinement Tree
R
R
R
R
A
B
A
B
A
B
A
B
C3
C1
C2
C
C
C
C
C
C6
C4
C5
C
C
C
Data graph A(2)
A(1) A(0)
- Reduce storage cost
- Reduce maintenance cost
0.5 13 additional storage
20Quality Guarantee
- Theorem The split/merge algorithm always
maintains A(k)-index - Lemma There is a unique minimal A(k)-index for
any data graph, acyclic or cyclic
a minimal
the minimum
21Outline
paper
1
13
section
section
2
title
14
title
3
8
section
4
section
experiments
exp
intro
algorithm
15
16
exp
5
title
6
9
title
10
algorithm
7
proof
17
A(k)-index
11
18
1-index
about
proof
about
12
uses
22Experiments on Edge Changes
- Datasets
- Real-life IMDB (272,000 nodes)
- Benchmark XMark (198,000 nodes)
- Setup
- First delete a portion of existing ID-REF links
- Then do random mixed insertions/deletions
- Compare with
- 1-index propagate ( reconstruction)
- A(k)-index recompute affected portion (
reconstruction)
23Experiment Results 1-index
24Experiment Results A(k)-index
running times
25Conclusions
- The first solutions for the maintenance (edge
subgraph additions/deletions) of 1-index and
A(k)-index that are both effective and efficient - Effective quality guarantee on the resulted
index - Efficient the algorithms themselves are fast
- Thank you!
26Graphical Illustration
size
valid 1-index
merge
split
index
the index can only grow in size due to splitting,
if merging is not enforced