XRTree: Indexing Data for Efficient Structural Joins - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

XRTree: Indexing Data for Efficient Structural Joins

Description:

An internal node with m keys contain m 1 pointers pj pointing to the nodes in ... Keys should be chosen to minimize the size of the stab lists. ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 39

Provided by: jorge68

Category:

more less

Transcript and Presenter's Notes

Title: XRTree: Indexing Data for Efficient Structural Joins

1
XR-Tree Indexing Data for Efficient Structural
Joins

By Jiang, Lu, Wang, and Ooi

Presented by Jorge Mena
7 / April / 2005
2
Roadmap

Motivation
Introduction
Concepts and Definitions
XR-Tree Structure
Updating an XR-Tree
Structural Joins using XR-Trees
Performance Study and Evaluation
Conclusions

3
Motivation

Join operations are one of the most useful
operations in databases
Primary means to combine data
In XML, Structural Joins consume a large portion
of time to evaluate path expressions
Recent proposals improve in accessing the data at
most once, but still access some data
unnecessarily

4
The Problem

We want to evaluate structural relationships
between XML element sets efficiently.
For example paragraph // section returns all
sections that are contained in each paragraph
How do we do this?
? Scan the XML Document
? Index the tags and find occurrences of
relationships (Structural Joins)
Improve efficiency of Structural Joins

5
The Proposal

A dynamic external memory index structure
specially designed for XML Data
Build an index on the region codes of the element
nodes, which are of the from
(start, end)
Thus, XR-Tree XML Region Tree
Fully exploit the strictly nested property of
XML.
Join two element sets using XR-Trees to skip
elements that do not participate in the join.

6
Introduction

XML Data are commonly modeled by a tree structure

7
Introduction

To determine structural relationships, use a
numbering scheme

Note that for two elements u and v, either
They are nested u.startltv.startltv.endltu.end
They dont intersect

8
Introduction

Structural Join is to find all occurrences of
structural relationships between element sets

9
Concepts and Definitions

Definition 1
Given a key k and an element with region Ei (si,
ei), k stabs Ei or Ei is stabbed by k if si k
ei.
Given a set of ordered keys kj (0 j lt n), where
kx lt ky if x lt y, and an element Ei (si, ei), kj
primarily stabs Ei or Ei is primarily stabbed by
kj if
(1) kj lies in the interval (si, ei)
(2) kj is the smallest key that stabs Ei

10
Concepts and Definitions

For example
k0 stabs elements (s0, e0), (s1, e1), (s2, e2)
k0 primarily stabs interval (s0, e0)
k2 primarily stabs interval (s4, e4)

11
Concepts and Definitions

Definition 2
Given a set of ordered keys kj (0 j lt n), where
kx lt ky if x lt y, and a set of elements e
Ui(si, ei), the stab list of a key kj is the list
of elements in e that are stabbed by kj, denoted
SLj or SL(kj).
The primary stabbed list, PSLj or PSL(kj), of a
key kj is the list of elements that are primarily
stabbed by kj.

12
Concepts and Definitions

For example
SL1 (s0, e0), (s3, e3)
PSL1 (s3, e3)
Note PSL3 0
Note strict ancestor-descendant relationships

13
Concepts and Definitions

Definition 3
Given kj,
psj start position for the first element of
PSLj
pej end position for the first element of
PSLj
These values are nil if PSLj 0.

14
Concepts and Definitions

For example
For k0 with PSL0 (s0, e0), (s1, e1), (s2,
e2), (ps0, pe0) (s0, e0)
Note (ps3, pe3) (nil, nil)

15
XR-Tree Structure

An XR-Tree has the following properties
It is a balanced tree
An internal node contains m entries in the form
(ki,psi,pei), with k0 lt k1 lt lt km-1, and d m
2d where d is the degree of the tree.
An internal node with m keys contain m1 pointers
pj pointing to the nodes in the next level of the
tree.

16
XR-Tree Structure

More properties
Each internal node has a stab list SL(n) that
holds all elements Ei such that Ei is stabbed by
at least one key in n and not by any key of any
ancestor of n. Each element is in the form
(s,e,pointer).

17
XR-Tree Structure

More Properties
Leaf nodes contain element entries
(s,e,InStabList,pointer).
Leaf nodes are linked from left to right

18
XR-Tree Structure Recap

Essentially a B tree
Complex index key entries
A Stab list associated with each internal node.
Keys should be chosen to minimize the size of the
stab lists.
Use the value (as the internal key) that is
smaller than the keys at the right branch.

19
Stab Lists

Contain elements (regions, intervals)
Each element is found in at most one stab list
Total elements in all stab lists total elements
indexed
Maximum number of pages for a stab list
Smax hdBIfmax / BSfmin
hd max number of nesting
BI max number of entries per node
BS max number of tuples a stab list page can
hold
f fill ratios

20
Stab Lists with high nesting

Smax is directly proportional to the nesting of
the relationship.
Use a ps directory page that maps each psj to
the location of the PSLj

21
Updating Insertion

Insertion is similar to that in B Trees, only
maintain stub list
1. Find a leaf page to insert
2. Insert into leaf
3. Insert into internal node
4. Grow the tree taller

22
Updating Insertion with Overflow

In case of overflow, we need to split the stab
list as well
However, the cost is constant because we only
need to read the page where the split occurs
A new key k is proposed to the upper level with
a stub list (StabSet) associated with it.
InStabList flags are updated accordingly.

23
Updating Insertion Cost

Theorem 1
The amortized I/O cost for inserting is
O(logFNCDP)
N number of elements indexed
F Fanout of the XR-Tree
CDP the cost of one displacement of an element
(removing it form one stabbed list and inserting
it into another).

24
Updating Deletion

Similar as deletion in a B tree
1. Find element, delete as you go
2. Delete element from leaf page
3. Delete entry from internal node
4. Shorten the tree if necessary

25
Update Deletion Underflow

If deletion underflows, first we try to
redistribute the keys in the internal nodes
Then redistribute the elements in the stub lists
that are affected.
If need to merge, just link the SL lists from the
two merging lists.
The cost of deletion is (Theorem 3)
O(logFN3CDP)

26
Cost of manipulating the Stub List

Maximum number of pages for a stab list
Smax hdBIfmax / BSfmin
Due to high hesting
CDP CSI CSD
CSI Cost of inserting
CSD Cost of deleting
Thanks to the use of ps directory pages,
insertion and deletion cost are just a couple of
I/Os

27
Structural Joins with XR-Trees

In order to do efficient structural joins, the
authors defined two basic operations
Search for Descendants
Search for Ancestors
These basic operations are used with the proposed
Stack based structural join to perform the
desired join.

28
Searching for Descendants

Given an element Ea (sa,ea), find all the
descendants of E such that
sa lt Ei.start lt ea
No need to access stub lists
Cost
O(logFNR/B)

29
Searching for Ancestors

Not as trivial as searching for Descendants
We want elements Ei that are ancestors of Ed such
that Ei.start lt sd lt Ei.end
Elements stabbed by sd
But such elements are not stored in order, they
are scattered in leaf pages to the left of the
leaf page on the search path of sd.
Thats why we have stabbed lists!!!
Traversing from the root to the leaf pages,
search the stub lists and find stubbed elements
by sd. When at the leaves, output those elements
stabbed by sd but not included in the stubbed
lists of internal nodes.

30
Searching for Ancestors

To find stabbed elements by sd in internal nodes,
just scan the PSL of the particular node
Note that we will not waste time scanning the PSL
because of the condition pscltsdltpec

31
Searching for Ancestors

Again, to find ancestors, just scan the PSL lists
at each internal node and output the results
At the leaves, just output those elements that
are not in the intermediate nodes (because those
are output already).

Cost O(logFNR) by Theorem 4
32
The Structural Join Algorithm using XR-Tree
Indexed data

Assume that input lists A and D (for AList and
DList) are sorted by start position
Both sets are indexed by XR-Trees
Therefore leaf pages are sorted by start position
The algorithm proceeds just like Merge-Join but
it effectively skips elements that do not
participate in the join
Use any of the two algorithms to retrieve
Ancestors or Descendants

33
The Structural Join Algorithm using XR-Tree
Indexed data

Loop until one list is empty
Keep a stack with ancestors that are potentially
joinable with CurD

34
Performance Study and Evaluation

The players
No-index algorithm
B algorithm
XR-stack algorithm
The metrics
Number of elements scanned
Elapsed time

35
Varying Join Selectivity on Ancestors
Metric Elements scanned
36
Varying Join Selectivity on Descendants

Metric Elements Scanned

37
Varying Join Selectivity

Metric CPU Time

38
Conclusion

XR-Trees can support retrieval of all ancestors
or descendants of an element E in an element set
e indexed by an XR-Tree with an optimal worst
case I/O cost.
A stack-based algorithm, XR-stack, is proposed.
A major improvement in structural joins with no
index datasets or B-Tree indexed datasets.

Write a Comment

User Comments (0)