Index Bulk Loading Techniques - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Index Bulk Loading Techniques

Description:

... R-tree and its variants, quad tree and its variants and ... Bulk loading the PMR quad tree. By G. R. Hjaltason et al, VLDB Journal 2002. Works as follows: ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 24
Provided by: ghan7
Category:

less

Transcript and Presenter's Notes

Title: Index Bulk Loading Techniques


1
Index Bulk Loading Techniques
  • Thanaa M. Ghanem
  • ghanemtm_at_cs.purdue.edu

2
Talk outline
  • Introduction.
  • Categorization of bulk loading techniques
  • Sort-based techniques.
  • Partitioning-based techniques.
  • Buffer-based techniques.
  • Conclusion.

3
Introduction
  • Multidimensional data is used in many
    applications like multimedia databases and GIS.
  • Many tree-based multidimensional access methods
    were proposed like R-tree and its variants, quad
    tree and its variants and KD tree.
  • The main objective of the proposed access methods
    is to efficiently support queries.
  • Less attention was given to the performance of
    insert operation.

4
Introduction (Contd)
  • Efficient construction of a multidimensional
    access method from scratch (bulk loading) is
    needed in both static and dynamic databases.
  • The main objective of bulk loading in static
    databases is to enhance storage utilization.
  • In a query optimizer we may need to build an
    index on intermediate results to be supplied to a
    subsequent operation.

5
Bulk Loading
  • Bulk operations means that a series of operations
    is executed atomically without interruption by
    other actions.
  • A bulk loading algorithm has more knowledge than
    a single insert algorithm where all the data
    needed to build the index is known a priori.
  • Bulk loading techniques aim to reduce the number
    of disk I/Os needed to build the index.

6
Categorization of bulk loading techniques
  • Most of the previously proposed bulk loading
    techniques are specifically designed for the
    index structure in question.
  • GiST and SP-GiST appeared as extensible index
    engines to support a class of index methods which
    calls for the need of generic bulk loading
    techniques.
  • We can categorize bulk loading techniques as
    follows
  • Sort based
  • Presort the data before loading.
  • Partitioning based
  • Apply some partitioning criterion to the data set
    before loading.
  • Buffer based
  • Use buffers to group operations.

7
History of bulk loading techniques
8
Sort based bulk loading
  • The main idea is to presort the data according
    to some linear order before loading.
  • Examples
  • Bulk loading the B-tree.
  • Bulk loading the R-tree.
  • Bulk loading the PMR quad tree.

9
Optimal B-tree packing
  • By T. M. Klein et al, IS 1991.
  • Works as follows
  • A special ordering is imposed on the data items
    such that a record will be inserted after the
    node in which it is to reside has already been
    split.
  • To achieve optimal storage utilization and search
    performance.

10
Bulk loading the R-tree
  • The main steps are
  • Presort the data.
  • Pack the data in the sorted order to form full
    leaf nodes.
  • Build upper levels of the tree recursively.
  • Different algorithms differ in the way the data
    is presorted.
  • Examples
  • N. Roussopoulos et al, SIGMOD 1985
  • Linear sort on one of the dimensions.
  • I. Kamel et al, CIKM 1993
  • Sort according to Hilbert SFC of centers.
  • S. T. Leutenegger et al, ICDE 1997
  • Sort-tile-recursive algorithm.

11
Bulk loading the PMR quad tree
  • By G. R. Hjaltason et al, VLDB Journal 2002.
  • Works as follows
  • Sort the input data items in Z-order.
  • Insert data in the sorted order until the memory
    is full.
  • Use the next item to be inserted to select some
    of the items in the memory that will not be
    inserted into again to flush to disk and find
    some more space in memory to insert the remaining
    items.

12
Partitioning based bulk loading
  • The main steps are
  • Data is partitioned according to some criterion.
  • Build sub trees from the partitions recursively.
  • Merge the sub-trees to form the final index tree.
  • Examples
  • Bulk loading the M-tree.
  • Bulk loading the KD tree.
  • A greedy algorithm for R-tree bulk loading.
  • Bulk loading weight-balanced trees.

13
Bulk loading the M-tree
  • By Ciaccia et al, ADC 1998.
  • Works as follows
  • Randomly sample K objects from the data set and
    assign each of the remaining objects to its
    nearest sample point obtaining K sets.
  • Invoke the bulk loading algorithm recursively on
    each of these sets obtaining K sub-trees.
  • Invoke the bulk loading algorithm on the sample
    set obtaining a super tree with the K sample
    points as its leaves.
  • Assign each sub tree to the leaf corresponding to
    its sample point in the super tree.

14
Bulk loading the KD tree.
  • By J.L.Bentley, CACM 1975.
  • Works as follows
  • Find the element with the median x-coordinate
    from the input data and let it to be the root.
  • Partition the remaining elements into two
    partitions around this median.
  • Work recursively on each partition.
  • Alternate between using the median x-coordinate
    and median y-coordinate as the partitioning
    element.

15
A greedy algorithm for R-tree bulk loading
  • By Garcia et al, GIS 1998.
  • Builds the R-tree top down while using an
    aggressive approach that greedily constructs the
    sub trees of the R-tree.
  • Works as follows
  • Applies a cut orthogonal to an axis such that the
    cost of some user-specified objective function
    f(r1,r2) of the two resulting subsets is minimum
    where r1 and r2 are the MBRs of the two resulting
    subsets.
  • Applies the algorithm recursively on the two
    resulting subsets.
  • Stops when a subset has a pre-specified
    cardinality.

16
Bulk loading weight-balanced trees.
  • By Agarwal et al, ICALP 2001.
  • WP trees is a class of trees
  • that guarantee each child
  • of a node has no more
  • than a specified ratio of the
  • total number of elements in
  • the whole sub tree
  • rooted at this node (e.g. KD tree).
  • Works as follows
  • Apply a grid on the input data and count the
    number of elements in each grid cell.
  • Find the median without reading the whole input
    data.

17
Buffer based bulk loading
  • These algorithms use buffers at different nodes
    in the tree based on the idea of the buffer
    tree.
  • Operations are first grouped into buffers and
    then performed at once.
  • Examples
  • A generic algorithm for the class of Grow and
    post trees.
  • Bulk operations for R-trees.

18
The buffer tree
  • By L.Arge, BRICS report series 1996.
  • A technique optimal I/O algorithms.
  • Insertion
  • Instead of inserting one element at a
  • time, wait until you collect a group of
    insertions and insert them into the
  • buffer of the root.
  • When the buffer runs full, the elements in the
    buffer are pushed one level down to buffers on
    the next level.
  • O(logm n/B).

19
A generic algorithm for bulk loading GP trees
  • By J.V.Bercken et al, VLDB 2001.
  • Works for the class of Grow and
  • Post trees.
  • Works as follows
  • Insert items in the tree until the
  • memory is full.
  • Assign buffers to leaves.
  • Distribute the remaining items
  • into leaves buffers.
  • Flush the memory.
  • Work recursively on each leaf
  • as the root and its associated buffer as the
    input source.

20
Bulk operations on R-trees
  • By Arge et. al, Algorithmatica 2002.
  • Their algorithm can work for bulk insertions and
    bulk deletions as well as for bulk loading.
  • Like the buffer tree except that buffer are
    attached only to nodes on certain levels of the
    tree.
  • They proved mathematically that they achieved the
    minimum bound for bulk loading.

21
Conclusion
  • Multidimensional data is used in many
    applications.
  • Bulk loading is an important operation in both
    static and dynamic databases.
  • Bulk loading algorithms are either specific for
    some index structure or generic for a whole class
    on indexes.
  • Bulk loading algorithms can be categorized into
    three categories
  • Sort-based bulk loading.
  • Partitioning-based bulk loading.
  • Buffer-based bulk loading.

22
References
  • L.Arge. The buffer tree A new technique for
    optimal I/O algorithms. BRICS report series,
    1996.
  • P. K. Agarwal et. al. A framework for index bulk
    loading and dynamization. ICALP 2001.
  • L.Arge et. al. Efficient bulk operations on
    dynamic R-trees. Algorithmatica 2001.
  • J.L.Bentley. Multidimensional binary search
    trees used for associative searching. CACM 1975.
  • P.Ciaccia et. al. Bulk loading the M-tree. ADC
    1998.

23
References (Contd)
  • J.V.Bercken et. al. An Evaluation of generic
    bulk loading techniques. VLDB 2001.
  • Y.J.Garcia et al. A Greedy algorithm for bulk
    loading r-trees. GIS 1998.
  • G.R, Hjaltason. Improved bulk loading algorithms
    for quadtrees. GIS 1999.
  • I.Kamel et. al.On packing R-trees. CIKM 1993.
  • T.M.Klein et. al. Optimal B-tree packing. IS
    1991.
  • S.T.Leutenegger et al. STR A simple and
    efficient algorithm for R-tree packing. ICDE
    1997.
Write a Comment
User Comments (0)
About PowerShow.com