Allnearestneighbors Queries in Spatial Databases - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Allnearestneighbors Queries in Spatial Databases

Description:

If there is an object ai in A whose NNdist(ai, B) is large, all the entry pairs with distance smaller than NNdist(ai, B) are required to be visited ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 18
Provided by: Bob7151
Category:

less

Transcript and Presenter's Notes

Title: Allnearestneighbors Queries in Spatial Databases


1
All-nearest-neighbors Queries in Spatial
Databases
SSDBM 2004 June 23, 2004
2
All-nearest-neighbors Queries
  • All-nearest-neighbors queries Given two
    multidimensional dataset A and B, for each object
    in A, find its nearest neighbor in B
  • All-nearest neighbors query application
    Geographic Information Systems (GIS), Data
    Mining, etc.

3
R-trees and NN Search
  • R-trees Gut84 multi-dimensional index
    structure
  • Nearest neighbor (NN) queries
  • Depth-first NN query processing RKV95
  • Best-first NN query processing HS99
    incremental, optimal I/O cost
  • Closest pair queries CMTV00 find the closest
    object pairs from A and B

4
All-nearest-neighbors Queries with CP
  • All-NN query processing with CP queries HS98,
    CMTV01
  • Closest pair query is performed on dataset A and
    B
  • A bitmap S0 of size A is used to indicate for
    each point in A, whether its nearest neighbor has
    been found
  • If an object pair ltai, bjgt is the first for ai,
    then bj is the NN for ai
  • The algorithm terminates when S0ai 1 for all
    ai ? A
  • The disadvantage of ANN with CP
  • If there is an object ai in A whose NNdist(ai, B)
    is large, all the entry pairs with distance
    smaller than NNdist(ai, B) are required to be
    visited
  • Inefficient

5
Indexed-based ANN Multiple NN
  • All-nearest-neighbors query processing with
    multiple nearest neighbor queries (assume dataset
    B is indexed)
  • Perform A NN queries (one for each object in A)
    on R-tree RB
  • The order of the NN queries is important if there
    exists an LRU buffer
  • If A is not indexed, we sort the objects using a
    space filling curve (e.g., Hilbert) and visit
    them in this order
  • If A is indexed, we traverse the R-tree RA
    following the Hilbert order of the entry (with
    respect to the center of the entry MBR)
  • Some observations about multiple NN
  • The dominant cost is the CPU cost

6
Indexed-based ANN Batched NN
  • All-nearest-neighbors query processing with
    batched nearest neighbor (BNN) queries
  • Retrieves the nearest neighbors for a group of
    objects at a time
  • Reduce the number of distance computation
  • Apply optimizations like planesweep
  • Several criteria for BNN grouping
  • The number of points in each group should be
    maximized
  • The MBR of each group should be minimized
  • Each group should be small enough to fit in memory

7
Forming Groups in Batched NN
  • Take the objects one by one until any of the
    following constraints is violated
  • The number of objects in the group is smaller
    than max_num
  • The area of the grouping MBR is smaller than
    max_area
  • Determine max_num and max_area
  • max_num is determined by the available memory
  • max_area is related to the density of the dataset
    B in the area
  • Perform a NN search for the first object in the
    group
  • Estimate the max_area as the average area of the
    leaf nodes of B visited in the search

8
Batched NN
  • Take the objects one by one into a group,
    maximizing locality
  • If A is not indexed
  • Sort the objects in A by their Hilbert value
  • If A is indexed
  • Start from the root, follow entries recursively
    accordingly to the Hilbert value of their
    centroids
  • Find the NN for the objects in the group
  • Use NN algorithm (with respect to the MBR of the
    objects)
  • Keep the current nearest neighbor for each object
  • The algorithm terminates when no better result is
    possible for all the objects

9
Non-indexed-based ANN a Hash Approach
  • All-nearest-neighbors query processing with
    hashing (assume neither dataset A nor B is
    indexed)
  • An observation most of the objects are in the
    same partition as their nearest neighbors

10
Hash ANN Query Processing
  • Load the corresponding buckets and find the NN in
    the bucket
  • For the border objects, put them to a local
    pending set and a global pending list
  • When a bucket is loaded, process the objects in
    the pending set
  • A 2nd pass might be required to handle buckets
    with non-empty pending set

11
Optimization of Hash ANN
  • The goal minimize the number of page accesses
    required by the 2nd pass of the algorithm
  • Define a good tiling and a good scheduling of
    partitions

12
Experiment Settings
  • Simulator written in C
  • CPU Pentium III 733 MHz
  • Page size 4KByte
  • LRU Buffer 512KByte
  • Dataset United States

13
Experiments Indexed Datasets
14
Experiments Non-indexed Datasets
15
Experiments Self ANN
16
Conclusions
  • Existing closest pair techniques is not suitable
    for all-nearest-neighbors queries
  • Based on whether dataset A and B is indexed, we
    developed different ANN processing methods
  • For indexed datasets, batched nearest neighbor
    (BNN) is more efficient than multiple nearest
    neighbor (MNN)
  • For large non-indexed datasets, hash-based all
    nearest neighbor (HNN) outperforms BNN build
    the tree on the fly
  • Future work (1) extend to high dimensional
    space (2) extend to
    all-k-nearest-neighbors

17
  • Thank you!
  • Questions
Write a Comment
User Comments (0)
About PowerShow.com