Allnearestneighbors Queries in Spatial Databases - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Allnearestneighbors Queries in Spatial Databases

Description:

If there is an object ai in A whose NNdist(ai, B) is large, all the entry pairs with distance smaller than NNdist(ai, B) are required to be visited ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 18

Provided by: Bob7151

Category:

Tags: ai | allnearestneighbors | databases | queries | spatial

more less

Transcript and Presenter's Notes

Title: Allnearestneighbors Queries in Spatial Databases

1
All-nearest-neighbors Queries in Spatial
Databases
SSDBM 2004 June 23, 2004
2
All-nearest-neighbors Queries

All-nearest-neighbors queries Given two
multidimensional dataset A and B, for each object
in A, find its nearest neighbor in B
All-nearest neighbors query application
Geographic Information Systems (GIS), Data
Mining, etc.

3
R-trees and NN Search

R-trees Gut84 multi-dimensional index
structure
Nearest neighbor (NN) queries
Depth-first NN query processing RKV95
Best-first NN query processing HS99
incremental, optimal I/O cost
Closest pair queries CMTV00 find the closest
object pairs from A and B

4
All-nearest-neighbors Queries with CP

All-NN query processing with CP queries HS98,
CMTV01
Closest pair query is performed on dataset A and
B
A bitmap S0 of size A is used to indicate for
each point in A, whether its nearest neighbor has
been found
If an object pair ltai, bjgt is the first for ai,
then bj is the NN for ai
The algorithm terminates when S0ai 1 for all
ai ? A
The disadvantage of ANN with CP
If there is an object ai in A whose NNdist(ai, B)
is large, all the entry pairs with distance
smaller than NNdist(ai, B) are required to be
visited
Inefficient

5
Indexed-based ANN Multiple NN

All-nearest-neighbors query processing with
multiple nearest neighbor queries (assume dataset
B is indexed)
Perform A NN queries (one for each object in A)
on R-tree RB
The order of the NN queries is important if there
exists an LRU buffer
If A is not indexed, we sort the objects using a
space filling curve (e.g., Hilbert) and visit
them in this order
If A is indexed, we traverse the R-tree RA
following the Hilbert order of the entry (with
respect to the center of the entry MBR)
Some observations about multiple NN
The dominant cost is the CPU cost

6
Indexed-based ANN Batched NN

All-nearest-neighbors query processing with
batched nearest neighbor (BNN) queries
Retrieves the nearest neighbors for a group of
objects at a time
Reduce the number of distance computation
Apply optimizations like planesweep
Several criteria for BNN grouping
The number of points in each group should be
maximized
The MBR of each group should be minimized
Each group should be small enough to fit in memory

7
Forming Groups in Batched NN

Take the objects one by one until any of the
following constraints is violated
The number of objects in the group is smaller
than max_num
The area of the grouping MBR is smaller than
max_area

Determine max_num and max_area
max_num is determined by the available memory
max_area is related to the density of the dataset
B in the area
Perform a NN search for the first object in the
group
Estimate the max_area as the average area of the
leaf nodes of B visited in the search

8
Batched NN

Take the objects one by one into a group,
maximizing locality
If A is not indexed
Sort the objects in A by their Hilbert value
If A is indexed
Start from the root, follow entries recursively
accordingly to the Hilbert value of their
centroids
Find the NN for the objects in the group
Use NN algorithm (with respect to the MBR of the
objects)
Keep the current nearest neighbor for each object
The algorithm terminates when no better result is
possible for all the objects

9
Non-indexed-based ANN a Hash Approach

All-nearest-neighbors query processing with
hashing (assume neither dataset A nor B is
indexed)

An observation most of the objects are in the
same partition as their nearest neighbors

10
Hash ANN Query Processing

Load the corresponding buckets and find the NN in
the bucket
For the border objects, put them to a local
pending set and a global pending list
When a bucket is loaded, process the objects in
the pending set
A 2nd pass might be required to handle buckets
with non-empty pending set

11
Optimization of Hash ANN

The goal minimize the number of page accesses
required by the 2nd pass of the algorithm
Define a good tiling and a good scheduling of
partitions

12
Experiment Settings

Simulator written in C
CPU Pentium III 733 MHz
Page size 4KByte
LRU Buffer 512KByte
Dataset United States

13
Experiments Indexed Datasets
14
Experiments Non-indexed Datasets
15
Experiments Self ANN
16
Conclusions

Existing closest pair techniques is not suitable
for all-nearest-neighbors queries
Based on whether dataset A and B is indexed, we
developed different ANN processing methods
For indexed datasets, batched nearest neighbor
(BNN) is more efficient than multiple nearest
neighbor (MNN)
For large non-indexed datasets, hash-based all
nearest neighbor (HNN) outperforms BNN build
the tree on the fly
Future work (1) extend to high dimensional
space (2) extend to
all-k-nearest-neighbors