Title: Reverse Nearest Neighbor Queries for Dynamic Databases
1Reverse Nearest Neighbor Queries for Dynamic
Databases
SIGMOD 2000
- SHOU Yu Tao
- Jan. 10th, 2003
2Outline of the Presentation
- Background
- Nearest neighbor (NN) search algorithm RKV95
- Reverse nearest neighbor (RNN) search algorithm
SAA00 - Other NN related problems CNN, RNNa, etc.
- Conclusions
- References
- Q A
3Background
- RNN(q) returns a set of data points that have
the query point q as the nearest neighbor. - Receives much interests during recent years due
to its increased importance in advanced database
applications - fixed wireless telephone access application
load detection problemcount how many users
are currently using a specific base station q ?
if qs load is too heavy ? activating an inactive
base station to lighten the load of that over
loaded base station
4Nonsymmetrical property of RNN queries
- NN(q) p NN(p) q
- If p is the nearest neighbor of q, then q need
not be the nearest neighbor of p (in this case
the nearest neighbor of p is r). - those efficient NN algorithms cannot directly
applied to solve the RNN problems. Algorithms for
RNN problems are needed. - A straight forward solution-- check for each
point whether it has q as its nearest neighbor
-- not suitable for large data set!
5Two versions of RNN problem
- monochromatic version
- -- the data points are of two categories, say red
and blue. The RNN query point q is in one of the
categories, say blue. So RNN(q) must determine
the red points which have the query point q as
the closest blue point. - -- e.g. fixed wireless telephone access
application clients/red (e.g. call initiation
or termination) - servers/blue (e.g. fixed wireless base stations)
- bichromatic version
- -- all points are of the same color is the
monochromatic version. - Static vs Dynamic
- -- whether insertions or deletions of the data
points are allowed.
6RNN problem this paper concerns
- Monochromatic case
- Dynamic case
- Whole Algorithm is based on(1). Geometric
observations ? enable a reduction of the RNN
problem to the NN problem. - (2). NN search algorithm RKV95.
- Both RNN(q) and NN(q) are sets of points in
the databases, while query point q may or may not
correspond to an actual data point in the data
base.
7Geometric Observations
Let the space around a query point q be divided
into six equal regions Si (1lines intersecting q. Si therefore is the space
between two space dividing lines. Proposition
1 For a given 2-dimensional dataset, RNN(q)
will return at most six data points. And they are
must be on the same circle centered at q.
L3
L2
s2
s1
s3
q
L1
s4
s6
s5
8Geometric Observations
- Proposition 2
- In each region Si
- (1). There exists at most two RNN points
- (2). If there exist exactly two RNN points in a
region Si, then each point must be on one of the
space dividing lines through q delimiting Si. - Proposition 3
- In each region Si, let p NN(q) in Si,
if p is not on a space dividing line, then either
NN(p) q (and then RNN(q) p) or RNN(q) null.
p
9Important result from Observations
- Implications In a region Si, if the number of
results of NN(q) is(1) one point only If
NN(q) is not on the space dividing lines either
the nearest neighbor is also the reverse nearest
neighbor, or there is no RNN(q) in Si. (2) more
than one point, (but the NN(q) of each region
will return at most two points for each
region) These two points must be on the two
dividing lines and on the same circle centered at
q. - Allow us to have a criterion for limiting the
choice of RNN(q) to one or two points in each of
the six regions Si. - The RNN query has been reduced to the NN query
10Basic NN Search Algorithm
- This is based on MINDIST metric only
- return single NN(q) result only
11Algorithms in RKV95
- Two metrics introduced effectively directing
and pruning the NN search - MINDIST (optimistic)
- MINMAXDIST (pessimistic)
- DFS Search
12MINDIST(Optimistic)
- MINDIST(RECT,q) the shortest distance from RECT
to query point q - This provides a lower bound for distance from q
to objects in RECT - MINDIST guarantees that all points in the RECT
have at least MINDIST distance from the query
point q.
13MINMAXDIST(Pessimistic)
- MBR property Every face (edge in 2D, rectangle
in 3D, hyper-face in high D) of any MBR contains
at least one point of some spatial object in the
DB. - MINMAXDIST Calculate the maximum dist to each
face, and choose the minimal. - Upper bound of minimal distance
- MINMAXDIST guarantees that at least 1 object with
distance less or equal to MINMAXDIST in the MBR
14Illustration of MINMAXDIST
(t1,t2)
MINDIST
(q1,q2)
(t1,p2)
MINMAXDIST
y
x
(s1,s2)
(t1,s2)
15Pruning
- Downward Pruning during the descending phase
- MINDIST(q, M) MINMAXDIST(q, M)
- M can be pruned
- Distance(q, O) MINMAXDIST(q, M)
- O can be discarded
- Upward Pruning when return from the recursion
- MINDIST(q, M) Distance(q, O)
- M can be pruned
16DFS Search on R-Tree
- Traversal DFS
- Expanding non-leaf node during the descending
phase Order all its children by the metrics
(MINDIST or MINMAXDIST) and sort them into an
Active Branch List (ABL). Apply downward pruning
techniques to the ABL to remove unnecessary
branches. - Expanding leaf node Compare objects to the
nearest neighbor found so far. Replace it if the
new object is closer. - At the return from the recursion phase Using
upward pruning tech.
17RNN Algorithm
- Algorithm Outline for RNN(q) query
- 1. Construct the space dividing lines so that
space has been divided into 6 regions based on
the query point q. - 2. (a) Traverse R-tree and find one or two
points in each region Si that satisfy the nearest
neighbor condition NN(q). -- this part is also
called conditional NN queries - (b) The candidate points are tested for the
condition whether their nearest neighbor is q and
add to answer list if condition is fulfilled. - 3. Eliminate duplicates in RNN(q)
18How to find NN(q) in Si
p2
Si
p1
q
p7
p3
p6
p4
p5
- Brute-force Algorithm
- finds all the nearest neighbors until there is
one in the queried region Si. - ?inefficient! (as shown in the figure)
19How to find NN(q) in Si
- The only difference between the NN algorithm
proposed by RKV95 and conditional NN algorithm
resides only in the metric used to sort and prune
the list of candidate nodes.
20New MINMAXDIST definition
Mindist(q, M)
Mindist(q, M)
Minmaxdist(q, M)
Minmaxdist(q, M)
queried region S
queried region S
MINMAXDIST(q, M, Si) distance to furthest
vertex on closest face IN Si MINDIST(q, M, Si)
MINDIST(q, M)
21New metric definition
Mindist(q, M, Si) Mindist(q, M) Because
mindist(q, M) is valid for all case, since it
provides a definite lower bound on the location
of data points inside an MBR, although a little
bit looser.
22CNN/NN algorithm difference
- When expanding non-leaf node during the
descending phase - NN Search
- Order all its children by the metrics (MINDIST
or MINMAXDIST) and sort them into an Active
Branch List (ABL). Apply downward pruning
techniques to the ABL to remove unnecessary
branches. - CNN Search-- build a set of lists
branchListinodecard 0
? the list whose pointer points to the
children of that node and overlaps with region
(i1) - i num_section ?the list contains the counter
(for each child) the total number of sections
overlaps with this child ? child with higher
counter is visited first for I/O optimization.
23Other NN related researches
- NN and RNN for moving objects BJKS02
- CNN PTS02
- RNNA over data streams KMS02
24Conclusions
- The RNN algorithm proposed is based on using the
underling indexing data structure (R-tree), also
necessary to answer NN queries. - By integrating RNN queries in the framework of
already existing access structures, the approach
developed in this paper is therefore algorithmic
and independent of data structures constructed
especially for a set of such queries. - No additional data structures are necessary,
therefore the space requirement does not
increase.
25References
- RKV95 N. Roussopoulos, S. Kelley, and F.
Vincent. Nearest neighbor queries. In SIGMOD,
1995. - SAA00 I. Stanoi, D. Agrawal, and A. El Abbadi.
Reverse nearest neighbor queries for dynamic
databases. In Proceedings of the ACM SIGMOD
Workshop on Data Mining and Knowledge Discovery
(DMKD), 2000. - KM00 Korn, F. and Muthukrishnan, S., Influence
Set Based on Reverse Nearest Neighbor Queries.
SIGMOD, 2000. - BJKS02 Benetis, R., Jensen, C., Karciauskas,
G., Saltenis, S. Nearest Neighbor and Reverse
Nearest Neighbor Queries for Moving Objects.
IDEAS, 2002 - PTS02 Papadias, Tao, Y. and Shen, D.,
Continuous Nearest Neighbor Search. VLDB, 2002. - KMS02 Korn, F., Muthukrishnan, S. and
Srivastava, D., Reverse nearest neighbor
aggregates over data streams. VLDB, 2002.
26Questions and Answers
?
27Thank you for attending this presentation!