Title: Nearest Neighbor Search in High Dimensions
1Nearest Neighbor Search in High Dimensions
- Seminar in Algorithms and Geometry
- Mica Arie-Nachimson and Daniel Glasner
- April 2009
2Talk Outline
- Nearest neighbor problem
- Motivation
- Classical nearest neighbor methods
- KD-trees
- Efficient search in high dimensions
- Bucketing method
- Locality Sensitive Hashing
- Conclusion
Main Results
Indyk and Motwani, 1998
Gionis, Indyk and Motwani, 1999
3Nearest Neighbor Problem
- Input A set P of points in Rd (or any metric
space). - Output Given a query point q, find the point p
in P which is closest to q.
q
p
4What is it good for?
- Many things!
- Examples
- Optical Character Recognition
- Spell Checking
- Computer Vision
- DNA sequencing
- Data compression
5What is it good for?
- Many things!
- Examples
- Optical Character Recognition
- Spell Checking
- Computer Vision
- DNA sequencing
- Data compression
2
query
2
1
2
3
7
7
2
2
3
8
4
Feature space
6What is it good for?
- Many things!
- Examples
- Optical Character Recognition
- Spell Checking
- Computer Vision
- DNA sequencing
- Data compression
query
abaut
shout
bat
abate
scout
about
boat
able
Feature space
7What is it good for?
- Many things!
- Examples
- Optical Character Recognition
- Spell Checking
- Computer Vision
- DNA sequencing
- Data compression
And many more
8Approximate Nearest Neighbor ?-NN
9Approximate Nearest Neighbor ?-NN
- Input A set P of points in Rd (or any metric
space). - Given a query point q, let
- p point in P closest to q
- r the distance p-q
- Output Some point p with distance at most
r(1?)
q
r
p
10Approximate Nearest Neighbor ?-NN
- Input A set P of points in Rd (or any metric
space). - Given a query point q, let
- p point in P closest to q
- r the distance p-q
- Output Some point p with distance at most
r(1?)
r(1?)
q
r
p
r(1?)
11Approximate vs. ExactNearest Neighbor
- Many applications give similar results with
approximate NN - Example from Computer Vision
12Retiling
Slide from Lihi Zelnik-Manor
13Exact NNS 27 sec
Approximate NNS 0.6 sec
Slide from Lihi Zelnik-Manor
14Solution Method
- Input A set P of n points in Rd.
- Method Construct a data structure to answer
nearest neighbor queries - Complexity
- Preprocessing space and time to construct the
data structure - Query time to return answer
15Solution Method
- Naïve approach
- Preprocessing O(nd)
- Query time O(nd)
- Reasonable requirements
- Preprocessing time and space poly(nd).
- Query time sublinear in n.
16Talk Outline
- Nearest neighbor problem
- Motivation
- Classical nearest neighbor methods
- KD-trees
- Efficient search in high dimensions
- Bucketing method
- Locality Sensitive Hashing
- Conclusion
17Classical nearest neighbor methods
- Tree structures
- kd-trees
- Vornoi Diagrams
- Preprocessing poly(n), exp(d)
- Query log(n), exp(d)
- Difficult problem in high dimensions
- The solutions still work, but are exp(d)
18KD-tree
5
20
12
15
7
8
10
13
18
13,15,18
7,8,10,12
18
13,15
10,12
7,8
7, 8
10, 12
13, 15
18
19KD-tree
5
20
12
15
7
8
10
13
18
query
17
13,15,18
7,8,10,12
18
13,15
10,12
7,8
min dist 1
7, 8
10, 12
13, 15
18
20KD-tree
5
20
12
15
7
8
10
13
18
query
16
13,15,18
7,8,10,12
18
13,15
10,12
7,8
min dist 2
min dist 1
7, 8
10, 12
13, 15
18
21KD-tree
- dgt1 alternate between dimensions
- Example d2
(12,5) (6,8) (17,4) (23,2) (20,10) (9,9) (1,6)
x
(17,4) (23,2) (20,10)
(12,5) (6,8) (1,6) (9,9)
y
x
22KD-tree
- dgt1 alternate between dimensions
- Example d2
x
x
y
x
23KD-tree
- dgt1 alternate between dimensions
- Example d2
- NN search
Animated gif from http//en.wikipedia.org/wiki/Fil
eKDTree-animation.gif
24KD-tree complexity
- Preprocessing O(nd)
- Query
- O(logn) if points are randomly distributed
- w.c. O(kn1-1/k) almost linear when n close to k
- Need to search the whole tree
25Talk Outline
- Nearest neighbor problem
- Motivation
- Classical nearest neighbor methods
- KD-trees
- Efficient search in high dimensions
- Bucketing method
- Locality Sensitive Hashing
- Conclusion
26Sublinear solutions
Preprocessing Query time
nO(1/? ) O(logn) Bucketing
O(n11/(1?)) n3/2 when ?1 O(n1/(1?)) sqrt(n) when ?1 LSH
2
Not counting logn factors
Linear in d
Solve ?-NN by reduction
27r-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for every query q,
find a ball that it resides in, if exists. - If doesnt reside in any ball return NO.
Return p1
p1
28r-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for every query q,
find a ball that it resides in, if exists. - If doesnt reside in any ball return NO.
Return NO
29Reduction from ?-NN to r-PLEB
- The two problems are connected
- r-PLEB is like a decision problem for ?-NN
30Reduction from ?-NN to r-PLEB
- The two problems are connected
- r-PLEB is like a decision problem for ?-NN
31Reduction from ?-NN to r-PLEB
- The two problems are connected
- r-PLEB is like a decision problem for ?-NN
32Reduction from ?-NN to r-PLEBNaïve Approach
- Set Rproportion between largest dist and
smallest dist of 2 points - Define r(1?)0, (1?)1,,R
- For each ri construct ri-PLEB
- Given q, find the smallest r which gives a YES
- Use binary search to find r
33Reduction from ?-NN to r-PLEBNaïve Approach
- Set Rproportion between largest dist and
smallest dist of 2 points - Define r(1?)0, (1?)1,,R
- For each ri construct ri-PLEB
- Given q, find the smallest ri which gives a YES
- Use binary search
34Reduction from ?-NN to r-PLEBNaïve Approach
- Correctness
- Stopped at ri(1?)k
- ri1(1?)k1
(1?)k r (1?)k1
r3-PLEB
r2-PLEB
r1-PLEB
35Reduction from ?-NN to r-PLEBNaïve Approach
- Reduction overhead
- Space O(log1?R) r-PLEB constructions
- Size of (1?)0, (1?)1,,R is log1?R
- Query O(loglog1?R) calls to r-PLEB
Dependency on R
36Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected
components (C.C)
Har-Peled 2001
37Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected
components (C.C)
38Reduction from ?-NN to r-PLEBBetter Approach
- Set rmed as the radius which gives n/2 connected
components (C.C) - Set rtop 4nrmedlogn/?
rtop
rmed
39Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
40Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rtop
41Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
42Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rtop
43Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rtop
44Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
45Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
46Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
47Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
rmed
48Reduction from ?-NN to r-PLEBBetter Approach
- If q2 B(pi,rmed) and q2 B(pi,rtop), set
Rrtop/rmed and perform binary search on
r(1?)0, (1?)1,,R - R independent of input points
- If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is far
away - Enough to choose one point from each C.C and
continue recursively with these points
(accumulating error 1?/3) - If q2 B(pi,rmed) for some i then continue
recursively on the C.C.
O(loglogR)O(log(n/?))
2 half of the points
Complexity overhead how many r-PLEB queries?
Total O(logn)
49(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q
- If q resides in a ball of radius r, return the
ball. - If q doesnt reside in any ball, return NO.
- If q resides only in the border of a ball,
return either the ball or NO.
p1
Return p1
50(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q
- If q resides in a ball of radius r, return the
ball. - If q doesnt reside in any ball, return NO.
- If q resides only in the border of a ball,
return either the ball or NO.
Return NO
51(r,?)-PLEBPoint Location in Equal Balls
- Given n balls of radius r, for query q
- If q resides in a ball of radius r, return the
ball. - If q doesnt reside in any ball, return NO.
- If q resides only in the border of a ball,
return either the ball or NO.
Return YES or NO
52Talk Outline
- Nearest neighbor problem
- Motivation
- Classical nearest neighbor methods
- KD-trees
- Efficient search in high dimensions
- Bucketing method
- Locality Sensitive Hashing
- Conclusion
53Bucketing Method
- Apply a grid of size r?/sqrt(d)
- Every ball is covered by at most k cubes
- Can show that k Cd/?d for some Clt5 constant
- kn cubes cover all balls
- Finite number of cubes can use hash table
- Key cube, Value a ball it covers
- Space req O(nk)
r-PLEB
Indyk and Motwani, 1998
54Bucketing Method
- Apply a grid of size r?/sqrt(d)
- Every ball is covered by at most k cubes
- Can show that k Cd/?d for some Clt5 constant
- kn cubes cover all balls
- Finite number of cubes can use hash table
- Key cube, Value a ball it covers
- Space req O(nk)
r-PLEB
55Bucketing Method
- Apply a grid of size r?/sqrt(d)
- Every ball is covered by at most k cubes
- Can show that k Cd/?d for some Clt5 constant
- kn cubes cover all balls
- Finite number of cubes can use hash table
- Key cube, Value a ball it covers
- Space req O(nk)
r-PLEB
56Bucketing Method
- Apply a grid of size r?/sqrt(d)
- Every ball is covered by at most k cubes
- Can show that k Cd/?d for some Clt5 constant
- kn cubes cover all balls
- Finite number of cubes can use hash table
- Key cube, Value a ball it covers
- Space req O(nk)
r-PLEB
57Bucketing Method
- Given query q
- Compute the cube it resides in O(d)
- Find the ball this cube intersects O(1)
- This point is an (r,?)-PLEB of q
r-PLEB
58Bucketing Method
- Given query q
- Compute the cube it resides in O(d)
- Find the ball this cube intersects O(1)
- This point is an (r,?)-PLEB of q
r?/sqrt(d)
r-PLEB
?
r?/sqrt(d)
59Bucketing Method
- Given query q
- Compute the cube it resides in O(d)
- Find the ball this cube intersects O(1)
- This point is an (r,?)-PLEB of q
NO
YES or NO
r-PLEB
?
YES
60Bucketing MethodComplexity
- Space required O(nk)O(n(1/?d))
- Query time O(d)
- If dO(logn) or nO(2d)
- Space req O(nlog(1/?))
- Else use dimensionality reduction in l2 from d to
?-2log(n) Johnson-Lindenstrauss lemma - Space nO(1/? )
2
61Break
62Talk Outline
- Nearest neighbor problem
- Motivation
- Classical nearest neighbor methods
- KD-trees
- Efficient search in high dimensions
- Bucketing method
- Local Sensitive Hashing
- Conclusion
63Locality Sensitive Hashing
- Indyk Motwani 98, Gionis, Indyk Motwani 99
- A solution for (r,?)-PLEB.
- Probabilistic construction, query succeeds with
high probability. - Use random hash functionsg X ? U (some finite
range). - Preserve separation of near and far points
with high probability.
64Locality Sensitive Hashing
r
- If p-q r, then Prg(p)g(q) is high
- If p-q gt (1?)r, then Prg(p)g(q) is low
65A locality sensitive family
- A family H of functions h X ? U is called
(P1,P2,r,(1?)r)-sensitive for metric dX, if for
any p,q - if p-q lt r then Pr h(p)h(q) gt P1
- if p-q gt(1?)r then Pr h(p)h(q) lt P2
- For this notion to be useful we requireP1 gt P2
66Intuition
- if p-q lt r then Pr h(p)h(q) gt P1
- if p-q gt(1?)r then Pr h(p)h(q) lt P2
h2
h1
Illustration from Lihi Zelnik-Manor
67Claim
- If there is a (P1,P2,r,(1?)r) - sensitive family
for dX then there exists an algorithm for
(r,?)-PLEB in dX with - Space - O(dnn1?)
- Query - O(dn?)Where
When ? 1 O(dn n3/2) O(dsqrt(n))
68Algorithm preprocessing
- For i 1,,L
- Uniformly select k functions from H
- Set gi(p)(h1(p),h2(p),,hk(p))
0 1
hi Rd ? 0,1
69Algorithm preprocessing
- For i 1,,L
- Uniformly select k functions from H
- Set gi(p)(h1(p),h2(p),,hk(p))
- Compute gi(p) for all p 2 P
- Store resulting values in a hash table
70Algorithm - query
- S à ? , i à 1
- While S 2L
- S Ã S points in bucket gi(q) of table i
- If 9 p 2 S s.t. p-q (1?)rreturn p and
exit. - i
- Return NO.
71Correctness
- Property Iif q-p r then gi(p) gi(q)
for some i 2 1,...,L - Property IInumber of points p2 P s.t. q-p
(1?)r and gi(p) gi(q) is less than 2L - We show that PrI II hold ½-1/e
72Correctness
- Property Iif q-p r then gi(p) gi(q)
for some i 2 1,...,L - Property IInumber of points p2 P s.t. q-p
(1?)r and gi(p) gi(q) is less than 2L - Choose
- k log1/p2n
- L n? where
73Complexity
- k log1/p2n
- L n? where
- Space
- Ln dn O(n1? dn)
- Query
- L hash function evaluations O(L) distance
calculations O(dn?)
Hash tables
Data points
74Significance of k and L
Prg(p) g(q)
p-q
75Significance of k and L
Prgi(p) gi(q) for some i 2 1,...,L
p-q
76Application
- Perform NNS in Rd with l1 distance.
- Reduce the problem to NNS in Hd the hamming cube
of dimension d. - Hd binary strings of length d.
- dHam(s1,s2) number of coordinates where s1 and
s2 disagree.
77Embedding l1d in Hd
- w.l.o.g all coordinates of all points in P are
positive integer lt C. - Map integer i 2 1,...,C to
- (1,1,....,1,0,0,...0)
- Map a vector by mapping each coordinate.
- Example (5,3,2),(2,4,1) ?(11111,11100,11000),
(11000,11110,10000)
78Embedding l1d in Hd
- Distances are preserved.
- Actual computations are performed in the original
space O(log C) overhead.
79A sensitive family for the hamming cube
- Hd hi hi(b1,,bd) bi for i 1,,d
- If dHam(s1,s2) lt r what is Prh(p)h(q) ?
- at most 1-r/d
- If dHam(s1,s2) gt (1?)r what is Prh(p)h(q) ?
- at least 1-(1?)r/d
- Hd is (r,(1?)r,1-r/d,1-(1?)r/d) sensitive.
- Question what are these projections in the
original space?
80Corollary
- We can bound
- (1/1?)
- Space - O(dnn(11/(1?))
- Query - O(dn1/(1?))
When ? 1 O(dn n3/2) O(dsqrt(n))
81Recent results
- In Euclidian space
- ? 1/(1?)2 O(log log n / log1/3 n)Andoni
Indyk 2008 - ? 0.462/(1?)2Motwani, Naor Panigrahy
2006 - LSH family for ls s 2 0,2)Datar,Immorlica,Indyk
Mirrokni 2004 - And many more.
82Conclusion
- NNS is an important problem with many
applications. - The problem can be efficiently solved in low
dimensions. - We saw some efficient approximate solutions in
high dimensions, which are applicable to many
metrics.