Navigating Nets: Simple algorithms for proximity search - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Navigating Nets: Simple algorithms for proximity search

Description:

Impossible, even if the data set S is a path metric: 1. 2. n. n-1. n. n. What about ... Near-optimality. The basic idea: Consider a uniform metric on l points. ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 26

Provided by: ROBI196

Category:

more less

Transcript and Presenter's Notes

Title: Navigating Nets: Simple algorithms for proximity search

1
Navigating Nets Simple algorithms for proximity
search

Robert Krauthgamer (IBM Almaden)
Joint work with James R. Lee (UC Berkeley)

2
A classical problem

Fix a metric space (X,d)
X set of points.
d distance function over X.
Near-neighbor search (NNS) Minsky-Papert
Preprocess a given n-point subset S ? X.
Given a query point q 2 X, quickly compute the
closest point to q among S.

3
Variations on NNS

(1e)-approximate nearest neighbor search
Find a2X such that d(q,a) (1?) d(q,S).
Dynamic case
Allow updates to S (insertions and deletions).
Distributed case
No central index (e.g., nodes in a network).
Other cost measures (e.g., communication,
stretch, load).

4
General metrics

Only oracle access to distance function d(,).
Models a complicated metric or on-demand
measurement.
No hashing of coordinates or tuning for a
specific metric.
Goal efficient query (sublinear or polylog
time).
Impossible, even if the data set S is a path
metric

1
2
n
What about approximate NNS?
5
Approximate NNS

Hard even for (near) uniform metrics
d(x,y) 1 for all x,y2S.

But many data sets lack large uniform
subsets. Can we quantify this?
6
Abstract dimension

The doubling constant lX of a metric (X,d) is the
minimum l such that every ball can be covered by
l balls of half the radius.
The metric is doubling if lX O(1).
The (abstract) dimension is dim (X) log2 lX.
Immediate properties
dimA(Rd , 2) O(d).
dimA(X) ? dimA(X) for all X ? X.
dimA(X) ? log X. (Equality for a uniform
metric.)

7
Illustration

Grid with missing piece

8
Illustration

Grid with missing piece
Low-dimensional manifold (bounded curvature)

9
Illustration

Grid with missing piece
Manifold
Union of curves in Euclidean space

10
Embedding doubling metrics

Theorem Assouad, 1983 Gupta, K., Lee, 2003
Fix 0ltelt1, and let (X,d) be a doubling metric.
Then (X,de) can be embedded with O(1) distortion
into l2O(1).
Not true for ?1 Semmes, 1996.
Motivation Embed S and then apply Euclidean NNS.

11
Our results

Simple data structure for maintaining S
(1e)-NNS query time (1/e)O(dim(S)) log D
(for elt½), where Ddmax/dmin is the normalized
diameter of S (typically DnO(1)).
Space n 2O(dim(S)).
Dynamic maintenance of S
Insertion / deletion time 2O(dim(S)) log D
loglog D.
Additional properties
Best possible dependency on dim(S) (in a certain
model).
Oblivious to dim(S) and robust against bad
localities.
Matches/improves known (more specialized) results.

12
Nets

Definition An r-net of X is a subset Y with
1. d(y1,y2) ? r for all y1,y2 2 Y.
2. d(x,Y) lt r for all x 2 XnY.
(I.e., a maximal r-separated subset.)
Note Compare vs. ?-net.

13
More nets

Definition An r-net of X is a subset Y with
1. d(y1,y2) ? r for all y1,y2 2 Y.
2. d(x,Y) lt r for all x 2 XnY.
(I.e., a maximal r-separated subset.)
Note Compare vs. ?-net.

14
The data structure

For every r 2i, let Yr be an r-net of S.
Only O(log D) values of r are non-trivial.

For every y 2 Yr maintain a navigation list
Ly,r z 2 Yr/2 d(y,z) ? 2r

15
More on the data structure

For every r 2i, let Yr be an r-net of S.
Only O(log D) values of r are non-trivial.

Yr/2

For every y 2 Yr maintain a navigation list
Ly,r z 2 Yr/2 d(y,z) ? 2r

16
Space requirement

Lemma Ly,r ? 2O(dim(S)) for all y2Y, r0.
Proof
Ly,r is contained in a ball of radius 2r.
This ball can be covered by lS3 balls of radius
r/4.
Every point in Ly,r ? Yr/2 must be covered by a
distinct ball.
Hence, Ly,r ? lS3 23dim(S). ?
Corollary Total space is 2O(dim(S)) n log D.
We actually improve it to 2O(dim(S)) n.

17
Back to running example
18
Navigating nets

Let denote the query point.

19
How to find zr/2?

Assume each zr2Yr is the closest point to a
(instead of to q).
Then d(zr,zr/2) rr/2 3r/2.
And zr/2 must be in zrs list Ly,r.

zr
r
a
r/2
q

For zr to be closest Yr point to q,
It suffices that d(q,a) r/4.
And then zrs list Ly,r contains zr/2.
Note d(q,zr) 3r/2.

r/4
zr/2
20
Stopping point

If we find a point zr with d(q,zr) 3r/2,
But not a point zr/2 with d(q,zr/2) 3r/4,
We know that d(q,S) gt r/4,
Yielding 6-NNS with query time 2O(dim(S)) log
D.
This can be extended to (1?)-NNS
Similar principles yield insertions and deletions.

21
Near-optimality

The basic idea
Consider a uniform metric on l points.
Let the query point be at distance 1 from all of
them,
Except for one point whose distance is 1-e.
Finding this point requires (in an oracle model)
computing all l distances to q.
Can happen at every distance scale r.
We get a lower bound of 2W (dim(S)) log D.

22
Related work general metrics

Let KX be the smallest K such that
B(x,r) ? K B(x,r/2) for all x 2 X, r 0.
Define the KR-dimension as log2 KX.
Randomized exact NNS Karger-Ruhl02, Hildrum et
al.04
Space n 2O(dim(S)) log D.
Query time 2O(dim(S)) log D.
If dimKR(S) O(1) the log D term is actually
O(log n).
Our results extend to this setting
1. KR-metrics are doubling dim(X) ? 4dimKR(X).
2. Our algorithms actually give exact NNS.
Assumptions on query distribution Clarkson99.

23
Related work Euclidean metrics

Exact NNS for Rd
O(d5 log n) query time and O(ndd) space.
Meiser93
(1e)-NNS for Rd
O((d/e)d log n) query time and O(dn) space by
quad-tree like decompositions AMNSW94.
Our algorithm achieves similar bounds.
O(d polylog(dn)) query time and (dn)O(1) space is
useful for higher dimensions IM98, KOR98.

24
Concluding remarks