Navigating Nets: Simple algorithms for proximity search - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Navigating Nets: Simple algorithms for proximity search

Description:

Impossible, even if the data set S is a path metric: 1. 2. n. n-1. n. n. What about ... Near-optimality. The basic idea: Consider a uniform metric on l points. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 26
Provided by: ROBI196
Category:

less

Transcript and Presenter's Notes

Title: Navigating Nets: Simple algorithms for proximity search


1
Navigating Nets Simple algorithms for proximity
search
  • Robert Krauthgamer (IBM Almaden)
  • Joint work with James R. Lee (UC Berkeley)

2
A classical problem
  • Fix a metric space (X,d)
  • X set of points.
  • d distance function over X.
  • Near-neighbor search (NNS) Minsky-Papert
  • Preprocess a given n-point subset S ? X.
  • Given a query point q 2 X, quickly compute the
    closest point to q among S.

3
Variations on NNS
  • (1e)-approximate nearest neighbor search
  • Find a2X such that d(q,a) (1?) d(q,S).
  • Dynamic case
  • Allow updates to S (insertions and deletions).
  • Distributed case
  • No central index (e.g., nodes in a network).
  • Other cost measures (e.g., communication,
    stretch, load).

4
General metrics
  • Only oracle access to distance function d(,).
  • Models a complicated metric or on-demand
    measurement.
  • No hashing of coordinates or tuning for a
    specific metric.
  • Goal efficient query (sublinear or polylog
    time).
  • Impossible, even if the data set S is a path
    metric

1
2
n
What about approximate NNS?
5
Approximate NNS
  • Hard even for (near) uniform metrics
  • d(x,y) 1 for all x,y2S.

But many data sets lack large uniform
subsets. Can we quantify this?
6
Abstract dimension
  • The doubling constant lX of a metric (X,d) is the
    minimum l such that every ball can be covered by
    l balls of half the radius.
  • The metric is doubling if lX O(1).
  • The (abstract) dimension is dim (X) log2 lX.
  • Immediate properties
  • dimA(Rd , 2) O(d).
  • dimA(X) ? dimA(X) for all X ? X.
  • dimA(X) ? log X. (Equality for a uniform
    metric.)

7
Illustration
  • Grid with missing piece

8
Illustration
  • Grid with missing piece
  • Low-dimensional manifold (bounded curvature)

9
Illustration
  • Grid with missing piece
  • Manifold
  • Union of curves in Euclidean space

10
Embedding doubling metrics
  • Theorem Assouad, 1983 Gupta, K., Lee, 2003
    Fix 0ltelt1, and let (X,d) be a doubling metric.
    Then (X,de) can be embedded with O(1) distortion
    into l2O(1).
  • Not true for ?1 Semmes, 1996.
  • Motivation Embed S and then apply Euclidean NNS.

11
Our results
  • Simple data structure for maintaining S
  • (1e)-NNS query time (1/e)O(dim(S)) log D
    (for elt½), where Ddmax/dmin is the normalized
    diameter of S (typically DnO(1)).
  • Space n 2O(dim(S)).
  • Dynamic maintenance of S
  • Insertion / deletion time 2O(dim(S)) log D
    loglog D.
  • Additional properties
  • Best possible dependency on dim(S) (in a certain
    model).
  • Oblivious to dim(S) and robust against bad
    localities.
  • Matches/improves known (more specialized) results.

12
Nets
  • Definition An r-net of X is a subset Y with
  • 1. d(y1,y2) ? r for all y1,y2 2 Y.
  • 2. d(x,Y) lt r for all x 2 XnY.
  • (I.e., a maximal r-separated subset.)
  • Note Compare vs. ?-net.

13
More nets
  • Definition An r-net of X is a subset Y with
  • 1. d(y1,y2) ? r for all y1,y2 2 Y.
  • 2. d(x,Y) lt r for all x 2 XnY.
  • (I.e., a maximal r-separated subset.)
  • Note Compare vs. ?-net.

14
The data structure
  • For every r 2i, let Yr be an r-net of S.
  • Only O(log D) values of r are non-trivial.
  • For every y 2 Yr maintain a navigation list
  • Ly,r z 2 Yr/2 d(y,z) ? 2r

15
More on the data structure
  • For every r 2i, let Yr be an r-net of S.
  • Only O(log D) values of r are non-trivial.

Yr/2
  • For every y 2 Yr maintain a navigation list
  • Ly,r z 2 Yr/2 d(y,z) ? 2r

16
Space requirement
  • Lemma Ly,r ? 2O(dim(S)) for all y2Y, r0.
  • Proof
  • Ly,r is contained in a ball of radius 2r.
  • This ball can be covered by lS3 balls of radius
    r/4.
  • Every point in Ly,r ? Yr/2 must be covered by a
    distinct ball.
  • Hence, Ly,r ? lS3 23dim(S). ?
  • Corollary Total space is 2O(dim(S)) n log D.
  • We actually improve it to 2O(dim(S)) n.

17
Back to running example
18
Navigating nets
  • Let denote the query point.

19
How to find zr/2?
  • Assume each zr2Yr is the closest point to a
    (instead of to q).
  • Then d(zr,zr/2) rr/2 3r/2.
  • And zr/2 must be in zrs list Ly,r.

zr
r
a
r/2
q
  • For zr to be closest Yr point to q,
  • It suffices that d(q,a) r/4.
  • And then zrs list Ly,r contains zr/2.
  • Note d(q,zr) 3r/2.

r/4
zr/2
20
Stopping point
  • If we find a point zr with d(q,zr) 3r/2,
  • But not a point zr/2 with d(q,zr/2) 3r/4,
  • We know that d(q,S) gt r/4,
  • Yielding 6-NNS with query time 2O(dim(S)) log
    D.
  • This can be extended to (1?)-NNS
  • Similar principles yield insertions and deletions.

21
Near-optimality
  • The basic idea
  • Consider a uniform metric on l points.
  • Let the query point be at distance 1 from all of
    them,
  • Except for one point whose distance is 1-e.
  • Finding this point requires (in an oracle model)
    computing all l distances to q.
  • Can happen at every distance scale r.
  • We get a lower bound of 2W (dim(S)) log D.

22
Related work general metrics
  • Let KX be the smallest K such that
  • B(x,r) ? K B(x,r/2) for all x 2 X, r 0.
  • Define the KR-dimension as log2 KX.
  • Randomized exact NNS Karger-Ruhl02, Hildrum et
    al.04
  • Space n 2O(dim(S)) log D.
  • Query time 2O(dim(S)) log D.
  • If dimKR(S) O(1) the log D term is actually
    O(log n).
  • Our results extend to this setting
  • 1. KR-metrics are doubling dim(X) ? 4dimKR(X).
  • 2. Our algorithms actually give exact NNS.
  • Assumptions on query distribution Clarkson99.

23
Related work Euclidean metrics
  • Exact NNS for Rd
  • O(d5 log n) query time and O(ndd) space.
    Meiser93
  • (1e)-NNS for Rd
  • O((d/e)d log n) query time and O(dn) space by
    quad-tree like decompositions AMNSW94.
  • Our algorithm achieves similar bounds.
  • O(d polylog(dn)) query time and (dn)O(1) space is
    useful for higher dimensions IM98, KOR98.

24
Concluding remarks
  • Our approach
  • A decision tree that is not really a tree
    (saves space).
  • In progress
  • A different (static) scheme where log ? is
    replaced by log n.
  • Bounds on the help of ambient space points.
  • Our data structure yields a spanner of the metric
  • Immediate O(1) stretch with average degree
    2dim(S).
  • More work O(1) stretch with maximum degree
    2dim(S).
  • Guibas,04 applied the nets data structure for
    moving points in the plane.

25
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com