When Is Nearest Neighbor Meaningful

About This Presentation

Title:

When Is Nearest Neighbor Meaningful

Description:

Serious questions are raised about techniques that map ... Bottom left - Ideal clusters. Bottom right - Distance distribution for ideally clustered data/queries ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 10

Provided by: uris5

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: When Is Nearest Neighbor Meaningful

1
When Is Nearest Neighbor Meaningful?

By Kevin Beyer, Jonathan Goldstein, Raghu
Ramakrishnan, and Uri Shaft

2
Nearest neighbor queries
Typical query in 2D
Unstable query in 2D
3
Main theoretical instability result
(i.e. As dimensionality increases, all points
become equidistant w.r.t. the query point)
4
IID contrast as dimensionality increases
5
Repercussions of the technical result

Serious questions are raised about techniques
that map approximate similarity into high
dimensional nearest neighbor problems.
The ease with which linear scan beats more
complex access methods for high-D nearest
neighbor is explained by our theorem.
These results should not be taken to mean that
all high dimensional nearest neighbor problems
are badly framed or that more complex access
methods will always fail on individual high-D
data sets.

6
Example result application

Assume the following
The data distribution and query distribution are
IID in all dimensions.
All the appropriate moments are finite (i.e., up
to the é2pùth moment).
The query point is chosen independently of the
data points.

7
Examples that meet our condition

IID (Identical Independently Distributed), Q D
(Query distribution follows data distribution)
Variance converging to 0 at a bounded rate, Q D
Variance converging to infinity at a bounded
rate, Q D
Partial correlation between all dimensions, Q D
Variance converging to 0 at a bounded rate, and
partial correlation between all dimensions, Q D
Perfectly realized clustering, Q IID uniform

8
Examples that dont meet our condition

Total correlation between all dimensions, Q D
All dimensions are linear combinations of a fixed
number of IID random variables, Q D
Perfectly realized clustering with query
distribution following data distribution, Q D

9
Contrast in ideally clustered data
Top right - Typical distance distribution Bottom
left - Ideal clusters Bottom right - Distance
distribution for ideally clustered data/queries

Write a Comment

User Comments (0)