Nearest Neighbour Condensing and Editing - PowerPoint PPT Presentation

About This Presentation
Title:

Nearest Neighbour Condensing and Editing

Description:

Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford Nearest Neighbour Rule Example: Digit Recognition Yann ... – PowerPoint PPT presentation

Number of Views:222
Avg rating:3.0/5.0
Slides: 27
Provided by: davidc171
Category:

less

Transcript and Presenter's Notes

Title: Nearest Neighbour Condensing and Editing


1
Nearest NeighbourCondensing and Editing
  • David Claus
  • February 27, 2004
  • Computer Vision Reading Group
  • Oxford

2
Nearest Neighbour Rule
Non-parametric pattern classification. Consider a
two class problem where each sample consists of
two measurements (x,y).
k 1
For a given query point q, assign the class of
the nearest neighbour.
k 3
Compute the k nearest neighbours and assign the
class by majority vote.
3
Example Digit Recognition
  • Yann LeCunn MNIST Digit Recognition
  • Handwritten digits
  • 28x28 pixel images d 784
  • 60,000 training samples
  • 10,000 test samples
  • Nearest neighbour is competitive

Test Error Rate () Test Error Rate ()
Linear classifier (1-layer NN) 12.0
K-nearest-neighbors, Euclidean 5.0
K-nearest-neighbors, Euclidean, deskewed 2.4
K-NN, Tangent Distance, 16x16 1.1
K-NN, shape context matching 0.67
1000 RBF linear classifier 3.6
SVM deg 4 polynomial 1.1
2-layer NN, 300 hidden units 4.7
2-layer NN, 300 HU, deskewing 1.6
LeNet-5, distortions 0.8
Boosted LeNet-4, distortions 0.7
4
Nearest Neighbour Issues
  • Expensive
  • To determine the nearest neighbour of a query
    point q, must compute the distance to all N
    training examples
  • Pre-sort training examples into fast data
    structures (kd-trees)
  • Compute only an approximate distance (LSH)
  • Remove redundant data (condensing)
  • Storage Requirements
  • Must store all training data P
  • Remove redundant data (condensing)
  • Pre-sorting often increases the storage
    requirements
  • High Dimensional Data
  • Curse of Dimensionality
  • Required amount of training data increases
    exponentially with dimension
  • Computational cost also increases dramatically
  • Partitioning techniques degrade to linear search
    in high dimension

5
Questions
  • What distance measure to use?
  • Often Euclidean distance is used
  • Locally adaptive metrics
  • More complicated with non-numeric data, or when
    different dimensions have different scales
  • Choice of k?
  • Cross-validation
  • 1-NN often performs well in practice
  • k-NN needed for overlapping classes
  • Re-label all data according to k-NN, then
    classify with 1-NN
  • Reduce k-NN problem to 1-NN through dataset
    editing

6
Exact Nearest Neighbour
  • Asymptotic error (infinite sample size) is less
    than twice the Bayes classification error
  • Requires a lot of training data
  • Expensive for high dimensional data (dgt20?)
  • O(Nd) complexity for both storage and query time
  • N is the number of training examples, d is the
    dimension of each sample
  • This can be reduced through dataset
    editing/condensing

7
Decision Regions
Each cell contains one sample, and every location
within the cell is closer to that sample than to
any other sample. A Voronoi diagram divides the
space into such cells.
Every query point will be assigned the
classification of the sample within that cell.
The decision boundary separates the class regions
based on the 1-NN decision rule. Knowledge of
this boundary is sufficient to classify new
points. The boundary itself is rarely computed
many algorithms seek to retain only those points
necessary to generate an identical boundary.
8
Condensing
  • Aim is to reduce the number of training samples
  • Retain only the samples that are needed to define
    the decision boundary
  • This is reminiscent of a Support Vector Machine
  • Decision Boundary Consistent a subset whose
    nearest neighbour decision boundary is identical
    to the boundary of the entire training set
  • Minimum Consistent Set the smallest subset of
    the training data that correctly classifies all
    of the original training data

Original data
Condensed data
Minimum Consistent Set
9
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

10
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

11
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

12
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

13
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

14
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

15
Condensing
  • Condensed Nearest Neighbour (CNN) Hart 1968
  • Incremental
  • Order dependent
  • Neither minimal nor decision boundary consistent
  • O(n3) for brute-force method
  • Can follow up with reduced NN Gates72
  • Remove a sample if doing so does not cause any
    incorrect classifications
  1. Initialize subset with a single training example
  2. Classify all remaining samples using the subset,
    and transfer any incorrectly classified samples
    to the subset
  3. Return to 2 until no transfers occurred or the
    subset is full

16
Proximity Graphs
  • Condensing aims to retain points along the
    decision boundary
  • How to identify such points?
  • Neighbouring points of different classes
  • Proximity graphs provide various definitions of
    neighbour

NNG Nearest Neighbour Graph MST Minimum
Spanning Tree RNG Relative Neighbourhood
Graph GG Gabriel Graph DT Delaunay
Triangulation
17
Proximity Graphs Delaunay
  • The Delaunay Triangulation is the dual of the
    Voronoi diagram
  • Three points are each others neighbours if their
    tangent sphere contains no other points
  • Voronoi editing retain those points whose
    neighbours (as defined by the Delaunay
    Triangulation) are of the opposite class
  • The decision boundary is identical
  • Conservative subset
  • Retains extra points
  • Expensive to compute in high dimensions

18
Proximity Graphs Gabriel
  • The Gabriel graph is a subset of the Delaunay
    Triangulation
  • Points are neighbours only if their (diametral)
    sphere of influence is empty
  • Does not preserve the identical decision
    boundary, but most changes occur outside the
    convex hull of the data points
  • Can be computed more efficiently

Green lines denote Tomek links
19
Proximity Graphs RNG
  • The Relative Neighbourhood Graph (RNG) is a
    subset of the Gabriel graph
  • Two points are neighbours if the lune defined
    by the intersection of their radial spheres is
    empty
  • Further reduces the number of neighbours
  • Decision boundary changes are often drastic, and
    not guaranteed to be training set consistent

Gabriel edited
RNG edited not consistent
20
  • Matlab demo

21
Dataset Reduction Editing
  • Training data may contain noise, overlapping
    classes
  • starting to make assumptions about the underlying
    distributions
  • Editing seeks to remove noisy points and produce
    smooth decision boundaries often by retaining
    points far from the decision boundaries
  • Results in homogenous clusters of points

22
Wilson Editing
  • Wilson 1972
  • Remove points that do not agree with the majority
    of their k nearest neighbours

Earlier example
Overlapping classes
Original data
Original data
Wilson editing with k7
Wilson editing with k7
23
Multi-edit
  1. Diffusion divide data into N 3 random subsets
  2. Classification Classify Si using 1-NN with
    S(i1)Mod N as the training set (i 1..N)
  3. Editing Discard all samples incorrectly
    classified in (2)
  4. Confusion Pool all remaining samples into a new
    set
  5. Termination If the last I iterations produced no
    editing then end otherwise go to (1)
  • Multi-edit Devijer Kittler 79
  • Repeatedly apply Wilson editing to random
    partitions
  • Classify with the 1-NN rule
  • Approximates the error rate of the Bayes decision
    rule

Multi-edit, 8 iterations last 3 same
Voronoi editing
24
Combined Editing/Condensing
  • First edit the data to remove noise and smooth
    the boundary
  • Then condense to obtain a smaller subset

25
Where are we?
  • Simple method, pretty powerful rule
  • Can be made to run fast
  • Requires a lot of training data
  • Edit to reduce noise, class overlap
  • Condense to remove redundant data

26
Questions
  • What distance measure to use?
  • Often Euclidean distance is used
  • Locally adaptive metrics
  • More complicated with non-numeric data, or when
    different dimensions have different scales
  • Choice of k?
  • Cross-validation
  • 1-NN often performs well in practice
  • k-NN needed for overlapping classes
  • Re-label all data according to k-NN, then
    classify with 1-NN
  • Reduce k-NN problem to 1-NN through dataset
    editing
Write a Comment
User Comments (0)
About PowerShow.com