Title: Diapositiva 1
1Non Parametric Methods
Pattern Recognition and Machine Learning
Debrup Chakraborty
2Nearest Neighbor classification
Given Given a labeled sample of n feature
vectors ( call X) A distance measure (say the
Euclidian Distance)
To find The class label of a given feature
vector x which is not in X
3Nearest Neighbor classification (contd.)
The NN rule
Find the point y in X which is nearest to
x Assign the label of y to x
4Nearest Neighbor classification (contd.)
This rule allows us to partition the feature
space into cells consisting of all points closer
to a given training point x All points in such
cells are labeled by the class of the training
point. This partitioning is called a Voronoi
Tesselation
5Nearest Neighbor classification (contd.)
Voronoi Cells in 2d
6Nearest Neighbor classification
Complexity of the NN rule
Distance calculation
Finding the minimum distance
7Nearest Neighbor classification
Nearest Neighbor Editing
X Data set, n no of training points,
j0 Construct the full Voronoi diagram for X Do
jj1, for each point xj in X
find Voronoi neighbors of xj If
any neighbor is not from the same class as xj
then mark xj Until jn Discard all
points that are not marked.
8k nearest neighbor classification
Given Given a labeled sample of N feature
vectors ( call X) A distance measure (say the
Euclidian Distance) An integer k (generally odd)
To find The class label of a given feature
vector x which is not in X
9k-NN classification (contd.)
Algorithm
Find out the k nearest neighbors of x in X
Call them
Out of the k samples, let ki of them belong to
class ci .
Choose that ci to be the class of x for which ki
is maximum
10K-nn Classification
Class 1
Class 2
Class 3
z
11k-NN classification (contd.)
Distance weighted nearest neighbor
In case xxi, return f(xi)
Training set
Given an instance x to be classified Let
be the nearest neighbors of x Return
12Remarks on k-NN classification
- The distance weighted kNN is robust to noisy
training data and is quite effective when it is
provided a sufficiently large set of training
examples. - One drawbak of kNN method is that, it defers all
computation till a new querry point is presented.
Various methods have been developed to index the
training examples so that the nearest neighbor
can be found with less search time. One such
indexing method is kd-tree developed by Bently
1975 - kNN is a lazy learner
13Locally Weighted Regression
- In the linear regression problem, to find h(x) at
a point x we would do the following - Minimize
- Output
14Locally Weighted Regression
- In the llocally weighted regression problem we
would do the following - Minimize
- Output
- A standard choice of weights is
- is called the bandwidth parameter
15Clustering
Is different from Classification
Classification is partitioning the feature
space whereas Clustering is partitioning the data
intohomogeneous groups
Clustering is Unsupervised!!
16K-means Clustering
Given A data set
Fix the number of clusters K
Let represent the i-th cluster center
(prototype) at the k-th iteration
Let represent the j-th cluster at the k-th
iteration
17K-means Clustering
Steps
- Choose the initial cluster centers
- At the k-th iterative step distribute the points
in X in K cluster using -
- Compute
- If then
the procedure has - converged else repeat from 2.