Title: Kernel Methods for Weakly Supervised Mean Shift Clustering
1Kernel Methods for Weakly Supervised Mean Shift
Clustering
- Oncel Tuzel Fatih Porikli
- Mitsubishi Electric Research Labs
- Peter Meer
- Rutgers University
2Outline
- Motivation
- Mean Shift
- Method Overview
- Kernel Mean Shift
- Constrained Kernel Mean Shift
- Experiments
- Conclusion
3Motivation
- Clustering is an ambiguous task
- In many cases, the initially designed similarity
metric fails to resolve the ambiguities - Simple supervision can guide clustering to
desired structure - We present a semi supervised mean shift
clustering algorithm based on pair-wise
similarities
4Mean Shift
- Given n data points xi on Rd and associated
bandwidths hi, the sample point density estimator
is given by - where k(x) is the kernel profile
- Stationary points of the density can be found via
the mean shift procedure -
-
-
- where
5Mean Shift Clustering
- Mean shift iterations are initialized at the data
points - The cluster centers are located by the mean shift
procedure - The data points associated with the same local
maxima of the density function produce a
partitioning of the space - There is no systematic semi supervised mean shift
algorithm
6Method Overview
Embedded Space
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- The supervision is given in the form of a few
pair-wise similarity constraints - We embed the input space to a space where the
constraint pairs are associated with the same
mode - Mode seeking is performed on the embedded space
- The method preserves all the advantages of mean
shift clustering
.
.
.
.
.
.
x
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
.
.
.
x
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x
.
.
x
Input Space
7Pair-wise Constraints on the Input Space
- Data points are projected to the null space of
the constraint matrix - Since the constraint point pairs overlap after
projection, they are clustered together - The method fails if the clusters are not linearly
separable - At most d-1 constraints can be defined
Projection
Input Points
Constraint Vector
Clustering
8Pair-wise Constraints on the Feature Space
- The method can be extended to handle increasing
number of constraints or to linearly inseparable
case using a mapping function - The mapping embeds the input space to an
enlarged feature space - The projection is performed on the feature space
- Defining mapping explicitly is not practical
- Solution Kernel Trick
Input Points
Mapping to Feature Space
Constraint Vector
Projection
Clustering
9Kernel Mean Shift (Explicit Form)
- Given and a p.s.d. kernel
satisfying -
- where
- The density estimator at is given by
- The stationary points can be found via the mean
shift procedure
10Kernel Mean Shift (Implicit Form)
- Let
be the dimensional feature matrix
and be the
dimensional Kernel matrix - At each iteration the estimate, , lies is the
column space of and any point on the subspace
can be written as - The distance between two points and is
given by - The implicit form of mean shift updates the
weighting vectors -
- where denote the i-th canonical basis for
Rn
11Kernel Mean Shift Clustering
- The clustering algorithm starts on the data
points - Upon convergence the mode can be expressed via
- When the rank of the kernel matrix K is smaller
than n, columns of form an overcomplete basis
and the modes can be identified within an
equivalence relationship - The procedure is restricted to the subspace
spanned by the feature points therefore - The convergence of the procedure follows from the
original proof
12Constrained Kernel Mean Shift
Feature Space
- Let be the set of
point pairs to be clustered together - The constraint matrix is given by
- The null space of A is the set of vectors
-
- and the matrix
-
- projects to
- Under the projection the constraint point pairs
are overlapped
Projection
13Constrained Kernel Mean Shift
- The constrained mean shift algorithm implicitly
maps the data points to null space of the
constraint matrix - and performs mean shift on the embedded space
- This process is equivalent to applying kernel
mean shift algorithm with the projected kernel
function - The projected Kernel matrix only involves mapping
through the kernel function and can be
expressed in terms of original Kernel matrix - where
is the part of the Kernel matrix involving
constraint set and is the
scaling matrix
14Experiments
- We conduct experiments on three datasets
- Synthetic experiments
- Clustering faces across illumination on CMU PIE
dataset - Clustering object categories on Caltech-4 dataset
- For the first two experiments we utilize Gaussian
kernel function -
- For the last experiment we utilize kernel
function - We use adaptive bandwidth mean shift where the
bandwidth for each point is selected as the k-th
smallest distance from the point to all the data
points on the feature space
15Clustering Linear Structure
Data Points
Mean Shift
Constrained Mean Shift
- We generated 240 data points originating from six
different lines - Data is corrupted with normally distributed noise
with standard deviation 0.1 - Three pair-wise constraints are given
16Clustering Circular Structure
Data Points
Data Points with Outliers
- We generated 200 data points originating from
five concentric circles - Data is corrupted with normally distributed noise
with standard deviation 0.1 - 80 outlier points are added
- Four pair-wise constraints are enforced from the
same circle
Mean Shift
Constrained Mean Shift
17Clustering Faces Across Illumination
Samples from CMU PIE Dataset
Constraint Set
- Dataset contains 441 images from 21 subjects
under 21 different illumination conditions - Images are coarsely registered and scaled to the
same size 128x128 - Each image is represented with a
16384-dimensional vector - Two pair-wise similarity constraints are given
per subject - Approximately 1/10 of the dataset is labeled
18Clustering Faces with Mean Shift
Pair-wise Distances
Mean Shift
- Mean shift finds 5 clusters corresponding to
partly illumination conditions, partly subject
labels
19Clustering Faces with Constrained Mean Shift
Pair-wise Distances after Embedding
Constrained Mean Shift
- Constrained mean shift recovers all 21 subjects
perfectly
20Clustering Object Categories
Samples from Caltech-4 Dataset
- Dataset contains 400 images from four object
categories cars, motorcycles, faces, airplanes - Each image is represented with a 500 bin feature
histogram - Pair-wise constraints are randomly selected
within classes - Experiment is repeated with varying number of
constraints (1 to 20 constraints per object class)
21Clustering Object Categories with Mean Shift
Pair-wise Distances
Mean Shift
- Some of the samples from airplanes class and half
of the motorcycles class are incorrectly
identified as cars - The overall clustering accuracy is 74.25
22Clustering Object Categories with Constrained
Mean Shift
Pair-wise Distances after Embedding
Constrained Mean Shift
- Clustering example after enforcing 10 constraints
per class - Only a single example among 400 is misclustered
23Clustering Performance vs. Number of Constraints
- The results are averaged over 20 runs where at
each run a different constraint set is selected - Clustering accuracy is over 99 for more than 7
constraints per class
24Conclusion
- We presented a novel constrained mean shift
clustering method that can incorporate pair-wise
must-link priors - The method preserves all the advantages of the
original mean shift clustering algorithm - The presented approach also extends to inner
product spaces thus, it is applicable to a wide
range of problems