Title: Clustering with kmeans and mixture of Gaussian densities
1Clustering with k-means and mixture of Gaussian
densities
- Jakob Verbeek
- December 4, 2009
2Plan for this course
- Introduction to machine learning
- Clustering techniques
- k-means, Gaussian mixture density
- Gaussian mixture density continued
- Parameter estimation with EM, Fisher kernels
- Classification techniques 1
- Introduction, generative methods, semi-supervised
- Classification techniques 2
- Discriminative methods, kernels
- Decomposition of images
- Topic models,
3Clustering
- Finding a group structure in the data
- Data in one cluster similar to each other
- Data in different clusters dissimilar
- Map each data point to a discrete cluster index
- flat methods find k groups (k known, or
automatically set) - hierarchical methods define a tree structure
over the data
4Hierarchical Clustering
- Data set is partitioned into a tree structure
- Top-down construction
- Start all data in one cluster root node
- Apply flat clustering into k groups
- Recursively cluster the data in each group
- Bottom-up construction
- Start with all points in separate cluster
- Recursively merge closest clusters
- Distance between clusters A and B
- Min, max, or mean distance
- between x in A, and y in B
5Clustering example
- Learn face similarity from training pairs labeled
as same/different - Cluster faces based on identity
- Example picasa web albums, label face clusters
Guillaumin, Verbeek, Schmid, ICCV 2009
6Clustering example visual words
7Clustering for visual vocabulary construction
- Clustering of local image descriptors
- Most often done using k-means or mixture of
Gaussians - Divide space of region descriptors in a
collection of non-overlapping cells - Recap of the image representation pipe-line
- Extract image regions at different locations and
scales randomly, on a regular grid, or using
interest point detector - Compute descriptor for each region (eg SIFT)
- Assign each descriptor to a cluster center
- Or do soft assignment or multiple assignment
- Make histogram for complete image
- Possibly separate histograms for different image
regions
8Definition of k-means clustering
- Given data set of N points xn, n1,,N
- Goal find K cluster centers mk, k1,,K
- Clustering assign each point to closest center
- Error criterion sum of squared distances to
closest cluster center for each data point
9Examples of k-means clustering
- Data uniformly sampled in R2
- Data non-uniformly sampled in R3
10Minimizing the error function
- The error function is
- non-differentiable due to the min operator
- Non-convex, i.e. there are local maxima
- Minimization can be done with iterative algorithm
- Initialize cluster centers
- Assign each data point to nearest center
- Update the cluster centers as mean of associated
data - If cluster centers changed return to step 2)
- Return cluster centers
- Iterations monotonically decrease error function
11Iteratively minimizing the error function
- Introduce latent variable zn, with value in
1,, K - Assignment of data point xn, to one of the
clusters zn - Upper-bound on error function, without min
operator - Error function and bound equal for the min
assignment - Minimize the bound w.r.t. cluster centers
- Update the cluster centers as mean of associated
data -
12Iteratively minimizing the error function
- Minimization can be done with iterative algorithm
- Assign each data point to nearest center
- Construct tight bound on error function
- Update the cluster centers as mean of associated
data - Minimize bound
- Example of Iterative bound optimization
- EM algorithm another example
1
2
13Examples of k-means clustering
- Several iterations with two centers
Error function
14Examples of k-means clustering
- Clustering RGB vectors of pixels in images
- Compression of image file N x 24 bits
- Store RGB values of cluster centers K x 24 bits
- Store cluster index of each pixel N x log K bits
16.7
8.3
4.2
15Clustering with Gaussian mixture density
- Each cluster represented by Gaussian density
- Center, as in k-means
- Covariance matrix cluster spread around center
Determinant of covariance matrix C
Quadratic function of point x and mean m
Data dimension d
16Clustering with Gaussian mixture density
- Mixture density is weighted sum of Gaussians
- Mixing weight importance of each cluster
- Density has to integrate to 1, so we require
17Clustering with Gaussian mixture density
- Given data set of N points xn, n1,,N
- Find mixture of Gaussians (MoG) that best
explains data - Assigns maximum likelihood to the data
- Assume data points are drawn independently from
MoG - Maximize log-likelihood of fixed data set X
w.r.t. parameters of MoG - As with k-means objective function has local
minima - Can use Expectation-Maximization (EM) algorithm
- Similar to the iterative k-means algorithm
18Assignment of data points to clusters
- As with k-means zn indicates cluster index for xn
- To sample point from MoG
- Select cluster index k with probability given by
mixing weight - Sample point from the k-th Gaussian
- MoG recovered if we marginalize of unknown index
19Soft assignment of data points to clusters
- Given data point xn, infer value of zn
- Conditional probability of zn given xn
20Maximum likelihood estimation of Gaussian
- Given data points xn, n1,,N
- Find Gaussian that maximizes data log-likelihood
- Set derivative of data log-likelihood w.r.t.
parameters to zero - Parameters set as data covariance and mean
21Maximum likelihood estimation of MoG
- Use EM algorithm
- Initialize MoG parameters or soft-assign
- E-step soft assign of data points to clusters
- M-step update the cluster parameters
- Repeat EM steps, terminate if converged
- Convergence of parameters or assignments
- E-step compute posterior on z given x
- M-step update Gaussians from data points
weighted by posterior
22Maximum likelihood estimation of MoG
- Example of several EM iterations
23Clustering with k-means and MoG
- Hard assignment in k-means is not robust near
border of quantization cells - Soft assignment in MoG accounts for ambiguity in
the assignment - Both algorithms sensitive for initialization
- Run from several initializations
- Keep best result
- Nr of clusters need to be set
- Both algorithm can be generalized to other types
of distances or densities
Images from Gemert et al, IEEE TPAMI, 2010
24Plan for this course
- Introduction to machine learning
- Clustering techniques
- k-means, Gaussian mixture density
- Reading for next week
- Neal Hinton A view of the EM algorithm that
justifies incremental, sparse, and other
variants, in Learning in graphical
models,1998. - Part of chapter 3 of my thesis
- Both available on course website
http//lear.inrialpes.fr/verbeek/teaching - Gaussian mixture density continued
- Parameter estimation with EM, Fisher kernels
- Classification techniques 1
- Introduction, generative methods, semi-supervised
- Classification techniques 2
- Discriminative methods, kernels