Clustering with kmeans and mixture of Gaussian densities - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Clustering with kmeans and mixture of Gaussian densities

Description:

Divide space of region descriptors in a collection of non-overlapping cells ... Assignment of data point xn, to one of the clusters: zn ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 25
Provided by: Verb6
Category:

less

Transcript and Presenter's Notes

Title: Clustering with kmeans and mixture of Gaussian densities


1
Clustering with k-means and mixture of Gaussian
densities
  • Jakob Verbeek
  • December 4, 2009

2
Plan for this course
  • Introduction to machine learning
  • Clustering techniques
  • k-means, Gaussian mixture density
  • Gaussian mixture density continued
  • Parameter estimation with EM, Fisher kernels
  • Classification techniques 1
  • Introduction, generative methods, semi-supervised
  • Classification techniques 2
  • Discriminative methods, kernels
  • Decomposition of images
  • Topic models,

3
Clustering
  • Finding a group structure in the data
  • Data in one cluster similar to each other
  • Data in different clusters dissimilar
  • Map each data point to a discrete cluster index
  • flat methods find k groups (k known, or
    automatically set)
  • hierarchical methods define a tree structure
    over the data

4
Hierarchical Clustering
  • Data set is partitioned into a tree structure
  • Top-down construction
  • Start all data in one cluster root node
  • Apply flat clustering into k groups
  • Recursively cluster the data in each group
  • Bottom-up construction
  • Start with all points in separate cluster
  • Recursively merge closest clusters
  • Distance between clusters A and B
  • Min, max, or mean distance
  • between x in A, and y in B

5
Clustering example
  • Learn face similarity from training pairs labeled
    as same/different
  • Cluster faces based on identity
  • Example picasa web albums, label face clusters

Guillaumin, Verbeek, Schmid, ICCV 2009
6
Clustering example visual words
7
Clustering for visual vocabulary construction
  • Clustering of local image descriptors
  • Most often done using k-means or mixture of
    Gaussians
  • Divide space of region descriptors in a
    collection of non-overlapping cells
  • Recap of the image representation pipe-line
  • Extract image regions at different locations and
    scales randomly, on a regular grid, or using
    interest point detector
  • Compute descriptor for each region (eg SIFT)
  • Assign each descriptor to a cluster center
  • Or do soft assignment or multiple assignment
  • Make histogram for complete image
  • Possibly separate histograms for different image
    regions

8
Definition of k-means clustering
  • Given data set of N points xn, n1,,N
  • Goal find K cluster centers mk, k1,,K
  • Clustering assign each point to closest center
  • Error criterion sum of squared distances to
    closest cluster center for each data point

9
Examples of k-means clustering
  • Data uniformly sampled in R2
  • Data non-uniformly sampled in R3

10
Minimizing the error function
  • The error function is
  • non-differentiable due to the min operator
  • Non-convex, i.e. there are local maxima
  • Minimization can be done with iterative algorithm
  • Initialize cluster centers
  • Assign each data point to nearest center
  • Update the cluster centers as mean of associated
    data
  • If cluster centers changed return to step 2)
  • Return cluster centers
  • Iterations monotonically decrease error function

11
Iteratively minimizing the error function
  • Introduce latent variable zn, with value in
    1,, K
  • Assignment of data point xn, to one of the
    clusters zn
  • Upper-bound on error function, without min
    operator
  • Error function and bound equal for the min
    assignment
  • Minimize the bound w.r.t. cluster centers
  • Update the cluster centers as mean of associated
    data

12
Iteratively minimizing the error function
  • Minimization can be done with iterative algorithm
  • Assign each data point to nearest center
  • Construct tight bound on error function
  • Update the cluster centers as mean of associated
    data
  • Minimize bound
  • Example of Iterative bound optimization
  • EM algorithm another example

1
2
13
Examples of k-means clustering
  • Several iterations with two centers

Error function
14
Examples of k-means clustering
  • Clustering RGB vectors of pixels in images
  • Compression of image file N x 24 bits
  • Store RGB values of cluster centers K x 24 bits
  • Store cluster index of each pixel N x log K bits

16.7
8.3
4.2
15
Clustering with Gaussian mixture density
  • Each cluster represented by Gaussian density
  • Center, as in k-means
  • Covariance matrix cluster spread around center

Determinant of covariance matrix C
Quadratic function of point x and mean m
Data dimension d
16
Clustering with Gaussian mixture density
  • Mixture density is weighted sum of Gaussians
  • Mixing weight importance of each cluster
  • Density has to integrate to 1, so we require

17
Clustering with Gaussian mixture density
  • Given data set of N points xn, n1,,N
  • Find mixture of Gaussians (MoG) that best
    explains data
  • Assigns maximum likelihood to the data
  • Assume data points are drawn independently from
    MoG
  • Maximize log-likelihood of fixed data set X
    w.r.t. parameters of MoG
  • As with k-means objective function has local
    minima
  • Can use Expectation-Maximization (EM) algorithm
  • Similar to the iterative k-means algorithm

18
Assignment of data points to clusters
  • As with k-means zn indicates cluster index for xn
  • To sample point from MoG
  • Select cluster index k with probability given by
    mixing weight
  • Sample point from the k-th Gaussian
  • MoG recovered if we marginalize of unknown index

19
Soft assignment of data points to clusters
  • Given data point xn, infer value of zn
  • Conditional probability of zn given xn

20
Maximum likelihood estimation of Gaussian
  • Given data points xn, n1,,N
  • Find Gaussian that maximizes data log-likelihood
  • Set derivative of data log-likelihood w.r.t.
    parameters to zero
  • Parameters set as data covariance and mean

21
Maximum likelihood estimation of MoG
  • Use EM algorithm
  • Initialize MoG parameters or soft-assign
  • E-step soft assign of data points to clusters
  • M-step update the cluster parameters
  • Repeat EM steps, terminate if converged
  • Convergence of parameters or assignments
  • E-step compute posterior on z given x
  • M-step update Gaussians from data points
    weighted by posterior

22
Maximum likelihood estimation of MoG
  • Example of several EM iterations

23
Clustering with k-means and MoG
  • Hard assignment in k-means is not robust near
    border of quantization cells
  • Soft assignment in MoG accounts for ambiguity in
    the assignment
  • Both algorithms sensitive for initialization
  • Run from several initializations
  • Keep best result
  • Nr of clusters need to be set
  • Both algorithm can be generalized to other types
    of distances or densities

Images from Gemert et al, IEEE TPAMI, 2010
24
Plan for this course
  • Introduction to machine learning
  • Clustering techniques
  • k-means, Gaussian mixture density
  • Reading for next week
  • Neal Hinton A view of the EM algorithm that
    justifies incremental, sparse, and other
    variants, in Learning in graphical
    models,1998.
  • Part of chapter 3 of my thesis
  • Both available on course website
    http//lear.inrialpes.fr/verbeek/teaching
  • Gaussian mixture density continued
  • Parameter estimation with EM, Fisher kernels
  • Classification techniques 1
  • Introduction, generative methods, semi-supervised
  • Classification techniques 2
  • Discriminative methods, kernels
Write a Comment
User Comments (0)
About PowerShow.com