Title: Unsupervised Clustering of Images using their Joint Segmentation
1Unsupervised Clustering of Images using their
Joint Segmentation
- Sonia Starik, Yevgeny Seldin, Michael Werman
2Problem Statement
- Classify set of images by their content
similarity - Examples
- photos from traveling
- image database organization
- movie segmentation
- Unsupervised clustering based on prior joint
segmentation of the image set
3Major idea
Joint Segmentation
Classification
Segmented images Use segmentslt-gtwords imageslt-gtdoc
uments analogy
Images
Classified Images
4Algorithm Framework
- Relate to an image as a collection of homogeneous
textures, where each texture can be identified by
its statistical nature - for each texture, assume it was sampled i.i.d.
from a single distribution - identify it by marginal density of its wavelet
subband coefficients
5Algorithm Framework
- Represent each image as a soft mixture of a small
number of textures common to all the images in
the set - different regions have different textures
- same texture may appear at many locations in the
set - ? segment the image set into a family of textures
constituting it - use Deterministic Annealing framework for
unsupervised segmentation
6Algorithm Framework
- Use co-occurrences statistics of model centroids
(that identify textures) and images in order to
cluster the images - parallel words/documents ? segments/images
- known and formally justified algorithms
7General Schema
- Preprocessing
- build parametric models for image sub-windows
- use wavelet coefficients statistics for the
models - Segmentation
- jointly segment set of images to obtain small set
of global models and soft assignments of images
regions to these models - top-down hierarchical unsupervised segmentation
- based on joint work of Seldin, Bejerano, Tishby
on natural languages and protein sequences - reminiscent to work of Hofmann, Puzicha, Buhmann
8General Schema (cont.)
- Image classification according to segmentation
map - compute statistics of segments-images
co-occurrences - for each image obtain conditional probability
distribution P(segmentimage),
?segmentP(segmentimage)1 - divide the image set into k clusters based on
this statistics - use sequential Information Bottleneck algorithm
(N.Slonim, N.Friedman, T.Tishby)
9Preprocessing Step
- Make an overlapping net of small square windows
of a predefined size for each image - overlapping spatial coherence
- Build parametric model for each window s.t
- good similarity measure between two windows can
be defined - average model of n window models can be computed
- small perturbations on a model can be performed
- Natural texture modeling - by its wavelet
statistics - (Do, Vetterly)
10Preprocessing Step - Window Modeling
- Model a window by marginal density of its wavelet
subband coefficients - perform a conventional wavelet decomposition
pyramid with L (usually L3) levels (with
Daubechies or R-Biorthogonal filters) - build histogram of wavelet coefficients for each
subband - normalize histograms to obtain probability
distributions - use the resulting set of distributions as a
parametric model for the window
11Preprocessing Step - Window Modeling
- Building histogram for a wavelet subband
- number of histogram bins square root of the
number of coefficients in the subband - optimal tradeoff of resolution and statistical
significance - construct histogram bins s.t. each bin will
contain approximately same number of samples - coarsely estimate coefficients distribution of
this subband through the whole set of windows -
Gaussian fitting - use the inverse Gaussian distribution in bins
construction - normalize histograms to obtain probability
distributions
12Preprocessing Step - Window Modeling
- Kullback-Leibler divergence as a common measure
of similarity between distributions p,q - Dkl?xp(x)log(p(x)/q(x))
- Distance from a window model H to a centroid
model M - weighted sum of pairwise distances Dkl(HlMl)
for each subband l Dtotal(HM)?lwlDkl(HlMl)
- number of coefficients decreases at lower
resolutions ? decrease weights wl accordingly
13Preprocessing Step - Centroid Modeling
- To build a centroid model for a weighted set of
image windows build an average model - Foreach subband l
- WeightedAverageModell?kwkHk,l
- Centroid model M found this way minimizes the
distortion (sum of distances of the windows to
M) - minM?kDtotal (HkM)
- Same parametric structure for centroid model and
window models - Can use color properties instead
14Joint Segmentation
- Goal find k segment models ?Mj?j1k to minimize
the average in-segment distortion - ?d?1/N?j?iP(Mjxi)d(Mj,xi)
- Solve it using DA framework to obtain
segmentations for different levels of resolution - Iteratively
- apply EM algorithm to obtain segmentation in
current resolution - when EM converges, increase resolution and repeat
15Segmentation - General Schema
- Start with a single average model for all the
histograms in the set with initial (small)
resolution - Repeatedly split the largest model into 2
almost identical models with small assym.
perturbations - Use EM to obtain new segmentation
- Re-assign the data (image patches) to the new set
of models - Re-estimate the models from the assigned regions
- Merge/Eliminate identical/small models
- Increase resolution
16Soft-Clustering (EM) Step
- Assigning image patches to the given models
- minimize objective functional minP(Mjxi)i,j?d
??DI(X,M) - I(X,M)1/N ?i?jP(Mjxi)log(P(Mjxi)/P(Mj))
- idea - assignment that minimizes mutual
information subject to given constraints (maximal
permitted distortion in our case) is the most
probable one - Model re-estimation
- MjWeightedAverageModel(H(patch),wMj(patch))
17Soft-Clustering (EM) Step
18Deterministic Annealing Properties
- At each given resolution there is (finite) number
K? of models required to describe the data, extra
models will be unified (have identical
parameters) or disappear (P(M)0, no data
assigned to it) - Avoids local minima
- Provides hierarchy of segmentation solutions
19Forced Hierarchy
- New algorithmic enhancement - forced hierarchy
- child models created after a split are restricted
to the data belonged to their parent only - limits the algorithms search space
- speed up of the segmentation process
20Choosing the next model to split
- Previous solution - choose the model M having the
largest portion of data assigned to it (max P(M)) - division to approximately even-sized segments
- Improvement - choose the model including the most
variable data - JS(M1,M2)(P(M1)/P(M))Dkl(M1M)(P(M2)/P(M))Dkl
(M2M) - choose maxMP(M)JS(M1,M2)
- reduction of distortion achieved if we replace M
with M1, M2 - Less small models (less insignificant model
splits), each split gives maximum reduction to
the total distortion, enables top-down clustering
21Forced Hierarchy DA
Initial model (all data average)
M
Tentative split
JS
M2
M1
Score(M) P(M) JS(M1,M2)
JS(M1,M2) P(M1)/P(M) Dkl(M1M) P(M2)/P(M)
Dkl(M2M)
22Spatial Coherence
- Shifted grid
- Each patch belongs to 4 windows
- Distance between a model and a patch
- d(M,patch)?windowwindow?patchDkl(H(window)M)/(
number of windows intersecting with the patch) - Assignment probability of a window
- wM(window)P(Mwindow)?patchwindow?patchwM(patch
)/(number of patches)
23Whole Segmentation Algorithm
24Image Classification according to Segmentation Map
- Brings the analogy image-document, model-word
- P(MI)1/N(I)?I?xiP(Mxi) analogous to the
normalized count of words that appear in a
document - Goal represent set of input images Ii with a
small number of clusters Ck s.t. the
distribution of models Mj (the features) inside
Ck will be maximal close to original P(MjIi) for
all Ck?Ii
25Image Classification according to Segmentation Map
- Measure of quality of representation
- I(CM)/I(IM) (maximize)
- Use sIB algorithm to approximate the solution
26Sequential Information Bottleneck
- Start from a random partition of the data into k
clusters - Sequentially take a random sample Ii from its
cluster and reassign it to a new cluster C s.t.
I(CM) (and thus I(CM)/I(IM)) maximized - I(CM) grows monotonically ? algorithm converges
27Experimental Results
28Original Dataset
29Division into 2 clusters
30Division into 3 clusters
31Division into 4 clusters
32Division into 5 clusters
33Division into 6 clusters
34Experimental Results
35Experimental Results
36Experimental Results
37Experimental Results
38Summary
- We took set of still images
- The images are segmented into soft mixtures of
small number of models global to all the set - The measure of similarity between different
regions in image was based on wavelet
coefficients statistics - Finally, we used parallel of words/documents
segments/images co-occurrences to apply sIB
algorithm to classify the images based on their
content similarity
39Discussion - future work
- Other statistics for modeling - color, other
wavelets, beamlets, curvelets - More sophisticated statistics of segments in
images in the classification step - Application in other fields of research (top-down
segmentation and following classification of
protein sequences)