Title: Machine Learning
1Machine Learning Category RepresentationCordeli
a Schmid Jakob Verbeek
- Bag of words image representation
- Fisher Kernels and Topic Models
- Part of slides taken from Fei-Fei Li
2Overview
- Recap bag-of-words representation
- Graphical models, naïve Bayes
- Topic Models for texts and images
- PLSA
- LDA
- Fisher vector image representation
3Overview of relevant work
- Early bag of words models mostly texture
recognition - Cula Dana, 2001 Leung Malik 2001 Mori,
Belongie Malik, 2001 Schmid 2001 Varma
Zisserman, 2002, 2003 Lazebnik, Schmid Ponce,
2003 - Topic models for documents (pLSA, LDA, etc.)?
- Hoffman 1999 Blei, Ng Jordan, 2004 Teh,
Jordan, Beal Blei, 2004 - Object categorization
- Csurka, Bray, Dance Fan, 2004 Sivic, Russell,
Efros, Freeman Zisserman, 2005 Sudderth,
Torralba, Freeman Willsky, 2005 - Natural scene categorization
- Vogel Schiele, 2004 Fei-Fei Perona, 2005
Bosch, Zisserman Munoz, 2006
4(No Transcript)
5Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
6A clarification definition of BoW
- Looser definition
- Independent features
7A clarification definition of BoW
- Looser definition
- Independent features
- Stricter definition
- Independent features
- histogram representation
8(No Transcript)
9Representation
2.
1.
3.
101.Feature detection and representation
111.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
121.Feature detection and representation
- Regular grid
- Vogel Schiele, 2003
- Fei-Fei Perona, 2005
- Interest point detector
- Csurka, et al. 2004
- Fei-Fei Perona, 2005
- Sivic, et al. 2005
131.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
141.Feature detection and representation
152. Codewords dictionary formation
162. Codewords dictionary formation
- Quantization, eg
- K-means
- Mixture of Gaussians
Slide credit Josef Sivic
172. Codewords dictionary formation
Fei-Fei et al. 2005
18Image patch examples of codewords
Sivic et al. 2005
193. Image representation
frequency
codewords
20Representation
2.
1.
3.
21Learning and Recognition
category models (and/or) classifiers
22Learning and Recognition
- Generative methods
- - graphical models
- Discriminative methods
- - eg. SVM
category models (and/or) classifiers
23Overview
- Recap bag-of-words representation
- Graphical models, naïve Bayes
- Topic Models for texts and images
- PLSA
- LDA
- Fisher vector image representation
24slide from C. Guestrin
25Example Bayesian Network
slide from K. Murphy
26slide from K. Murphy
27Plate notation for repetitions
slide from M. Jordan
28Mixture (of Gaussian) example
- Independently Identically Distributed Data
- X1, X2, X3, , XN
n 1N
292 generative models
- Naïve Bayes classifier
- Csurka Bray, Dance Fan, 2004
- Topic models (pLSA and LDA)?
- Background Hoffman 2001, Blei, Ng Jordan, 2004
- Object categorization Sivic et al. 2005,
Sudderth et al. 2005 - Natural scene categorization Fei-Fei et al. 2005
30First, some notations
- wn patch in an image
- w a collection of all N patches in an image
- w w1,w2,,wN
- dj the jth image in an image collection
- c category of the image
- z theme or topic of the patch
31Case 1 the Naïve Bayes model
w
c
N
Object class decision
Prior prob. of the object classes
Image likelihood given the class
Csurka et al. 2004
32Csurka et al. 2004
33Csurka et al. 2004
34Overview
- Recap bag-of-words representation
- Graphical models, naive Bayes
- Topic Models for texts and images
- PLSA
- LDA
- Fisher vector image representation
35Topic Models for texts images
- Words no longer independent
- Seeing some words, you can guess topics to
predict other words
36Topic Models for texts images
- Document is mixture of (visual) topics
- Each document has its own special mix
- All documents mix the same set of topics
37Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Hoffman, 2001
Latent Dirichlet Allocation (LDA)?
Blei et al., 2001
38Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Sivic et al. ICCV 2005
39Case 2 the pLSA model
40Probabilistic Latent Semantic Analysis
- w (visual) words
- z topics
- d documents / images
- P(zd) document specific topic mix
- P(wz) topic specific distribution over words
- To sample a word
- Sample topic from p(zd)?
- Sample word from topic using p(wz)
41Case 2 the pLSA model
Slide credit Josef Sivic
42Topic Model Image Representation
43EM algorithm for PLSA (1/2)?
- Topic of patches unknown
- EM algorithm for maximum likelihood estimation
- E estimate topic of each patch
- M update distributions
- P(zd) distribution over topics given document
- P(wz) distribution over words given topic
- Data log-likelihood for a single document
44EM algorithm for PLSA (2/2)?
- E-step distribution on hidden variables
- Image-wide context steers interpretation of patch
- M-step maximize expected joint log-likelihood
45EM algorithm for PLSA (2/2)?
- E-step distribution on hidden variables
- M-step maximize expected joint log-likelihood
46PLSA as dimension reduction
- PLSA makes image representation more compact
- Can work better when using few training images
- PLSA groups words that appear in similar context
47Using Multiple Segmentations to Discover Objects
and their Extent in Image Collections Russel et
al. CVPR 2006
- Multiple segmentations of image, all imperfect
- Some segments pure, others mix categories
- Goal find the pure segments objects?
- PLSA segment document
- Sort segments by how well they fit with a topic
p(wordstopic)?
48Best segments of 4 topics in street scenes
- Scene buildings cars road trees sky
49PLSA vs Latent Dirichlet Allocation
- How about a new image?
- What is its likelihood under the model?
- PLSA doesnt define what p(zd) can look like
- LDA does specify a density over p(zd)?
- Density over discrete distributions
50Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)?
Fei-Fei et al. ICCV 2005
51Latent Dirichlet Allocation
- Image belongs to a class c
- Class defines a typical mix of scene topics
- Topic defines a distribution over visual words
- Unknown/hidden variables
- Image Topic mix theta
- Patches corresponding topic z
52Dirichlet density over distributions
- Parameter controls sparsity
- Given samples from the multinomial, what is
conditional distribution on it? - Again a Dirichlet, only the counts matter
53LDA parameter estimation (1/2)?
- Joint likelihood of words, topics, mixing weights
- Marginal likelihood of the observed words
- EM algorithm to learn alpha beta (topics)?
- Unobserved variables mixing weights, topics
54LDA parameter estimation (2/2)?
- Conditional on topics and mixing weight
- Intractable to compute
- Approximate with product of two distributions
- Variational EM algorithm
- E-step 1 Update q(z)?
- E-step 2 Update q(theta)?
- M-step Update parameters
55Variational Inference
- Inference
- estimation of hidden variables given data
- Approximate inference
- Used when exact inference is impossible
- Comes in many flavors
- Variational inference
- Approximate complicated distribution with simpler
one
56Variational EM algorithm
- M-step as usual
- Approximate E-step
- Most often ignoring some of the dependencies
- Remember bound-optimization picture
- Here we have a sub-optimal E-step
- Bounds no longer tight for
- current parameters
57(No Transcript)
58(No Transcript)
59(No Transcript)
60Summary
- Generative models for bag-of-word image
representations - Naïve Bayes model
- Poor model assumes all (visual) words
independent - Reasonable for classification, worse for scene
interpretation - Topic models PLSA, LDA
- Model image as mix of scene elements
- PLSA does not a complete generative model
- LDA fixes this, but parameter estimation more
difficult
61Overview
- Recap bag-of-words representation
- Graphical models, naïve Bayes
- Topic Models for texts and images
- PLSA
- LDA
- Fisher vector image representation
623. Image representation
frequency
codewords
63Fisher Vector motivation
- Feature vector quantization is expensive in
practice linear in - N nr. of feature vectors 103 per image
- D nr. of dimensions 102 (SIFT)?
- K nr. of clusters 104 for recognition
- Looking at 109 multiplications per image
- How to do this more efficiently ?!
- See Fisher Kernels on Visual Vocabularies for
Image Categorization F. Perronnin and C. Dance,
in CVPR'07
64Fisher Vector intuition 1/2
- K-means or Mix. of Gauss. partitions space
- Bag-of-word histogram stores nr. of features from
each type (cluster)? - Just the count of points in each cell!
- Trade-off in nr of clusters
- Many representation, cost
- Few representation-, cost-
65Fisher Vector Idea
- Generative probabilistic model of data p(x)?
- Represent signal with derivative of
log-likelihood - Use Mixture of Gaussians (MoG) as model
- Use gradient for further processing such as
- Classification
- Clustering
66Fisher Vector for MoG 1/2
- Independent data points (descriptors) x
67Fisher Vector for MoG 2/2
- Derivatives for MoG components for T patches
- Number of points assigned
- Their average w.r.t. the mean
- Their variation around the mean
1
D
D
68Fisher Vector intuition
- MoG / k-means stores nr of points per cell
- Many cell needed to represent distrib in feat
space - Fischer vector adds 1st 2nd order moments
- More precise description of cell contents
- Fewer cells needed
2
2
3
5
1
1
5
3
2
2
4
8
4
2
2
8
4
2
2
4
1
2
1
3
2
2
3
2
69Fisher vector size and speed
- Much richer representation at same cost!
- Size of Fisher vector (2xD1)xN
- Size of Bag Of Visterms N
- Both time to compute descriptor NxD
70Example Images from categorization task PASCAL
Visual Object Challenge
71Fisher Vector results
- 19 class data set beach, bicycling,birds,
boating, cats, clouds/sky, desert, dogs, owers,
golf, motorsports, mountains, people,
sunrise/sunset, surfing, underwater, urban,
waterfalls and wintersports - Training 30k images, 5k for test
- Similar performance, using 16x fewer Gaussians
- Unsupervised/universal representation good
72Overview
- Graphical models, naïve Bayes
- Topic Models for texts and images
- PLSA probabilistic semantic analysis
- LDA latent Dirichlet allocation
- Fisher vector image representation
- Next lecture classification methods
- Linear methods SVM logistic discriminant
- The kernel trick for non-linear features