Machine Learning - PowerPoint PPT Presentation

1 / 72

About This Presentation

Title:

Machine Learning

Description:

... brain; the cerebral cortex was a movie screen, so to speak, upon which the image ... Background: Hoffman 2001, Blei, Ng & Jordan, 2004 ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 73

Provided by: jakobv

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning

1
Machine Learning Category RepresentationCordeli
a Schmid Jakob Verbeek

Bag of words image representation
Fisher Kernels and Topic Models
Part of slides taken from Fei-Fei Li

2
Overview

Recap bag-of-words representation
Graphical models, naïve Bayes
Topic Models for texts and images
PLSA
LDA
Fisher vector image representation

3
Overview of relevant work

Early bag of words models mostly texture
recognition
Cula Dana, 2001 Leung Malik 2001 Mori,
Belongie Malik, 2001 Schmid 2001 Varma
Zisserman, 2002, 2003 Lazebnik, Schmid Ponce,
2003
Topic models for documents (pLSA, LDA, etc.)?
Hoffman 1999 Blei, Ng Jordan, 2004 Teh,
Jordan, Beal Blei, 2004
Object categorization
Csurka, Bray, Dance Fan, 2004 Sivic, Russell,
Efros, Freeman Zisserman, 2005 Sudderth,
Torralba, Freeman Willsky, 2005
Natural scene categorization
Vogel Schiele, 2004 Fei-Fei Perona, 2005
Bosch, Zisserman Munoz, 2006

4
(No Transcript)
5
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
6
A clarification definition of BoW

Looser definition
Independent features

7
A clarification definition of BoW

Looser definition
Independent features
Stricter definition
Independent features
histogram representation

8
(No Transcript)
9
Representation
2.
1.
3.
10
1.Feature detection and representation
11
1.Feature detection and representation

Regular grid
Vogel Schiele, 2003
Fei-Fei Perona, 2005

12
1.Feature detection and representation

Regular grid
Vogel Schiele, 2003
Fei-Fei Perona, 2005
Interest point detector
Csurka, et al. 2004
Fei-Fei Perona, 2005
Sivic, et al. 2005

13
1.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
14
1.Feature detection and representation
15
2. Codewords dictionary formation
16
2. Codewords dictionary formation

Quantization, eg
K-means
Mixture of Gaussians

Slide credit Josef Sivic
17
2. Codewords dictionary formation
Fei-Fei et al. 2005
18
Image patch examples of codewords
Sivic et al. 2005
19
3. Image representation
frequency
codewords
20
Representation
2.
1.
3.
21
Learning and Recognition
category models (and/or) classifiers
22
Learning and Recognition

Generative methods
- graphical models
Discriminative methods
- eg. SVM

category models (and/or) classifiers
23
Overview

Recap bag-of-words representation
Graphical models, naïve Bayes
Topic Models for texts and images
PLSA
LDA
Fisher vector image representation

24
slide from C. Guestrin
25
Example Bayesian Network
slide from K. Murphy
26
slide from K. Murphy
27
Plate notation for repetitions
slide from M. Jordan
28
Mixture (of Gaussian) example

Independently Identically Distributed Data
X1, X2, X3, , XN

n 1N
29
2 generative models

Naïve Bayes classifier
Csurka Bray, Dance Fan, 2004
Topic models (pLSA and LDA)?
Background Hoffman 2001, Blei, Ng Jordan, 2004
Object categorization Sivic et al. 2005,
Sudderth et al. 2005
Natural scene categorization Fei-Fei et al. 2005

30
First, some notations

wn patch in an image
w a collection of all N patches in an image
w w1,w2,,wN
dj the jth image in an image collection
c category of the image
z theme or topic of the patch

31
Case 1 the Naïve Bayes model
w
c
N
Object class decision
Prior prob. of the object classes
Image likelihood given the class
Csurka et al. 2004
32
Csurka et al. 2004
33
Csurka et al. 2004
34
Overview

Recap bag-of-words representation
Graphical models, naive Bayes
Topic Models for texts and images
PLSA
LDA
Fisher vector image representation

35
Topic Models for texts images

Words no longer independent
Seeing some words, you can guess topics to
predict other words

36
Topic Models for texts images

Document is mixture of (visual) topics
Each document has its own special mix
All documents mix the same set of topics

37
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Hoffman, 2001
Latent Dirichlet Allocation (LDA)?
Blei et al., 2001
38
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Sivic et al. ICCV 2005
39
Case 2 the pLSA model
40
Probabilistic Latent Semantic Analysis

w (visual) words
z topics
d documents / images
P(zd) document specific topic mix
P(wz) topic specific distribution over words
To sample a word
Sample topic from p(zd)?
Sample word from topic using p(wz)

41
Case 2 the pLSA model
Slide credit Josef Sivic
42
Topic Model Image Representation
43
EM algorithm for PLSA (1/2)?

Topic of patches unknown
EM algorithm for maximum likelihood estimation
E estimate topic of each patch
M update distributions
P(zd) distribution over topics given document
P(wz) distribution over words given topic
Data log-likelihood for a single document

44
EM algorithm for PLSA (2/2)?

E-step distribution on hidden variables
Image-wide context steers interpretation of patch
M-step maximize expected joint log-likelihood

45
EM algorithm for PLSA (2/2)?

E-step distribution on hidden variables
M-step maximize expected joint log-likelihood

46
PLSA as dimension reduction

PLSA makes image representation more compact
Can work better when using few training images
PLSA groups words that appear in similar context

47
Using Multiple Segmentations to Discover Objects
and their Extent in Image Collections Russel et
al. CVPR 2006

Multiple segmentations of image, all imperfect
Some segments pure, others mix categories
Goal find the pure segments objects?
PLSA segment document
Sort segments by how well they fit with a topic
p(wordstopic)?

48
Best segments of 4 topics in street scenes

Scene buildings cars road trees sky

49
PLSA vs Latent Dirichlet Allocation

How about a new image?
What is its likelihood under the model?
PLSA doesnt define what p(zd) can look like
LDA does specify a density over p(zd)?
Density over discrete distributions

50
Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)?
Fei-Fei et al. ICCV 2005
51
Latent Dirichlet Allocation

Image belongs to a class c
Class defines a typical mix of scene topics
Topic defines a distribution over visual words
Unknown/hidden variables
Image Topic mix theta
Patches corresponding topic z

52
Dirichlet density over distributions

Parameter controls sparsity
Given samples from the multinomial, what is
conditional distribution on it?
Again a Dirichlet, only the counts matter

53
LDA parameter estimation (1/2)?

Joint likelihood of words, topics, mixing weights
Marginal likelihood of the observed words
EM algorithm to learn alpha beta (topics)?
Unobserved variables mixing weights, topics

54
LDA parameter estimation (2/2)?

Conditional on topics and mixing weight
Intractable to compute
Approximate with product of two distributions
Variational EM algorithm
E-step 1 Update q(z)?
E-step 2 Update q(theta)?
M-step Update parameters

55
Variational Inference

Inference
estimation of hidden variables given data
Approximate inference
Used when exact inference is impossible
Comes in many flavors
Variational inference
Approximate complicated distribution with simpler
one

56
Variational EM algorithm

M-step as usual
Approximate E-step
Most often ignoring some of the dependencies
Remember bound-optimization picture
Here we have a sub-optimal E-step
Bounds no longer tight for
current parameters

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Summary

Generative models for bag-of-word image
representations
Naïve Bayes model
Poor model assumes all (visual) words
independent
Reasonable for classification, worse for scene
interpretation
Topic models PLSA, LDA
Model image as mix of scene elements
PLSA does not a complete generative model
LDA fixes this, but parameter estimation more
difficult

61
Overview

Recap bag-of-words representation
Graphical models, naïve Bayes
Topic Models for texts and images
PLSA
LDA
Fisher vector image representation

62
3. Image representation
frequency
codewords
63
Fisher Vector motivation

Feature vector quantization is expensive in
practice linear in
N nr. of feature vectors 103 per image
D nr. of dimensions 102 (SIFT)?
K nr. of clusters 104 for recognition
Looking at 109 multiplications per image
How to do this more efficiently ?!
See Fisher Kernels on Visual Vocabularies for
Image Categorization F. Perronnin and C. Dance,
in CVPR'07

64
Fisher Vector intuition 1/2

K-means or Mix. of Gauss. partitions space
Bag-of-word histogram stores nr. of features from
each type (cluster)?
Just the count of points in each cell!
Trade-off in nr of clusters
Many representation, cost
Few representation-, cost-

65
Fisher Vector Idea

Generative probabilistic model of data p(x)?
Represent signal with derivative of
log-likelihood
Use Mixture of Gaussians (MoG) as model
Use gradient for further processing such as
Classification
Clustering

66
Fisher Vector for MoG 1/2

Independent data points (descriptors) x

67
Fisher Vector for MoG 2/2

Derivatives for MoG components for T patches
Number of points assigned
Their average w.r.t. the mean
Their variation around the mean

1
D
D
68
Fisher Vector intuition

MoG / k-means stores nr of points per cell
Many cell needed to represent distrib in feat
space
Fischer vector adds 1st 2nd order moments
More precise description of cell contents
Fewer cells needed

2
2
3
5
1
1
5
3
2
2
4
8
4
2
2
8
4
2
2
4
1
2
1
3
2
2
3
2
69
Fisher vector size and speed

Much richer representation at same cost!
Size of Fisher vector (2xD1)xN
Size of Bag Of Visterms N
Both time to compute descriptor NxD

70
Example Images from categorization task PASCAL
Visual Object Challenge
71
Fisher Vector results

19 class data set beach, bicycling,birds,
boating, cats, clouds/sky, desert, dogs, owers,
golf, motorsports, mountains, people,
sunrise/sunset, surfing, underwater, urban,
waterfalls and wintersports
Training 30k images, 5k for test
Similar performance, using 16x fewer Gaussians
Unsupervised/universal representation good

72
Overview