Machine Learning - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Machine Learning

Description:

... brain; the cerebral cortex was a movie screen, so to speak, upon which the image ... Background: Hoffman 2001, Blei, Ng & Jordan, 2004 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 73
Provided by: jakobv
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning


1
Machine Learning Category RepresentationCordeli
a Schmid Jakob Verbeek
  • Bag of words image representation
  • Fisher Kernels and Topic Models
  • Part of slides taken from Fei-Fei Li

2
Overview
  • Recap bag-of-words representation
  • Graphical models, naïve Bayes
  • Topic Models for texts and images
  • PLSA
  • LDA
  • Fisher vector image representation

3
Overview of relevant work
  • Early bag of words models mostly texture
    recognition
  • Cula Dana, 2001 Leung Malik 2001 Mori,
    Belongie Malik, 2001 Schmid 2001 Varma
    Zisserman, 2002, 2003 Lazebnik, Schmid Ponce,
    2003
  • Topic models for documents (pLSA, LDA, etc.)?
  • Hoffman 1999 Blei, Ng Jordan, 2004 Teh,
    Jordan, Beal Blei, 2004
  • Object categorization
  • Csurka, Bray, Dance Fan, 2004 Sivic, Russell,
    Efros, Freeman Zisserman, 2005 Sudderth,
    Torralba, Freeman Willsky, 2005
  • Natural scene categorization
  • Vogel Schiele, 2004 Fei-Fei Perona, 2005
    Bosch, Zisserman Munoz, 2006

4
(No Transcript)
5
Analogy to documents
Of all the sensory impressions proceeding to the
brain, the visual experiences are the dominant
ones. Our perception of the world around us is
based essentially on the messages that reach the
brain from our eyes. For a long time it was
thought that the retinal image was transmitted
point by point to visual centers in the brain
the cerebral cortex was a movie screen, so to
speak, upon which the image in the eye was
projected. Through the discoveries of Hubel and
Wiesel we now know that behind the origin of the
visual perception in the brain there is a
considerably more complicated course of events.
By following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to demonstrate
that the message about the image falling on the
retina undergoes a step-wise analysis in a system
of nerve cells stored in columns. In this system
each cell has its specific function and is
responsible for a specific detail in the pattern
of the retinal image.
6
A clarification definition of BoW
  • Looser definition
  • Independent features

7
A clarification definition of BoW
  • Looser definition
  • Independent features
  • Stricter definition
  • Independent features
  • histogram representation

8
(No Transcript)
9
Representation
2.
1.
3.
10
1.Feature detection and representation
11
1.Feature detection and representation
  • Regular grid
  • Vogel Schiele, 2003
  • Fei-Fei Perona, 2005

12
1.Feature detection and representation
  • Regular grid
  • Vogel Schiele, 2003
  • Fei-Fei Perona, 2005
  • Interest point detector
  • Csurka, et al. 2004
  • Fei-Fei Perona, 2005
  • Sivic, et al. 2005

13
1.Feature detection and representation
Compute SIFT descriptor Lowe99
Normalize patch
Detect patches Mikojaczyk and Schmid 02 Mata,
Chum, Urban Pajdla, 02 Sivic Zisserman,
03
Slide credit Josef Sivic
14
1.Feature detection and representation
15
2. Codewords dictionary formation
16
2. Codewords dictionary formation
  • Quantization, eg
  • K-means
  • Mixture of Gaussians

Slide credit Josef Sivic
17
2. Codewords dictionary formation
Fei-Fei et al. 2005
18
Image patch examples of codewords
Sivic et al. 2005
19
3. Image representation
frequency
codewords
20
Representation
2.
1.
3.
21
Learning and Recognition
category models (and/or) classifiers
22
Learning and Recognition
  • Generative methods
  • - graphical models
  • Discriminative methods
  • - eg. SVM

category models (and/or) classifiers
23
Overview
  • Recap bag-of-words representation
  • Graphical models, naïve Bayes
  • Topic Models for texts and images
  • PLSA
  • LDA
  • Fisher vector image representation

24
slide from C. Guestrin
25
Example Bayesian Network
slide from K. Murphy
26
slide from K. Murphy
27
Plate notation for repetitions
slide from M. Jordan
28
Mixture (of Gaussian) example
  • Independently Identically Distributed Data
  • X1, X2, X3, , XN



n 1N
29
2 generative models
  • Naïve Bayes classifier
  • Csurka Bray, Dance Fan, 2004
  • Topic models (pLSA and LDA)?
  • Background Hoffman 2001, Blei, Ng Jordan, 2004
  • Object categorization Sivic et al. 2005,
    Sudderth et al. 2005
  • Natural scene categorization Fei-Fei et al. 2005

30
First, some notations
  • wn patch in an image
  • w a collection of all N patches in an image
  • w w1,w2,,wN
  • dj the jth image in an image collection
  • c category of the image
  • z theme or topic of the patch

31
Case 1 the Naïve Bayes model
w
c
N
Object class decision
Prior prob. of the object classes
Image likelihood given the class
Csurka et al. 2004
32
Csurka et al. 2004
33
Csurka et al. 2004
34
Overview
  • Recap bag-of-words representation
  • Graphical models, naive Bayes
  • Topic Models for texts and images
  • PLSA
  • LDA
  • Fisher vector image representation

35
Topic Models for texts images
  • Words no longer independent
  • Seeing some words, you can guess topics to
    predict other words

36
Topic Models for texts images
  • Document is mixture of (visual) topics
  • Each document has its own special mix
  • All documents mix the same set of topics

37
Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Hoffman, 2001
Latent Dirichlet Allocation (LDA)?
Blei et al., 2001
38
Case 2 Hierarchical Bayesian text models
Probabilistic Latent Semantic Analysis (pLSA)?
Sivic et al. ICCV 2005
39
Case 2 the pLSA model
40
Probabilistic Latent Semantic Analysis
  • w (visual) words
  • z topics
  • d documents / images
  • P(zd) document specific topic mix
  • P(wz) topic specific distribution over words
  • To sample a word
  • Sample topic from p(zd)?
  • Sample word from topic using p(wz)

41
Case 2 the pLSA model
Slide credit Josef Sivic
42
Topic Model Image Representation
43
EM algorithm for PLSA (1/2)?
  • Topic of patches unknown
  • EM algorithm for maximum likelihood estimation
  • E estimate topic of each patch
  • M update distributions
  • P(zd) distribution over topics given document
  • P(wz) distribution over words given topic
  • Data log-likelihood for a single document

44
EM algorithm for PLSA (2/2)?
  • E-step distribution on hidden variables
  • Image-wide context steers interpretation of patch
  • M-step maximize expected joint log-likelihood

45
EM algorithm for PLSA (2/2)?
  • E-step distribution on hidden variables
  • M-step maximize expected joint log-likelihood

46
PLSA as dimension reduction
  • PLSA makes image representation more compact
  • Can work better when using few training images
  • PLSA groups words that appear in similar context

47
Using Multiple Segmentations to Discover Objects
and their Extent in Image Collections Russel et
al. CVPR 2006
  • Multiple segmentations of image, all imperfect
  • Some segments pure, others mix categories
  • Goal find the pure segments objects?
  • PLSA segment document
  • Sort segments by how well they fit with a topic
    p(wordstopic)?

48
Best segments of 4 topics in street scenes
  • Scene buildings cars road trees sky

49
PLSA vs Latent Dirichlet Allocation
  • How about a new image?
  • What is its likelihood under the model?
  • PLSA doesnt define what p(zd) can look like
  • LDA does specify a density over p(zd)?
  • Density over discrete distributions

50
Case 2 Hierarchical Bayesian text models
Latent Dirichlet Allocation (LDA)?
Fei-Fei et al. ICCV 2005
51
Latent Dirichlet Allocation
  • Image belongs to a class c
  • Class defines a typical mix of scene topics
  • Topic defines a distribution over visual words
  • Unknown/hidden variables
  • Image Topic mix theta
  • Patches corresponding topic z

52
Dirichlet density over distributions
  • Parameter controls sparsity
  • Given samples from the multinomial, what is
    conditional distribution on it?
  • Again a Dirichlet, only the counts matter

53
LDA parameter estimation (1/2)?
  • Joint likelihood of words, topics, mixing weights
  • Marginal likelihood of the observed words
  • EM algorithm to learn alpha beta (topics)?
  • Unobserved variables mixing weights, topics

54
LDA parameter estimation (2/2)?
  • Conditional on topics and mixing weight
  • Intractable to compute
  • Approximate with product of two distributions
  • Variational EM algorithm
  • E-step 1 Update q(z)?
  • E-step 2 Update q(theta)?
  • M-step Update parameters

55
Variational Inference
  • Inference
  • estimation of hidden variables given data
  • Approximate inference
  • Used when exact inference is impossible
  • Comes in many flavors
  • Variational inference
  • Approximate complicated distribution with simpler
    one

56
Variational EM algorithm
  • M-step as usual
  • Approximate E-step
  • Most often ignoring some of the dependencies
  • Remember bound-optimization picture
  • Here we have a sub-optimal E-step
  • Bounds no longer tight for
  • current parameters

57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
Summary
  • Generative models for bag-of-word image
    representations
  • Naïve Bayes model
  • Poor model assumes all (visual) words
    independent
  • Reasonable for classification, worse for scene
    interpretation
  • Topic models PLSA, LDA
  • Model image as mix of scene elements
  • PLSA does not a complete generative model
  • LDA fixes this, but parameter estimation more
    difficult

61
Overview
  • Recap bag-of-words representation
  • Graphical models, naïve Bayes
  • Topic Models for texts and images
  • PLSA
  • LDA
  • Fisher vector image representation

62
3. Image representation
frequency
codewords
63
Fisher Vector motivation
  • Feature vector quantization is expensive in
    practice linear in
  • N nr. of feature vectors 103 per image
  • D nr. of dimensions 102 (SIFT)?
  • K nr. of clusters 104 for recognition
  • Looking at 109 multiplications per image
  • How to do this more efficiently ?!
  • See Fisher Kernels on Visual Vocabularies for
    Image Categorization F. Perronnin and C. Dance,
    in CVPR'07

64
Fisher Vector intuition 1/2
  • K-means or Mix. of Gauss. partitions space
  • Bag-of-word histogram stores nr. of features from
    each type (cluster)?
  • Just the count of points in each cell!
  • Trade-off in nr of clusters
  • Many representation, cost
  • Few representation-, cost-

65
Fisher Vector Idea
  • Generative probabilistic model of data p(x)?
  • Represent signal with derivative of
    log-likelihood
  • Use Mixture of Gaussians (MoG) as model
  • Use gradient for further processing such as
  • Classification
  • Clustering

66
Fisher Vector for MoG 1/2
  • Independent data points (descriptors) x

67
Fisher Vector for MoG 2/2
  • Derivatives for MoG components for T patches
  • Number of points assigned
  • Their average w.r.t. the mean
  • Their variation around the mean

1
D
D
68
Fisher Vector intuition
  • MoG / k-means stores nr of points per cell
  • Many cell needed to represent distrib in feat
    space
  • Fischer vector adds 1st 2nd order moments
  • More precise description of cell contents
  • Fewer cells needed

2
2
3
5
1
1
5
3
2
2
4
8
4
2
2
8
4
2
2
4
1
2
1
3
2
2
3
2
69
Fisher vector size and speed
  • Much richer representation at same cost!
  • Size of Fisher vector (2xD1)xN
  • Size of Bag Of Visterms N
  • Both time to compute descriptor NxD

70
Example Images from categorization task PASCAL
Visual Object Challenge
71
Fisher Vector results
  • 19 class data set beach, bicycling,birds,
    boating, cats, clouds/sky, desert, dogs, owers,
    golf, motorsports, mountains, people,
    sunrise/sunset, surfing, underwater, urban,
    waterfalls and wintersports
  • Training 30k images, 5k for test
  • Similar performance, using 16x fewer Gaussians
  • Unsupervised/universal representation good

72
Overview
  • Graphical models, naïve Bayes
  • Topic Models for texts and images
  • PLSA probabilistic semantic analysis
  • LDA latent Dirichlet allocation
  • Fisher vector image representation
  • Next lecture classification methods
  • Linear methods SVM logistic discriminant
  • The kernel trick for non-linear features
Write a Comment
User Comments (0)
About PowerShow.com