Title: Bagofwords model and PLSA David Liu Feb 2006
1Bag-of-words model and PLSADavid LiuFeb 2006
2Related work
- ICCV05 Modeling Scenes with Local Descriptors
and Latent Aspects - P. Quelhas, F. Monay, J.-M. Odobez, D.
Gatica-Perez, T. Tuytelaars , and L. Van Gool
(IDIAP, Katholieke Univ. in Leuven) - CVPR05 A Bayesian Hierarchical Model for
Learning Natural Scene Categories - Li Fei-Fei, P. Perona (UIUC, Caltech)
- ICCV05 Discovering objects and their location
in images - J. Sivic, B. C. Russell, A. A. Efros, A.
Zisserman, W. T. Freeman (Oxford, CMU, MIT)
3What do you see?
4(No Transcript)
5Outline
- Image Representation
- Learning
- Recognition
6Slide from Li Fei-Fei
71.Feature detection and representation
Compute SIFT descriptor Lowe99
Slide credit Josef Sivic
82. Codewords dictionary formation
R128
Slide credit Josef Sivic
92. Codewords dictionary formation
Vector quantization
R128
Slide credit Josef Sivic
10Examples of codewords
Fei-Fei et al. 2005
11Examples of codewords
Sivic et al. 2005
123. Image representation bag of words
n(w,d)
w1
w2
w3
13Outline
- Image Representation
- Learning
- Recognition
14slide from C. Guestrin
15slide from K. Murphy
16slide from K. Murphy
17slide from M. Jordan
18- also talk about d-separation, to make conditional
indep concept clear
19Graphical representation of a mixture model
z
x
C. Bishop
20Graphical representation of a mixture model
z
x
Graphical representation of a Gaussian mixture
model
zn
xn
N
C. Bishop
21zk topic of a word z1 , z2
Model
A document is a mixture of topics. A topic is a
mixture of words.
22zk topic of the word z1 , z2
Model
d1
d2
d3
w1
w2
w1
w2
w1
w2
23zk topic of the word z1 , z2
Model
d1
d2
d3
w2
w1
w1
24zk topic of the word z1 , z2
Model
Data
n(w,d)
25zk topic of the word z1 , z2
z
w
d
Model
N
D
Things that we have control over
Parameters
Data
n(w,d)
26zk topic of the word z1 , z2
Model
Things that we have control over
Parameters
P(zd)
P(wz)
Data
n(w,d)
27zk topic of the word z1 , z2
Model
Things that we have control over
Parameters
P(zd)
P(wz)
Data
n(w,d)
28d1
d2
d3
d4
w1
w1
w2
w3
Parameters
Data
P(wz)
P(zd)
n(w,d)
29We have a model well-fitted to the data (Learning
stage)
n(w,d)
Data
Model
Next comes recognition
30n(w,d)
d
w1
31n(w,d)
d
w1
P(zd)
P(wz)
keep this fixed
32skip
Thing that we can adjust
P(zd)
P(wz)
33n(w,d)
d
Do optimization by EM and get
w1
P(zd)
P(wz)
keep this fixed
Most likely topic for d is z1
Image topic / category discovered
34Decide Face vs. Nonface
ROC curve
35Slide from J. Sivic
36Slide from J. Sivic
37Slide from J. Sivic
38Lets read the titles again
- ICCV05 Modeling Scenes with Local Descriptors
and Latent Aspects - P. Quelhas, F. Monay, J.-M. Odobez, D.
Gatica-Perez, T. Tuytelaars , and L. Van Gool
(IDIAP, Katholieke Univ. in Leuven) - CVPR05 A Bayesian Hierarchical Model for
Learning Natural Scene Categories - Li Fei-Fei, P. Perona (UIUC, Caltech)
- ICCV05 Discovering objects and their location
in images - J. Sivic, B. C. Russell, A. A. Efros, A.
Zisserman, W. T. Freeman (Oxford, CMU, MIT)
39Unsupervised object categorization!Unsupervised
object detection?
Paradigms
- Supervised all training data labeled
- Semi-supervised some labeled, some unlabeled
- Unsupervised finds structure, no /-
40d4
d1
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust.
41d4
d1
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust.
42d4
d1
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background)
43d4
d1
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background)
44d4
d1
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background) - 2. data if only one face image (only d2 d3 d4,
no d1), then d2 is no different than d3 d4.
Cluster will not favor d2.
45d4
d3
d2
n(w,d)
P(wz)
P(zd)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background) - 2. data if only one face image (only d2 d3 d4,
no d1), then d2 is no different than d3 d4.
Cluster will not favor d2.
46d4
d1
d3
d2
n(w,d)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background) - 2. data if only one face image (only d2 d3 d4,
no d1), then d2 is no different than d3 d4.
Cluster will not favor d2. - 2. data if exist many many background images
similar to d3 or d4, then d1 d2 will just appear
as noise.
47d4
d1
d3
d2
n(w,d)
- Clustering depends on 1.feature, 2. data
- 1. feature if feature not good (robust to all
kinds of deformations), then d1 and d2 will not
have consistent features. SIFT is designed to be
robust. - 1. feature if feature not good (similar to
perception), then few features from the face
region are captured. (extreme case all features
captured are from the background) - 2. data if only one face image (only d2 d3 d4,
no d1), then d2 is no different than d3 d4.
Cluster will not favor d2. - 2. data if exist many many background images
similar to d3 or d4, then d1 d2 will just appear
as noise.
48EM algorithm for PLSA
Data
n(w,d)
Model
Goal
param
data
49EM algorithm for PLSA
50EM algorithm for PLSA
51EM algorithm for PLSA
52EM algorithm for PLSA
Optimize with Lagrange multiplier, and get
53EM algorithm for PLSA
Optimize with Lagrange multiplier, and get
But what is Q
54EM algorithm for PLSA
Optimize with Lagrange multiplier, and get
M-step
But what is Q
E-step
55Fix P, find Q increases L to L L Fix Q,
find P Lagrange Multiplier on , increases
L, but also increases L
56EM algorithm for PLSA
update P
update Q
update P
57(No Transcript)
58P(zd)
P(wz)
59Do optimization by EM and get
P(zd)
P(wz)
60n(w,d)
P(zd)
P(wz)
P(wd)
61n(w,d)
P(zd)
P(wz)
P(wd)
P(zd)
P(wz)
P(wd)
62n(w,d)
P(zd)
P(wz)
In either case, d1 and d2 in one cluster, d3 in
another cluster
P(zd)
P(wz)