Title: Generative Models for Image Understanding
1Generative Models for Image Understanding
- Nebojsa Jojic and Thomas Huang
- Beckman Institute and ECE Dept.
- University of Illinois
2Problem Summarization of High Dimensional Data
- Pattern Analysis
- For several classes c1,..,C of the data, define
probability distribution functions p(x c) - Compression
- Define a probabilistic model p(x) and devise an
optimal coding approach - Video Summary
- Drop most of the frames in a video sequence and
keep interesting information that summarizes it.
3Generative density modeling
- Find a probability model that
- reflects desired structure
- randomly generates plausible images,
- represents the data by parameters
- ML estimation
- p(imageclass) used for recognition, detection,
...
4Problems we attacked
- Transformation as a discrete variable in
generative models of intensity images - Tracking articulated objects in dense stereo maps
- Unsupervised learning for video summary
- Idea - the structure of the generative model
reveals the interesting objects we want to
extract.
5Mixture of Gaussians
c
P(c) pc
The probability of pixel intensities z given that
the image is from cluster c is p(zc) N(z mc ,
Fc)
z
6Mixture of Gaussians
c
P(c) pc
- Parameters pc, mc and Fc represent the data
- For input z, the cluster responsibilities are
- P(cz) p(zc)P(c) / Sc p(zc)P(c)
7Example Simulation
P(c) pc
c1
z
p(zc) N(z mc , Fc)
8Example Simulation
P(c) pc
c2
z
p(zc) N(z mc , Fc)
9Example Learning - E step
P(cz)
c1
0.52
m1
F1
p1 0.5,
c
c2
0.48
m 2
F 2
p 2 0.5,
z
Images from data set
10Example Learning - E step
P(cz)
c1
0.48
m1
F1
p1 0.5,
c
c2
0.52
m 2
F 2
p 2 0.5,
z
Images from data set
11Example Learning - M step
m1
F1
p1 0.5,
c
m 2
F 2
p 2 0.5,
Set m1 to the average of zP(c1z)
z
Set m2 to the average of zP(c2z)
12Example Learning - M step
m1
F1
p1 0.5,
c
m 2
F 2
p 2 0.5,
Set F1 to the average of diag((z-m1)T
(z-m1))P(c1z)
z
Set F2 to the average of diag((z-m2)T
(z-m2))P(c2z)
13Transformation as a Discrete Latent Variable
- with
- Brendan J. Frey
- Computer Science, University of Waterloo, Canada
- Beckman Institute ECE, Univ of Illinois at
Urbana
14Kind of data were interested in
Even after tracking, the features still have
unknown positions, rotations, scales, levels of
shearing, ...
15Oneapproach
Images
Labor
Normalization
Normalized images
Pattern Analysis
16Ourapproach
Images
Joint Normalization and Pattern Analysis
17What transforming an image does in the vector
space of pixel intensities
- A continuous transformation moves an image, ,
along a continuous curve - Our subspace model should assign images near this
nonlinear manifold to the same point in the
subspace
18Tractable approaches to modeling the
transformation manifold
- \ Linear approximation
- - good locally
- Discrete approximation
- - good globally
19Adding transformation as a discrete latent
variable
- Say there are N pixels
- We assume we are given a set of sparse N x N
transformation generating matrices G1,,Gl ,,GL - These generate points
- from point
20Transformed Mixture of Gaussians
P(c) pc
c
p(zc) N(z mc , Fc)
P(l) rl
l
z
p(xz,l) N(x Gl z , Y)
- rl, pc, mc and Fc represent the data
- The cluster/transf responsibilities,
- P(c,lx), are quite easy to compute
x
21Example Simulation
G1 shift left and up, G2 I, G3 shift
right and up
c1
z
l1
x
22ML estimation of a Transformed Mixture of
Gaussians using EM
- E step Compute P(lx), P(cx) and p(zc,x) for
each x in data - M step Set
- pc avg of P(cx)
- rl avg of P(lx)
- mc avg mean of p(zc,x)
- Fc avg variance of p(zc,x)
- Y avg var of p(x-Gl zx)
23Face Clustering
- Examples of 400 outdoor images of 2 people
- (44 x 28 pixels)
24Mixture of Gaussians
15 iterations of EM (MATLAB takes 1
minute) Cluster means c 1 c 2 c 3
c 4
25Transformed mixture of Gaussians
30 iterations of EM Cluster means c 1 c
2 c 3 c 4
26Video Analysis Using Generative Models
- with Brendan Frey, Nemanja Petrovic and Thomas
Huang
27Idea
- Use generative models of video sequences to do
unsupervised learning - Use the resulting model for video summarization,
filtering, stabilization, recognition of objects,
retrieval, etc.
28Transformed Hidden Markov Model
P(c,lpast)
29THMM Transition Models
- Independent probability distributions for class
and transformations relative motion - P(ct , lt past) P(ct ct-1) P(d(lt , l t-1))
- Relative motion dependent on the class
- P(ct , lt past) P(ct ct-1) P(d(lt , l t-1)
ct) - Autoregressive model for transformation
distribution
30Inference in THMM
- Tasks
- Find the most likely state at time t given the
whole observed sequence xt and the model
parameters (class means and variances, transition
probabilities, etc.) - Find the distribution over states for each time t
- Find the most likely state sequence
- Learn the parameters that maximize he likelihood
of the observed data
31Video Summary and Filtering
p(zc) N(z mc , Fc)
p(xz,l) N(x Gl z , Y)
32Example Learning
- Hand-held camera
- Moving subject
- Cluttered background
DATA
33Examples
- Normalized sequence
- Simulated sequence
- De-noising
- Seeing through distractions
34Future work
- Fast approximate learning and inference
- Multiple layers
- Learning transformations from images
- Nebojsa Jojic www.ifp.uiuc.edu/jojic
35Subspace models of imagesExample Image, R 1200
f (y, R 2)
Shut eyes
Frown
36Factor analysis (generative PCA)
p(y) N(y 0, I)
y
The density of pixel intensities z given subspace
point y is p(zy) N(z mLy, F)
z
Manifold f (y) mLy, linear
37Factor analysis (generative PCA)
p(y) N(y 0, I)
y
p(zy) N(z mLy, F)
- Parameters m, L represent the manifold
- Observing z induces a Gaussian p(yz)
- COVyz (LTF-1LI)-1
- Eyz COVyz LTF-1 z
z
38Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
39Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
40Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
41Transformed Component Analysis
p(y) N(y 0, I)
y
p(zy) N(z mLy, F)
P(l) rl
z
l
The probability of observed image x is p(xz,l)
N(x Gl z , Y)
x
42Example Simulation
G1 shift left up, G2 I, G3 shift right
up
SE
y
Frn
z
l3
x
43Example Inference
G1 shift left up, G2 I, G3 shift right
up
SE
SE
SE
y
y
y
Frn
Frn
Frn
P(l1x) .01
P(l3x) .98
P(l2x) .01
z
z
z
l3
l2
l1
x
x
x
44EM algorithm for TCA
- Initialize m, L, F, r, Y to random values
- E Step
- For each training case x(t), infer
- q(t)(l,z,y) p(l,z,y x(t))
- M Step
- Compute mnew,Lnew,F new,rnew,Ynew to maximize
- St E log p(y) p(zy) P(l) p(x(t)z,l),
- where E is wrt q(t)(l,z,y)
- Each iteration increases log p(Data)
45A tough toy problem
- 144, 9 x 9 images
- 1 shape (pyramid)
- 3-D lighting
- cluttered
- background
- 25 possible
- locations
461st 8 principal components
- TCA
- 3 components
- 81 transformations
- - 9 horiz shifts
- - 9 vert shifts
- 10 iters of EM
- Model generates
- realistic examples
F
Y
m
L1L2 L3
47Expression modeling
- 100 16 x 24 training
- images
- variation in expression
- imperfect alignment
48PCA Mean 1st 10 principal components
49Fantasies from FA model
Fantasies from TCA model
50Modeling handwritten digits
- 200 8 x 8 images of
- each digit
- preprocessing
- normalizes vert/horiz
- translation and scale
- different writing angles
- (shearing) - see 7
51TCA - 29 shearing translation combinations
- 10 components per digit - 30
iterations of EM per digit
Transformed means
Mean of each digit
52FA Mean 10 components per digit
TCA Mean 10 components per digit
53Classification Performance
- Training 200 cases/digit, 20 components, 50 EM
iters - Testing 1000 cases, p(xclass) used for
classification - Results
- Method Error rate
- k-nearest neighbors (optimized k) 7.6
- Factor analysis 3.2
- Tranformed component analysis 2.7
- Bonus P(lx) infers the writing angle!
54Wrap-up
- Papers, MATLAB scripts www.ifp.uiuc.edu/jojic
- www.cs.uwaterloo.ca/frey
- Other domains audio, bioinfomatics,
- Other latent image models, p(z)
- mixtures of factor analyzers (NIPS99)
- layers, multiple objects, occlusions
- time series (in preparation)
55Wrap-up
- DiscreteLinear Combination Set some components
equal to derivatives of m wrt transformations - Multiresolution approach
- Fast variational methods, belief propagation,...
56Other generative models
- Modeling human appearance in stereo images
articulated, self-occluding Gaussians