Generative Models for Image Understanding - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Generative Models for Image Understanding

Description:

Unsupervised learning for video summary ... Transformed Hidden Markov Model. x. l. c. z. x. l. c. z. t. t-1. P(c,l|past) THMM Transition Models ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 57
Provided by: nebojs
Category:

less

Transcript and Presenter's Notes

Title: Generative Models for Image Understanding


1
Generative Models for Image Understanding
  • Nebojsa Jojic and Thomas Huang
  • Beckman Institute and ECE Dept.
  • University of Illinois

2
Problem Summarization of High Dimensional Data
  • Pattern Analysis
  • For several classes c1,..,C of the data, define
    probability distribution functions p(x c)
  • Compression
  • Define a probabilistic model p(x) and devise an
    optimal coding approach
  • Video Summary
  • Drop most of the frames in a video sequence and
    keep interesting information that summarizes it.

3
Generative density modeling
  • Find a probability model that
  • reflects desired structure
  • randomly generates plausible images,
  • represents the data by parameters
  • ML estimation
  • p(imageclass) used for recognition, detection,
    ...

4
Problems we attacked
  • Transformation as a discrete variable in
    generative models of intensity images
  • Tracking articulated objects in dense stereo maps
  • Unsupervised learning for video summary
  • Idea - the structure of the generative model
    reveals the interesting objects we want to
    extract.

5
Mixture of Gaussians
c
P(c) pc
The probability of pixel intensities z given that
the image is from cluster c is p(zc) N(z mc ,
Fc)
z
6
Mixture of Gaussians
c
P(c) pc
  • Parameters pc, mc and Fc represent the data
  • For input z, the cluster responsibilities are
  • P(cz) p(zc)P(c) / Sc p(zc)P(c)

7
Example Simulation
P(c) pc
c1
z
p(zc) N(z mc , Fc)
8
Example Simulation
P(c) pc
c2
z
p(zc) N(z mc , Fc)
9
Example Learning - E step
P(cz)
c1
0.52
m1
F1
p1 0.5,
c
c2
0.48
m 2
F 2
p 2 0.5,
z
Images from data set
10
Example Learning - E step
P(cz)
c1
0.48
m1
F1
p1 0.5,
c
c2
0.52
m 2
F 2
p 2 0.5,
z
Images from data set
11
Example Learning - M step
m1
F1
p1 0.5,
c
m 2
F 2
p 2 0.5,
Set m1 to the average of zP(c1z)
z
Set m2 to the average of zP(c2z)
12
Example Learning - M step
m1
F1
p1 0.5,
c
m 2
F 2
p 2 0.5,
Set F1 to the average of diag((z-m1)T
(z-m1))P(c1z)
z
Set F2 to the average of diag((z-m2)T
(z-m2))P(c2z)
13
Transformation as a Discrete Latent Variable
  • with
  • Brendan J. Frey
  • Computer Science, University of Waterloo, Canada
  • Beckman Institute ECE, Univ of Illinois at
    Urbana

14
Kind of data were interested in
Even after tracking, the features still have
unknown positions, rotations, scales, levels of
shearing, ...
15
Oneapproach
Images
Labor
Normalization
Normalized images
Pattern Analysis
16
Ourapproach
Images
Joint Normalization and Pattern Analysis
17
What transforming an image does in the vector
space of pixel intensities
  • A continuous transformation moves an image, ,
    along a continuous curve
  • Our subspace model should assign images near this
    nonlinear manifold to the same point in the
    subspace

18
Tractable approaches to modeling the
transformation manifold
  • \ Linear approximation
  • - good locally
  • Discrete approximation
  • - good globally

19
Adding transformation as a discrete latent
variable
  • Say there are N pixels
  • We assume we are given a set of sparse N x N
    transformation generating matrices G1,,Gl ,,GL
  • These generate points
  • from point

20
Transformed Mixture of Gaussians
P(c) pc
c
p(zc) N(z mc , Fc)
P(l) rl
l
z
p(xz,l) N(x Gl z , Y)
  • rl, pc, mc and Fc represent the data
  • The cluster/transf responsibilities,
  • P(c,lx), are quite easy to compute

x
21
Example Simulation
G1 shift left and up, G2 I, G3 shift
right and up
c1
z
l1
x
22
ML estimation of a Transformed Mixture of
Gaussians using EM
  • E step Compute P(lx), P(cx) and p(zc,x) for
    each x in data
  • M step Set
  • pc avg of P(cx)
  • rl avg of P(lx)
  • mc avg mean of p(zc,x)
  • Fc avg variance of p(zc,x)
  • Y avg var of p(x-Gl zx)

23
Face Clustering
  • Examples of 400 outdoor images of 2 people
  • (44 x 28 pixels)

24
Mixture of Gaussians
15 iterations of EM (MATLAB takes 1
minute) Cluster means c 1 c 2 c 3
c 4
25
Transformed mixture of Gaussians
30 iterations of EM Cluster means c 1 c
2 c 3 c 4
26
Video Analysis Using Generative Models
  • with Brendan Frey, Nemanja Petrovic and Thomas
    Huang

27
Idea
  • Use generative models of video sequences to do
    unsupervised learning
  • Use the resulting model for video summarization,
    filtering, stabilization, recognition of objects,
    retrieval, etc.

28
Transformed Hidden Markov Model
P(c,lpast)
29
THMM Transition Models
  • Independent probability distributions for class
    and transformations relative motion
  • P(ct , lt past) P(ct ct-1) P(d(lt , l t-1))
  • Relative motion dependent on the class
  • P(ct , lt past) P(ct ct-1) P(d(lt , l t-1)
    ct)
  • Autoregressive model for transformation
    distribution

30
Inference in THMM
  • Tasks
  • Find the most likely state at time t given the
    whole observed sequence xt and the model
    parameters (class means and variances, transition
    probabilities, etc.)
  • Find the distribution over states for each time t
  • Find the most likely state sequence
  • Learn the parameters that maximize he likelihood
    of the observed data

31
Video Summary and Filtering
p(zc) N(z mc , Fc)
p(xz,l) N(x Gl z , Y)
32
Example Learning
  • Hand-held camera
  • Moving subject
  • Cluttered background

DATA
33
Examples
  • Normalized sequence
  • Simulated sequence
  • De-noising
  • Seeing through distractions

34
Future work
  • Fast approximate learning and inference
  • Multiple layers
  • Learning transformations from images
  • Nebojsa Jojic www.ifp.uiuc.edu/jojic

35
Subspace models of imagesExample Image, R 1200
f (y, R 2)
Shut eyes
Frown
36
Factor analysis (generative PCA)
p(y) N(y 0, I)
y
The density of pixel intensities z given subspace
point y is p(zy) N(z mLy, F)
z
Manifold f (y) mLy, linear
37
Factor analysis (generative PCA)
p(y) N(y 0, I)
y
p(zy) N(z mLy, F)
  • Parameters m, L represent the manifold
  • Observing z induces a Gaussian p(yz)
  • COVyz (LTF-1LI)-1
  • Eyz COVyz LTF-1 z

z
38
Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
39
Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
40
Example Simulation
SE
y
L
p(y) N(y 0, I)
Frn
p(zy) N(z mLy, F)
z
41
Transformed Component Analysis
p(y) N(y 0, I)
y
p(zy) N(z mLy, F)
P(l) rl
z
l
The probability of observed image x is p(xz,l)
N(x Gl z , Y)
x
42
Example Simulation
G1 shift left up, G2 I, G3 shift right
up
SE
y
Frn
z
l3
x
43
Example Inference
G1 shift left up, G2 I, G3 shift right
up
SE
SE
SE
y
y
y
Frn
Frn
Frn
P(l1x) .01
P(l3x) .98
P(l2x) .01
z
z
z
l3
l2
l1
x
x
x
44
EM algorithm for TCA
  • Initialize m, L, F, r, Y to random values
  • E Step
  • For each training case x(t), infer
  • q(t)(l,z,y) p(l,z,y x(t))
  • M Step
  • Compute mnew,Lnew,F new,rnew,Ynew to maximize
  • St E log p(y) p(zy) P(l) p(x(t)z,l),
  • where E is wrt q(t)(l,z,y)
  • Each iteration increases log p(Data)

45
A tough toy problem
  • 144, 9 x 9 images
  • 1 shape (pyramid)
  • 3-D lighting
  • cluttered
  • background
  • 25 possible
  • locations

46
1st 8 principal components
  • TCA
  • 3 components
  • 81 transformations
  • - 9 horiz shifts
  • - 9 vert shifts
  • 10 iters of EM
  • Model generates
  • realistic examples

F
Y
m
L1L2 L3
47
Expression modeling
  • 100 16 x 24 training
  • images
  • variation in expression
  • imperfect alignment

48
PCA Mean 1st 10 principal components
49
Fantasies from FA model
Fantasies from TCA model
50
Modeling handwritten digits
  • 200 8 x 8 images of
  • each digit
  • preprocessing
  • normalizes vert/horiz
  • translation and scale
  • different writing angles
  • (shearing) - see 7

51
TCA - 29 shearing translation combinations
- 10 components per digit - 30
iterations of EM per digit
Transformed means
Mean of each digit
52
FA Mean 10 components per digit
TCA Mean 10 components per digit
53
Classification Performance
  • Training 200 cases/digit, 20 components, 50 EM
    iters
  • Testing 1000 cases, p(xclass) used for
    classification
  • Results
  • Method Error rate
  • k-nearest neighbors (optimized k) 7.6
  • Factor analysis 3.2
  • Tranformed component analysis 2.7
  • Bonus P(lx) infers the writing angle!

54
Wrap-up
  • Papers, MATLAB scripts www.ifp.uiuc.edu/jojic
  • www.cs.uwaterloo.ca/frey
  • Other domains audio, bioinfomatics,
  • Other latent image models, p(z)
  • mixtures of factor analyzers (NIPS99)
  • layers, multiple objects, occlusions
  • time series (in preparation)

55
Wrap-up
  • DiscreteLinear Combination Set some components
    equal to derivatives of m wrt transformations
  • Multiresolution approach
  • Fast variational methods, belief propagation,...

56
Other generative models
  • Modeling human appearance in stereo images
    articulated, self-occluding Gaussians
Write a Comment
User Comments (0)
About PowerShow.com