Title: Inference in generative models of images and video
1Inference in generative models of images and
video
- John Winn
- MSR Cambridge
- May 2004
2Overview
- Generative vs. conditional models
- Combined approach
- Inference in the flexible sprite model
- Extending the model
3Generative vs. conditional models
We have an image I and latent variables H which
we wish to infer, e.g. object position,
orientation, class. There will also be other
sources of variability, e.g. illumination,
parameterised by ?.
Generative model P(H, ?, I)
Conditional model P(H, ?I) or P(HI)
4Conditional models use features
- Features are functions of I which aim to be
informative about H but invariant to ?.
Edge features
Corner features
Blob features
5Conditional models
- Using features f(I), train a conditional model
e.g. using labelled data
Example Viola Jones face recognition using
rectangle features and AdaBoost
6Conditional models
- Advantages
- Simple - only model variables of interest
- Inference is fast - due to use of features and
simple model
- Disadvantages
- Non-robust
- Difficult to compare different models
- Difficult to combine different models
7Generative models
- A generative model defines a process of
generating the image pixels I from the latent
variables H and ?, giving a joint distribution
over all variables
P(H, ?, I)
Learning and inference carried out using standard
machine learning techniques e.g. Expectation
Maximisation, MCMC, variational methods. No
features!
8Generative models
- Example image modeled as layers of flexible
sprites.
9Generative models
- Advantages
- Accurate as the entire image is modeled
- Can compare different models
- Can combine different models
- Can generate new images
- Disadvantages
- Inference is difficult due to local minima
- Inference is slower due to complex model
- Limitations on model complexity
10Combined approach
- Use a generative model, but speed up inference
using proposal distributions given by a
conditional model.
A proposal R(X) suggests a new distribution over
some of the latent variables X? H, ?. Inference
is extended to allow accepting or rejecting the
proposal e.g. depending on whether it improves
the model evidence.
11Using proposals in an MCMC framework
Generative model textured regions combined with
face and text models
Conditional model face and text detector using
AdaBoost (Viola Jones)
Proposals for text and faces
Accepted proposals
From Tu et al, 2003
12Using proposals in an MCMC framework
Generative model textured regions combined with
face and text models
Conditional model face and text detector using
AdaBoost (Viola Jones)
Proposals for text and faces
Reconstructed image
From Tu et al, 2003
13Proposals in the flexible sprite model
14Flexible sprite model
Set of images e.g. frames from a video
x
15Flexible sprite model
x
16Flexible sprite model
p
f
Sprite shape and appearance
x
17Flexible sprite model
p
f
Sprite transform for this image (discretised)
T
m
x
Transformed mask instance for this image
18Flexible sprite model
p
f
b
Background
T
m
x
19Inference method problems
- Apply variational inference with factorised Q
distribution - Slow since we have to search entire discrete
transform space - Limited size of transform space e.g. translations
only (160?120). - Many local minima.
20Proposals in the flexible sprite model
- We wish to create a proposal R(T).
- Cannot use features of the image directly until
object appearance found. - Use features of the inferred mask.
p
proposal
T
m
21Moment-based features
- Use the first and second moments of the inferred
mask as features. Learn a proposal distribution
R(T).
C-of-G of mask
True location
Contour of proposal distribution over object
location
Can also use R to get a probabilistic bound on T.
22Iteration 1
23Iteration 2
24Iteration 3
25Iteration 4
26Iteration 5
27Iteration 6
28Iteration 7
29Results on scissors video.
Original
Reconstruction
Foreground only
- On average, 1 of transform space searched.
- Always converges, independent of initialisation.
30Beyond translation
31Extended transform space
Original
Reconstruction
32Extended transform space
Original
Reconstruction
33Extended transform space
Learned sprite appearance
Normalised video
34Corner features
Learned sprite appearance
Masked normalised image
35Corner feature proposals
36Preliminary results
37Future directions
38Extensions to the generative model
- Very wide range of possible extensions
- Local appearance model e.g. patch-based
- Multiple layered objects
- Object classes
- Illumination modelling
- Incorporation of object-specific models e.g.
faces - Articulated models
39Further investigation of using proposals
- Investigate other bottom-up features, including
- Optical flow
- Color/texture
- Use of standard invariant features e.g. SIFT
- Discriminative models for particular object
classes e.g. faces, text
40p
f
b
T
m
x
N