Inference in generative models of images and video - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Inference in generative models of images and video

Description:

Simple - only model variables of interest ... A generative model defines a process of ... Learn a proposal distribution R(T). True location. C-of-G of mask ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 41

Provided by: john319

Category:

more less

Transcript and Presenter's Notes

Title: Inference in generative models of images and video

1
Inference in generative models of images and
video

John Winn
MSR Cambridge
May 2004

2
Overview

Generative vs. conditional models
Combined approach
Inference in the flexible sprite model
Extending the model

3
Generative vs. conditional models
We have an image I and latent variables H which
we wish to infer, e.g. object position,
orientation, class. There will also be other
sources of variability, e.g. illumination,
parameterised by ?.
Generative model P(H, ?, I)
Conditional model P(H, ?I) or P(HI)
4
Conditional models use features

Features are functions of I which aim to be
informative about H but invariant to ?.

Edge features
Corner features
Blob features
5
Conditional models

Using features f(I), train a conditional model
e.g. using labelled data

Example Viola Jones face recognition using
rectangle features and AdaBoost
6
Conditional models

Advantages
Simple - only model variables of interest
Inference is fast - due to use of features and
simple model

Disadvantages
Non-robust
Difficult to compare different models
Difficult to combine different models

7
Generative models

A generative model defines a process of
generating the image pixels I from the latent
variables H and ?, giving a joint distribution
over all variables

P(H, ?, I)
Learning and inference carried out using standard
machine learning techniques e.g. Expectation
Maximisation, MCMC, variational methods. No
features!
8
Generative models

Example image modeled as layers of flexible
sprites.

9
Generative models

Advantages
Accurate as the entire image is modeled
Can compare different models
Can combine different models
Can generate new images

Disadvantages
Inference is difficult due to local minima
Inference is slower due to complex model
Limitations on model complexity

10
Combined approach

Use a generative model, but speed up inference
using proposal distributions given by a
conditional model.

A proposal R(X) suggests a new distribution over
some of the latent variables X? H, ?. Inference
is extended to allow accepting or rejecting the
proposal e.g. depending on whether it improves
the model evidence.
11
Using proposals in an MCMC framework
Generative model textured regions combined with
face and text models
Conditional model face and text detector using
AdaBoost (Viola Jones)
Proposals for text and faces
Accepted proposals
From Tu et al, 2003
12
Using proposals in an MCMC framework
Generative model textured regions combined with
face and text models
Conditional model face and text detector using
AdaBoost (Viola Jones)
Proposals for text and faces
Reconstructed image
From Tu et al, 2003
13
Proposals in the flexible sprite model
14
Flexible sprite model
Set of images e.g. frames from a video
x
15
Flexible sprite model
x
16
Flexible sprite model
p
f
Sprite shape and appearance
x
17
Flexible sprite model
p
f
Sprite transform for this image (discretised)
T
m
x
Transformed mask instance for this image
18
Flexible sprite model
p
f
b
Background
T
m
x
19
Inference method problems

Apply variational inference with factorised Q
distribution
Slow since we have to search entire discrete
transform space
Limited size of transform space e.g. translations
only (160?120).
Many local minima.

20
Proposals in the flexible sprite model

We wish to create a proposal R(T).
Cannot use features of the image directly until
object appearance found.
Use features of the inferred mask.

p
proposal
T
m
21
Moment-based features

Use the first and second moments of the inferred
mask as features. Learn a proposal distribution
R(T).

C-of-G of mask
True location
Contour of proposal distribution over object
location
Can also use R to get a probabilistic bound on T.

22
Iteration 1
23
Iteration 2
24
Iteration 3
25
Iteration 4
26
Iteration 5
27
Iteration 6
28
Iteration 7
29
Results on scissors video.
Original
Reconstruction
Foreground only

On average, 1 of transform space searched.
Always converges, independent of initialisation.

30
Beyond translation
31
Extended transform space
Original
Reconstruction
32
Extended transform space
Original
Reconstruction
33
Extended transform space
Learned sprite appearance
Normalised video
34
Corner features
Learned sprite appearance
Masked normalised image
35
Corner feature proposals
36
Preliminary results
37
Future directions
38
Extensions to the generative model