Title: LOCUS Learning Object Classes with Unsupervised Segmentation
1LOCUS(Learning Object Classes with Unsupervised
Segmentation) A variational approach to learning
model-based segmentation.
John Winn Microsoft Research Cambridge with
Nebojsa Jojic, MSR Redmond
7th July 2006
2Overview
- Learning object models
- The LOCUS model
- Experiments results
- Extensions to LOCUS
3Goal
- Long Term Goal
- Recognise 10,000 object classes.
4Learning from buckets of images
Learningalgorithm
Horsemodel
- Object Segmentation
- Object Recognition
- Object Detection
5Object segmentation
LOCUS
Horsemodel
6Related work
7Constellation models
- Weakly supervised
- Probabilistic framework
Object class recognition by unsupervised
scale-invariant learning. R. Fergus, P. Perona,
and A. Zisserman. CVPR 2003 A Bayesian approach
to unsupervised One-Shot learning of Object
categories. L. Fei-Fei, R. Fergus, and P.
Perona. ICCV 2003
8Fragment-based
- Non-probabilistic
- No global shape model
Learning to segment. E. Borenstein and S. Ullman.
ECCV 2004 Combining top-down and bottom-up
segmentation. E. Borenstein, E. Sharon, and S.
Ullman. CVPR 2004
9Codebook-based
- Probabilistic
- Dense model
- Supervised
- Ad-hoc inference
Combined object categorization and segmentation
with an implicit shape model. B. Leibe, A.
Leonardis, and B. Schiele. ECCV 04
10OBJ CUT
- Probabilistic
- Dense model
- Supervised
- Requires video
11LOCUS overview
- Weakly supervised learning Buckets of images -
no annotation required. - Probabilistic generative modelof both object and
background. - Dense modelAll pixels modelled, not just at
interest points. - Combines global and local cuesModels global
shape and local appearance edges. - Iterative inference processSimultaneous
localisation, segmentation, pose estimation.
12The LOCUS model
13LOCUS model
Shared between images
Class shape p
Class edge sprite µo,so
Deformation field D
Position size T
Different for each image
Mask m
Edge image e
Object appearance ?1
Background appearance ?0
Image
14LOCUS model appearance
Mask m
Image z
15LOCUS model mask
background
object
8-neighbour Markov Random Field (as used in
GrabCut)
16LOCUS model shape/position
17Iterative inference
Class shape p
Iteration 1
T4
T2
T3
18Iterative inference
Class shape p
Iteration 2
T4
T2
T3
19Iterative inference
Class shape p
Iteration 3
T4
T2
T3
20Iterative inference
Class shape p
Iteration 5
T4
T2
T3
21Iterative inference
Class shape p
Iteration 8
T4
T2
T3
22Iterative inference
Class shape p
Iteration 12
T4
T2
T3
23Non-rigid objects
Class shape p
Translation and scale is not enough.
24LOCUS model pose
Class shape p
25LOCUS model pose
Class shape p
26LOCUS model edge
Original images
27LOCUS model overview
Shared between images
Class shape p
Class edge sprite µo,so
Deformation field D
Position size T
Different for each image
Mask m
Edge image e
Object appearance ?1
Background appearance ?0
Image
28Inference
- Aim to infer all latent variables,
- For each image background appearance ?0, object
appearance ?1, deformation D, transformation T,
mask m, - Class variables shape p, edge sprite µo, so.
- Bayesian inference is carried out using
variational message passing with a fully
factorised variational distribution. - Optimisation of grid-structured variational free
energy terms (relating to the deformation field D
and the mask m) achieved using graph cuts.
29Experiments results
30Experiments
- LOCUS applied to 8 sets of 20 images each
containing objects of the same class.
- Horses
- Faces
- Cars (rear)
- Cars (side)
- Motorbikes
- Aeroplanes
- Cows
- Trees
For each class, we ran separate experiments for
color and texture appearance models.
31Results horses
32Results horses
33Results cars
34Results cars
35Results remaining classes
36Segmentation accuracy
To evaluate segmentation quantitively, we used
hand segmentations for horses and cars (side).
37Object registration
Transformation deformation field registers
object outlines (and some internal edges).
38Object registration
39Extensions to LOCUS
40Recognition segmentation
- Object recognition using only global shape
Overall 88 accuracy.
41Probabilistic Index Maps
2 indices
9 indices
Each image has a palette of appearance models
palette invariance.
42Probabilistic Index Maps
43Learning objects from video
Object shape
Object edge sprite
44Locumotion
- Add flow and track constraints to achieve motion
segmentation
Tracking/flow estimation by Larry Zitnick
45Conclusions
- LOCUS gives unsupervised segmentations of
accuracy equivalent to state-of-the-art
supervised methods. - General-purpose model allows
- Object localisation
- Pose estimation
- Object segmentation
- Motion segmentation/object tracking
- Object recognition/detection (in combination with
discriminative model)
46Questions ?