Title: Streetlevel Scene Understanding
1Street-level Scene Understanding
- Paul Sturgess, Karteek Alahari, Pushmeet Kohli,
Chris Russell, Lubor Ladický, - Philip Torr.
2Motivation
- Abundance of street level imagery
- Google Street View
- Microsoft Live Search Maps
- Yotta DCL
- Aim identify object-classes automatically
- Build highway inventories
- Augment driving experience
3We would like a framework
- That can combine
- Deal with Things and stuff. (Ted Adelson)
- Efficient optimization.
- Leverage Video
- Large scale Learning
- Some stuff under review so wont give all the
details, but will sketch out the plan. - Use of a global (CRF) energy
- A hierarchical representation
- (vehicle has sub categories car, van etc)
4Google Street View
54 Great Russell St
5Yotta
http//www.geospatialvision.com/
6Yotta
- The scenario is as follows
- a van drives around the roads of the UK, in the
van are - GPS equipment and multiple calibrated cameras,
- synchronized to capture and store an image every
two metres - giving a massive data set.
7CamVidBrostow ECCV et al 2008
- 31 hand labelled object classes void
Brostow et al 2009
8CamVid
- We use the top 11 classes void
9Labelled Ground Truth
10Sub-window Classification
- There is a car at x,y,size
- Classifies a sub-image as before
- Requires a separate model for each object-class
- Good results for rigid objects
Lampert, Blaschko, Hofmann (2008)
11Sub-window Classification
- But what about amorphous objects such as road and
sky-stuff!! - Not so good for amorphous objects
- Lots of pixels mislabelled in bounding box
segment - Lots of the image is left un-classified
un-classified
Misclassified
12Segment ? Classify
- Refines a sub-window segment
- Less miss-classified pixels
- More unclassified pixels
- The model becomes prohibitively large as we add
more object-classes
Larlus,Verbeek and Jurie 2008
13Segment ? Classify
- Need to classify each segment
- But what if the segments are wrong?
14Classify ? Segment
- Every pixel in image is classified as one of a
set of object-class labels
15We would like a framework
- That can combine
- Super pixels-edges or segments
- Sliding window classifiers
- Use of a global (CRF) energy (so we know what we
are optimizing) - A hierarchical representation (vehicle has sub
categories car, van etc) - Efficient optimization.
- Some stuff under review so wont give all the
details, but will sketch out the plan.
16Sketch
- Lots of ongoing work
- Results not perfect
- Need more training data
- Snap shot/sketch ahead.
- Philosophy-link algorithms to problems (what can
we solve now or soon?)
17Enforcing Label Consistency using Higher Order
Potentials
- CVPR08 VOC 08
-
- Joint work with Lubor Ladicky and Pushmeet Kohli
- Cambridge 2008
18Image labelling Problems
Assign a label to each image pixel
Object Segmentation
Image Denoising
Geometry Estimation
Sky
Building
Tree
Grass
19Segmentation Taster (VOC2008 data) Competition
"comp5" (train on VOC2008 data) Accuracy () -
Entries in parentheses are synthesized from
detection results.
All pretty bad, ours slightly worse than
some-problem training data Big overhead to entry,
lots of stuff to code, first year we entered
20Object Segmentation using CRFs
(Shotton et al. ECCV 2006)
CRF Energy
Unary potentials based on Colour, Location and
Texture features
Encourages label consistency in adjacent pixels
21Limitations of Pairwise CRFs
- Encourages short boundaries (Shrinkage bias)
- Can only enforce label consistency in pairs of
pixels - Inability to incorporate region based features
Image
Unary Potential
MAP-CRF Solution
22Label Consistency in Image Regions
Image (MSRC)
Segmentation (Mean shift)
- All pixels constituting some regions belong to
- Same plane (Orientation)
- (Hoiem, Efros, Herbert, ICCV05)
- Same object
- (Russel, Efros, Sivic, Freeman, Zisserman,
CVPR06)
23Image labelling using Segments
Object Labelling
Unsupervised Segmentation
Image
- Geometric Context
- Hoiem et al, ICCV05
- Object Segmentation
- He et al. ECCV06, Yang et al. CVPR07,
Rabinovich et al. ICCV07, Batra et al. CVPR08 - Interactive Video Segmentation
- Wang, SIGGRAPH 2005
Not robust to Inconsistent Segments!
24Our Higher Order CRF Model
Encourages label consistency in regions
Multiple Segmentations
c
Comaniciu and Meer PAMI 2002 Shi and Malik PAMI
2000 Felzenszwalb and Huttenlocher IJCV 2004
25Higher Order Energy Functions
Unary
Pairwise
Higher order
- Efficient BP in Higher Order MRFs
- ECCV06 (Lan, Roth, Huttenlocher, Black)
- 2x2 cliques learned using FOE model
- Approximation methods to make BP feasible
- Search a restricted state space
- 16 minutes per iteration
26Label Consistency in Segments
- Encourages consistency within super-pixels
- Takes the form of a PN Potts model
- Kohli et al. CVPR 2007
c
27Label Consistency in Segments
- Encourages consistency within super-pixels
- Takes the form of a PN Potts model
- Kohli et al. CVPR 2007
c
Cost 0
28Label Consistency in Segments
- Encourages consistency within super-pixels
- Takes the form of a PN Potts model
- Kohli et al. CVPR 2007
c
Cost f (c)
29Label Consistency in Segments
- Encourages consistency within super-pixels
- Takes the form of a PN Potts model
- Kohli et al. CVPR 2007
Does not distinguish between Good/Bad Segments !
c
Cost f (c)
30Quality based Label Consistency
Label inconsistency cost depends on segment
quality
31Quality based Label Consistency
Label inconsistency cost depends on segment
quality
- How to measure quality G(c)?
- Ren and Malik ICCV03, Rabinovich et al. ICCV07,
many others - Colour and Texture Similarity
- Contour Energy
Measure quality from variance in feature responses
Higher order generalization of contrast-sensitive
pairwise potential
32Quality based Label Consistency
Segment Quality (darker is better)
Mean shift segmentation
MSRC image
33Robust Consistency Potentials
gmax
PN Potts
0
Too Rigid!
0
1
Inconsistent Pixels
Kohli, Ladicky, Torr, CVPR 2008
Kohli, Kumar, Torr, CVPR 2007
34Robust Consistency Potentials
Add multiple potentials to generate arbitrary
concave increasing function.
35Higher Order Cliques
- A way of assigning a cost if all the pixels in a
clique take a particular value. - Cliques can come
- from detectors
- Features
- Segments/super pixels
- Optimizer sorts it out (CRF energy).
36Minimizing Higher order Energy Functions
- Message passing is computationally expensive
- High runtime and space complexity - O(LN)
- L Number of Labels, N Size of Clique
- Efficient BP for Higher Order MRFs
- Lan et al. ECCV 06, Potetz CVPR 2007
- 2x2 clique potentials for Image Denoising
- Take minutes per iteration (Hours to converge)
37Graph Cuts for Minimizing Higher order Energy
Functions (Our Approach)
- Binary label problems can be solved exactly
- Can handle very high order energy functions
- Extremely efficient computation time in the
order of seconds - Graph Cut based move making algorithm for
Multilabel Functions - Primal Dual methods indicate how accurate we are
(duality gap).
38Solving the PN Potts Model
- Computing the optimal expansion move
Source
Ms
v1
v2
vn
Mt
Sink
39Solving the PN Potts Model
- Computing the optimal expansion move
Source
Ms
v1
v2
vn
Case 1 all ti 0 (xi xi )
Mt
Cost
Sink
40Solving the PN Potts Model
- Computing the optimal expansion move
Source
Ms
v1
v2
vn
Case 2 all ti 1 (xi a)
Mt
Cost
Sink
41Solving the PN Potts Model
- Computing the optimal expansion move
Source
Ms
v1
v2
vn
Case 3 ti 0,1 (xi xi , a)
Mt
Cost
Sink
42Source
Sink
43Source
Source Clique nodes
Pixel nodes
Sink Clique nodes
Sink
44Source
Sink
45Opens up to Hierarchies
Same sort of idea as deep belief nets but
tractable inference
46Overview of our Method
Higher Order Energy
Unary Potentials Shotton et al. ECCV 2006
Energy Minimization
Contrast Sensitive Pairwise Potentials
Segmentation Solution
Higher Order Potentials (Multiple Segmentations)
47Experimental results
Datasets MSRC (21), Sowerby (7)
Shotton et al. ECCV 2006
He et al. CVPR 04
48Qualitative Results
Image (MSRC-21)
Pairwise CRF
Higher order CRF
Ground Truth
Grass
Sheep
49Qualitative Results (Contd..)
Image (MSRC-21)
Pairwise CRF
Higher order CRF
Ground Truth
50Qualitative Results (Contd..)
Image (MSRC-21)
Pairwise CRF
Higher order CRF
Ground Truth
Results can be improved using image specific
colour models
Rother et al. SIGGRAPH 2004 Shotton et al. ECCV
2006
51Quantitative Results Problems
Rough ground truth segmentations
Fine structures have small influence on overall
pixel accuracy
52Generating Accurate Segmentations
- Generated accurate segmentation of 27 images
- 30 minutes per image
Image (MSRC-21)
Original Segmentation
New Segmentation
53Relationship between Qualitative and Quantitative
Results
Pairwise CRF
Higher order CRF
Ground Truth
Image (MSRC-21)
Overall Pixel Accuracy
95.8
98.7
Small changes in pixel accuracy can lead to large
improvements in segmentation results.
54Quantitative Accuracy
- Measure accuracy in labelling boundary pixels.
- Accuracy evaluated in boundary bands of variable
width
Hand-labelled Segmentation
Trimap (8-pixels)
Trimap (16-pixels)
Image (MSRC-21)
55Quantitative Accuracy
- Measure accuracy in labelling boundary pixels.
- Accuracy evaluated in boundary bands of variable
width
56Generating Multiple Segmentations
Sampling likely segmentations Tu and Zhu PAMI
2002 Segmentations at multiple scales Sharon et
al. CVPR 2001
Generate multiple segmentations by using varying
parameters of segmentation algorithms Russell
et al. CVPR 2006
Unsupervised Segmentation algorithms Comaniciu
and Meer PAMI 2002 Shi and Malik PAMI
2000 Felzenszwalb and Huttenlocher IJCV 2004
57Qualitative Results (Contd..)
Image (MSRC-21)
Pairwise CRF
Higher order CRF
Ground Truth
58Results
background
person
aeroplane
background
dinning table
person
background
horse
car
background
background
background
bird
train
VOC2008 image
Result
VOC2008 image
Result
59Extension to Video
- Tracking features
- Use of 3D
- Space time super pixels (on volume)
60Cues from Point-clouds, Brostow et al
Brostow et al (2008)
61From 3D to 2D
Brostow et al (2008)
62Point-clouds for object-class segmentation
- Cues
- SfM
- Texton
- Learning/inference
- Energy involves unary and pairwise terms
- Pixels
- Superpixels
- Sets of super pixels
63Higher Order Potential
- Single Segmentation?
- Combine multiple segmentations
64Unary
pairwise
higher order
G-Truth
Raw
65Result
Unary Pairwise Higher Order
G-Truth
Raw
66Results Summary
- Note some non super pixel friendly classes worse
e.g. pole, sign. - Combine detectors things with this.
- Detectors fit very naturally into this framework.
67Results for all test frames
Raw Image
Ground Truth
Unary Pairwise
Unary Pairwise Higher Order
68Conclusion
- Big problem CamVid
- only 700 images labelled,
- half used to test half to train.
- Training Data
- how to get much more
- Internet games, ESP, label me etc.
- Unsupervised training? Semi Supervised?
- Once got data
- how to do inference?
- large Scale learning?