Object Recognition and Scene Understanding: Looking for Effective Representations

About This Presentation

Title:

Object Recognition and Scene Understanding: Looking for Effective Representations

Description:

Object Recognition and Scene Understanding: Looking for Effective Representations –

Number of Views:280

Avg rating:3.0/5.0

Slides: 62

Provided by: statistics

Learn more at: http://vision.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Object Recognition and Scene Understanding: Looking for Effective Representations

1
Object Recognition and Scene Understanding
Looking for Effective Representations
Zhuowen Tu Lab of Neruo Imaging, Department of
Neurology Department of Computer
Science University of California, Los
Angeles Supported by ONR N00014-09-1-0099, NSF
0844566, NIH U54-RR021813
2
Outline

Auto-context (Tu 08, Tu and Jiang 09,
supervised, mostly implicit representation)
Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation)
Active Skeleton (Bai et al. 08, weakly
supervised, explicit representation)

3
Context
For object recognition, contexts come in from
both within-object (parts) and between-objects
(configurations).
4
Auto-Context Motivation
Data Observation
Label
Bayesian Approach
5
Challenges
Modeling It is often very hard to
learn p(XY) and p(Y) for complex patterns.
Computing Computing for the
optimal solution that maximizes the posterior is
not an easy task. A desired algorithm should bb
both efficient and effective.
We are looking for the joint statistics of
p(YX), context.
6
Problems with MRFs, BP, and CRFs

Use fixed topology on limited number of
neighborhood connections (context).
Usually slow and it takes many steps for the
message to propagate.
Not guaranteed to find the global optimal
solution.
Modeling and computing processes are separate
(maybe an advantage in some situations).

7
Auto-Context
8
A Classification Approach
Training Set
9
Auto-Context
Features (1) appearances on X(N), 20,000
Gradients, Gabor, Haar at different scales (2)
context (shape) on P, 10,000 on a fairly large
neighborhood
10
Results on Test Images
Final
Input
Step 1
Step 3
Input
Step 1
Step 3
Final
11
Comparisons with Classification Methods
12
Comparisons on Weismann Dataset
13
Additional Experiments
14
Images from Google
15
Convergence of Auto-Context
Theorem The auto-context algorithm monotonically
decreases the training error.
16
Human Body Configuration
Training images from Google
17
Results
18
24-class Object Labeling/Scene Parsing
(MSR-Cambridge, Shotton et al. ECCV 2006)
19
Scene Parsing/Labeling
sky
sky
sky
building
tree
mountain
airplane
tree
boat
water
grass
water
boat
road
road
sky
sky
building
tree
building
tree
road
building
car
human
car
grass
road
20
Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
21
Features Learned
22
More Extensions

A new multi-class classifier Data-assisted
Output Code (speed up training and testing)
Scale-space approach (less scale sensitive)
Region-based voting scheme (speed up testing)

23
Scale Space Patch
24
Region-based Voting Scheme
25
Element of a disease-specific atlas (Thompson and
Toga 2004)
26
Challenges for 3D Brain Segmentation

Large volume size.
Very weak intensity patterns.
Hard to capture 3D shape info.
Hard to capture the high-level knowledge and
adapt to different protocols.

27
Segmenting Caudate
BWH and UNC data for caudate segmentation
28
Grand Challenge Competition
29
Medical Image Segmentation (Morra et al.)
30
Robust Whole Brain Image Segmentation
31
Belief Propagation on (MRFs, CRFs)
y1
y2
y3
y5
y6
y4
Y
m23
m34
X
x1
X2
X3
X5
X6
X4
32
Conclusions for Auto-Context
Advantages

Learns low-level and context model in an
integrated framework.
Very easy to implement.
Significantly faster than MCMC and BP (3050
seconds) on MRFs or CRFs.
General and avoid heavy algorithm design.
Learning and computing use the same procedures.
Can be applied in other domains.

33
Conclusions for Auto-Context
Disadvantages

Require training for different problems.
Explicit high-level information is not included.
Training time might be long. (half day to a week)
Require all labeled data (fully supervised).

34
Outline

Auto-context (Tu 08, Tu and Jiang 09,
supervised, mostly implicit representation)
Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation)
Active Skeleton (Bai et al. 08, weakly
supervised, explicit representation)

35
Objects as Collections of Parts Previous Work
Algorithm Parts vs No parts Discriminative vs Generative Level of supervision
Viola 01 no parts bottom-up features discriminative supervised (objects)
Viola 05 no parts bottom-up features discriminative weakly supervised
Brunelli 93, Poggio 01 parts hand designed discriminative supervised (parts)
Weber 00, Fergus 03, Feifei 03 parts learned from detected points generative weakly supervised
Fischler 73, Huttenlocher 00 parts learned generative weakly supervised
Amit 98, Agarwal 02, Vidal 03 parts learned from detected points discriminative weakly supervised
MCL (our work) parts learned discriminative weakly supervised
36
Objects as Collections of Parts MCL

Multiple Component Learning
Learn part based classifier with weak supervision
Object labels provided, but no part labels
Part classifiers learned in unsupervised manner
Part classifiers are complex models (rather than
Gaussian distributions, or template matching)
Run time is fast (no inference since model is
discriminative)

37
Pedestrian Detection
pos
neg
Results
Training Sample
Bag (all 25x25 patches)
38
Core Computation Routine

Use weakly supervised learning to learn
components
Specifically, use algorithms developed for
Multiple Instance Learning (MIL)
MIL has well developed theory
practical MIL algorithms
In boosting terminology
Use MIL to obtain weak classifiers (components)
Combine components into strong classifier

39
Multiple Instance Learning
Dietterich 97

Training data given in bags weakly supervised
If all instances in bag are negative, bag is
negative
Bag is positive if at least 1 instance in bag is
positive
Goal is to learn instance classifier f
If oracle gave positive instance j for each
positive bag, could train f using standard
supervised learning

40
MIL Example

Object detection with weak supervision
Positive bag image contains object
Goal to train standard object detector
Example positive bag

Viola 05
41
MIL vs MCL
MIL
MCL

Input Data (bags)
Target Label
Goal

Input Data (bags)
Target Label
Goal

42
Standard vs MIL vs MCL
Standard
? Given Label
? Target Decision Boundary
43
MIL Results e
Training Positive Bags
Training Negative Bag
44
MIL Results e
Test Image
45
Learning Single Component

Note
So first formulation of MCL with k1 equivalent
to MIL
Can also show reduction for kgt1, but training
exponential in k
Therefore existing MIL algorithms provide
mechanism to learn single components

? MCL (k1)
? MIL
46
Learning Multiple Components

Additive Formulation
Additive models are simple but powerful
Prevalent in statistics, rich theory
Can use boosting to train additive model

47
Learning Multiple Components

General algorithm
Use MIL to obtain weak classifiers (components)
Use boosting to combine components into strong
classifier
AdaBoost for MCL
RealBoost for MCL

48
Object Detection
object vs. background
Frontal Faces
Motorbikes
Spotted Cats
Rigid
Articulated

Not optimized for absolute performance

49
Pedestrian Detection

Inria Dataset Dalal Triggs 2005
1213 Training Positives ( reflections)
O(2000) background training images
Test set about ½ as big
Verification task
Does window contain pedestrian?
Challenging dataset, much recent work

Specialized version of MCL
Optimize MIL training
Incorporate spatial model

50
Latent SVM (Felzenszwalb, Allester and Ramanan,
cvpr08)
51
Pedestrian Detection (a benchmark Dollar et al.
cvpr09)
A more informative measure per image evaluation.
100k images with 155k pedestrians.
We are still not there yet!
52
MCL Summary

Advantages
General notion of parts (components)
Components learned in unsupervised manner
Principled, general algorithm
State of the art results with simple features
Disadvantages
Requires large amount of training data, learning
phase complex
Learns multiple classifiers, means testing slower

53
Outline

Auto-context (Tu 08, Jiang and Tu 09,
supervised, mostly implicit representation)
Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation)
Active Skeleton (Bai et al. 09,
weakly-supervised, explicit representation)

54
Active Skeleton
55
Representation
56
Detection
57
Some Results
58
Thank you!Questions?
59
Related Work

M. Fink and P. Perona, Mutual Boosting for
Contextual Inference, NIPS 2003.
A. Torralba, K. P. Murphy, and W. T. Freeman,
Contextual Models for Object Detection Using
Boosted Random Fields, NIPS, 2004.
S. Avidan, SpatialBoost Adding Spatial
Reasoning to AdaBoost, ECCV, 2006.
D. Hoiem, A. Efros, and M. Hebert, Putting
Objects in Perspective, CVPR 2006.
S. Kumar and M. Hebert, Discriminative random
fields a discriminative framework for contextual
interaction in classification'', ICCV, 2003.
J. Shotton, J. Winn, C. Rother, and A. Criminisi,
TextonBoost Joint Appearance, Shape and Context
Modeling for Mult-Class Object Recognition and
Segmentation, ECCV, 2006.
A. Rabinovich, A. Vedaldi, C. Galleguillos, E.
Wiewiora, and S. Belongie, Objects in Context,
ICCV, 2007.
.

60
Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
61
Multiclass classifier DAOC
Code bits Class 1 0 1
0 Class 2 0 0 1 Class
3 1 1 1 Class 4 1
0 1

Write a Comment

User Comments (0)