Title: Object Recognition and Scene Understanding: Looking for Effective Representations
1Object Recognition and Scene Understanding
Looking for Effective Representations
Zhuowen Tu Lab of Neruo Imaging, Department of
Neurology Department of Computer
Science University of California, Los
Angeles Supported by ONR N00014-09-1-0099, NSF
0844566, NIH U54-RR021813
2Outline
- Auto-context (Tu 08, Tu and Jiang 09,
supervised, mostly implicit representation) - Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation) - Active Skeleton (Bai et al. 08, weakly
supervised, explicit representation)
3Context
For object recognition, contexts come in from
both within-object (parts) and between-objects
(configurations).
4Auto-Context Motivation
Data Observation
Label
Bayesian Approach
5Challenges
Modeling It is often very hard to
learn p(XY) and p(Y) for complex patterns.
Computing Computing for the
optimal solution that maximizes the posterior is
not an easy task. A desired algorithm should bb
both efficient and effective.
We are looking for the joint statistics of
p(YX), context.
6Problems with MRFs, BP, and CRFs
- Use fixed topology on limited number of
neighborhood connections (context). - Usually slow and it takes many steps for the
message to propagate. - Not guaranteed to find the global optimal
solution. - Modeling and computing processes are separate
(maybe an advantage in some situations).
7Auto-Context
8A Classification Approach
Training Set
9Auto-Context
Features (1) appearances on X(N), 20,000
Gradients, Gabor, Haar at different scales (2)
context (shape) on P, 10,000 on a fairly large
neighborhood
10Results on Test Images
Final
Input
Step 1
Step 3
Input
Step 1
Step 3
Final
11Comparisons with Classification Methods
12Comparisons on Weismann Dataset
13Additional Experiments
14Images from Google
15Convergence of Auto-Context
Theorem The auto-context algorithm monotonically
decreases the training error.
16Human Body Configuration
Training images from Google
17Results
1824-class Object Labeling/Scene Parsing
(MSR-Cambridge, Shotton et al. ECCV 2006)
19Scene Parsing/Labeling
sky
sky
sky
building
tree
mountain
airplane
tree
boat
water
grass
water
boat
road
road
sky
sky
building
tree
building
tree
road
building
car
human
car
grass
road
20Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
21Features Learned
22More Extensions
- A new multi-class classifier Data-assisted
Output Code (speed up training and testing) - Scale-space approach (less scale sensitive)
- Region-based voting scheme (speed up testing)
23Scale Space Patch
24Region-based Voting Scheme
25Element of a disease-specific atlas (Thompson and
Toga 2004)
26Challenges for 3D Brain Segmentation
- Large volume size.
- Very weak intensity patterns.
- Hard to capture 3D shape info.
- Hard to capture the high-level knowledge and
adapt to different protocols.
27Segmenting Caudate
BWH and UNC data for caudate segmentation
28Grand Challenge Competition
29Medical Image Segmentation (Morra et al.)
30Robust Whole Brain Image Segmentation
31Belief Propagation on (MRFs, CRFs)
y1
y2
y3
y5
y6
y4
Y
m23
m34
X
x1
X2
X3
X5
X6
X4
32Conclusions for Auto-Context
Advantages
- Learns low-level and context model in an
integrated framework. - Very easy to implement.
- Significantly faster than MCMC and BP (3050
seconds) on MRFs or CRFs. - General and avoid heavy algorithm design.
- Learning and computing use the same procedures.
- Can be applied in other domains.
33Conclusions for Auto-Context
Disadvantages
- Require training for different problems.
- Explicit high-level information is not included.
- Training time might be long. (half day to a week)
- Require all labeled data (fully supervised).
34Outline
- Auto-context (Tu 08, Tu and Jiang 09,
supervised, mostly implicit representation) - Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation) - Active Skeleton (Bai et al. 08, weakly
supervised, explicit representation)
35Objects as Collections of Parts Previous Work
Algorithm Parts vs No parts Discriminative vs Generative Level of supervision
Viola 01 no parts bottom-up features discriminative supervised (objects)
Viola 05 no parts bottom-up features discriminative weakly supervised
Brunelli 93, Poggio 01 parts hand designed discriminative supervised (parts)
Weber 00, Fergus 03, Feifei 03 parts learned from detected points generative weakly supervised
Fischler 73, Huttenlocher 00 parts learned generative weakly supervised
Amit 98, Agarwal 02, Vidal 03 parts learned from detected points discriminative weakly supervised
MCL (our work) parts learned discriminative weakly supervised
36Objects as Collections of Parts MCL
- Multiple Component Learning
- Learn part based classifier with weak supervision
- Object labels provided, but no part labels
- Part classifiers learned in unsupervised manner
- Part classifiers are complex models (rather than
Gaussian distributions, or template matching) - Run time is fast (no inference since model is
discriminative)
37Pedestrian Detection
pos
neg
Results
Training Sample
Bag (all 25x25 patches)
38Core Computation Routine
- Use weakly supervised learning to learn
components - Specifically, use algorithms developed for
Multiple Instance Learning (MIL) - MIL has well developed theory
- practical MIL algorithms
- In boosting terminology
- Use MIL to obtain weak classifiers (components)
- Combine components into strong classifier
39Multiple Instance Learning
Dietterich 97
- Training data given in bags weakly supervised
- If all instances in bag are negative, bag is
negative - Bag is positive if at least 1 instance in bag is
positive - Goal is to learn instance classifier f
- If oracle gave positive instance j for each
positive bag, could train f using standard
supervised learning
40MIL Example
- Object detection with weak supervision
- Positive bag image contains object
- Goal to train standard object detector
- Example positive bag
Viola 05
41 MIL vs MCL
MIL
MCL
- Input Data (bags)
- Target Label
- Goal
- Input Data (bags)
- Target Label
- Goal
42 Standard vs MIL vs MCL
Standard
? Given Label
? Target Decision Boundary
43MIL Results e
Training Positive Bags
Training Negative Bag
44MIL Results e
Test Image
45Learning Single Component
- Note
- So first formulation of MCL with k1 equivalent
to MIL - Can also show reduction for kgt1, but training
exponential in k - Therefore existing MIL algorithms provide
mechanism to learn single components
? MCL (k1)
? MIL
46Learning Multiple Components
- Additive Formulation
- Additive models are simple but powerful
- Prevalent in statistics, rich theory
- Can use boosting to train additive model
47Learning Multiple Components
- General algorithm
- Use MIL to obtain weak classifiers (components)
- Use boosting to combine components into strong
classifier - AdaBoost for MCL
- RealBoost for MCL
48Object Detection
object vs. background
Frontal Faces
Motorbikes
Spotted Cats
Rigid
Articulated
- Not optimized for absolute performance
49Pedestrian Detection
- Inria Dataset Dalal Triggs 2005
- 1213 Training Positives ( reflections)
- O(2000) background training images
- Test set about ½ as big
- Verification task
- Does window contain pedestrian?
- Challenging dataset, much recent work
- Specialized version of MCL
- Optimize MIL training
- Incorporate spatial model
50Latent SVM (Felzenszwalb, Allester and Ramanan,
cvpr08)
51Pedestrian Detection (a benchmark Dollar et al.
cvpr09)
A more informative measure per image evaluation.
100k images with 155k pedestrians.
We are still not there yet!
52MCL Summary
- Advantages
- General notion of parts (components)
- Components learned in unsupervised manner
- Principled, general algorithm
- State of the art results with simple features
- Disadvantages
- Requires large amount of training data, learning
phase complex - Learns multiple classifiers, means testing slower
53Outline
- Auto-context (Tu 08, Jiang and Tu 09,
supervised, mostly implicit representation) - Multiple Component Learning (Dollar et al. 08,
weakly supervised, implicit explicit
representation) - Active Skeleton (Bai et al. 09,
weakly-supervised, explicit representation)
54Active Skeleton
55Representation
56Detection
57Some Results
58Thank you!Questions?
59Related Work
- M. Fink and P. Perona, Mutual Boosting for
Contextual Inference, NIPS 2003. - A. Torralba, K. P. Murphy, and W. T. Freeman,
Contextual Models for Object Detection Using
Boosted Random Fields, NIPS, 2004. - S. Avidan, SpatialBoost Adding Spatial
Reasoning to AdaBoost, ECCV, 2006. - D. Hoiem, A. Efros, and M. Hebert, Putting
Objects in Perspective, CVPR 2006. - S. Kumar and M. Hebert, Discriminative random
fields a discriminative framework for contextual
interaction in classification'', ICCV, 2003. - J. Shotton, J. Winn, C. Rother, and A. Criminisi,
TextonBoost Joint Appearance, Shape and Context
Modeling for Mult-Class Object Recognition and
Segmentation, ECCV, 2006. - A. Rabinovich, A. Vedaldi, C. Galleguillos, E.
Wiewiora, and S. Belongie, Objects in Context,
ICCV, 2007. - .
60Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
61Multiclass classifier DAOC
Code bits Class 1 0 1
0 Class 2 0 0 1 Class
3 1 1 1 Class 4 1
0 1