Object Recognition and Scene Understanding: Looking for Effective Representations

About This Presentation
Title:

Object Recognition and Scene Understanding: Looking for Effective Representations

Description:

Object Recognition and Scene Understanding: Looking for Effective Representations –

Number of Views:280
Avg rating:3.0/5.0
Slides: 62
Provided by: statistics
Category:

less

Transcript and Presenter's Notes

Title: Object Recognition and Scene Understanding: Looking for Effective Representations


1
Object Recognition and Scene Understanding
Looking for Effective Representations
Zhuowen Tu Lab of Neruo Imaging, Department of
Neurology Department of Computer
Science University of California, Los
Angeles Supported by ONR N00014-09-1-0099, NSF
0844566, NIH U54-RR021813
2
Outline
  1. Auto-context (Tu 08, Tu and Jiang 09,
    supervised, mostly implicit representation)
  2. Multiple Component Learning (Dollar et al. 08,
    weakly supervised, implicit explicit
    representation)
  3. Active Skeleton (Bai et al. 08, weakly
    supervised, explicit representation)

3
Context
For object recognition, contexts come in from
both within-object (parts) and between-objects
(configurations).
4
Auto-Context Motivation
Data Observation
Label
Bayesian Approach
5
Challenges
Modeling It is often very hard to
learn p(XY) and p(Y) for complex patterns.
Computing Computing for the
optimal solution that maximizes the posterior is
not an easy task. A desired algorithm should bb
both efficient and effective.
We are looking for the joint statistics of
p(YX), context.
6
Problems with MRFs, BP, and CRFs
  • Use fixed topology on limited number of
    neighborhood connections (context).
  • Usually slow and it takes many steps for the
    message to propagate.
  • Not guaranteed to find the global optimal
    solution.
  • Modeling and computing processes are separate
    (maybe an advantage in some situations).

7
Auto-Context
8
A Classification Approach
Training Set
9
Auto-Context
Features (1) appearances on X(N), 20,000
Gradients, Gabor, Haar at different scales (2)
context (shape) on P, 10,000 on a fairly large
neighborhood
10
Results on Test Images
Final
Input
Step 1
Step 3
Input
Step 1
Step 3
Final
11
Comparisons with Classification Methods
12
Comparisons on Weismann Dataset
13
Additional Experiments
14
Images from Google
15
Convergence of Auto-Context
Theorem The auto-context algorithm monotonically
decreases the training error.
16
Human Body Configuration
Training images from Google
17
Results
18
24-class Object Labeling/Scene Parsing
(MSR-Cambridge, Shotton et al. ECCV 2006)
19
Scene Parsing/Labeling
sky
sky
sky
building
tree
mountain
airplane
tree
boat
water
grass
water
boat
road
road
sky
sky
building
tree
building
tree
road
building
car
human
car
grass
road
20
Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
21
Features Learned
22
More Extensions
  1. A new multi-class classifier Data-assisted
    Output Code (speed up training and testing)
  2. Scale-space approach (less scale sensitive)
  3. Region-based voting scheme (speed up testing)

23
Scale Space Patch
24
Region-based Voting Scheme
25
Element of a disease-specific atlas (Thompson and
Toga 2004)
26
Challenges for 3D Brain Segmentation
  1. Large volume size.
  2. Very weak intensity patterns.
  3. Hard to capture 3D shape info.
  4. Hard to capture the high-level knowledge and
    adapt to different protocols.

27
Segmenting Caudate
BWH and UNC data for caudate segmentation
28
Grand Challenge Competition
29
Medical Image Segmentation (Morra et al.)
30
Robust Whole Brain Image Segmentation
31
Belief Propagation on (MRFs, CRFs)
y1
y2
y3
y5
y6
y4
Y
m23
m34
X
x1
X2
X3
X5
X6
X4
32
Conclusions for Auto-Context
Advantages
  • Learns low-level and context model in an
    integrated framework.
  • Very easy to implement.
  • Significantly faster than MCMC and BP (3050
    seconds) on MRFs or CRFs.
  • General and avoid heavy algorithm design.
  • Learning and computing use the same procedures.
  • Can be applied in other domains.

33
Conclusions for Auto-Context
Disadvantages
  • Require training for different problems.
  • Explicit high-level information is not included.
  • Training time might be long. (half day to a week)
  • Require all labeled data (fully supervised).

34
Outline
  1. Auto-context (Tu 08, Tu and Jiang 09,
    supervised, mostly implicit representation)
  2. Multiple Component Learning (Dollar et al. 08,
    weakly supervised, implicit explicit
    representation)
  3. Active Skeleton (Bai et al. 08, weakly
    supervised, explicit representation)

35
Objects as Collections of Parts Previous Work
Algorithm Parts vs No parts Discriminative vs Generative Level of supervision
Viola 01 no parts bottom-up features discriminative supervised (objects)
Viola 05 no parts bottom-up features discriminative weakly supervised
Brunelli 93, Poggio 01 parts hand designed discriminative supervised (parts)
Weber 00, Fergus 03, Feifei 03 parts learned from detected points generative weakly supervised
Fischler 73, Huttenlocher 00 parts learned generative weakly supervised
Amit 98, Agarwal 02, Vidal 03 parts learned from detected points discriminative weakly supervised
MCL (our work) parts learned discriminative weakly supervised
36
Objects as Collections of Parts MCL
  • Multiple Component Learning
  • Learn part based classifier with weak supervision
  • Object labels provided, but no part labels
  • Part classifiers learned in unsupervised manner
  • Part classifiers are complex models (rather than
    Gaussian distributions, or template matching)
  • Run time is fast (no inference since model is
    discriminative)

37
Pedestrian Detection
pos
neg
Results
Training Sample
Bag (all 25x25 patches)
38
Core Computation Routine
  • Use weakly supervised learning to learn
    components
  • Specifically, use algorithms developed for
    Multiple Instance Learning (MIL)
  • MIL has well developed theory
  • practical MIL algorithms
  • In boosting terminology
  • Use MIL to obtain weak classifiers (components)
  • Combine components into strong classifier

39
Multiple Instance Learning
Dietterich 97
  • Training data given in bags weakly supervised
  • If all instances in bag are negative, bag is
    negative
  • Bag is positive if at least 1 instance in bag is
    positive
  • Goal is to learn instance classifier f
  • If oracle gave positive instance j for each
    positive bag, could train f using standard
    supervised learning

40
MIL Example
  • Object detection with weak supervision
  • Positive bag image contains object
  • Goal to train standard object detector
  • Example positive bag

Viola 05
41
MIL vs MCL
MIL
MCL
  • Input Data (bags)
  • Target Label
  • Goal
  • Input Data (bags)
  • Target Label
  • Goal

42
Standard vs MIL vs MCL
Standard
? Given Label
? Target Decision Boundary
43
MIL Results e
Training Positive Bags
Training Negative Bag
44
MIL Results e
Test Image
45
Learning Single Component
  • Note
  • So first formulation of MCL with k1 equivalent
    to MIL
  • Can also show reduction for kgt1, but training
    exponential in k
  • Therefore existing MIL algorithms provide
    mechanism to learn single components

? MCL (k1)
? MIL
46
Learning Multiple Components
  • Additive Formulation
  • Additive models are simple but powerful
  • Prevalent in statistics, rich theory
  • Can use boosting to train additive model

47
Learning Multiple Components
  • General algorithm
  • Use MIL to obtain weak classifiers (components)
  • Use boosting to combine components into strong
    classifier
  • AdaBoost for MCL
  • RealBoost for MCL

48
Object Detection
object vs. background
Frontal Faces
Motorbikes
Spotted Cats
Rigid
Articulated
  • Not optimized for absolute performance

49
Pedestrian Detection
  • Inria Dataset Dalal Triggs 2005
  • 1213 Training Positives ( reflections)
  • O(2000) background training images
  • Test set about ½ as big
  • Verification task
  • Does window contain pedestrian?
  • Challenging dataset, much recent work
  • Specialized version of MCL
  • Optimize MIL training
  • Incorporate spatial model

50
Latent SVM (Felzenszwalb, Allester and Ramanan,
cvpr08)
51
Pedestrian Detection (a benchmark Dollar et al.
cvpr09)
A more informative measure per image evaluation.
100k images with 155k pedestrians.
We are still not there yet!
52
MCL Summary
  • Advantages
  • General notion of parts (components)
  • Components learned in unsupervised manner
  • Principled, general algorithm
  • State of the art results with simple features
  • Disadvantages
  • Requires large amount of training data, learning
    phase complex
  • Learns multiple classifiers, means testing slower

53
Outline
  1. Auto-context (Tu 08, Jiang and Tu 09,
    supervised, mostly implicit representation)
  2. Multiple Component Learning (Dollar et al. 08,
    weakly supervised, implicit explicit
    representation)
  3. Active Skeleton (Bai et al. 09,
    weakly-supervised, explicit representation)

54
Active Skeleton
55
Representation
56
Detection
57
Some Results
58
Thank you!Questions?
59
Related Work
  • M. Fink and P. Perona, Mutual Boosting for
    Contextual Inference, NIPS 2003.
  • A. Torralba, K. P. Murphy, and W. T. Freeman,
    Contextual Models for Object Detection Using
    Boosted Random Fields, NIPS, 2004.
  • S. Avidan, SpatialBoost Adding Spatial
    Reasoning to AdaBoost, ECCV, 2006.
  • D. Hoiem, A. Efros, and M. Hebert, Putting
    Objects in Perspective, CVPR 2006.
  • S. Kumar and M. Hebert, Discriminative random
    fields a discriminative framework for contextual
    interaction in classification'', ICCV, 2003.
  • J. Shotton, J. Winn, C. Rother, and A. Criminisi,
    TextonBoost Joint Appearance, Shape and Context
    Modeling for Mult-Class Object Recognition and
    Segmentation, ECCV, 2006.
  • A. Rabinovich, A. Vedaldi, C. Galleguillos, E.
    Wiewiora, and S. Belongie, Objects in Context,
    ICCV, 2007.
  • .

60
Confusion Matrix
Average pixel accuracy77.7
Shotton et al. (ECCV 2006) 72.2
61
Multiclass classifier DAOC
Code bits Class 1 0 1
0 Class 2 0 0 1 Class
3 1 1 1 Class 4 1
0 1
Write a Comment
User Comments (0)
About PowerShow.com