Title: Part 3: classifier based methods
1Part 3 classifier based methods
Antonio Torralba
2Overview of section
- A short story of discriminative methods
- Object detection with classifiers
- Boosting
- Gentle boosting
- Weak detectors
- Object model
- Object detection
- Multiclass object detection
- Context based object recognition
3Classifier based methods
Object detection and recognition is formulated as
a classification problem.
The image is partitioned into a set of
overlapping windows
and a decision is taken at each window about if
it contains a target object or not.
Where are the screens?
4Discriminative vs. generative
x data
5- The representation and matching of pictorial
structures Fischler, Elschlager (1973). - Face recognition using eigenfaces M. Turk and A.
Pentland (1991). - Human Face Detection in Visual Scenes - Rowley,
Baluja, Kanade (1995) - Graded Learning for Object Detection - Fleuret,
Geman (1999) - Robust Real-time Object Detection - Viola, Jones
(2001) - Feature Reduction and Hierarchy of Classifiers
for Fast Object Detection in Video Images -
Heisele, Serre, Mukherjee, Poggio (2001) - .
6- The representation and matching of pictorial
structures Fischler, Elschlager (1973). - Face recognition using eigenfaces M. Turk and A.
Pentland (1991). - Human Face Detection in Visual Scenes - Rowley,
Baluja, Kanade (1995) - Graded Learning for Object Detection - Fleuret,
Geman (1999) - Robust Real-time Object Detection - Viola, Jones
(2001) - Feature Reduction and Hierarchy of Classifiers
for Fast Object Detection in Video Images -
Heisele, Serre, Mukherjee, Poggio (2001) - .
7Face detection
- The representation and matching of pictorial
structures Fischler, Elschlager (1973). - Face recognition using eigenfaces M. Turk and A.
Pentland (1991). - Human Face Detection in Visual Scenes - Rowley,
Baluja, Kanade (1995) - Graded Learning for Object Detection - Fleuret,
Geman (1999) - Robust Real-time Object Detection - Viola, Jones
(2001) - Feature Reduction and Hierarchy of Classifiers
for Fast Object Detection in Video Images -
Heisele, Serre, Mukherjee, Poggio (2001) - .
8Face detection
9Formulation
- Formulation binary classification
x1
x2
x3
xN
xN1
xN2
xNM
Features x
1
-1
-1
-1
?
?
?
y
Labels
Training data each image patch is labeled as
containing the object or background
Test data
- Minimize misclassification error
- (Not that simple we need some guarantees that
there will be generalization)
10Discriminative methods
Nearest neighbor
Neural networks
106 examples
LeCun, Bottou, Bengio, Haffner 1998 Rowley,
Baluja, Kanade 1998
Shakhnarovich, Viola, Darrell 2003 Berg, Berg,
Malik 2005
Conditional Random Fields
Support Vector Machines and Kernels
Guyon, Vapnik Heisele, Serre, Poggio, 2001
McCallum, Freitag, Pereira 2000 Kumar, Hebert
2003
11A simple object detector with Boosting
- Download
- Toolbox for manipulating dataset
- Code and dataset
- Matlab code
- Gentle boosting
- Object detector using a part based model
- Dataset with cars and computer monitors
http//people.csail.mit.edu/torralba/iccv2005/
12Why boosting?
- A simple algorithm for learning robust
classifiers - Freund Shapire, 1995
- Friedman, Hastie, Tibshhirani, 1998
- Provides efficient algorithm for sparse visual
feature selection - Tieu Viola, 2000
- Viola Jones, 2003
- Easy to implement, not requires external
optimization tools.
13Boosting
- Boosting fits the additive model
by minimizing the exponential loss
Training samples
The exponential loss is a differentiable upper
bound to the misclassification error.
14Boosting
Sequential procedure. At each step we add
to minimize the residual loss
input
Desired output
Parameters weak classifier
For more details Friedman, Hastie, Tibshirani.
Additive Logistic Regression a Statistical View
of Boosting (1998)
15Weak classifiers
- The input is a set of weighted training samples
(x,y,w) - Regression stumps simple but commonly used in
object detection.
fm(x)
bEw(y xgt q)
aEw(y xlt q)
Four parameters
x
q
16Flavors of boosting
- AdaBoost (Freund and Shapire, 1995)
- Real AdaBoost (Friedman et al, 1998)
- LogitBoost (Friedman et al, 1998)
- Gentle AdaBoost (Friedman et al, 1998)
- BrownBoosting (Freund, 2000)
- FloatBoost (Li et al, 2002)
-
17From images to featuresA myriad of weak
detectors
- We will now define a family of visual features
that can be used as weak classifiers (weak
detectors)
Takes image as input and the output is binary
response. The output is a weak detector.
18A myriad of weak detectors
- Yuille, Snow, Nitzbert, 1998
- Amit, Geman 1998
- Papageorgiou, Poggio, 2000
- Heisele, Serre, Poggio, 2001
- Agarwal, Awan, Roth, 2004
- Schneiderman, Kanade 2004
- Carmichael, Hebert 2004
-
19Weak detectors
- Textures of textures
- Tieu and Viola, CVPR 2000
Every combination of three filters generates a
different feature
This gives thousands of features. Boosting
selects a sparse subset, so computations on test
time are very efficient. Boosting also avoids
overfitting to some extend.
20Haar wavelets
- Haar filters and integral image
- Viola and Jones, ICCV 2001
The average intensity in the block is computed
with four sums independently of the block size.
21Haar wavelets
Papageorgiou Poggio (2000)
Polynomial SVM
22Edges and chamfer distance
Gavrila, Philomin, ICCV 1999
23Edge fragments
Opelt, Pinz, Zisserman, ECCV 2006
Weak detector k edge fragments and threshold.
Chamfer distance uses 8 orientation planes
24Histograms of oriented gradients
- Shape context
- Belongie, Malik, Puzicha, NIPS 2000
25Weak detectors
- Part based similar to part-based generative
models. We create weak detectors by using parts
and voting for the object center location
Screen model
Car model
These features are used for the detector on the
course web site.
26Weak detectors
First we collect a set of part templates from a
set of training objects. Vidal-Naquet, Ullman,
Nature Neuroscience 2003
27Weak detectors
We now define a family of weak detectors as
Better than chance
28Weak detectors
We can do a better job using filtered images
Still a weak detector but better than before
29Training
First we evaluate all the N features on all the
training images.
Then, we sample the feature outputs on the object
center and at random locations in the background
30Representation and object model
Selected features for the screen detector
Lousy painter
31Representation and object model
Selected features for the car detector
100
3
2
4
1
10
32Detection
- Invariance search strategy
- Part based
Here, invariance in translation and scale is
achieved by the search strategy the classifier
is evaluated at all locations (by translating the
image) and at all scales (by scaling the image in
small steps). The search cost can be reduced
using a cascade.
33Example screen detection
Feature output
34Example screen detection
Thresholded output
Feature output
Weak detector
Produces many false alarms.
35Example screen detection
Thresholded output
Feature output
Strong classifier at iteration 1
36Example screen detection
Thresholded output
Feature output
Strong classifier
Second weak detector
Produces a different set of false alarms.
37Example screen detection
Thresholded output
Feature output
Strong classifier
Strong classifier at iteration 2
38Example screen detection
Thresholded output
Feature output
Strong classifier
Strong classifier at iteration 10
39Example screen detection
Thresholded output
Feature output
Strong classifier
Adding features
Final classification
Strong classifier at iteration 200
40Cascade of classifiers
- Fleuret and Geman 2001, Viola and Jones 2001
100 features
30 features
3 features
We want the complexity of the 3 features
classifier with the performance of the 100
features classifier
Select a threshold with high recall for each
stage. We increase precision using the cascade
41Some goals for object recognition
- Able to detect and recognize many object classes
- Computationally efficient
- Able to deal with data starving situations
- Some training samples might be harder to collect
than others - We want on-line learning to be fast
42Multiclass object detection
43Multiclass object detection
44Shared features
- Is learning the object class 1000 easier than
learning the first? - Can we transfer knowledge from one object to
another? - Are the shared properties interesting by
themselves?
45Multitask learning
R. Caruana. Multitask Learning. ML 1997
Primary task detect door knobs
Tasks used
- horizontal location of right door jamb
- width of left door jamb
- width of right door jamb
- horizontal location of left edge of door
- horizontal location of right edge of door
- horizontal location of doorknob
- single or double door
- horizontal location of doorway center
- width of doorway
- horizontal location of left door jamb
46Sharing invariances
S. Thrun. Is Learning the n-th Thing Any Easier
Than Learning The First? NIPS 1996 Knowledge is
transferred between tasks via a learned model of
the invariances of the domain object recognition
is invariant to rotation, translation, scaling,
lighting, These invariances are common to all
object recognition tasks.
Toy world
With sharing
Without sharing
47Sharing transformations
- Miller, E., Matsakis, N., and Viola, P. (2000).
Learning from one example through shared
densities on transforms. In IEEE Computer Vision
and Pattern Recognition.
Transformations are shared and can be learnt from
other tasks.
48Models of object recognition
I. Biederman, Recognition-by-components A
theory of human image understanding,
Psychological Review, 1987. M. Riesenhuber and
T. Poggio, Hierarchical models of object
recognition in cortex, Nature Neuroscience 1999.
T. Serre, L. Wolf and T. Poggio. Object
recognition with features inspired by visual
cortex. CVPR 2005
49Sharing in constellation models
Pictorial StructuresFischler Elschlager, IEEE
Trans. Comp. 1973
SVM DetectorsHeisele, Poggio, et. al., NIPS 2001
Constellation Model Burl, Liung,Perona, 1996
Weber, Welling, Perona, 2000 Fergus, Perona,
Zisserman, CVPR 2003
Model-Guided SegmentationMori, Ren, Efros,
Malik, CVPR 2004
50Variational EM
Random initialization
Fei-Fei, Fergus, Perona, ICCV 2003
(Attias, Hinton, Beal, etc.)
Slide from Fei Fei Li
51Reusable Parts
Krempp, Geman, Amit Sequential Learning of
Reusable Parts for Object Detection. TR 2002
Goal Look for a vocabulary of edges that reduces
the number of features.
Examples of reused parts
Number of features
Number of classes
52Sharing patches
For a new class, use only features similar to
features that where good for other classes
Proposed Dog features
53Multiclass boosting
- Adaboost.MH (Shapire Singer, 2000)
- Error correcting output codes (Dietterich
Bakiri, 1995 ) - Lk-TreeBoost (Friedman, 2001)
- ...
54Shared features
- Independent binary classifiers
Screen detector
Car detector
Face detector
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
5550 training samples/class 29 object classes 2000
entries in the dictionary Results averaged on 20
runs Error bars 80 interval
Class-specific features
Shared features
Krempp, Geman, Amit, 2002 Torralba, Murphy,
Freeman. CVPR 2004
56Generalization as a function of object
similarities
K 2.1
K 4.8
Area under ROC
Area under ROC
Number of training samples per class
Number of training samples per class
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
57Generalization
Efficiency
Opelt, Pinz, Zisserman, CVPR 2006
58Some references on multiclass
- Caruana 1997
- Schapire, Singer, 2000
- Thrun, Pratt 1997
- Krempp, Geman, Amit, 2002
- E.L.Miller, Matsakis, Viola, 2000
- Mahamud, Hebert, Lafferty, 2001
- Fink 2004
- LeCun, Huang, Bottou, 2004
- Holub, Welling, Perona, 2005
-
59Context based methods
Antonio Torralba
60Why is this hard?
61What are the hidden objects?
1
2
62What are the hidden objects?
Chance 1/30000
63Context-based object recognition
- Cognitive psychology
- Palmer 1975
- Biederman 1981
-
- Computer vision
- Noton and Stark (1971)
- Hanson and Riseman (1978)
- Barrow Tenenbaum (1978)
- Ohta, kanade, Skai (1978)
- Haralick (1983)
- Strat and Fischler (1991)
- Bobick and Pinhanez (1995)
- Campbell et al (1997)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Global and local representations
building
Urban street scene
car
sidewalk
68Global and local representations
building
Urban street scene
car
sidewalk
Image index Summary statistics, configuration
of textures
Urban street scene
histogram
features
69Global scene representations
Spatially organized textures
Bag of words
M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A.
Torralba, IJCV 2001
Sivic, Russell, Freeman, Zisserman, ICCV
2005 Fei-Fei and Perona, CVPR 2005 Bosch,
Zisserman, Munoz, ECCV 2006
Non localized textons
S. Lazebnik, et al, CVPR 2006
Walker, Malik. Vision Research 2004
Spatial structure is important in order to
provide context for object localization
70Contextual object relationships
Carbonetto, de Freitas Barnard (2004)
Kumar, Hebert (2005)
Torralba Murphy Freeman (2004)
E. Sudderth et al (2005)
Fink Perona (2003)
71Context
- Murphy, Torralba Freeman (NIPS 03)
- Use global context to predict presence and
location of objects
Keyboards
723d Scene Context
Image
World
Hoiem, Efros, Hebert ICCV 2005
733d Scene Context
Image
Support
Vertical
Sky
V-Center
V-Left
V-Right
V-Porous
V-Solid
Hoiem, Efros, Hebert ICCV 2005
74Object-Object Relationships
- Enforce spatial consistency between labels using
MRF
Carbonetto, de Freitas Barnard (04)
75Object-Object Relationships
- Use latent variables to induce long distance
correlations between labels in a Conditional
Random Field (CRF)
He, Zemel Carreira-Perpinan (04)
76Object-Object Relationships
- Fink Perona (NIPS 03)
- Use output of boosting from other objects at
previous iterations as input into boosting for
this iteration
77CRFsObject-Object Relationships
Torralba Murphy Freeman 2004
Kumar Hebert 2005
78Hierarchical Sharing and Context
E. Sudderth, A. Torralba, W. T. Freeman, and A.
Wilsky.
- Scenes share objects
- Objects share parts
- Parts share features
79Some references on context
With a mixture of generative and discriminative
approaches
- Strat Fischler (PAMI 91)
- Torralba Sinha (ICCV 01),
- Torralba (IJCV 03)
- Fink Perona (NIPS 03)
- Murphy, Torralba Freeman (NIPS 03)
- Kumar and M. Hebert (NIPS 04)
- Carbonetto, Freitas Barnard (ECCV 04)
- He, Zemel Carreira-Perpinan (CVPR 04)
- Sudderth, Torralba, Freeman, Wilsky (ICCV 05)
- Hoiem, Efros, Hebert (ICCV 05)
-
80A car out of context
81Integrated models for scene and object
recognition
Banksy
82Summary
- Many techniques are used for training
discriminative models that I have not mention
here - Conditional random fields
- Kernels for object recognition
- Learning object similarities
-