AdaBoost - PowerPoint PPT Presentation

1 / 73

About This Presentation

Title:

AdaBoost

Description:

Maximize the margin yH(x) Goal. Final classifier: Minimize. Define. with ... Can we create a good classifier using just a small subset of all possible features? ... – PowerPoint PPT presentation

Number of Views:340

Avg rating:3.0/5.0

Slides: 74

Provided by: 14051

Category:

more less

Transcript and Presenter's Notes

Title: AdaBoost

1
AdaBoost Its Applications

??????

2
Outline

Overview
The AdaBoost Algorithm
How and why AdaBoost works?
AdaBoost for Face Detection

3
AdaBoost Its Applications

Overview

4
Introduction
Boosting
AdaBoost
A learning algorithm
Adaptive
Building a strong classifier a lot of weaker ones
5
AdaBoost Concept
. . .
strong classifier
weak classifiers
slightly better than random
6
Weaker Classifiers

Each weak classifier learns by considering one
simple feature
T most beneficial features for classification
should be selected
How to
define features?
select beneficial features?
train weak classifiers?
manage (weight) training samples?
associate weight to each weak classifier?

. . .
strong classifier
weak classifiers
slightly better than random
7
The Strong Classifiers
How good the strong one will be?
. . .
strong classifier
weak classifiers
slightly better than random
8
AdaBoost Its Applications

The AdaBoost Algorithm

9
The AdaBoost Algorithm
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

10
The AdaBoost Algorithm
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

Output final classifier
11
Boosting illustration
Weak Classifier 1
12
Boosting illustration
Weights Increased
13
Boosting illustration
Weak Classifier 2
14
Boosting illustration
Weights Increased
15
Boosting illustration
Weak Classifier 3
16
Boosting illustration
Final classifier is a combination of weak
classifiers
17
AdaBoost Its Applications

How and why AdaBoost works?

18
The AdaBoost Algorithm
What goal the AdaBoost wants to reach?
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

Output final classifier
19
The AdaBoost Algorithm
What goal the AdaBoost wants to reach?
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

Output final classifier
20
Goal
Final classifier
Minimize exponential loss
21
Goal
Final classifier
Minimize exponential loss
Maximize the margin yH(x)
22
Goal
Minimize
Final classifier
Define
with
Then,
23
Minimize
Final classifier
Define
with
Then,
0
Set
24
Minimize
Final classifier
Define
with
Then,
0
25
Minimize
Final classifier
with
Define
Then,
0
26
Minimize
Final classifier
with
Define
Then,
0
27
Minimize
Final classifier
with
Define
Then,
28
Minimize
Final classifier
with
Define
Then,
29
Minimize
Final classifier
with
Define
Then,
maximized when
30
Minimize
Final classifier
with
Define
Then,
At time t
31
Minimize
Final classifier
with
Define
Then,
At time t
At time 1
At time t1
32
AdaBoost Its Applications

AdaBoost for
Face Detection

33
The Task ofFace Detection
Many slides adapted from P. Viola
34
Basic Idea

Slide a window across image and evaluate a face
model at every location.

35
Challenges

Slide a window across image and evaluate a face
model at every location.
Sliding window detector must evaluate tens of
thousands of location/scale combinations.
Faces are rare 010 per image
For computational efficiency, we should try to
spend as little time as possible on the non-face
windows
A megapixel image has 106 pixels and a
comparable number of candidate face locations
To avoid having a false positive in every image
image, our false positive rate has to be less
than 10?6

36
The Viola/Jones Face Detector

A seminal approach to real-time object detection
Training is slow, but detection is very fast
Key ideas
Integral images for fast feature evaluation
Boosting for feature selection
Attentional cascade for fast rejection of
non-face windows

P. Viola and M. Jones. Rapid object detection
using a boosted cascade of simple features. CVPR
2001.
P. Viola and M. Jones. Robust real-time face
detection. IJCV 57(2), 2004.
37
Image Features
Rectangle filters
38
Image Features
Rectangle filters
39
Size of Feature Space

How many number of possible rectangle features
for a 24x24 detection region?

AB
C
D
Rectangle filters
40
Feature Selection

How many number of possible rectangle features
for a 24x24 detection region?

AB
C
D
What features are good for face detection?
41
Feature Selection

How many number of possible rectangle features
for a 24x24 detection region?

AB
C

Can we create a good classifier using just a
small subset of all possible features?
How to select such a subset?

D
42
Integral images

The integral image computes a value at each pixel
(x, y) that is the sum of the pixel values above
and to the left of (x, y), inclusive.

(x, y)
43
Computing the Integral Image

The integral image computes a value at each pixel
(x, y) that is the sum of the pixel values above
and to the left of (x, y), inclusive.
This can quickly be computed in one pass through
the image.

(x, y)
44
Computing Sum within a Rectangle
D
B
C
A
Only 3 additions are required for any size of
rectangle!
45
Scaling

Integral image enables us to evaluate all
rectangle sizes in constant time.
Therefore, no image scaling is necessary.
Scale the rectangular features instead!

2
1
3
4
5
6
46
Boosting

Boosting is a classification scheme that works by
combining weak learners into a more accurate
ensemble classifier
A weak learner need only do better than chance
Training consists of multiple boosting rounds
During each boosting round, we select a weak
learner that does well on examples that were hard
for the previous weak learners
Hardness is captured by weights attached to
training examples

Y. Freund and R. Schapire, A short introduction
to boosting, Journal of Japanese Society for
Artificial Intelligence, 14(5)771-780,
September, 1999.
47
The AdaBoost Algorithm
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

48
The AdaBoost Algorithm
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

Output final classifier
49
Weak Learners for Face Detection
What base learner is proper for face detection?
Given
Initialization
For

Find classifier which
minimizes error wrt Dt ,i.e.,

Weight classifier

Update distribution

Output final classifier
50
Weak Learners for Face Detection
51
Boosting

Training set contains face and nonface examples
Initially, with equal weight
For each round of boosting
Evaluate each rectangle filter on each example
Select best threshold for each filter
Select best filter/threshold combination
Reweight examples
Computational complexity of learning O(MNK)
M rounds, N examples, K features

52
Features Selected by Boosting
First two features selected by boosting
This feature combination can yield 100 detection
rate and 50 false positive rate
53
ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95 detection
rate and a false positive rate of 1 in 14084.
Not good enough!
To be practical for real application, the false
positive rate must be closer to 1 in 1,000,000.
54
Attentional Cascade

We start with simple classifiers which reject
many of the negative sub-windows while detecting
almost all positive sub-windows
Positive response from the first classifier
triggers the evaluation of a second (more
complex) classifier, and so on
A negative outcome at any point leads to the
immediate rejection of the sub-window

Classifier 2
Classifier 1
Classifier 3
55
Attentional Cascade

Chain classifiers that are progressively more
complex and have lower false positive rates

Classifier 2
Classifier 1
Classifier 3
56
Detection Rate and False Positive Rate for
Chained Classifiers

The detection rate and the false positive rate of
the cascade are found by multiplying the
respective rates of the individual stages
A detection rate of 0.9 and a false positive rate
on the order of 10?6 can be achieved by a
10-stage cascade if each stage has a detection
rate of 0.99 (0.9910 0.9) and a false positive
rate of about 0.30 (0.310 6?10?6 )

Classifier 2
Classifier 1
Classifier 3
57
Training the Cascade

Set target detection and false positive rates for
each stage
Keep adding features to the current stage until
its target rates have been met
Need to lower AdaBoost threshold to maximize
detection (as opposed to minimizing total
classification error)
Test on a validation set
If the overall false positive rate is not low
enough, then add another stage
Use false positives from current stage as the
negative training examples for the next stage

58
Training the Cascade
59
ROC Curves Cascaded Classifier to Monlithic
Classifier
60
ROC Curves Cascaded Classifier to Monlithic
Classifier

There is little difference between the two in
terms of accuracy.
There is a big difference in terms of speed.
The cascaded classifier is nearly 10 times faster
since its first stage throws out most non-faces
so that they arenever evaluated by subsequent
stages.

61
The Implemented System

Training Data
5000 faces
All frontal, rescaled to 24x24 pixels
300 million non-faces
9500 non-face images
Faces are normalized
Scale, translation
Many variations
Across individuals
Illumination
Pose

62
Structure of the Detector Cascade

Combining successively more complex classifiers
in cascade
38 stages
included a total of 6060 features

63
Structure of the Detector Cascade
All Sub-Windows
1
2
3
4
5
6
7
8
38
T
T
T
T
T
T
T
T
T
Face
F
F
F
F
F
F
F
F
F
Reject Sub-Window
64
Speed of the Final Detector

On a 700 Mhz Pentium III processor, the face
detector can process a 384 ?288 pixel image in
about .067 seconds
15 Hz
15 times faster than previous detector of
comparable accuracy (Rowley et al., 1998)
Average of 8 features evaluated per window on
test set

65
Image Processing

Training ? all example sub-windows were variance
normalized to minimize the effect of different
lighting conditions
Detection ? variance normalized as well

66
Scanning the Detector

Scaling is achieved by scaling the detector
itself, rather than scaling the image
Good detection results for scaling factor of 1.25
The detector is scanned across location
Subsequent locations are obtained by shifting the
window s? pixels, where s is the current scale
The result for ? 1.0 and ? 1.5 were reported

67
Merging Multiple Detections
68
ROC Curves for Face Detection
69
Output of Face Detector on Test Images
70
Other Detection Tasks
Facial Feature Localization
Profile Detection
Male vs. female
71
Other Detection Tasks
Facial Feature Localization
Profile Detection
72
Other Detection Tasks
Male vs. Female
73
Conclusions