AdaBoost - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

AdaBoost

Description:

Maximize the margin yH(x) Goal. Final classifier: Minimize. Define. with ... Can we create a good classifier using just a small subset of all possible features? ... – PowerPoint PPT presentation

Number of Views:340
Avg rating:3.0/5.0
Slides: 74
Provided by: 14051
Category:
Tags: adaboost | create

less

Transcript and Presenter's Notes

Title: AdaBoost


1
AdaBoost Its Applications
  • ??????

2
Outline
  • Overview
  • The AdaBoost Algorithm
  • How and why AdaBoost works?
  • AdaBoost for Face Detection

3
AdaBoost Its Applications
  • Overview

4
Introduction
Boosting
AdaBoost
A learning algorithm
Adaptive
Building a strong classifier a lot of weaker ones
5
AdaBoost Concept
. . .
strong classifier
weak classifiers
slightly better than random
6
Weaker Classifiers
  • Each weak classifier learns by considering one
    simple feature
  • T most beneficial features for classification
    should be selected
  • How to
  • define features?
  • select beneficial features?
  • train weak classifiers?
  • manage (weight) training samples?
  • associate weight to each weak classifier?

. . .
strong classifier
weak classifiers
slightly better than random
7
The Strong Classifiers
How good the strong one will be?
. . .
strong classifier
weak classifiers
slightly better than random
8
AdaBoost Its Applications
  • The AdaBoost Algorithm

9
The AdaBoost Algorithm
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

10
The AdaBoost Algorithm
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

Output final classifier
11
Boosting illustration
Weak Classifier 1
12
Boosting illustration
Weights Increased
13
Boosting illustration
Weak Classifier 2
14
Boosting illustration
Weights Increased
15
Boosting illustration
Weak Classifier 3
16
Boosting illustration
Final classifier is a combination of weak
classifiers
17
AdaBoost Its Applications
  • How and why AdaBoost works?

18
The AdaBoost Algorithm
What goal the AdaBoost wants to reach?
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

Output final classifier
19
The AdaBoost Algorithm
What goal the AdaBoost wants to reach?
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

Output final classifier
20
Goal
Final classifier
Minimize exponential loss
21
Goal
Final classifier
Minimize exponential loss
Maximize the margin yH(x)
22
Goal
Minimize
Final classifier
Define
with
Then,
23
Minimize
Final classifier
Define
with
Then,
0
Set
24
Minimize
Final classifier
Define
with
Then,
0
25
Minimize
Final classifier
with
Define
Then,
0
26
Minimize
Final classifier
with
Define
Then,
0
27
Minimize
Final classifier
with
Define
Then,
28
Minimize
Final classifier
with
Define
Then,
29
Minimize
Final classifier
with
Define
Then,
maximized when
30
Minimize
Final classifier
with
Define
Then,
At time t
31
Minimize
Final classifier
with
Define
Then,
At time t
At time 1
At time t1
32
AdaBoost Its Applications
  • AdaBoost for
  • Face Detection

33
The Task ofFace Detection
Many slides adapted from P. Viola
34
Basic Idea
  • Slide a window across image and evaluate a face
    model at every location.

35
Challenges
  • Slide a window across image and evaluate a face
    model at every location.
  • Sliding window detector must evaluate tens of
    thousands of location/scale combinations.
  • Faces are rare 010 per image
  • For computational efficiency, we should try to
    spend as little time as possible on the non-face
    windows
  • A megapixel image has 106 pixels and a
    comparable number of candidate face locations
  • To avoid having a false positive in every image
    image, our false positive rate has to be less
    than 10?6

36
The Viola/Jones Face Detector
  • A seminal approach to real-time object detection
  • Training is slow, but detection is very fast
  • Key ideas
  • Integral images for fast feature evaluation
  • Boosting for feature selection
  • Attentional cascade for fast rejection of
    non-face windows

P. Viola and M. Jones. Rapid object detection
using a boosted cascade of simple features. CVPR
2001.
P. Viola and M. Jones. Robust real-time face
detection. IJCV 57(2), 2004.
37
Image Features
Rectangle filters
38
Image Features
Rectangle filters
39
Size of Feature Space
  • How many number of possible rectangle features
    for a 24x24 detection region?

AB
C
D
Rectangle filters
40
Feature Selection
  • How many number of possible rectangle features
    for a 24x24 detection region?

AB
C
D
What features are good for face detection?
41
Feature Selection
  • How many number of possible rectangle features
    for a 24x24 detection region?

AB
C
  • Can we create a good classifier using just a
    small subset of all possible features?
  • How to select such a subset?

D
42
Integral images
  • The integral image computes a value at each pixel
    (x, y) that is the sum of the pixel values above
    and to the left of (x, y), inclusive.

(x, y)
43
Computing the Integral Image
  • The integral image computes a value at each pixel
    (x, y) that is the sum of the pixel values above
    and to the left of (x, y), inclusive.
  • This can quickly be computed in one pass through
    the image.

(x, y)
44
Computing Sum within a Rectangle
D
B
C
A
Only 3 additions are required for any size of
rectangle!
45
Scaling
  • Integral image enables us to evaluate all
    rectangle sizes in constant time.
  • Therefore, no image scaling is necessary.
  • Scale the rectangular features instead!

2
1
3
4
5
6
46
Boosting
  • Boosting is a classification scheme that works by
    combining weak learners into a more accurate
    ensemble classifier
  • A weak learner need only do better than chance
  • Training consists of multiple boosting rounds
  • During each boosting round, we select a weak
    learner that does well on examples that were hard
    for the previous weak learners
  • Hardness is captured by weights attached to
    training examples

Y. Freund and R. Schapire, A short introduction
to boosting, Journal of Japanese Society for
Artificial Intelligence, 14(5)771-780,
September, 1999.
47
The AdaBoost Algorithm
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

48
The AdaBoost Algorithm
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

Output final classifier
49
Weak Learners for Face Detection
What base learner is proper for face detection?
Given
Initialization
For
  • Find classifier which
    minimizes error wrt Dt ,i.e.,
  • Weight classifier
  • Update distribution

Output final classifier
50
Weak Learners for Face Detection
51
Boosting
  • Training set contains face and nonface examples
  • Initially, with equal weight
  • For each round of boosting
  • Evaluate each rectangle filter on each example
  • Select best threshold for each filter
  • Select best filter/threshold combination
  • Reweight examples
  • Computational complexity of learning O(MNK)
  • M rounds, N examples, K features

52
Features Selected by Boosting
First two features selected by boosting
This feature combination can yield 100 detection
rate and 50 false positive rate
53
ROC Curve for 200-Feature Classifier
A 200-feature classifier can yield 95 detection
rate and a false positive rate of 1 in 14084.
Not good enough!
To be practical for real application, the false
positive rate must be closer to 1 in 1,000,000.
54
Attentional Cascade
  • We start with simple classifiers which reject
    many of the negative sub-windows while detecting
    almost all positive sub-windows
  • Positive response from the first classifier
    triggers the evaluation of a second (more
    complex) classifier, and so on
  • A negative outcome at any point leads to the
    immediate rejection of the sub-window

Classifier 2
Classifier 1
Classifier 3
55
Attentional Cascade
  • Chain classifiers that are progressively more
    complex and have lower false positive rates

Classifier 2
Classifier 1
Classifier 3
56
Detection Rate and False Positive Rate for
Chained Classifiers
  • The detection rate and the false positive rate of
    the cascade are found by multiplying the
    respective rates of the individual stages
  • A detection rate of 0.9 and a false positive rate
    on the order of 10?6 can be achieved by a
    10-stage cascade if each stage has a detection
    rate of 0.99 (0.9910 0.9) and a false positive
    rate of about 0.30 (0.310 6?10?6 )

Classifier 2
Classifier 1
Classifier 3
57
Training the Cascade
  • Set target detection and false positive rates for
    each stage
  • Keep adding features to the current stage until
    its target rates have been met
  • Need to lower AdaBoost threshold to maximize
    detection (as opposed to minimizing total
    classification error)
  • Test on a validation set
  • If the overall false positive rate is not low
    enough, then add another stage
  • Use false positives from current stage as the
    negative training examples for the next stage

58
Training the Cascade
59
ROC Curves Cascaded Classifier to Monlithic
Classifier
60
ROC Curves Cascaded Classifier to Monlithic
Classifier
  • There is little difference between the two in
    terms of accuracy.
  • There is a big difference in terms of speed.
  • The cascaded classifier is nearly 10 times faster
    since its first stage throws out most non-faces
    so that they arenever evaluated by subsequent
    stages.

61
The Implemented System
  • Training Data
  • 5000 faces
  • All frontal, rescaled to 24x24 pixels
  • 300 million non-faces
  • 9500 non-face images
  • Faces are normalized
  • Scale, translation
  • Many variations
  • Across individuals
  • Illumination
  • Pose

62
Structure of the Detector Cascade
  • Combining successively more complex classifiers
    in cascade
  • 38 stages
  • included a total of 6060 features

63
Structure of the Detector Cascade
All Sub-Windows
1
2
3
4
5
6
7
8
38
T
T
T
T
T
T
T
T
T
Face
F
F
F
F
F
F
F
F
F
Reject Sub-Window
64
Speed of the Final Detector
  • On a 700 Mhz Pentium III processor, the face
    detector can process a 384 ?288 pixel image in
    about .067 seconds
  • 15 Hz
  • 15 times faster than previous detector of
    comparable accuracy (Rowley et al., 1998)
  • Average of 8 features evaluated per window on
    test set

65
Image Processing
  • Training ? all example sub-windows were variance
    normalized to minimize the effect of different
    lighting conditions
  • Detection ? variance normalized as well

66
Scanning the Detector
  • Scaling is achieved by scaling the detector
    itself, rather than scaling the image
  • Good detection results for scaling factor of 1.25
  • The detector is scanned across location
  • Subsequent locations are obtained by shifting the
    window s? pixels, where s is the current scale
  • The result for ? 1.0 and ? 1.5 were reported

67
Merging Multiple Detections
68
ROC Curves for Face Detection
69
Output of Face Detector on Test Images
70
Other Detection Tasks
Facial Feature Localization
Profile Detection
Male vs. female
71
Other Detection Tasks
Facial Feature Localization
Profile Detection
72
Other Detection Tasks
Male vs. Female
73
Conclusions
  • How adaboost works?
  • Why adaboost works?
  • Adaboost for face detection
  • Rectangle features
  • Integral images for fast computation
  • Boosting for feature selection
  • Attentional cascade for fast rejection of
    negative windows
Write a Comment
User Comments (0)
About PowerShow.com