Title: A Robust Real Time Face Detection
1A Robust Real Time Face Detection
- AdaBoost Learning Algorithm
- Face Detection in real life
- Using AdaBoost for Face Detection
- Improvements
- Demonstration
- A short Introduction to Boosting (Freund
Schapire, 1999) - Logistic Regression, AdaBoost and Bregman
Distances (Collins, Schapire, Singer, 2002)
- The Horse-Racing Gambler Problem
- Rules of thumb for a set of races
- How should we choose the set of races in order to
get the best rules of thumb? - How should the rules be combined into a single
highly accurate prediction rule? - Boosting !
5AdaBoost - the idea
- AdaBoost agglomerates many weak classifiers into
one strong classifier.
- Initialize sample weights
- For each cycle
- Find a classifier that performs well on the
weighted sample - Increase weights of misclassified examples
- Return a weighted list of classifiers
Shoe size
Shoe size
6AdaBoost - algorithm
7AdaBoost training error
- Freund and Schapire (1997) proved that
- AdaBoost ADApts to the error rates of the
individual weak hypotheses.
8AdaBoost generalization error
- Freund and Schapire (1997) showed that
9AdaBoost generalization error
- The analysis implies that boosting will overfit
if run for too many rounds - However, it was observed empirically that
AdaBoost does not overfit, even when run
thousands of rounds. - Moreover, it was observed that the generalization
error continues to drive down long after training
error reached zero
10AdaBoost generalization error
- An alternative analysis was presented by Schapire
et al. (1998), that suits the empirical findings
11AdaBoost different point of view
- We try to solve the problem of approximating the
ys using a linear combination of weak hypotheses - In other words, we are interested in the problem
of finding a vector of parameters a such that
is a good approximation of yi - For classification problems we try to match the
sign of f(xi) to yi
12AdaBoost different point of view
- Sometimes it is advantageous to minimize some
other (non-negative) loss function instead of the
number of classification errors - For AdaBoost the loss function is
- This point of view was used by Collins, Schapire
and Singer (2002) to demonstrate that AdaBoost
converges to optimality
13Face Detection (not face recognition)
14Face Detection in Monkeys
- There are cells that detect faces
15Face Detection in Human
- There are processes of face detection
16Faces Are Special
- We analyze faces in a different way
17Faces Are Special
- We analyze faces in a different way
18Faces Are Special
We analyze faces in a different way
19Face Recognition in Human
- We analyze faces in a specific location
20Robust Real-Time Face Detection
- Picture analysis, Integral Image
- The system classifies images based on the value
of simple features
Value ? (pixels in white area) - ? (pixels in
black area)
23Contrast Features
- Notice that each feature is related to a special
location in the sub-window - Why features and not pixels?
- Encode domain knowledge
- Feature based system operates faster
- Inspiration from human V1
- Later we will see that there are other features
that can be used to implement an efficient face
detector - The original system of Viola and Jones used only
rectangle features
26Computing Features
- Given a detection resolution of 24x24, and size
of 200x200, the set of rectangle features is
160,000 ! - We need to find a way to rapidly compute the
27Integral Image
- Intermediate representation of the image
- Computed in one pass over the original image
28Integral Image
Using the integral image representation one can
compute the value of any rectangular sum in
constant time. For example the integral sum
inside rectangle D we can compute as ii(4)
ii(1) ii(2) ii(3)
29Integral Image
Integral Image
30Building a Detector
- Cascading, training a cascade
31Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
32Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
33Weak Classifiers
- Weak Classifier A feature which best separates
the examples - Given a sub-window (x), a feature (f), a
threshold (T), and a polarity (p) indicating the
direction of the inequality
34Weak Classifiers
- A weak classifier is a combination of a feature
and a threshold - We have K features
- We have N thresholds where N is the number of
examples - Thus there are KN weak classifiers
35Weak Classifier Selection
- For each feature sort the examples based on
feature value - For each element evaluate the total sum of
positive/negative example weights (T/T-) and the
sum of positive/negative weights below the
current example (S/S-) - The error for a threshold which splits the range
between the current and previous example in the
sorted list is -
36An example
e B A S- S T- T W f y x
2/5 3/5 2/5 0 0 2/5 3/5 1/5 2 -1 X1
1/5 4/5 1/5 1/5 0 2/5 3/5 1/5 3 -1 X2
0 5/5 0 2/5 0 2/5 3/5 1/5 5 1 X3
1/5 4/5 1/5 2/5 1/5 2/5 3/5 1/5 7 1 X4
2/5 3/5 2/5 2/5 2/5 2/5 3/5 1/5 8 1 X5
37Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
38Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
- We start with simple classifiers which reject
many of the negative sub-windows while detecting
almost all positive sub-windows - Positive results from the first classifier
triggers the evaluation of a second (more
complex) classifier, and so on - A negative outcome at any point leads to the
immediate rejection of the sub-window
41Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
42Main Ideas
- The Features will be used as weak classifiers
- We will concatenate several detectors serially
into a cascade - We will boost (using a version of AdaBoost) a
number of features to get good enough detectors
43Training a cascade
- User selects values for
- Maximum acceptable false positive rate per layer
- Minimum acceptable detection rate per layer
- Target overall false positive rate
- User gives a set of positive and negative examples
44Training a cascade (cont.)
- While the overall false positive rate is not met
- While the false positive rate of current layer is
less than the maximum per layer - Train a classifier with n features using AdaBoost
on set of positive and negative examples - Decrease threshold for current classifier
detection rate of the layer is more than the
minimum - Evaluate current cascade classifier on validation
set - Evaluate current cascade detector on a set of non
faces images and put any false detections into
the negative training set
46Training Data Set
- 4916 hand labeled faces
- Aligned to base resolution (24x24)
- Non faces for first layer were collected from
9500 non faces images - Non faces for subsequent layers were obtained by
scanning the partial cascade across non faces and
collecting false positives (max 6000 for each
47Structure of the Detector
- 38 layer cascade
- 6060 features
48Speed of final Detector
- On a 700Mhz Pentium III processor, the face
detector can process a 384 by 288 pixel image in
about .067 seconds
- Learning Object Detection from a Small Number of
Examples the Importance of Good Features (Levy
Weiss, 2004)
- Performance depends crucially on the features
that are used to represent the objects (Levy
Weiss, 2004) - Good Features imply
- Good results from small training databases
- Better generalization abilities
- Shorter (faster) classifiers
51Edge Orientation Histogram
- Invariant to global illumination changes
- Captures geometric properties of faces
- Domain knowledge represented
- Inner part of the face includes more horizontal
edges then vertical - The ration between vertical and horizontal edges
is bounded - The area of the eyes includes mainly horizontal
edges - The chin has more or less the same number of
oblique edges on both sides
52Edge Orientation Histogram
- The EOH can be calculated using some kind of
Integral Image - We find the gradients at the point (x,y) using
Sobel masks - We calculate the orientation of the edge (x,y)
- We divide the edges into K bins
- The result is stored in K matrices
- We use the same idea of Integral Image for the
53EOH Features
- The ratio between two orientations
- The dominance of a given orientation
- Symmetry Features
- Already with only 250 positive examples we can
see above 90 detection rate - Faster classifier
- Better performance in profile faces
55DemoImplementing Viola Jones systemFrank
Fritze, 2004