Title: Face Recognition: A Literature Survey
1Face RecognitionA Literature Survey
- By
- W. Zhao, R. Chellappa, P.J. Phillips,
- and A. Rosenfeld
- Presented ByShane Brennan
- 5/02/2005
2Early Methods of Recognition
- Early methods treated recognition as a problem of
2D pattern recognition. - Methods included distance-measuring algorithms.
These determined the distances between important
features and compared these distances to the
distances on known faces. - Fairly inaccurate, and performs poorly with
variations in orientation and size. - Does well with variations in intensity.
3More Modern Approaches
- Among appearance-based methods eigenfaces and
fisherfaces have been proved effective in
experiments involving large databases. - Feature-based graph matching approaches have been
successful as well, and are less sensitive to
variations in illumination and viewpoint, as well
as inaccuracy in face localization. - Feature extraction techniques in graph matching
approaches are currently inadequate. - Example cannot detect an eye if the eyelid is
closed.
4Lessons That Have Been Learned
- The upper half of the face aids more in
recognition then the bottom half. Bottom lighting
may actually make it more difficult to recognize
a face. - The nose is not as significant as the eyes, ears,
and mouth in recognition. Although in profile
views, a distinctive nose can help greatly in
recognition. - Low-frequency components play a dominant role,
enough to identify the gender of the face.
Although higher frequency bands are necessary for
recognition.
5Three Aspects of Recognition
- Face detection Locating the faces in an image or
video sequence. - Feature extraction Finding the location of eyes,
nose, mouth, etc. - Face recognition Identifying the face(s) in the
input image or video. - Face detection and feature extraction may be
performed simultaneously.
6Face Detection
- Considered successful if the presence and rough
location of a face is correctly identified. - Two statistics are important True positives
(correct detection), and false positives
(incorrect detection). - Multi-view based methods do much better than
invariant feature methods when head rotation is
large.
7Face Detection, continued
- By treating this problem as a two-class problem
false positives can be reduced while maintaining
a high true positive rate. This can be done by
retraining systems with a high number of
false-positives. (Bootstrapping) - Appearance-based methods have achieved the best
results in face detection, compared to
feature-based and template-matching methods.
8Feature Extraction
- Is the most important part of face recognition.
Even holistic methods need accurate location of
features for normalization. - Some methods use feature restoration, which fills
in occluded parts of the face using symmetry. - Three basic approaches Edge detection methods,
feature-template methods, and structural matching
methods that take into consideration geometric
constraints on features.
9Feature Extraction, continued
- Early methods used template approaches that
focused on individual features. - These methods fail when those important features
are occluded or obscured. - More recent methods use structural matching
methods like Active Shape Modeling (ASM). These
methods are more robust in terms of handling
variations in image intensity and feature shape.
10ASM
- Create a model of the features that you wish to
find. This model is defined by a series of model
points as well as the connection between points. - Overlay the model onto the image. Examine the
region around each model point to find the best
match in the image that fits that point. Move the
model point to that image point and update the
model. - The matching is usually done using image edges.
- Repeat this process for several iterations until
convergence (model points do not move far).
11Examples of ASM Implementations A successful
match (top), and a semi-successful match
(bottom).
12ASM, continued
- Suppose you take k samples on either side of a
model point, this provides a vector of 2k1
sample points. Call this vector gi. - Normalize the sample by dividing by the sum of
absolute element values gi gi / ( ? j
gij ) - Repeat this for each training image to obtain a
set of normalized samples gi for each model
point. - Assume the set of normalized samples are
distributed as a multivariate Gaussian, and find
mean gmean and covariance Sg. - Repeat this process for each model point.
13ASM, continued
- The quality of fit (measure of accuracy) of a new
sample gs to the model is given byf(gs) (gs
gmean)T Sg-1 (gs gmean) - This is the Mahalanobis distance. Minimizing
f(gs) is equivalent to maximizing the
probability that gs comes from the distribution. - This iterative process can be sped up by the use
of multi-resolution (coarse to fine) feature
matching.
14Facial Recognition
- One successful facial recognition system has been
the use of eigenfaces. - This involves projecting an input image into a
lower-dimension facespace and then computing
the distance between the projected input image
and known faces. - More detail on eigenfaces will be provided in my
next presentation.
15Linear Discriminant Analysis
- Face systems using Linear Discriminant Analysis
(LDA) have also been successful. - Training of LDA systems is carried out via
scatter matrix analysis. - So for an M-class problem, the within- and
between-class scatter matrices Sw and Sb are
computed as followsSw ?i1 to M Pr(wi)
CiSb ?i1 to M Pr(wi) (mi m0)(mi m0)T - Where Pr(wi) is the prior class probability, and
is typically assigned the value 1/M.
16Linear Discriminant Analysis, cont.
- Ci is the average scatter matrix (conditional
covariance matrix) and is defined asCi
E(x(w) mi)(x(w) mi)T w wi - Sw shows the average scatter (Ci) of the sample
vectors x of different classes wi around their
respective means (mi). - Sb shows the scatter of the conditional mean
vectors (mi) around the overall mean vector (m0). - A measure for quantifying discriminatory power
is G(T) TTSbT / TTSwT - The projection matrix W which optimizes this
function can be found by solving the eigenvalue
problem SbW SWW?W
17Linear Discriminant Analysis, cont.
- The basic method of this algorithm is that
classification is performed by projecting the
input x into a subspace via a projection/basis
matrix Proj (W from previous slide).
Z Proj x - Comparing the projection coefficient vector, Z of
the input to all pre-stored projection vectors of
known and labeled classes you can identify and
label the input vector. - The vector comparison varies in different
systems. PCA algorithms tend to use either the
angle or the Euclidean distance.
18PDBNN
- A proposed fully automatic face detection and
recognition system based on Probabilistic
Decision-Based Neural Networks has been proposed. - It consists of three modules A face detector,
eye localizer, and face recognizer. - The PDBNN does not use the lower face. This
excludes the influence of facial expressions
(smiling, frowning, etc).
19PDBNN, continued
- Breaks the input into two features at a
resolution of 14x10 pixels. - The features are normalized intensity and edges.
- Each feature is fed into a separate PDBNN and the
final recognition result is the combination of
the output of each PDBNN. - Advantages of this implementation are that it
converges quickly and is easily implemented on
distributed computing platforms.
20A key feature is that each individual to be
recognized has a subnet in the PDBNN devoted to
them.
21EBGM
- The most successful feature-based structural
matching approach has been the use of Elastic
Bunch Graph Matching (EBGM) systems. - Local features represented by wavelet
coefficients for different rotations and scales. - Wavelet bases are referred to as jets.
- Based on Dynamic Link Architecture (DLA).
- DLAs use synaptic plasticity to from sets of
neurons grouped into structured graphs in a
neural network.
22EBGM, continued
- Basic mechanisms are Tij, the connection between
two neurons (i and j), and Jij, a dynamic
variable. - These J-variables are the synaptic weights for
signal transmission among neurons. - The T-parameters act as constraints to the
J-variables. Small changes in T over time from
synaptic plasticity will cause the J-variables to
change as well. - A new image is recognized by transforming the
image into a grid of jets and comparing this grid
to those of known images.
23EBGM, continued
- This basic DLA architecture is extended to EBGM
by attaching a set of jets to each grid node,
instead of just one jet. - Each jet in the set is derived from a different
stored (known) face image. - This EBGM method has been applied toFace
detection and extraction, pose estimation, gender
classification, sketch-image-based recognition,
and general object recognition.
24On the left is an image graph. The graph is
positioned over the input image. At each node,
the local jet around the corresponding image
point is computed and stored. This pattern of
jets is used to represent the pattern clases. A
new image is recognized by transforming it into a
grid of jets and comparing it to known
models. EBGM (represented by the image on the
right) works the same way, but at each node is a
set of jets, each derived from a different face
image. Pose variation is handled by determining
the pose of the face using prior class
information. The jet transformations under
variations in pose are then learned.
25Results and Conclusions
- The Subspace LDA system, EBGM system, and
probabilistic eigenface system are judged to be
the top three methods of face recognition based
on the accuracy of the results. - Each method has different levels of performance
on different subsets of images. - When the number of training samples per class is
large, LDA performs best. When only one or two
samples are available per face class, PCA
(eigenface) is a better choice.
26Other Interesting Results
- It has been demonstrated that the image size can
be very small and recognition methods will still
perform well.For LDA system 12x11 pixelsFor
PDBNN 12x11 pixelsFor human perception 24x18
pixels - It is interesting to note that the algorithms can
recognize faces at a lower resolution than the
human brain can.