Title: Generalized Linear Discriminant Analysis
1Generalized Linear Discriminant Analysis
Hao Wu Center for Automation Research Department
of Electrical and Computer Engineering University
of Maryland, College Park
ENEE698A
2Outline
- Recast LDA as a linear regression problem
- Flexible discriminant analysis
- Penalized discriminant analysis
- Mixture discriminant analysis
- Examples and conclusions
3Linear Discriminant Analysis
- Virtues of LDA
- Simple prototype method for multiple class
classification - Linear decision boundary leads to simple decision
rule, and often produces best classification - Can provide natural low-dimensional views of the
data - Limitations of LDA
- Often linear decision boundaries are not adequate
to separate the classes - A single prototype per class is insufficient
- Sometimes too many correlated predictors lead to
noisy coefficients
4Generalize LDA
- Recast the LDA as a linear regression problem
- Many techniques exists for generalizing linear
regression to more flexible, nonparametric forms
of regression. This in turn leads to more
flexible forms of discriminant analysis, called
FDA - Penalized discriminant analysis
- In the case of too many predictors, we want to
fit the LDA model but penalize its coefficients
to be smooth or otherwise coherent. Also the
expanded basis set of FDA is so large that
regularization is also required. - Both of these can be achieved via suitably
regularized regression in the context of the FDA
model - Mixture discriminant analysis
- To model each class by mixture of two or more
Gaussians with different centroids. This allows
for more complex decision boundaries.
5LDA by optimal scoring
Suppose is a function
that assigns scores to the classes, such that the
transformed class labels are optimally predicted
by linear regression on X. This produces a one
dimensional separation between the classes.
More generally K sets of independent scorings
for the class labels K corresponding linear
maps, chosen to be optimal for multiple
regression in
6LDA by optimal scoring (contd)
- Linear regression for classification
- Fit the linear model
- Computer the prediction
- Decide the class
J classes
P 1 features
7LDA by optimal scoring (contd)
- By optimal scoring
- Some notations
- Matrix of K score vectors for the J classes
- Matrix of K score vectors for the N training
samples - Regression projection matrix
- Then the average square residual turns into
8LDA by optimal scoring (contd)
- With normalization
- Minimize
- Amounts to finding the K largest eigenvectors
of - With normalization
9Summary for LDA by optimal scoring
10An important fact
- A well known fact
- LDA can be performed by sequence of linear
regressions, followed by classification to the
closest centroid in the space of fits. - In optimal scoring method
- The final coefficient matrix B is, up to a
diagonal scale matrix, the same as the
discriminant analysis coefficient matrix. - Classification
- Assign an observation x to the class j that
minimizes
11Flexible Discriminant Analysis
- Generalization
- The real power of above result is in the
generalizations that it invites. We can replace
the linear regression fits by far more flexible,
nonparametric fits to achieve a more flexible
classifier than LDA. - A more general form of regression criterion
- (generalized additive fits, spline functions,
MARS)
12Summary for FDA
- Multivariate nonparametric regression
- Fit a multiresponse, adaptive nonparametric
regression of Y on X, giving fitted values .
Let be the linear operator that fit the
final chosen model, and - be the vector of fitted regression
functions. - Optimal scores
- Computer the eigen-decomposition of
where the eigenvectors
are normalized - Update
- Update the model from step1 using the optimal
scores
13Some Results
14FDA vs Regression
15Penalized Discriminant Analysis
- FDA can also be viewed directly as a form of
regularized discriminant analysis. - Suppose
with a quadratic penalty on the coefficients - Then the steps in FDA can be viewed as a
generalized form of LDA, called PDA
16Some results
17Mixture Discriminant Analysis
- LDA can be derived as the maximum likelihood
method for normal populations with different
means and common covariance matrix. - It is natural to generalize LDA by assuming that
each observed class is in fact a mixture of
unobserved normally distributed subclasses. - Gaussian mixture model
- MLE estimation of parameters EM algorithm
- Assumption the same covariance matrix for every
subclass - Then M step becomes the weighted LDA ? FDA can be
used
18Some results
19Conclusions
- Linear discriminant analysis is equivalent to
multi-response linear regression using optimal
scorings to represent the groups. - Replacing linear regression by any nonparametric
regression method can produce flexible
discrimiant analysis. - (In this way, any multi-response regression
techniques can be post-processed to improve their
classification performance) - PDA is designed for situations in which there are
many highly correlated predictors, such as image. - MDA uses Gaussian mixtures to each class and
gives good performance in non-normal
classification.
20References
Trevor Hastie, Robert Tibshirani and Jerome
Friedman, "Elements of Statistical Learning Data
Mining, Inference and Prediction
Springer-Verlag, New York. Hastie, T. J.,
Tibshirani, R. and Buja, A. "Flexible
Discriminant Analysis by Optimal Scoring." JASA,
December 1994. Hastie, T. and Tibshirani, R.
"Discriminant Analysis by Gaussian
Mixtures."JRSSB (Jan 1996). Hastie, T. J.,
Buja, A., and Tibshirani, R. "Penalized
Discriminant Analysis." Annals of Statistics,
1995. Hastie, T., and Tibshirani, R. and Buja,
A. "Flexible Discriminant and Mixture Models" in
edited proceedings of "Neural Networks and
Statistics" conference, Edinburgh,1995. J. Kay
and D. Titterington, Eds. Oxford University Press
Hastie, T., talk on Flexible Discriminat and
Mixture Models
21Thank you!
Thank Kevin for very helpful discussions!