Title: Supervised Learning Methods
1Supervised Learning Methods
- ESI6912
- Optimization in Data Mining
2Fisher Linear Discriminant
- The most famous example of dimensionality
reduction is principal components analysis - This technique searches for directions in the
data that have largest variance and subsequently
project the data onto it. - A lower dimensional representation of the data is
obtained, that removes some of the noisy
directions
3Fisher Linear Discriminant
- How do we utilize the label information in
finding informative projections? - Idea proposed by Fisher is to obtain a large
separation between the projected class means a
small variance within each class
4Fisher Linear Discriminant
Two classes (depicted in red and blue) with the
histograms resulting from projection onto the
line joining the class means. On the left, there
is considerable class overlap in the projected
space. The right plot shows The corresponding
projection based on Fisher linear discriminant.
5Fisher Linear Discriminant
- Maximize
- SB is the between classes scatter matrix
- SW is the within classes scatter matrix
6Fisher Linear Discriminant
- Why does this objective make sense?
- A good solution is one where the class-means are
well separated, measured relative to the (sum of
the) variances of the data assigned to a
particular class. - The gap between the classes is expected to be big.
7Fisher Linear Discriminant
8Fisher Linear Discriminant
- Considering scaling issues the optimization
problem boils down to - How do you solve this optimization problem?
- Use Lagrangian function.
- Satisfy KKT Conditions.
9Fisher Linear Discriminant
10Fisher Linear Discriminant
- Using KKT conditions
- Looks like an Eigenvalue equation
- But is not symmetric
- SB is symmetric positive definite
-
11Fisher Linear Discriminant
- Using the largest eigenvalue we obtain
corresponding vk - And optimal discriminant function is as follows
- Check Pattern Recognition and Machine Learning
by C. M. Bishop for details.