Linear discriminant analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Linear discriminant analysis

Description:

This presentation guide you through Linear Discriminant Analysis, LDA: Overview, Assumptions of LDA and Prepare the data for LDA. For more topics stay tuned with Learnbay. – PowerPoint PPT presentation

Number of Views:86

less

Transcript and Presenter's Notes

Title: Linear discriminant analysis


1
Linear Discriminant Analysis
Swipe
2
Linear Discriminant Analysis
Linear discriminant analysis is supervised
machine learning, the technique used to find a
linear combination of features that separates two
or more classes of objects or events. Linear
discriminant analysis, also known as LDA, does
the separation by computing the directions
(linear discriminants) that represent the axis
that enhances the separation between multiple
classes. Like logistic Regression, LDA to is a
linear classification technique, with the
following additional capabilities in comparison
to logistic regression.
3
Linear Discriminant Analysis
  • LDA can be applied to two or more than two-class
    classification problems.
  • Unlike Logistic Regression, LDA works better when
    classes are well separated.
  • LDA works relatively well in comparison to
    Logistic Regression when we have few examples.
  • LDA is also a dimensionality reduction technique.
    As the name implies dimensionality reduction
    techniques reduce the number of dimensions (i.e.
    variables or dimensions or features) in a dataset
    while retaining as much information as possible.

4
LDA Overview
Linear discriminant analysis (LDA) does
classification by assuming that the data within
each class are normally distributed fk (x) P(X
xG k) N(k,e) We allow each class to have
its own mean µk ? R p , but we assume a common
variance matrix S ? R pp . Thus fk(x) 1 (2p)
p/2S 1/2 exp 1 2 (x - µk) T S -1 (x - µk)
We want to find k so that P(G kX x) ?
fk(x)pk is the largest.
5
LDA Overview
The linear discriminant functions are derived
from the relation log(fk(x)pk) - 1 2 (x - µk)
T S -1 (x - µk) log(pk) C x T S -1µk - 1 2
µ T k S -1µk log(pk) C 0 , and we
denote dk(x) x T S -1µk - 1 2 µ T k S -1µk
log(pk). The decision rule is G(x) argmaxk
dk(x). The Bayes classifier is a linear
classifier.
6
LDA Overview
We need to estimate the parameters based on the
training data xi ? R p and yi ? 1, , K
by pˆk Nk/N µˆk N -1 k P yik xi , the
centroid of class k S ˆ 1 N-K PK k1 P yik (xi
- µˆk)(xi - µˆk) T , the pooled sample variance
matrix The decision boundary between each pair
of classes k and l is given by x dk(x)
dl(x) which is equivalent to (µk - µˆl) T Sˆ
-1x 1 2 (ˆµk ˆµl) T Sˆ -1 (ˆµk - µˆl) -
log(ˆpk/pˆl).
7
Assumptions of LDA
LDA assumes Each feature (variable or dimension
or attribute) in the dataset is a gaussian
distribution. In other words, each feature in
the dataset is shaped like a bell-shaped
curve. Each feature has the same variance, the
value of each feature varies around the mean
with the same amount on average. Each feature
is assumed to be randomly sampled. Lack of
multicollinearity in independent features.
Increase in correlations between independent
features and the power of prediction decreases.
8
Assumptions of LDA
LDA projects features from higher dimension to
lower dimension space, how LDA achieves this,
lets look into Computes mean vectors of each
class of dependent variable Computers with-in
class and between-class scatter
matrices Computes eigenvalues and eigenvector for
SW(Scatter matrix within class) and SB (scatter
matrix between class) Sorts the eigenvalues in
descending order and select the top k Creates a
new matrix containing eigenvectors that map to
the k eigenvalues Obtains the new features (i.e.
linear discriminants) by taking the dot product
of the data and the matrix.
9
Prepare the data for LDA
  • Machine learning model performance is greatly
    dependent upon how well we pre-process data.
    Lets see how to prepare our data before we apply
    LDA
  • Outlier Treatment Equal Variance Gaussian
    distribution

10
Topics for next Post
Decision tree k-nearest neighbor algorithm
Neural Networks Stay Tuned with
Write a Comment
User Comments (0)
About PowerShow.com