Linear Discriminant Analysis (LDA) - PowerPoint PPT Presentation

About This Presentation

Title:

Linear Discriminant Analysis (LDA)

Description:

Correlations between and the variances within each group should be similar. Dependent Variable. Must be categorical with 2 or more ... Kolmogorov-Smirnov Test ... – PowerPoint PPT presentation

Number of Views:635

Avg rating:3.0/5.0

Slides: 11

Provided by: satishna

Category:

more less

Transcript and Presenter's Notes

Title: Linear Discriminant Analysis (LDA)

1
Linear Discriminant Analysis (LDA)
2
Goal

To classify observations into 2 or more groups
based on k discriminant functions
(Dependent variable Y is categorical with k
classes.)
Assumptions
Multivariate Normal Distribution
variables are distributed normally within the
classes/groups.
Similar Group Covariances
Correlations between and the variances within
each group should be similar.

3
Dependent Variable

Must be categorical with 2 or more classes
(groups).
If there are only 2 classes, the discriminant
analysis procedure will give the same result as
the multiple regression procedure.

4
Independent Variables

Continuous or categorical independent variables
If categorical, they are converted into binary
(dummy) variables as in multiple linear
regression

5
Output

Example
Assume 3 classes (y1,2,3) of the dependent.

Y x11 x12 x13 x14 f1 f2 f3 Pred. Y
1 20 25 10 12 85 78 58 1
1 18 16 14 12 80 68 65 1
.. .. .. ..
2 15 15 16 17 75 84 70 2
2 14 16 17 18 70 88 67 2
.. .. .. ..
3 8 9 9 11 95 86 105 3
3 10 8 8 10 96 84 100 3
.. .. .. ..
6
Binary Dependent - Regression

If only 2 classes of dependent, can do multiple
regression
Sample data shown below

Status Age (18-30) Age (50) Income
Y X1 X2 X3
0 1 0 30
0 1 0 32
.. .. .. ..
0 0 0 50
0 0 0 28
0 0 0 75
.. .. .. ..
1 0 1 100
1 0 1 90
1 0 1 95
7
Regression Output
SUMMARY OUTPUT

Regression Statistics Regression Statistics
Multiple R 0.833615561
R Square 0.694914903
Adjusted R Square 0.649152139
Standard Error 0.301479577
Observations 24

ANOVA
df SS MS F Significance F
Regression 3 4.140534632 1.380178211 15.18516005 2.19698E-05
Residual 20 1.817798702 0.090889935
Total 23 5.958333333

Coefficients Standard Error t Stat P-value Lower 95 Upper 95
Intercept -0.337942024 0.22002876 -1.535899327 0.14023269 -0.796913973 0.121029925
X1 -0.160950017 0.155728156 -1.033531901 0.313691534 -0.485793257 0.163893223
X2 0.426373823 0.153140052 2.784208421 0.011449703 0.106929273 0.745818373
Income 0.013571735 0.003078379 4.408727859 0.00027065 0.007150349 0.019993121
8
Classification
Status Age (18-30) Age (50) Income
Y X1 X2 X3 Predicted Y Class
0 1 0 30 -0.0917 0
0 1 0 32 -0.0646 0
0 1 0 40 0.0440 0
0 1 0 38 0.0168 0
0 1 0 55 0.2476 0
0 1 0 56 0.2611 0
0 0 0 45 0.2728 0
0 0 0 40 0.2049 0
0 0 0 65 0.5442 1
0 0 0 50 0.3406 0
0 0 0 28 0.0421 0
1 0 0 75 0.6799 1
1 0 0 50 0.3406 0
1 1 0 80 0.5868 1
1 0 0 100 1.0192 1
1 0 0 90 0.8835 1
1 0 0 95 0.9514 1
1 0 1 75 1.1063 1
1 0 1 50 0.7670 1
1 0 1 85 1.2420 1
1 0 1 40 0.6313 1
1 0 1 88 1.2827 1
1 0 0 78 0.7207 1
1 0 1 65 0.9706 1
Classification Rule in this case If Pred. Y gt
0.5 then Class 1 else Class 0. This model
yielded 2 misclassifications out of 24. How good
is R-square?
9
Crosstab of Pred. Y and Y

For large datasets, one can format the Predicted
Y variable and create a crosstab with Y to see
how accurately the model classifies the data
(fictitious results shown here).
The Good and Bad columns represent the number
of actual Y values.

Predicted Y 1000 Predicted Y 1000 Predicted Y 1000 Good Bad
900 to 1000 410 50
850 to 900 390 70
800 to 850 370 90
750 to 800 350 110
700 to 750 330 130
650 to 700 310 150
600 to 650 290 170
550 to 600 270 190
500 to 550 250 210
450 to 500 230 230
400 to 450 210 250
350 to 400 190 270
300 to 350 170 290
250 to 300 150 310
200 to 250 130 330
150 to 200 110 350
100 to 150 90 370
50 to 100 70 390
0 to 50 50 410
4370 4370
10
Kolmogorov-Smirnov Test