Multivariate Data Analysis

About This Presentation

Title:

Multivariate Data Analysis

Description:

Example with 2 variables, 6 objects. Find best (most informative) direction ... Scree plot. What is rank? Mathematical rank = max(min(I,K)) Gives zero residual ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 41

Provided by: Pau1137

Category:

more less

Transcript and Presenter's Notes

Title: Multivariate Data Analysis

1
Multivariate Data Analysis

Principal Component Analysis

2
Principal Component Analysis (PCA)

Singular Value Decomposition
Eigenvector / eigenvalue calculation

3
Data Matrix (IxK)
K

Reduce variables
Improve projections
Remove noise
Find outliers
Find classes

X
I
4
PCA

Example with 2 variables, 6 objects
Find best (most informative) direction in space
Describe direction
Make projection

5
x2
x1
6
x2
x1
7
1st PC
8
1st PC
Score
Residual
9
1st PC
Loading p2
Unit vector
Loading p1
10
1st PC
Unit vector
Loading p2 sin (a)
?
Loading p1 cos(a)
11
t
X
K
i
Score vector
I
p
Loading vector
12
k
t
X
K
Score vector
I
p
Loading vector
13
t
X
K
Score vector
I
p
Loading vector
14
X t1p1 t2p2 ... tApA E
XTPE
X properly preprocessed (IxK) T Score matrix
(IxA) P loading matrix (KxA) E residual matrix
(IxK) ta score vector pa loading vector
15
The Wine ExamplePeople magazineWise
Gallagher
16
Wine Beer Spirit LifeEx HeartD
France Italy Switz Austra Brit
U.S.A. Russia Czech Japan Mexico
63.5000 40.1000 2.5000 78.0000
61.1000 58.0000 25.1000 0.9000 78.0000
94.1000 46.0000 65.0000 1.7000 78.0000
106.4000 15.7000 102.1000 1.2000
78.0000 173.0000 12.2000 100.0000 1.5000
77.0000 199.7000 8.9000 87.8000 2.0000
76.0000 176.0000 2.7000 17.1000
3.8000 69.0000 373.6000 1.7000 140.0000
1.0000 73.0000 283.7000 1.0000 55.0000
2.1000 79.0000 34.7000 0.2000 50.4000
0.8000 73.0000 36.4000
17
Beer Wine Spirit LifeEx
HeartD
Mean
20.9900 68.2600 1.7500 75.9000
153.8700 24.9270 38.6718 0.9132
3.2128 110.8182
Standard Deviation
18
Singular value
l146
32
12
8
2
Component
19
Score 2 (32)
Czech
Brit
Austral
Mex
USA
Japan
Switz
Italy
France
Russia
Score 1 (46)
20
Loading 2
Beer
Life exp.
Heart dis.
Wine
Spirit
Loading 1
21
Conclusions

Scores positions of objects in multivariate
space
Loadings importance of original variables for
new directions
Try to explain a large enough portion of X (4632
78)

22
The Apricot Example
Manley Geladi
23
Pseudoabsorbance
Appelkoos
Wavelength, nm
24
Singular value
Scree plot
Component number
25
What is rank?

Mathematical rank max(min(I,K))
Gives zero residual
Effective rank A
Separates model from noise

26
ANOVA
SS
SS
SScum
Comp
68.8269 1.2843 0.0463 0.0045
0.0007 0.0003 0.0002 0.0001
0.0000 0.0000
98.10 1.83 0.07 0.01 0.00
0.00 0.00 0.00 0.00 0.00
98.10 99.93 100
1 2 3 4 5 6 7
8 9 10
70.1634
100
Total
27
Score 2 (2)
Score 1 (98)
28
ANOVA

SStot SS1 SS2 SS3 ... SS(I or K)

SStot l1 l2 l3 ... l(I or K)
From largest to smallest!
29
ANOVA

X TP E
data model residual
SStot SSmod SSres
R2 SSmod / SStot 1 - SSres / SStot
Coefficient of determination (often in )

30
Examples

Wines R2 SSmod 78 SSres 22 2 Comp.
Apricots 1 R2 SSmod 99.93 SSres 0.07
2 Comp.
Apricots 2 R2 SSmod 100 SSres 0.0
3 Comp.

31
Absorbance
Outliers removed
Wavelength, nm
32
No outliers
Singular values
l181
16
3
Component
33
Score 3 (3)
Whole fruit
No kernel
Thin slice
Score 2 (16)
34
Loading 2 3
Wavelength, nm
35
Loading 3
Loading 2
36
More nomenclature