Multivariate Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Multivariate Data Analysis

Description:

Example with 2 variables, 6 objects. Find best (most informative) direction ... Scree plot. What is rank? Mathematical rank = max(min(I,K)) Gives zero residual ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 41
Provided by: Pau1137
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Data Analysis


1
Multivariate Data Analysis
  • Principal Component Analysis

2
Principal Component Analysis (PCA)
  • Singular Value Decomposition
  • Eigenvector / eigenvalue calculation

3
Data Matrix (IxK)
K
  • Reduce variables
  • Improve projections
  • Remove noise
  • Find outliers
  • Find classes

X
I
4
PCA
  • Example with 2 variables, 6 objects
  • Find best (most informative) direction in space
  • Describe direction
  • Make projection

5
x2
x1
6
x2
x1
7
1st PC
8
1st PC
Score
Residual
9
1st PC
Loading p2
Unit vector
Loading p1
10
1st PC
Unit vector
Loading p2 sin (a)
?
Loading p1 cos(a)
11
t
X
K
i
Score vector
I
p
Loading vector
12
k
t
X
K
Score vector
I
p
Loading vector
13
t
X
K
Score vector
I
p
Loading vector
14
X t1p1 t2p2 ... tApA E
XTPE
X properly preprocessed (IxK) T Score matrix
(IxA) P loading matrix (KxA) E residual matrix
(IxK) ta score vector pa loading vector
15
The Wine ExamplePeople magazineWise
Gallagher
16
Wine Beer Spirit LifeEx HeartD
France Italy Switz Austra Brit
U.S.A. Russia Czech Japan Mexico
63.5000 40.1000 2.5000 78.0000
61.1000 58.0000 25.1000 0.9000 78.0000
94.1000 46.0000 65.0000 1.7000 78.0000
106.4000 15.7000 102.1000 1.2000
78.0000 173.0000 12.2000 100.0000 1.5000
77.0000 199.7000 8.9000 87.8000 2.0000
76.0000 176.0000 2.7000 17.1000
3.8000 69.0000 373.6000 1.7000 140.0000
1.0000 73.0000 283.7000 1.0000 55.0000
2.1000 79.0000 34.7000 0.2000 50.4000
0.8000 73.0000 36.4000
17
Beer Wine Spirit LifeEx
HeartD
Mean
20.9900 68.2600 1.7500 75.9000
153.8700 24.9270 38.6718 0.9132
3.2128 110.8182
Standard Deviation
18
Singular value
l146
32
12
8
2
Component
19
Score 2 (32)
Czech
Brit
Austral
Mex
USA
Japan
Switz
Italy
France
Russia
Score 1 (46)
20
Loading 2
Beer
Life exp.
Heart dis.
Wine
Spirit
Loading 1
21
Conclusions
  • Scores positions of objects in multivariate
    space
  • Loadings importance of original variables for
    new directions
  • Try to explain a large enough portion of X (4632
    78)

22
The Apricot Example
Manley Geladi
23
Pseudoabsorbance
Appelkoos
Wavelength, nm
24
Singular value
Scree plot
Component number
25
What is rank?
  • Mathematical rank max(min(I,K))
  • Gives zero residual
  • Effective rank A
  • Separates model from noise

26
ANOVA
SS
SS
SScum
Comp
68.8269 1.2843 0.0463 0.0045
0.0007 0.0003 0.0002 0.0001
0.0000 0.0000
98.10 1.83 0.07 0.01 0.00
0.00 0.00 0.00 0.00 0.00
98.10 99.93 100
1 2 3 4 5 6 7
8 9 10
70.1634
100
Total
27
Score 2 (2)
Score 1 (98)
28
ANOVA
  • SStot SS1 SS2 SS3 ... SS(I or K)

SStot l1 l2 l3 ... l(I or K)
From largest to smallest!
29
ANOVA
  • X TP E
  • data model residual
  • SStot SSmod SSres
  • R2 SSmod / SStot 1 - SSres / SStot
  • Coefficient of determination (often in )

30
Examples
  • Wines R2 SSmod 78 SSres 22 2 Comp.
  • Apricots 1 R2 SSmod 99.93 SSres 0.07
  • 2 Comp.
  • Apricots 2 R2 SSmod 100 SSres 0.0
  • 3 Comp.

31
Absorbance
Outliers removed
Wavelength, nm
32
No outliers
Singular values
l181
16
3
Component
33
Score 3 (3)
Whole fruit
No kernel
Thin slice
Score 2 (16)
34
Loading 2 3
Wavelength, nm
35
Loading 3
Loading 2
36
More nomenclature
  • Score Latent Variable
  • Loading vector Eigenvector
  • Effective rank Pseudorank Model
    dimensionality Number of components
  • SSa Eigenvalue
  • Singular value SSa1/2

37
An analysis sequence
  • 1. Scale, mean-center data
  • 2. Calculate a few components
  • 3. Check scores, loadings
  • 4. Find outliers, groupings, explain
  • 5. Remove outliers

38
An analysis sequence
  • 6. Scale, mean-center data
  • 7. Calculate enough components
  • 8. Try to detemine pseudorank
  • 9. Check score plots
  • 10. Check loading plots
  • 11. Check residuals

39
Wines
Residual stdev
1
2
4
0
3
40
Wines
Residual stdev
4
0
1
2
3
Write a Comment
User Comments (0)
About PowerShow.com