Principal Component Analysis

About This Presentation

Title:

Principal Component Analysis

Description:

Just looking at the standard deviation of heights ... Eigenvalue/Eigenvector Decomposition ... Eigenvalue Decomposition con't. In Matlab: [V,D] = eig(A) ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 25

Provided by: bme3

Category:

more less

Transcript and Presenter's Notes

Title: Principal Component Analysis

1
Principal Component Analysis

An Introduction
by Brandon Merkl

2
What is it good for?

Change of Variable technique
Reduction of Variables
Interpretation Principal Components

3
Description of Data
4
Centroid (µ)

µH average of the heights
µW average of the widths

5
Spread of the data

Just looking at the standard deviation of heights
widths will not tell us how these variables are
related
We need to understand how these variables co-vary

6
Covariance Matrix (S)

Diagonal values are the just the variance of
Height and Width
Off diagonal values measure how Height and Width
co-vary
Always is symmetric (sij sji )

7
Correlation Matrix (R)

Related to S by dividing each element sij by
sqrt(sii sjj)
Each element rij represents the correlation
between variable in row i column j
The diagonal is all 1s

8
Definition Trace

The Trace of a matrix is the sum of its
diagonal elements
The trace of the Covariance Matrix is commonly
referred to as the Total Variance
The trace of the Correlation Matrix is p (number
of vars.)

9
Caveat

You must calculate either S or R to perform PCA
If possible use S if all variables have similar
variance
Otherwise you must use R which is better when
variables are not scaled the same (miles, mm,
etc.)

10
Eigenvalue/Eigenvector Decomposition

Purpose Break matrix A into eigenvalues (?) and
eigenvectors (x) according to
Ax?x
Found by solving the homogenous equation
A-?I 0

11
Eigenvalue Decomposition cont

In Matlab
V,D eig(A)
where V is the eigenvectors and D is the
eigenvalues such that AV VD
or
A VDV

12
How do you use it?

V, D, and µ represent a linear coordinate
transform
The columns of V are directions
The diagonal of D are variances, which scale the
coordinates
µ is the origin

13
Our example V and D
14
Principal Component Scores

Score each data point as the follows
pcScorei(dataj- µ)PCi/ si
Whats nice about using PC scores
mean 0
st.dev. 1
scores have no correlation

15
Our example PC scores
16
Reduction of Variables (ROV)

In this scenario, we want to see how many
dimensions the data take up?

17
Example of ROV
18
Understanding Eigenvalues

?1 gt ?2 ?3 gt 0
?1 ?2 ?3 gt 0
?1 ?2 gt ?3 0
?1 gt ?2 ?3 0

19
For n-Dimensional Data

Sort eigenvalues largest to smallest
Take only first p eigenvalues that are deemed
significant
Use only the PC scores associated with
significant eigenvalues

20
Interpretation of Principal Components

Basically, seeks to explain the particular
direction associated with the columns of V

21
Bivariate Box plots
22
Explanation

The 1st PC is related to scale
The 2nd PC is seen as a deviation from the
typical Width/Height ratio

23
In 3 Dimensions

If the data are points in 3D then the 1st PC is
the principal axis (minimum moment of inertia)
2nd PC is the minor axis
3rd PC points in the direction of the orthogonal
distance regression plane

24
Orthogonal-Distance Regression plane

Write a Comment

User Comments (0)