Principal Component Analysis - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Principal Component Analysis

Description:

Just looking at the standard deviation of heights ... Eigenvalue/Eigenvector Decomposition ... Eigenvalue Decomposition con't. In Matlab: [V,D] = eig(A) ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 25
Provided by: bme3
Category:

less

Transcript and Presenter's Notes

Title: Principal Component Analysis


1
Principal Component Analysis
  • An Introduction
  • by Brandon Merkl

2
What is it good for?
  • Change of Variable technique
  • Reduction of Variables
  • Interpretation Principal Components

3
Description of Data
4
Centroid (µ)
  • µH average of the heights
  • µW average of the widths

5
Spread of the data
  • Just looking at the standard deviation of heights
    widths will not tell us how these variables are
    related
  • We need to understand how these variables co-vary

6
Covariance Matrix (S)
  • Diagonal values are the just the variance of
    Height and Width
  • Off diagonal values measure how Height and Width
    co-vary
  • Always is symmetric (sij sji )

7
Correlation Matrix (R)
  • Related to S by dividing each element sij by
    sqrt(sii sjj)
  • Each element rij represents the correlation
    between variable in row i column j
  • The diagonal is all 1s

8
Definition Trace
  • The Trace of a matrix is the sum of its
    diagonal elements
  • The trace of the Covariance Matrix is commonly
    referred to as the Total Variance
  • The trace of the Correlation Matrix is p (number
    of vars.)

9
Caveat
  • You must calculate either S or R to perform PCA
  • If possible use S if all variables have similar
    variance
  • Otherwise you must use R which is better when
    variables are not scaled the same (miles, mm,
    etc.)

10
Eigenvalue/Eigenvector Decomposition
  • Purpose Break matrix A into eigenvalues (?) and
    eigenvectors (x) according to
  • Ax?x
  • Found by solving the homogenous equation
  • A-?I 0

11
Eigenvalue Decomposition cont
  • In Matlab
  • V,D eig(A)
  • where V is the eigenvectors and D is the
    eigenvalues such that AV VD
  • or
  • A VDV

12
How do you use it?
  • V, D, and µ represent a linear coordinate
    transform
  • The columns of V are directions
  • The diagonal of D are variances, which scale the
    coordinates
  • µ is the origin

13
Our example V and D
14
Principal Component Scores
  • Score each data point as the follows
  • pcScorei(dataj- µ)PCi/ si
  • Whats nice about using PC scores
  • mean 0
  • st.dev. 1
  • scores have no correlation

15
Our example PC scores
16
Reduction of Variables (ROV)
  • In this scenario, we want to see how many
    dimensions the data take up?

17
Example of ROV
18
Understanding Eigenvalues
  • ?1 gt ?2 ?3 gt 0
  • ?1 ?2 ?3 gt 0
  • ?1 ?2 gt ?3 0
  • ?1 gt ?2 ?3 0

19
For n-Dimensional Data
  • Sort eigenvalues largest to smallest
  • Take only first p eigenvalues that are deemed
    significant
  • Use only the PC scores associated with
    significant eigenvalues

20
Interpretation of Principal Components
  • Basically, seeks to explain the particular
    direction associated with the columns of V

21
Bivariate Box plots
22
Explanation
  • The 1st PC is related to scale
  • The 2nd PC is seen as a deviation from the
    typical Width/Height ratio

23
In 3 Dimensions
  • If the data are points in 3D then the 1st PC is
    the principal axis (minimum moment of inertia)
  • 2nd PC is the minor axis
  • 3rd PC points in the direction of the orthogonal
    distance regression plane

24
Orthogonal-Distance Regression plane
Write a Comment
User Comments (0)
About PowerShow.com