Principal Component Analysis PCA - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Principal Component Analysis PCA

Description:

Scree Test: Plotting the eigenvalues against the corresponding. PC produces a scree plot that illustrates the rate of change in the ... – PowerPoint PPT presentation

Number of Views:562
Avg rating:3.0/5.0
Slides: 32
Provided by: comp88
Category:

less

Transcript and Presenter's Notes

Title: Principal Component Analysis PCA


1
Principal Component Analysis(PCA)
Presented by Aycan YALÇIN
2003700369
2
Outline of the Presentation
  • Introduction
  • Objectives of PCA
  • Terminology
  • Algorithm
  • Applications
  • Conclusion

3
  • Introduction

4
Introduction
  • Problem
  • Analysis of multivariate data plays a key role in
    data analysis
  • Multidimensional hyperspace is often difficult to
    visualize

Represent data in a manner that facilitates the
analysis
5
Introduction (contd)
  • Objectives of unsupervised learning methods
  • Reduce dimensionality
  • Score all observations
  • Cluster similar observations together
  • Well-known linear transformation methods
  • PCA, Factor Analysis, Projection Pursuit,etc.

6
Introduction (contd)
  • Benefits of dimensionality reduction
  • The computational overhead of the subsequent
    processing stages is reduced
  • Noise may be reduced
  • A projection into a subspace of a very low
    dimension is useful for visualizing the data

7
Objectives of PCA
8
Objectives of PCA
  • Principal Component Analysis is a technique used
    to
  • Reduce the dimensionality of the data set
  • Identify new meaningful underlying variables
  • Loose minimum information
  • by finding the directions in which a cloud of
    data
  • points is stretched most.

9
Objectives of PCA (contd)
  • PCA or Karhunen- Loeve transform summarizes the
  • variation in a (possibly) correlated
    multi-attribute to a set of (a smaller number
    of) uncorrelated components (principal
    components).
  • These uncorrelated variables are linear
    combinations of original variables.
  • the objective of PCA is to reduce the
    dimensionality by extracting the smallest number
    components that account
  • for most of the variation in the original
    multivariate data and to summarize the data with
    little loss of information.

10
Terminology
11
Terminology
  • Variance
  • Covariance
  • Eigenvectors Eigenvalues
  • Principal Components

12
Terminology (Variance)
  • Standard deviation
  • Average distance from mean to a point
  • Variance
  • Standard deviation squared
  • One-dimensional measure

13
Terminology (Covariance)
  • How two dimensions vary from the mean with
    respect to each other
  • cov(X,Y) gt 0 Dimensions increase together
  • cov(X,Y) lt 0 One increases, one decreases
  • cov(X,Y) 0 Dimensions are independent

14
Terminology (Covariance Matrix)
  • Contains covariance values between all possible
    dimensions
  • Example for three dimensions (x,y,z) (Always
    symetric)

cov(x,x) ? variance of component x
15
Terminology (Eigenvalues Eigenvectors)
  • Eigenvalues measure the amount of the variation
    explained by each PC (largest for the first PC
    and smaller for the subsequent PCs)
  • gt 1 indicates that PCs account for more variance
    than accounted by one of the original variables
    in standardized data
  • This is commonly used as a cutoff point for
    which PCs are retained.
  • Eigenvectors provides the weights to compute the
    uncorrelated PC. These vectors give the
    directions in which the data cloud is stretched
    most

16
Terminology (Eigenvalues Eigenvectors)
  • Vectors x having same direction as Ax are called
    eigenvectors of A (A is an n by n matrix).
  • In the equation Ax?x, ? is called an eigenvalue
    of A.
  • Ax?x ? (A-?I)x0
  • How to calculate x and ?
  • Calculate det(A-?I), yields a polynomial (degree
    n)
  • Determine roots to det(A-?I)0, roots are
    eigenvalues ?
  • Solve (A- ?I) x0 for each ? to obtain
    eigenvectors x

17
Terminology (Principal Component)
  • The extracted uncorrelated components are called
    principal components(PC)
  • Estimated from the eigenvectors of the covariance
    or correlation matrix of the original variables.
  • The projections of the data on the eigenvectors
  • Extracted by linear transformations of the
    original variables so that the first few PCs
    contain most of the variations in the original
    dataset.

18
Algorithm
19
Algorithm
We look for axes which minimise projection errors
and maximise the variance after projection
Ex
transform from 2 to 1 dimension
20
Algorithm (contd)
  • Preserve as much of the variance as possible

21
Algorithm (contd)
  • Data is a matrix such as
  • Rows ? Observations(values)
  • Columns ? Attributes (dimensions)
  • First center data by subtracting the mean in each
    dimension
  • i is observation, j is dimension and m is
    total number of observation
  • Calculate covariance matrix for DataAdjust

22
Algorithm (contd)
  • Calculate eigenvalues ? and eigenvectors x for
    covariance matrix
  • Eigenvalues ?j are used for calculation of of
    total variance (Vj) for each component j

23
Algorithm (contd)
  • Choose components form feature vector
  • Eigenvalues ? and eigenvectors x are sorted in
    descending order
  • Component with highest ? is principal component
  • Featurevector(x1, ... , xn) where xi is a column
    oriented eigenvector. Contains chosen components.
  • Derive new dataset
  • Transpose Featurevector and DataAdjust
  • FinaldataRowFeatureVector x RowDataAdjust
  • Original data in terms of chosen components
  • Finaldata has eigenvectors as coordinate axes

24
Algorithm (contd)
  • Retrieving old data (e.g. in data compression)
  • RetrievedRowData
  • (RowFeatureVectorT x
    FinalData)OriginalMean
  • Yields original data using the chosen components

25
Algorithm (contd)
  • Estimating the Number of PC
  • Scree Test Plotting the eigenvalues against the
    corresponding
  • PC produces a scree plot that illustrates the
    rate of change in the
  • magnitude of the eigenvalues for the PC. The
    rate of decline
  • tends to be fast first then levels off. The
    elbow, or the point at
  • which the curve bends, is considered to indicate
    the maximum
  • number of PC to extract. One less PC than the
    number at the
  • elbow might be appropriate if you are concerned
    about
  • getting an overly defined solution.

26
Applications
27
Applications
  • Example applications
  • Computer Vision
  • Representation
  • Pattern Identification
  • Image compression
  • Face recognition
  • Gene expression analysis
  • Purpose Determine core set of conditions for
    useful gene comparison
  • Handwritten character recognition
  • Data Compression, etc.

28
Conclusion
29
Conclusion
  • PCA can be useful when there is a severe
    high-degree of correlation present in the
    multi-attributes
  • When a data set consists of several clusters, the
    principal axes found by PCA usually pick
    projections with good separation. PCA provides
    an effective basis for feature extraction in this
    case.
  • For good data compression, PCA offers a useful
    self-organized learning procedure

30
Conclusion (contd)
  • Shortcomings of PCA
  • PCA requires to diagonalise matrix C (dimensionn
    x n). Heavy if n is large !
  • PCA only finds linear sub-spaces
  • It works best if the individual components are
    Gaussian-distributed(e.g ICA does not rely on
    such a distribution)
  • PCA does not say how many target dimensions to use

31
Questions?
Write a Comment
User Comments (0)
About PowerShow.com