Matlabs PCA and Learning Method - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Matlabs PCA and Learning Method

Description:

Now to reverse the transform x=mapminmax('reverse',y,ps) x = 1 2 ... Here we reverse the processing of y1 to get x1 again. x1_again = mapstd('reverse',y1,ps) ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 32
Provided by: coo49
Category:

less

Transcript and Presenter's Notes

Title: Matlabs PCA and Learning Method


1
Matlabs PCA and Learning Method
  • Levenberg-Marquardt Method

2
Matlab PCA
  • There are several versions of PCA. We talked
    about only one. For our example, we will use the
    PCA in the statistics section and the PCA in the
    NN section.

3
Matlab PCA Statistics princomp()
  • For princomp() from the statistics section, it is
    important to remember what is actually being
    done.
  • princomp(x)
  • x is an nXp matrix, thus there are n data points
    or vectors of p dimensions each
  • Before doing the PCA, princomp zeros the means of
    all of the columns of x

4
Matlab PCA Statistics princomp()
  • If you use COEFF,SCORE,latentprincomp(x)
  • COEFF is a pXp matrix with each column containing
    a principle component. The columns are ordered by
    decreasing variance
  • SCORE is the principle component scores. Rows of
    SCORE correspond to observations and columns to
    components
  • Latent is the eigenvalues for the covariance
    matrix
  • They can also be viewed as the variance of the
    columns of SCORE

5
Matlab PCA Statistics princomp()
  • gtgtp1 2 3 4 5 6 1 2 3 4 5 6
  • gives
  • 1 2 3 4 5 6
  • 1 2 3 4 5 6
  • unfortunately for princomp we want rows to be
    observations, so
  • gtgtpp
  • gtgtcoeff,score,latentprincomp(p)

6
Matlab PCA Statistics princomp()
  • gtgtcoeff
  • 0.7071 -0.7071
  • 0.7071 0.7071
  • gtgtscore
  • -3.5355 0.0000
  • -2.1213 -0.0000
  • -0.7071 -0.0000
  • 0.7071 0.0000
  • 2.1213 0.0000
  • 3.5355 0.0000
  • gtgt latent
  • 7.0000
  • 0.0000

7
Matlab PCA Statistics princomp()
  • The questions are, what do COEFF and SCORE
    represent, and how do we get from them to p?
  • Remember

8
COEFF
  • COEFF is a pXp matrix with each column containing
    a principle component.
  • We have COEFF
  • 0.7071 -0.7071
  • 0.7071 0.7071
  • This means there are two principle components
    namely 0.7071 0.7071T -.7071 .7071T

9
COEFF
  • What the preceding tell us is that if we think of
    the points p plotted according to a new set of
    axes in a 2D space, all we have to do is multiply
    each point by the two principle component vectors
    to get each new pair of points.

10
COEFF
  • gtgtP COEFFp doesnt work
  • gtgtPCOEFFp doesnt work
  • gtgtPCOEFFp works, why?
  • 1.4124 2.8282 4.2426 5.6569 7.0711 8.4853
  • 0.0000 0.0000 0.0000 0.0000 0.0000
    0.0000
  • Are these answers correct?

11
COEFF
  • 1.4124 2.8282 4.2426 5.6569 7.0711 8.4853
  • 0.0000 0.0000 0.0000 0.0000 0.0000
    0.0000
  • Are these answers correct?
  • They dont seem to be, because the SCORE values
    are
  • -3.5355 -2.1213
  • 0.0000 0.0000
  • REMEMBER Princomp() finds the principal
    components after setting the component value to
    have 0 mean.

12
COEFF
  • Remember the component values were
  • Original 0d Mean
  • 1 1 -2.5 -2.5
  • 2 2 -1.5 -1.5
  • 3 3 -0.5 -0.5
  • 4 4 0.5 0.5
  • 5 5 1.5 1.5
  • 6 6 2.5 2.5
  • Average for each column 3.5

13
COEFF
  • 0d Mean, i.e. the real p-value for the data Q

14
PCA in Neural Networks
  • The principal component analysis we have just
    presented would work for a NN.
  • However
  • If there is a wide variation in the ranges if the
    elements of the vectors, then one of the
    components may overpower the other components.
  • There are a couple of things one can do in this
    case

15
Mapminmax()
  • Mapminmax(p,ymin,ymax) maps the values of P so
    that they are in the range ymin,ymax. If ymin
    and ymax are not given, the default is -1,-1.
  • This should be done over each of the vector
    components and over the outputs or targets.
  • The formula for evaluating an x is

16
  • x1
  • 1 2 4
  • 1 1 1
  • 3 2 2
  • 0 0 0
  • gtgt y,psmapminmax(x1,-1,1)
  • Warning Use REMOVECONSTANTROWS to remove rows
    with constant values.
  • gtgty
  • -1.0000 -0.3333 1.0000
  • -1.0000 -1.0000 -1.0000
  • 1.0000 -1.0000 -1.0000
  • -1.0000 -1.0000 -1.0000

17
  • gtgtps
  • name 'mapminmax'
  • xrows 4
  • xmax 4x1 double
  • xmin 4x1 double
  • xrange 4x1 double
  • yrows 4
  • ymax 1
  • ymin -1
  • yrange 2

18
  • Now to reverse the transform
  • gtgt xmapminmax('reverse',y,ps)
  • x
  • 1 2 4
  • 1 1 1
  • 3 2 2
  • 0 0 0
  • gtgt

19
MAPSTD
  • Mapstd in essence normalizes the mean and the
    standard deviation of a matrix set of data
  • Mapstd(x,ymean,ystd)
  • X is an nxq matrix
  • Ymean is the mean value for each row (default is
    0)
  • Ystd is the standard deviation for each row
    (default is 1)

20
Here is how to format a matrix so that the
minimum and maximum values of each row are mapped
to default mean and std of 0 and 1. x1
1 2 4 1 1 1 3 2 2 0 0 0 y1,ps
mapstd(x1) Next, we apply the same processing
settings to new values. x2 5 2 3 1 1
1 6 7 3 0 0 0 y2 mapstd('apply',x2,ps)

21
Here we reverse the processing of y1 to get x1
again. x1_again mapstd('reverse',y1,ps)
Algorithm It is assumed that X has
only finite real values, and that the elements of
each row are not all equal. y
(x-xmean)(ystd/xstd) ymean
22
At Last!
  • In the NN package there is also a PCA capability.
    It is called processpca()
  • PROCESSPCA(X,MAXFRAC) takes X and an optional
    parameter,
  • X - NxQ matrix
  • MAXFRAC - Maximum fraction of variance for
    removed rows. (Default 0), e.g. if 0.02 then
    delete if component contributes lt2 of variation
    to data set
  • returns,
  • Y - NxQ matrix with N-M rows deleted (optional).
  • PS - Process settings, to allow consistent
    processing of values.

23
At Last!
  • We create a matrix with an independent row, a
    correlated row, and a completely redundant row,
    so that its rows are uncorrelated and the
    redundant row is dropped.
  • x1_independant rand(1,5)
  • x1_correlated rand(1,5) x_independant
  • x1_redundant x_independant x_correlated
  • x1 x1_independant x1_correlated
    x1_redundant
  • y1,ps processpca(x1)

24
  • x1
  • 0.8753 0.7043 0.4970 0.6096
    0.0778
  • 1.1167 0.7301 1.3435 0.6245
    0.6282
  • 1.9920 1.4344 1.8405 1.2341
    0.7061
  • gtgt y1,ps processpca(x1)
  • y1
  • -2.4417 -1.7450 -2.3117 -1.5006
    -0.9070
  • 0.1388 0.2038 -0.3090 0.1805
    -0.2769
  • 0.0000 0.0000 0.0000 0.0000 0

25
At Last!
  • Next, we apply the same processing settings to
    new values.
  • x2_independant rand(1,5)
  • x2_correlated rand(1,5) x_independant
  • x2_redundant x_independant x_correlated
  • x2 x2_independant x2_correlated
    x2_redundant
  • y2 processpca('apply',x2,ps)

26
At Last!
  • Algorithm
  • Values in rows whose elements are not all
    the same are set to
  • y 2(x-minx)/(maxx-minx) - 1
  • Values in rows with all the same value are
    set to 0.

27
Warning
  • One mistake sometimes made is to train the
    network on normalized and/or PCA modified data,
    and then when it comes time to test the network,
    forget to likewise modify the testing data.
  • Also, recognize that the NN preprocessing
    function assumes that the data is pxn, i.e. n
    columns of vectors whereas, princomp assume nxp,
    i.e. n rows of vectors

28
Gradient Learning
  • Nearly all of the optimization techniques are
    based on some form of trying to optimize a cost
    function using a measure of its slope (gradient)
  • In general, they look at the Taylor series
    expansion of the cost function and use an
    approximation of that function.

29
Gradient Learning
  • Steepest descent looks at the gradient of the
    cost function, i.e. it considers only up to the
    first derivative of the Taylor Series expansion.
  • The LMS algorithm, of which BP is a version, also
    uses gradient descent, but does it for the
    instantaneous error
  • If one includes up through the second derivative
    of the cost function we have Newtons method
  • This second derivative is called the Hessian

30
Levenberg-Marquardt (LM) Method
  • The LM method is a compromise between the
    Gauss-Newton method
  • converges rapidly to a minimum but may diverge
  • Requires the computation of the Hessian which is
    computational expensive
  • and gradient descent
  • converges slowly if the learning rate is properly
    chosen).

31
Levenberg-Marquardt (LM) Method
  • The Hessian requires a second order derivative
  • LM uses the Jacobian2 , which is a first order
    derivative
  • Dont worry about it, just accept that it is
    faster and simpler than either the
    GN(Gauss-Newton)or SD (steepest descent)
Write a Comment
User Comments (0)
About PowerShow.com