Title: Matlabs PCA and Learning Method
1Matlabs PCA and Learning Method
- Levenberg-Marquardt Method
2Matlab PCA
- There are several versions of PCA. We talked
about only one. For our example, we will use the
PCA in the statistics section and the PCA in the
NN section.
3Matlab PCA Statistics princomp()
- For princomp() from the statistics section, it is
important to remember what is actually being
done. - princomp(x)
- x is an nXp matrix, thus there are n data points
or vectors of p dimensions each - Before doing the PCA, princomp zeros the means of
all of the columns of x
4Matlab PCA Statistics princomp()
- If you use COEFF,SCORE,latentprincomp(x)
- COEFF is a pXp matrix with each column containing
a principle component. The columns are ordered by
decreasing variance - SCORE is the principle component scores. Rows of
SCORE correspond to observations and columns to
components - Latent is the eigenvalues for the covariance
matrix - They can also be viewed as the variance of the
columns of SCORE
5Matlab PCA Statistics princomp()
- gtgtp1 2 3 4 5 6 1 2 3 4 5 6
- gives
- 1 2 3 4 5 6
- 1 2 3 4 5 6
- unfortunately for princomp we want rows to be
observations, so - gtgtpp
- gtgtcoeff,score,latentprincomp(p)
6Matlab PCA Statistics princomp()
- gtgtcoeff
- 0.7071 -0.7071
- 0.7071 0.7071
- gtgtscore
- -3.5355 0.0000
- -2.1213 -0.0000
- -0.7071 -0.0000
- 0.7071 0.0000
- 2.1213 0.0000
- 3.5355 0.0000
- gtgt latent
- 7.0000
- 0.0000
7Matlab PCA Statistics princomp()
- The questions are, what do COEFF and SCORE
represent, and how do we get from them to p? - Remember
8COEFF
- COEFF is a pXp matrix with each column containing
a principle component. - We have COEFF
- 0.7071 -0.7071
- 0.7071 0.7071
- This means there are two principle components
namely 0.7071 0.7071T -.7071 .7071T
9COEFF
- What the preceding tell us is that if we think of
the points p plotted according to a new set of
axes in a 2D space, all we have to do is multiply
each point by the two principle component vectors
to get each new pair of points.
10COEFF
- gtgtP COEFFp doesnt work
- gtgtPCOEFFp doesnt work
- gtgtPCOEFFp works, why?
- 1.4124 2.8282 4.2426 5.6569 7.0711 8.4853
- 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 - Are these answers correct?
11COEFF
- 1.4124 2.8282 4.2426 5.6569 7.0711 8.4853
- 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 - Are these answers correct?
- They dont seem to be, because the SCORE values
are - -3.5355 -2.1213
- 0.0000 0.0000
- REMEMBER Princomp() finds the principal
components after setting the component value to
have 0 mean.
12COEFF
- Remember the component values were
- Original 0d Mean
- 1 1 -2.5 -2.5
- 2 2 -1.5 -1.5
- 3 3 -0.5 -0.5
- 4 4 0.5 0.5
- 5 5 1.5 1.5
- 6 6 2.5 2.5
- Average for each column 3.5
13COEFF
- 0d Mean, i.e. the real p-value for the data Q
14PCA in Neural Networks
- The principal component analysis we have just
presented would work for a NN. - However
- If there is a wide variation in the ranges if the
elements of the vectors, then one of the
components may overpower the other components. - There are a couple of things one can do in this
case
15Mapminmax()
- Mapminmax(p,ymin,ymax) maps the values of P so
that they are in the range ymin,ymax. If ymin
and ymax are not given, the default is -1,-1. - This should be done over each of the vector
components and over the outputs or targets. - The formula for evaluating an x is
16- x1
- 1 2 4
- 1 1 1
- 3 2 2
- 0 0 0
- gtgt y,psmapminmax(x1,-1,1)
- Warning Use REMOVECONSTANTROWS to remove rows
with constant values. - gtgty
- -1.0000 -0.3333 1.0000
- -1.0000 -1.0000 -1.0000
- 1.0000 -1.0000 -1.0000
- -1.0000 -1.0000 -1.0000
17- gtgtps
- name 'mapminmax'
- xrows 4
- xmax 4x1 double
- xmin 4x1 double
- xrange 4x1 double
- yrows 4
- ymax 1
- ymin -1
- yrange 2
18- Now to reverse the transform
- gtgt xmapminmax('reverse',y,ps)
- x
- 1 2 4
- 1 1 1
- 3 2 2
- 0 0 0
- gtgt
19MAPSTD
- Mapstd in essence normalizes the mean and the
standard deviation of a matrix set of data - Mapstd(x,ymean,ystd)
- X is an nxq matrix
- Ymean is the mean value for each row (default is
0) - Ystd is the standard deviation for each row
(default is 1)
20Here is how to format a matrix so that the
minimum and maximum values of each row are mapped
to default mean and std of 0 and 1. x1
1 2 4 1 1 1 3 2 2 0 0 0 y1,ps
mapstd(x1) Next, we apply the same processing
settings to new values. x2 5 2 3 1 1
1 6 7 3 0 0 0 y2 mapstd('apply',x2,ps)
21 Here we reverse the processing of y1 to get x1
again. x1_again mapstd('reverse',y1,ps)
Algorithm It is assumed that X has
only finite real values, and that the elements of
each row are not all equal. y
(x-xmean)(ystd/xstd) ymean
22At Last!
- In the NN package there is also a PCA capability.
It is called processpca() - PROCESSPCA(X,MAXFRAC) takes X and an optional
parameter, -
- X - NxQ matrix
- MAXFRAC - Maximum fraction of variance for
removed rows. (Default 0), e.g. if 0.02 then
delete if component contributes lt2 of variation
to data set - returns,
- Y - NxQ matrix with N-M rows deleted (optional).
- PS - Process settings, to allow consistent
processing of values.
23At Last!
- We create a matrix with an independent row, a
correlated row, and a completely redundant row,
so that its rows are uncorrelated and the
redundant row is dropped. -
- x1_independant rand(1,5)
- x1_correlated rand(1,5) x_independant
- x1_redundant x_independant x_correlated
- x1 x1_independant x1_correlated
x1_redundant - y1,ps processpca(x1)
24- x1
- 0.8753 0.7043 0.4970 0.6096
0.0778 - 1.1167 0.7301 1.3435 0.6245
0.6282 - 1.9920 1.4344 1.8405 1.2341
0.7061 - gtgt y1,ps processpca(x1)
- y1
- -2.4417 -1.7450 -2.3117 -1.5006
-0.9070 - 0.1388 0.2038 -0.3090 0.1805
-0.2769 - 0.0000 0.0000 0.0000 0.0000 0
25At Last!
- Next, we apply the same processing settings to
new values. -
- x2_independant rand(1,5)
- x2_correlated rand(1,5) x_independant
- x2_redundant x_independant x_correlated
- x2 x2_independant x2_correlated
x2_redundant - y2 processpca('apply',x2,ps)
-
26At Last!
- Algorithm
-
- Values in rows whose elements are not all
the same are set to - y 2(x-minx)/(maxx-minx) - 1
- Values in rows with all the same value are
set to 0.
27Warning
- One mistake sometimes made is to train the
network on normalized and/or PCA modified data,
and then when it comes time to test the network,
forget to likewise modify the testing data. - Also, recognize that the NN preprocessing
function assumes that the data is pxn, i.e. n
columns of vectors whereas, princomp assume nxp,
i.e. n rows of vectors
28Gradient Learning
- Nearly all of the optimization techniques are
based on some form of trying to optimize a cost
function using a measure of its slope (gradient) - In general, they look at the Taylor series
expansion of the cost function and use an
approximation of that function.
29Gradient Learning
- Steepest descent looks at the gradient of the
cost function, i.e. it considers only up to the
first derivative of the Taylor Series expansion. - The LMS algorithm, of which BP is a version, also
uses gradient descent, but does it for the
instantaneous error - If one includes up through the second derivative
of the cost function we have Newtons method - This second derivative is called the Hessian
30Levenberg-Marquardt (LM) Method
- The LM method is a compromise between the
Gauss-Newton method - converges rapidly to a minimum but may diverge
- Requires the computation of the Hessian which is
computational expensive - and gradient descent
- converges slowly if the learning rate is properly
chosen).
31Levenberg-Marquardt (LM) Method
- The Hessian requires a second order derivative
- LM uses the Jacobian2 , which is a first order
derivative - Dont worry about it, just accept that it is
faster and simpler than either the
GN(Gauss-Newton)or SD (steepest descent)