Title: Interpreting Principal Components
1Interpreting Principal Components
- Simon Mason
- International Research Institute for Climate
Prediction - The Earth Institute of Columbia University
L i n k i n g S c i e n c e t o S o c
i e t y
2Retaining Principal Components Principal
components analysis is specifically designed as a
data reduction technique. How many of the new
variables should be retained to represent the
total variability of the original variables
adequately? A stopping rule is required to
identify at which point additional principal
components are no longer required.
L i n k i n g S c i e n c e t o S p o
r t !
3Retaining Principal Components There is a range
of criteria that could be used to formulate a
stopping rule Internal criteria 1. Total
variance explained 2. Marginal variance
explained 3. Comparison with other
deleted/retained eigenvalues External
criteria 4. Usefulness 5. Physical
interpretability.
L i n k i n g S c i e n c e t o S p o
r t !
4Retaining Principal Components
Total variance explained Ensures a minimum
loss of information, but No a priori criteria for
defining the proportion of signal.
L i n k i n g S c i e n c e t o S p o
r t !
5Retaining Principal Components
Marginal variance explained Ensures that each
component explains a substantial proportion of
the total variance. Choice of c?
L i n k i n g S c i e n c e t o S p o
r t !
6Retaining Principal Components
Marginal variance explained 1. Original
variables For the correlation matrix, the
Guttmann - Kaiser criterion sets c 1. For the
covariance matrix, Kaisers rule sets c to the
average of the original variables
L i n k i n g S c i e n c e t o S p o
r t !
7Retaining Principal Components
Marginal variance explained 2. Significant a.
The broken stick rule b. Rule N Randomization
procedures.
L i n k i n g S c i e n c e t o S p o
r t !
8Retaining Principal Components
Similar variance explained Delete if components
with similar variance are deleted. 1. ?2
approximations 2. Scree test Delete eigenvalues
below the elbow.
L i n k i n g S c i e n c e t o S p o
r t !
9Retaining Principal Components
Similar variance explained 3. Log-eigenvalue
test Scree test using logarithms of
eigenvalues. Based on the assumption that the
eigenvalues should decline exponentially.
L i n k i n g S c i e n c e t o S p o
r t !
10Retaining Principal Components
Usefulness If principal components are to be
used in other applications, retain the number
that gives the best results. Use
cross-validation. Perhaps retain subsets that do
not necessarily include the first few
components. Possibly subject to sampling errors,
especially subset selection.
L i n k i n g S c i e n c e t o S p o
r t !
11Retaining Principal Components
Physical interpretability 1. Time scores Do the
time scores differ from white noise? 2. Spatial
loadings Loadings identify modes of variability.
L i n k i n g S c i e n c e t o S p o
r t !
12Interpreting the Principal Components Principal
components are notoriously difficult to interpret
physically. The weights are defined to maximize
the variance, not maximize the interpretability!
With spatial data (including climate data) the
interpretation becomes even more difficult
because there are geometric controls on the
correlations between the data points.
L i n k i n g S c i e n c e t o S p o
r t !
13Buell patterns Imagine a rectangular domain in
which all the points are strongly correlated with
their neighbours.
L i n k i n g S c i e n c e t o S p o
r t !
14Buell patterns The points in the middle of the
domain will have the strongest average
correlations with all other points, simply
because their average distance to all other grids
is a minimum.
The strong correlations between neighbouring
grids will be represented by PC 1, with the
central grids dominating.
L i n k i n g S c i e n c e t o S p o
r t !
15Buell patterns The points in the corners of the
domain will have the weakest average correlations
with all other points, simply because their
average distance to all other grids is a maximum.
The weak correlations between distant grids will
be represented by PC 2. The direction of the
dipole reflects the domain shape.
L i n k i n g S c i e n c e t o S p o
r t !
16Buell patterns? Are these real, or are they a
function of the domain shape?
L i n k i n g S c i e n c e t o S p o
r t !
17- Buell patterns
- Because of domain shape dependency
- the first PC frequently indicates positive
loadings with strongest values in the centre of
the domain - the second PC frequently indicates negative
loadings on one side and positive loadings on the
other side in the direction of the longest
dimension of the domain. - Similar kinds of problems arise when using
- gridded data with converging longitudes, or
simply with longitude spacing different from
latitude spacing - station data.
L i n k i n g S c i e n c e t o S p o
r t !
18Rotation The principal component weights are
defined to maximize the variance, not maximize
the interpretability! The weights could be
redefined to meet alternative criteria. Rotation
is sometimes performed to maximize the weights of
as many metrics as possible, and to minimize the
weights of the others. An objective of rotation
is to attain simple structure 1. weights are
either close to zero or close to one 2.
variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o
r t !
19Rotation The principal component weights are
defined to maximize the variance, not maximize
the interpretability! The weights could be
redefined to meet alternative criteria. Rotation
is sometimes performed to maximize the weights of
as many metrics as possible, and to minimize the
weights of the others. An objective of rotation
is to attain simple structure 1. weights are
either close to zero or close to one 2.
variables have high weights on only one component.
L i n k i n g S c i e n c e t o S p o
r t !
20Rotation
- Commonly used rotation procedures include
- Varimax maximises the variance of the squared
loadings. - Quartimin oblique rotation
- Procrustes maximises the similarity between one
set of loadings and a target set. Can be
orthogonal or oblique.
L i n k i n g S c i e n c e t o S p o
r t !
21Rotation Rotation does NOT solve Buell pattern
problems, nor station and uneven gridded data
problems, it only reduces them. What if a mode
does not have simple structure for example, a
general warming trend? These problems are only
of concern for interpretation. Rotation may be
redundant if the principal components are used as
input into some other procedures.
L i n k i n g S c i e n c e t o S p o
r t !