Title: Correspondence Analysis
1Correspondence Analysis
Correspondence analysis is a descriptive/explorato
ry technique designed to analyse simple two-way
and multi-way tables containing some measure of
correspondence between the rows and columns. The
results provide information which is similar in
nature to those produced by Factor Analysis
techniques, and they allow one to explore the
structure of categorical variables included in
the table. The most common kind of table of this
type is the two-way frequency cross-tabulation
table.
Tuesday, 14 January 2014 344 PM
2Correspondence Analysis
Correspondence analysis (CA) may be defined as a
special case of principal components analysis
(PCA) of the rows and columns of a table,
especially applicable to a cross-tabulation.
However CA and PCA are used under different
circumstances. Principal components analysis is
used for tables consisting of continuous
measurement, whereas correspondence analysis is
applied to contingency tables (i.e.
cross-tabulations). Its primary goal is to
transform a table of numerical information into a
graphical display, in which each row and each
column is depicted as a point.
3Correspondence Analysis
In a typical correspondence analysis, a
cross-tabulation table of frequencies is first
standardised, so that the relative frequencies
across all cells sum to 1.0. One way to state
the goal of a typical analysis is to represent
the entries in the table of relative frequencies
in terms of the distances between individual rows
and/or columns in a low-dimensional space. There
are several parallels in interpretation between
correspondence analysis and factor analysis.
4Correspondence Analysis
Correspondence Analysis Applied to Psychological
Research. L. Doey and J. Kurta Tutorials in
Quantitative Methods for Psychology 2011, Vol.
7(1), p. 5-14. Paper
5Correspondence Analysis
An Introduction to Correspondence Analysis P.M.
Yelland The Mathematica Journal 2010, Vol. 12,
p. 1-23. Paper
5
6Correspondence Analysis
The data summarises individuals political
affiliation (1,,5) and geographic region (1,,4)
.
1 Liberal
2 Tend Lib
3 Moderate
4 Tend Cons
5 Conservative
7Correspondence Analysis
The data summarises individuals political
affiliation (1,,5) and geographic region (1,,4)
.
1 Northeast
2 Midwest
3 South
4 West
8Correspondence Analysis
The data (a) summarises individuals political
affiliation (1,,5) and geographic region (1,,4)
.
725 rows of data
9Correspondence Analysis
Analyze gt Dimension Reduction gt Correspondence
Analysis
10Correspondence Analysis
Select row/column variables. And define the
ranges.
Having defined the ranges. Use the buttons at the
side of the screen to set desired parameters.
11Correspondence Analysis
Define Row Range. Select row bound, Update and
then Continue
There are 4 regions.
12Correspondence Analysis
Define Column Range. Select column bound, Update
and then Continue
There are 5 political affiliations.
13Correspondence Analysis
Finally
Use the buttons at the side of the screen to set
desired parameters.
14Correspondence Analysis
Select Statistics
15Correspondence Analysis
Select Plots
16Correspondence Analysis
Finally use the OK button to run the analysis
17Correspondence Analysis
The Correspondence Table is simply the
cross-tabulation of the row and column variables,
including the row and column marginal totals,
serving as input.
18Correspondence Analysis
The Row Profiles are the cell contents divided by
their corresponding row total (eg. 19/1310.145
for the first cell). This table also shows the
column masses (column marginals as a percent of
n) (eg. 93/7250.128). These are intermediate
calculations on the way toward computing
distances between points. Note the column of 1s.
19Correspondence Analysis
Column Profiles are the cell elements divided by
the column marginals (ex. 19/1030.204). This
table also shows the row masses (row marginals as
a percent of n) (ex. 131/7250.181). These are
intermediate calculations on the way toward
computing distances between points. Note the row
of 1s.
20Correspondence Analysis
In the Summary table, we first look at the
chi-square value and see that it is significant,
justifying the assumption that the two variables
are related.
21Correspondence Analysis
SPSS has computed the interpoint distances and
subjected the distance matrix to principal
components analysis, yielding in this case three
dimensions.
22Correspondence Analysis
Only the interpretable dimensions are reported,
not the full solution, which is why the eigen
values add to something less than 100 (labelled
Inertia these are the percent of variance
explained by each dimension) - in this case only
0.057 5.7. This reflects the fact that the
correlation between region and political outlook,
while significant, is weak.
23Correspondence Analysis
The eigen values (called inertia here) reflect
the relative importance of each dimension, with
the first always being the most important, the
next second most important, etc.
24Correspondence Analysis
The singular values are simply the square roots
of the eigen values. They are interpreted as the
maximum canonical correlation between the
categories of the variables in analysis for any
given dimension.
25Correspondence Analysis
Note that the "Proportion of Inertia" columns are
the dimension eigen values divided by the total
(table) eigen value. That is, they are the
percent of variance each dimension explains of
the variance explained thus the first dimension
explains 62.7 of the 5.7 of the variance
explained by the model.
26Correspondence Analysis
The standard deviation columns refer back to the
singular values and helps the researcher assess
the relative precision of each dimension.
27Correspondence Analysis
The Overview Row Points table, for each row point
in the correspondence table, displays the mass,
scores in dimension, inertia, contribution of the
point to the inertia of the dimension, and
contribution of the dimension to the inertia of
the point.
28Correspondence Analysis
Keyword interpretations Mass the marginal
proportions of the row variable, used to weight
the point profiles when computing point distance.
This weighting has the effect of compensating for
unequal numbers of cases. Scores in dimension
scores used as coordinates for points when
plotting the correspondence map. Each point has a
score on each dimension. Inertia Variance
29Correspondence Analysis
Contribution of points to dimensions as factor
loadings are used in conventional factor analysis
to ascribe meaning to dimensions, so
"contribution of points to dimensions" is used to
intuit the meaning of correspondence dimensions.
Contribution of dimensions to points these are
multiple correlations, which reflect how well the
principal components model is explaining any
given point (category).
30Correspondence Analysis
The Overview Column Points table is similar to
the previous one, except for the column variable
(party rather than region) in the correspondence
table.
31Correspondence Analysis
The Confidence Row Points tables display the
standard deviations of the row scores (the values
used as coordinates to plot the correspondence
map) and are used to assess their precision.
32Correspondence Analysis
The Confidence Column Points tables display the
standard deviations of the column scores (the
values used as coordinates to plot the
correspondence map) and are used to assess their
precision.
33Correspondence Analysis
The plots of transformed categories for
dimensions display a plot of the transformation
of the row category values and of column category
values into scores in dimension, with one plot
per dimension. The x-axis has the category
values and the y-axis has the corresponding
dimension scores. Thus the category "Northeast"
in the Overview Row Points table above had a
score in dimension of -0.702, as shown on the
plot.
34Correspondence Analysis
Refer back to Overview Row Points dimension
1 Why join!
35Correspondence Analysis
Refer back to Overview Row Points dimension 2
36Correspondence Analysis
Refer back to Overview Column Points dimension 1
37Correspondence Analysis
Refer back to Overview Column Points dimension 2
38Correspondence Analysis
The uniplots for the row and column variables.
Note that the origin of the axes is slightly
different in the two plots.
39Correspondence Analysis
Refer back to Overview Row Points dimensions 1
2
40Correspondence Analysis
Refer back to Overview Column Points dimensions
1 2
41Correspondence Analysis
Finally the biplot correspondence map is
obtained. Note the axes now encompass the most
extreme values of both of the uniplots. Note
that while some generalizations can be made about
the association of categories (South more
conservative, West more liberal). The researcher
must keep firmly in mind that correspondence is
not association. That is, the researcher should
not allow the maps display of inter-category
distances to obscure the fact that, for this
example, the model only explains 5.7 of the
variance in the correspondence table.
42Correspondence Analysis
Refer back to Overview Row Points dimensions 1
2 and Overview Column Points dimensions 1 2.
43Correspondence Analysis
Care must be taken when interpreting the previous
plot. It must be remembered that distances
between columns and rows are not defined.
44Correspondence Analysis
Input Of A Collated Data Matrix An SPSS program
that will do this operation is ANACOR, although
since we are using data in table form, this has
to be performed using command syntax.
45Correspondence Analysis
The data (b) editor looks like
It contains the collated data matrix. Note that
we have only the matrix of interest in this view.
46Correspondence Analysis
You must employ the syntax Either via File gt
Open gt Syntax
47Correspondence Analysis
With the prepared commands in an ascii file
ANACOR TABLE ALL (5 , 4) /DIMENSION 2
/NORMALIZATION canonical /VARIANCES
COLUMNS /PLOT NDIM (1 , 2)
Note the command "ALL" since we are providing the
table Note "5" for the number of rows Note
"4" for the number of columns
48Correspondence Analysis
Or via File gt New gt Syntax
49Correspondence Analysis
With the commands input into the Syntax Editor
50Correspondence Analysis
The solution is, of course, unchanged.
51SPSS Tips
Now you should go and try for yourself. Each
week our cluster (5.05) is booked for 2 hours
after this session. This will enable you to come
and go as you please. Obviously other
timetabled sessions for this module take
precedence..