Title: Correlating Constructs:
1Correlating Constructs Why the Theoretical
Bounds on a Correlation Coefficient are MUCH
Larger than You Think
Niels G. Waller
Department of Psychology
University of Minnesota September 29th 2006
2ABSTRACT Conventional formulas for assessing
the statistical significance and confidence
bounds for correlation coefficients are often
highly misleading in psychological research. The
equations that you learned in Stat 101 were
designed for manifest (observed) variables.
Since at least the time of Cronbach and Meehl
(1955), behavioral scientists have recognized the
importance of latent variables in theory
construction and testing. Statistical bounds on
correlations between manifest variables are
almost always smaller than the associated
theoretical bounds on latent correlations.
Moreover, the bounds on latent correlations do
not get smaller with increasing sample size.
Structural Equation Modeling cannot solve this
problem. In this talk I will describe new
methods for computing the theoretical bounds on
latent correlations and provide examples showing
why these bounds should be routinely computed in
research dealing with latent constructs.
3Suppose that we have two tests (X, Y) and find
that
What do we know?
4If (x,y)MVN
Where ? the population correlation (i.e., the
parameter)
5Fishers r to Z transform
6Where
7As N gets big, 1/(N-3) gets small
8(No Transcript)
9(No Transcript)
10Often, we are not really interesting in ?x,y
- In many research contexts x and y are simply
convenient stand-ins for their underlying
constructs (? and ?).
11In many cases we are not really interesting in
?x,y
- Suppose that x is an imperfect measure of latent
construct ?, and - y is an imperfect measure of latent construct ?
We do not assume that the Beck Depression
Inventory is an infallible measure of
depression. We do not assume that the WAIS is an
infallible measure of IQ.
12Further suppose that
Rather than talk about x and y
13Further suppose that
14(No Transcript)
15THE SAD TRUTH
When N 1-gazillion we do not even know if the
correlation is positive!
16To understand these results we must discuss an
often neglected property of the factor analysis
model. Namely,
For any given factor model, one can construct an
infinite set of factor scores that perfectly fit
the model.
17In the Psychometrics literature, this issue is
known as the problem of
Factor Score Indeterminacy PLEASE NOTE This
issue is relevant whenever your models include
latent variables (whether or not you perform a
factor analysis).
18For a given factor model, one can construct an
infinite set of factor scores that perfectly fit
that model.
In the remainder of my talk
- Briefly discuss the Hx and underlying
mathematics (with lots of pictures) of fs
indeterminacy, - Present some new results and practical
implications of fs indeterminacy for the applied
researcher who wishes to determine the
correlation between constructs.
19Our story begins about 100 years ago . . .
Charles Spearman The father of Factor Analysis
20Charles Spearman
Born September 10, 1863 Died September 17,
1945
2004 Marked the Centennial of the Birth of
Exploratory Factor Analysis
Spearman, C. (1904) General Intelligence,
Objectively Determined and Measured, American
Journal of Psychology, 15, 201-293,
21Spearmans 1-factor model
For a battery of achievement and aptitude tests,
the scores for a group of individuals can be
explained by their scores on a single, unobserved
common factor (which Spearman called g). (
Note Spearman called this a 2-factor model)
22Spearman noticed a positive manifold
x1
x2
x3
x4
x1
x2
x3
x4
23Spearman suggested that
The observed variables correlated because they
shared something in common.
More formally
24The Basic 1-Factor Model
- A data set of N observations (people) on n
variables (tests) can be arranged in a matrix X.
Then, if the FA model holds, -
Observed scores
Factor loadings (weights)
Common Factor scores
Unique Factor scores
25Observed scores
Factor loadings (weights)
Common Factor scores
Unique Factor scores
The observed scores are presumed to be a weighted
linear function of more fundamental
(scientifically interesting) common factor
scores and residuals.
Looks like a multivariate linear regression model
26The Big Picture
27To help us see the FA model (without looking at
numbers) we can look at vectors in space.
28The pictorial view
From a geometrical view, a vector is a directed
line segment (or arrow) in space.
Variable a
29An important property of the spatial
representation of vectors (i.e., variables) is
that correlations between variables can be
represented by the cosine of the angle between
their associated vectors.
Var a
Var b
30Moving between the
Algebraic
Geometric
View of Factor Analysis
31Suppose an observed score matrix X contains data
for 3 variables Variable 1, Variable 2,
Variable 3.
32Given the correlation matrix, RX, we can
determine the inter-vector angles and plot the
vectors in space (hard to see in 4 or more
dimensions).
33Variable 3
Variable 1
General Factor
Variable 2
Spearmans great insight was the following If
the one factor model holds, then scores in X
presumably correlate because they share common
variance (have high correlations or small angles)
with a latent variable called g for general
intelligence factor.
34The purple vector represent scores on the common
(general) factor
General Factor
35Each observed variable will have a correlation
(make an angle) with the latent factor.
General Factor
We can assemble these correlations into a
matrix f
36f contains the correlations between the red
vectors (obs vars) and the purple vector (the
common factor).
This is g in Spearmans model.
37f
factor weights rx,g
From Spearman 1904
38Unfortunately,
Can calculate
(correlations, weights)
Cannot Calculate Uniquely
39Actually,
Cannot Calculate Uniquely
40This is the problem of Factor Score Indeterminacy
For a given factor model, one can construct an
infinite set of factor scores ( ) that
perfectly fit the model.
Heres a picture that explains why
41Observed
Estimated
Estimated
X is observed, everything else must be estimated
42Observed
Estimated
Estimated
lt
In the EFA model there are more unknowns than
equations!
43Seeing the problem with the help of vectors
44(vector of common factor scores)
Test space
The observed variables (red vectors) define a
space called the test space. The common
factor, , lies outside of the test space and
must be estimated (the yellow vector represents
the estimated factor scores).
45The BIG Problem
This much is known
Var a
Test space
Var b
Where do we place the vector of true factor
scores?
We are faced with an infinite number of choices
all of which fit the model perfectly (i.e.,
result in the same matrix of weights, f).
46One solution for
A second solution for
47There are an Infinite number of mathematically
acceptable solutions for the factor scores!
48This point was first noted by the famed Harvard
Mathematician Edwin B Wilson almost 80 years ago
(1928) while reviewing the publication of The
Abilities of Man (by Charles Spearman).
49Review of The Abilities of Man, their Nature and
Measurement. By C. Spearman
Wilsons review appeared in the prestigious
journal .
50Hx of fs indeterminacy in 5 seconds
Generalized problem to MFA
51The Baskin Robbins picture of factor score
indeterminacy
Imagine a sugar cone and a thin chopstick
A gustatorial adaptation of work done by Stan
Mulaik
52chopstick
53Place the chopstick upright in the center of the
sugar cone
54chopstick Estimated factor scores
Any chopstick (vector) on side of cone equals a
mathematically acceptable set of true
factor scores
55Mathematically, there exists an infinite number
of factors that have the EXACT pattern of
correlations with your observed variables.
56 EFA at 50 (midlife crisis)
Guttman, L. L. (1916 - 1987).
The determinacy of factor score matrices with
implications for five other basic problems of
common-factor theory. British Journal Of
Statistical Psychology, VIII, 65-81, 1955.
57Guttman began his article by discussing an
important equation that had been known for some
time.
The equation is Easy to compute and should
ALWAYS be reported!
58Related to half of the width of the ice cream
cone.
59Guttman then showed
The minimum correlation between two sets of
factor scores
The width of the cone
Squared Correlation between True and Estimated
factor scores.
60What does this all mean?
61Table 1. The minimal Correlation (p) always
attainable between Two Alternative Solutions for
the Same Factor (Common or Deviant), as a
Function of the Multiple Correlation (p) of that
Factor on the Observed Scores.
62Table 1. The minimal Correlation (p) always
attainable between Two Alternative Solutions for
the Same Factor (Common or Deviant), as a
Function of the Multiple Correlation (p) of that
Factor on the Observed Scores.
63Edge of cone and straw
Your SAT z-score
64Practical Implications of Factor Score
Indeterminacy
65Spearman envisioned a world in which factor
scores would be use to predict important real
life variables
Indeed, so many possibilities suggest
themselves that it is difficult to speak freely
without seeming extravagant . . . . It seems even
possible to anticipate the day when there will be
yearly official registration of the intellective
index, as we will call it, of every child
throughout the kingdom . . . . . The present
difficulties of picking out the abler children
for more advanced education, and the mentally
defective children for less advanced, would
vanish in the solution of the more general
problem of adapting education to all . . . .
Citizens, instead of choosing their career at
almost blind hazard, will undertake just the
professions really suited to their capacities.
One can even conceive the establishment of a
minimum index to qualify for parliamentary vote,
and above all for the right to have offspring
Hart and Spearman, 1912, pp. 78-79
66Steiger pointed out that there were some flies
in the ointment.
Steiger, J. (1979). The Relationship Between
External Variables and Common Factors,
Psychometrika, 44, 93-97.
67(No Transcript)
68Steiger showed that for a given data set, there
are an infinite set of values for that lie
between lower and upper bounds.
69(No Transcript)
70Steigers formula for the lower and upper bounds
on
Back to pictures . . .
71Waller Steiger, 2005
New derivation of Steigers formula for the
bounds on the correlation, , between true
factor scores, ?, and an external variable, y.
72(No Transcript)
73(No Transcript)
74Table 1. Lower and Upper Bounds on the
Correlation between a Factor and an External
Variable
75Correlating Constructs What are the Theoretical
Bounds on a Correlation between two latent
variables?
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84Table 2. Lower and Upper Bounds on the
Correlation between two Factors
85Practical Implications
Some Examples
86Table 3. An Interesting Example
87ML FA of variables 1 - 5
Loadings Factor1 1, 0.7 2, 0.5
3, 0.3 4, 0.3 5, 0.3
Factor1 SS loadings 1.010 Proportion
Var 0.202 Test of the hypothesis that 1 factor
is sufficient. The chi square statistic is 0 on 5
degrees of freedom. The p-value is 1
88ML FA of variables 6 - 10
Loadings Factor1 1, 0.70 2, 0.35
3, 0.30 4, 0.30 5, 0.30
Factor1 SS loadings 0.883 Proportion
Var 0.177 Test of the hypothesis that 1 factor
is sufficient. The chi square statistic is 0 on 5
degrees of freedom. The p-value is 1
89 Loadings Factor1 Factor2
1, 0.70 0.70 2, 0.50 3,
0.30 4, 0.30 5, 0.30
6, 0.70 0.70
7, 0.35 8, 0.30
9, 0.30 10,
0.30 Factor1 Factor2 SS
loadings 1.499 1.374 Proportion Var
0.150 0.137 Cumulative Var 0.150
0.287 Test of the hypothesis that 2 factors are
sufficient. The chi square statistic is 0 on 26
degrees of freedom. The p-value is 1
ML FA of variables 1 - 10
90(No Transcript)
91Can we avoid these problems by avoiding estimated
factor scores
92NO!
The latent factors in SEM are also indeterminate
Consider the following example
93The Model
y
1 latent factor
x1
X2
X3
X4
4 observed x variables
1 external variable, y
94The Model
y
1 latent factor
x1
X2
X3
X4
4 observed x variables
1 external variable, y
95The Data
96Factor loadings
f
97lb
ub
98y
.5
.5
.5
.5
x1
X2
X3
X4
99Unidentified Model
y
.5
.5
.5
.5
x1
X2
X3
X4
?
u4
100Take Home Message
- Theoretical Constructs imply Latent Variables
- Correlations among manifest variables are often
crude approximations of the latent correlations - Knowing the correlation between two manifest
variables tells us very little about the
correlation between two constructs.
101Psychology will become a science to the extent
that it takes measurement seriously
Thank you
102(No Transcript)
103Parting Thoughts
If the misapplication of factor methods continues
at the present rate, we shall find general
disappointment with the results because they are
usually meaningless as far as psychological
research interpretation is concerned. --L. L.
Thurstone (1937, p. 73)
104Thank You
105Define yO as that part of y that can be predicted
from XO (the orthogonal complement of the
estimated factor scores)
106If y includes measurement error then
107(No Transcript)
108(No Transcript)
109(No Transcript)
110(No Transcript)
111(No Transcript)
112The vectors in X will define a space
x1
x2
113The vector of estimated factor scores is in the
space spanned by X
114Our ability to predict y from X is a function of
XO
115XO is the space spanned by X that is orthogonal
to
XO
116When we set the determinant to 0.00 and solve for
the 1 unknown, we can re-express Steigers
equation as follows
Lower bound
Upper bound