Correlating Constructs: - PowerPoint PPT Presentation

1 / 116
About This Presentation
Title:

Correlating Constructs:

Description:

Conventional formulas for assessing the statistical significance and confidence ... The Baskin Robbins picture of factor score indeterminacy ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 117
Provided by: psyc7
Category:

less

Transcript and Presenter's Notes

Title: Correlating Constructs:


1
Correlating Constructs Why the Theoretical
Bounds on a Correlation Coefficient are MUCH
Larger than You Think
Niels G. Waller
Department of Psychology
University of Minnesota September 29th 2006
2
ABSTRACT Conventional formulas for assessing
the statistical significance and confidence
bounds for correlation coefficients are often
highly misleading in psychological research. The
equations that you learned in Stat 101 were
designed for manifest (observed) variables.
Since at least the time of Cronbach and Meehl
(1955), behavioral scientists have recognized the
importance of latent variables in theory
construction and testing. Statistical bounds on
correlations between manifest variables are
almost always smaller than the associated
theoretical bounds on latent correlations.
Moreover, the bounds on latent correlations do
not get smaller with increasing sample size.
Structural Equation Modeling cannot solve this
problem. In this talk I will describe new
methods for computing the theoretical bounds on
latent correlations and provide examples showing
why these bounds should be routinely computed in
research dealing with latent constructs.
3
Suppose that we have two tests (X, Y) and find
that
What do we know?
4
If (x,y)MVN
Where ? the population correlation (i.e., the
parameter)
5
Fishers r to Z transform
6
Where
7
As N gets big, 1/(N-3) gets small
8
(No Transcript)
9
(No Transcript)
10
Often, we are not really interesting in ?x,y
  • In many research contexts x and y are simply
    convenient stand-ins for their underlying
    constructs (? and ?).

11
In many cases we are not really interesting in
?x,y
  • Suppose that x is an imperfect measure of latent
    construct ?, and
  • y is an imperfect measure of latent construct ?

We do not assume that the Beck Depression
Inventory is an infallible measure of
depression. We do not assume that the WAIS is an
infallible measure of IQ.
12
Further suppose that

Rather than talk about x and y
13
Further suppose that

14
(No Transcript)
15
THE SAD TRUTH
When N 1-gazillion we do not even know if the
correlation is positive!
16
To understand these results we must discuss an
often neglected property of the factor analysis
model. Namely,
For any given factor model, one can construct an
infinite set of factor scores that perfectly fit
the model.
17
In the Psychometrics literature, this issue is
known as the problem of
Factor Score Indeterminacy PLEASE NOTE This
issue is relevant whenever your models include
latent variables (whether or not you perform a
factor analysis).
18
For a given factor model, one can construct an
infinite set of factor scores that perfectly fit
that model.
In the remainder of my talk
  • Briefly discuss the Hx and underlying
    mathematics (with lots of pictures) of fs
    indeterminacy,
  • Present some new results and practical
    implications of fs indeterminacy for the applied
    researcher who wishes to determine the
    correlation between constructs.

19
Our story begins about 100 years ago . . .
Charles Spearman The father of Factor Analysis
20
Charles Spearman
Born September 10, 1863 Died September 17,
1945
2004 Marked the Centennial of the Birth of
Exploratory Factor Analysis
Spearman, C. (1904) General Intelligence,
Objectively Determined and Measured, American
Journal of Psychology, 15, 201-293,
21
Spearmans 1-factor model
For a battery of achievement and aptitude tests,
the scores for a group of individuals can be
explained by their scores on a single, unobserved
common factor (which Spearman called g). (
Note Spearman called this a 2-factor model)
22
Spearman noticed a positive manifold
x1
x2
x3
x4
x1
x2
x3
x4
23
Spearman suggested that
The observed variables correlated because they
shared something in common.
More formally
24
The Basic 1-Factor Model
  • A data set of N observations (people) on n
    variables (tests) can be arranged in a matrix X.
    Then, if the FA model holds,

Observed scores
Factor loadings (weights)
Common Factor scores
Unique Factor scores
25

Observed scores
Factor loadings (weights)
Common Factor scores
Unique Factor scores
The observed scores are presumed to be a weighted
linear function of more fundamental
(scientifically interesting) common factor
scores and residuals.
Looks like a multivariate linear regression model
26
The Big Picture
27
To help us see the FA model (without looking at
numbers) we can look at vectors in space.
28
The pictorial view
From a geometrical view, a vector is a directed
line segment (or arrow) in space.
Variable a
29
An important property of the spatial
representation of vectors (i.e., variables) is
that correlations between variables can be
represented by the cosine of the angle between
their associated vectors.
Var a
Var b
30
Moving between the

Algebraic
Geometric
View of Factor Analysis
31

Suppose an observed score matrix X contains data
for 3 variables Variable 1, Variable 2,
Variable 3.
32
Given the correlation matrix, RX, we can
determine the inter-vector angles and plot the
vectors in space (hard to see in 4 or more
dimensions).
33
Variable 3
Variable 1
General Factor
Variable 2
Spearmans great insight was the following If
the one factor model holds, then scores in X
presumably correlate because they share common
variance (have high correlations or small angles)
with a latent variable called g for general
intelligence factor.
34
The purple vector represent scores on the common
(general) factor
General Factor
35
Each observed variable will have a correlation
(make an angle) with the latent factor.
General Factor
We can assemble these correlations into a
matrix f
36
f contains the correlations between the red
vectors (obs vars) and the purple vector (the
common factor).
This is g in Spearmans model.
37
f
factor weights rx,g

From Spearman 1904
38
Unfortunately,
Can calculate
(correlations, weights)
Cannot Calculate Uniquely
39
Actually,
Cannot Calculate Uniquely
40
This is the problem of Factor Score Indeterminacy
For a given factor model, one can construct an
infinite set of factor scores ( ) that
perfectly fit the model.
Heres a picture that explains why
41
Observed
Estimated
Estimated


X is observed, everything else must be estimated
42
Observed
Estimated
Estimated
lt
In the EFA model there are more unknowns than
equations!
43
Seeing the problem with the help of vectors
44
(vector of common factor scores)
Test space
The observed variables (red vectors) define a
space called the test space. The common
factor, , lies outside of the test space and
must be estimated (the yellow vector represents
the estimated factor scores).
45
The BIG Problem
This much is known
Var a
Test space
Var b
Where do we place the vector of true factor
scores?
We are faced with an infinite number of choices
all of which fit the model perfectly (i.e.,
result in the same matrix of weights, f).
46
One solution for
A second solution for
47
There are an Infinite number of mathematically
acceptable solutions for the factor scores!
48
This point was first noted by the famed Harvard
Mathematician Edwin B Wilson almost 80 years ago
(1928) while reviewing the publication of The
Abilities of Man (by Charles Spearman).
49
Review of The Abilities of Man, their Nature and
Measurement. By C. Spearman
Wilsons review appeared in the prestigious
journal .
50
Hx of fs indeterminacy in 5 seconds
Generalized problem to MFA
51
The Baskin Robbins picture of factor score
indeterminacy
Imagine a sugar cone and a thin chopstick
A gustatorial adaptation of work done by Stan
Mulaik
52
chopstick
53
Place the chopstick upright in the center of the
sugar cone
54
chopstick Estimated factor scores
Any chopstick (vector) on side of cone equals a
mathematically acceptable set of true
factor scores
55
Mathematically, there exists an infinite number
of factors that have the EXACT pattern of
correlations with your observed variables.
56
EFA at 50 (midlife crisis)
Guttman, L. L. (1916 - 1987).
The determinacy of factor score matrices with
implications for five other basic problems of
common-factor theory. British Journal Of
Statistical Psychology, VIII, 65-81, 1955.
57
Guttman began his article by discussing an
important equation that had been known for some
time.
The equation is Easy to compute and should
ALWAYS be reported!
58
Related to half of the width of the ice cream
cone.
59
Guttman then showed
The minimum correlation between two sets of
factor scores
The width of the cone
Squared Correlation between True and Estimated
factor scores.
60
What does this all mean?
61
Table 1. The minimal Correlation (p) always
attainable between Two Alternative Solutions for
the Same Factor (Common or Deviant), as a
Function of the Multiple Correlation (p) of that
Factor on the Observed Scores.
62
Table 1. The minimal Correlation (p) always
attainable between Two Alternative Solutions for
the Same Factor (Common or Deviant), as a
Function of the Multiple Correlation (p) of that
Factor on the Observed Scores.
63
Edge of cone and straw
Your SAT z-score
64
Practical Implications of Factor Score
Indeterminacy
65
Spearman envisioned a world in which factor
scores would be use to predict important real
life variables
Indeed, so many possibilities suggest
themselves that it is difficult to speak freely
without seeming extravagant . . . . It seems even
possible to anticipate the day when there will be
yearly official registration of the intellective
index, as we will call it, of every child
throughout the kingdom . . . . . The present
difficulties of picking out the abler children
for more advanced education, and the mentally
defective children for less advanced, would
vanish in the solution of the more general
problem of adapting education to all . . . .
Citizens, instead of choosing their career at
almost blind hazard, will undertake just the
professions really suited to their capacities.
One can even conceive the establishment of a
minimum index to qualify for parliamentary vote,
and above all for the right to have offspring
Hart and Spearman, 1912, pp. 78-79
66
Steiger pointed out that there were some flies
in the ointment.
Steiger, J. (1979). The Relationship Between
External Variables and Common Factors,
Psychometrika, 44, 93-97.
67
(No Transcript)
68
Steiger showed that for a given data set, there
are an infinite set of values for that lie
between lower and upper bounds.
69
(No Transcript)
70
Steigers formula for the lower and upper bounds
on
Back to pictures . . .
71
Waller Steiger, 2005
New derivation of Steigers formula for the
bounds on the correlation, , between true
factor scores, ?, and an external variable, y.
72
(No Transcript)
73
(No Transcript)
74
Table 1. Lower and Upper Bounds on the
Correlation between a Factor and an External
Variable
75
Correlating Constructs What are the Theoretical
Bounds on a Correlation between two latent
variables?
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
Table 2. Lower and Upper Bounds on the
Correlation between two Factors
85
Practical Implications
Some Examples
86
Table 3. An Interesting Example
87
ML FA of variables 1 - 5
Loadings Factor1 1, 0.7 2, 0.5
3, 0.3 4, 0.3 5, 0.3
Factor1 SS loadings 1.010 Proportion
Var 0.202 Test of the hypothesis that 1 factor
is sufficient. The chi square statistic is 0 on 5
degrees of freedom. The p-value is 1
88
ML FA of variables 6 - 10
Loadings Factor1 1, 0.70 2, 0.35
3, 0.30 4, 0.30 5, 0.30
Factor1 SS loadings 0.883 Proportion
Var 0.177 Test of the hypothesis that 1 factor
is sufficient. The chi square statistic is 0 on 5
degrees of freedom. The p-value is 1
89
Loadings Factor1 Factor2
1, 0.70 0.70 2, 0.50 3,
0.30 4, 0.30 5, 0.30
6, 0.70 0.70
7, 0.35 8, 0.30
9, 0.30 10,
0.30 Factor1 Factor2 SS
loadings 1.499 1.374 Proportion Var
0.150 0.137 Cumulative Var 0.150
0.287 Test of the hypothesis that 2 factors are
sufficient. The chi square statistic is 0 on 26
degrees of freedom. The p-value is 1
ML FA of variables 1 - 10
90
(No Transcript)
91
Can we avoid these problems by avoiding estimated
factor scores
92
NO!
The latent factors in SEM are also indeterminate
Consider the following example
93
The Model
y
1 latent factor
x1
X2
X3
X4
4 observed x variables
1 external variable, y
94
The Model
y
1 latent factor
x1
X2
X3
X4
4 observed x variables
1 external variable, y
95
The Data
96
Factor loadings
f
97
lb
ub
98
y
.5
.5
.5
.5
x1
X2
X3
X4
99
Unidentified Model
y
.5
.5
.5
.5
x1
X2
X3
X4
?
u4
100
Take Home Message
  • Theoretical Constructs imply Latent Variables
  • Correlations among manifest variables are often
    crude approximations of the latent correlations
  • Knowing the correlation between two manifest
    variables tells us very little about the
    correlation between two constructs.

101
Psychology will become a science to the extent
that it takes measurement seriously
Thank you
102
(No Transcript)
103
Parting Thoughts
If the misapplication of factor methods continues
at the present rate, we shall find general
disappointment with the results because they are
usually meaningless as far as psychological
research interpretation is concerned. --L. L.
Thurstone (1937, p. 73)
104
Thank You
105
Define yO as that part of y that can be predicted
from XO (the orthogonal complement of the
estimated factor scores)
106
If y includes measurement error then
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
(No Transcript)
112
The vectors in X will define a space
x1
x2
113
The vector of estimated factor scores is in the
space spanned by X
114
Our ability to predict y from X is a function of
XO
115
XO is the space spanned by X that is orthogonal
to
XO
116
When we set the determinant to 0.00 and solve for
the 1 unknown, we can re-express Steigers
equation as follows
Lower bound
Upper bound
Write a Comment
User Comments (0)
About PowerShow.com