Title: G89.2247%20Structural%20Equation%20Models
1G89.2247Structural Equation Models
- Overview of course
- Setting time for lab session
- Statistical thinking about causality
- SEM as "Causal Models
- Matrix Algebra Tools
2Goals of Course
- Introduce you to basic concepts and techniques of
structural equation models - Help you develop a critical perspective regarding
what is and is not learned using SEM - Provide skills for your continued self-education
- Provide a context for you to apply SEM methods to
a sophisticated problem of your own choosing
3What are Structural Equation Models?
- Systems of linear equations that describe a
network of relations among variables. - Structural, not simply predictive relations
- Implied systems of nonlinear equations that
describe patterns of variances and covariances
among variables. - Output of software systems such as LISREL, EQS,
AMOS, and MPlus.
4Why are SEM methods useful?
- Hoyles (1994) review tells us that SEM can
address - Questions about causal process
- Basic questions of measurement
- Questions about causal process when variables are
not well measured - SEM methods share most of the strengths of OLS
multiple regression - SEM models can be used to impress your family,
friends and colleagues, if not reviewers and
editors
5An example of a structural equation model
StressgtDistress
- Extreme stress is known to lead to psychological
breakdown (Battle fatigue, PTSD) - Severe stress is believed to cause depression,
anxiety disorder, psychosis
Stress
Distress1
Distress2
6Stress causes distress and psychopathology
- To what extent is this common belief true?
- How much stress is needed to cause distress?
- For a unit change in stress, how much do we
expect distress to increase? - How do we account for the many persons who
experience stress who manage to function without
psychopathology? - Is the purported causal process universal, or
does it operate only in a subset of the
population?
7Causal Inference Issues
- Causal inference is often illusive in social and
behavioral sciences - Prototypes of Causal Effects seem to implicate
primary (single) causes. - billiard balls
- bacteria or viruses
- In reality, effects usually have multiple causes
- For distress
- Stressors
- Personal dispositions
- Familial factors
- Social environment
- Biological environment
8Causal Inference, continued
- Effects of causes are not always constant
- social buffers
- developmental stages
- immune system interventions
- synergistic causal effects
- stochastic variation in causal factor strength
- stochastic measurement factors
9David Hume's framework for Causality
- If E is said to be the effect of C, then
- 1) C and E must have temporal and spatial
contiguity ASSOCIATION - 2) C must precede E temporally DIRECTION
- 3) There must be CONSTANT CONJUNCTION If C,
then E for all situations
10Although still influential, Hume's analysis is
known to have limitations.
- Analysis of any cause C must be isolated from
competing causes (ISOLATION) - Constant conjunction is too restrictive
stochastic processes affect causal relations, and
mechanisms may vary across situations. - Causal relations may be expressed in terms of
expectations over stochastic variation
11Formal causal analyses have led to important
advances
- Robert Koch, the Nobel Prize winning
bacteriologist, investigated bacteria as causes
of disease using three principles - The organism must be found in all cases of the
disease in question. (association) - The organism must be isolated and grown in pure
culture (isolation) - When inoculated with the isolated organism,
susceptible subjects must reproduce the disease
(direction and hedged constant conjunction)
12Causal Process in Time
- In the behavioral, social, and biological
sciences, the units of observation cannot be
trusted to stay the same over time. - For example, in Koch's inoculation test, how do
we know that the subject had not been infected by
chance? - For studies of distress, we expect both stress
and distress to change over time.
13Statisticians developed the randomized experiment
to address causal issues
- Randomly assign subjects to one of two
conditions, Treatment (T) or Control (C), - Administer treatment and control procedures
- Measure outcome variable Y (assumed to reflect
the process of interest) blind to treatment group - Infer effect of treatment from difference in
group means
14Hollands formal analysis of randomized
experiments
- Suppose Y(u) is a measurement on subject u that
reflects the process that is supposed to be
affected by treatment, T. - If subject u is given treatment T, then YT(u) is
observed. - If subject u is given a control treatment, C,
then YC(u) is observed. - We would like to compare YT(u) with YC(u), but
only one of these can be available as u is either
in T or C. - Let the desired comparison be called D YT(u) -
YC(u). - Holland calls this the Effect of cause T
- Although D can not be observed, its average can
be estimated by computing
15Between-subject is substituted for within-subject
information.
- Within subject analyses are intuitively
appealing, but require strong assumptions about
constancy over time. - When D?0, then ASSOCIATION is established.
- Randomization prior to treatment deals with the
causal issue of DIRECTION. - It also partially supports ISOLATION (double
blind trials, manipulation checks help address
other aspects of isolation). - Randomization does not establish CONSTANT
CONJUNCTION. The effect is only established for
the specific experimental conditions used in the
study.
16Key Feature Treatment is applied to subjects
sampled into group T
- Holland argues that this manipulation is critical
to guarantee DIRECTION, and ISOLATION. - Holland and Rubin go on to assert that clear
causal inference is only possible if manipulation
is at least conceivable. They propose the motto,
17NO CAUSATION WITHOUT MANIPULATION
- This motto is not popular with sociologists and
economists. It explicitly denies causal status
to personal attributes, such as race, sex, age,
nationality, and family history. - Instead, it encourages the investigation of
processes such as discrimination, physical
changes corresponding to age, government policy,
and biochemical consequences of genetic makeup.
18NO CAUSATION WITHOUT MANIPULATION
- To illustrate, Holland would not say that my
height causes me to hit my head going into my
suburban cellar, as my height cannot be
manipulated. - My failure to duck, and the dangerous obstruction
could be shown to be causally related to my
bumped head.
19Structural Equation Models
- Researchers of topics such as stress,
discrimination, poverty, coping and so on cannot
easily design randomized experiments - Structural Equation Models (SEM) are often
presented as a major tool for establishing
causes. - The use of SEM is increasing. The number of
articles has been doubling over 8 year periods.
Software is more available and accessible.
20SEM and ISOLATION, ASSOCIATION, and DIRECTION
- Consider a simple SEM model
- Y b1 X e
- For every unit change in X, Y is expected to
change by b1 units. This equation implies clear
association of Y and X, and it makes the assumed
direction underlying the association unambiguous.
For the equation to be meaningful in terms of
causation, we must also assume that alternative
causes of Y are accounted by the independent
stochastic term, e. - Bollen calls the requirement that e be
uncorrelated with X, the pseudo-isolation
condition.
21Analysis of Randomized Experiment through SEM
- Y b0 b1 X e
- Let X take one of two values representing whether
a subject received the treatment (X1) or the
control placebo(X0). b1 estimates D. Because
the assignment is randomized, X is expected to be
uncorrelated with residual causes of Y. - Randomization justifies the pseudo-isolation
condition. - The randomized experiment also reminds us that
between subject comparisons can be informative
about average within subject effects. We can
contemplate what would have happened if a given
subject had been assigned to a different group.
22In non-experimental studies, Isolation is
difficult to establish
- We need to specify EVERY causal factor that is
correlated with X, the causal variable of
interest.
X
e
W2
Y
W3
W4
23Stress example continued
- For example, if Y is distress, and X is exposure
to stress, W2 is some measure of previous
distress, W3 is social class and W4 is coping
skills, then all four exogenous (predictor)
variables might be considered as possible causes
of distress. - In this model, variables W2, W3, and W4 are
introduced as additional causes of Y that are
distinct from, but not necessarily uncorrelated
with, X. If the list of covariates is complete,
then the condition of pseudo-isolation will be
justified. - In practice, we never know if the list of
competing covariates is complete - SEM analyses become credible as they withstand
the alternative explanations advanced by their
critics
24The effects of model misspecification
- Suppose some W2 is missing in the data set, even
though we know it is correlated with both Y and
X. If we know that W is a causal factor for both
X and Y, then we would portray the model as on
the right - If we consider the misspecified model, in which
W2 is missing, we can see that the estimated
effect of X will include the indirect effect of
W2 on Y. The causal impact of X will be
overestimated in the misspecified model.
25Estimation of SEM Models
- We will see that the systems of equations imply
certain patterns of correlations (covariances)
among the variables in the model - Estimates are obtained by fitting the sample
covariance matrix rather than the individual
observations - This matrix is computed over persons
- SEM analyses report how well the covariance
matrix is fit by alternative models.
26Major strengths of SEM are that
- Proposed causal explanations are made explicit
- Tests of fit allow implausible models to be
rejected - Competing models can often be compared, and one
may emerge as more plausible given the data.
27Major problems with SEM are that
- Models are often (usually?) misspecified
- Linearity assumption is often made uncritically
- Measurement error distorts analysis
- Important variables may be missing
- Communicating results is challenging
- Novices may overstate claims or make errors in
complex analyses that are difficult to detect
28Math Review for SEM
- Matrix notation
- Many of the better books use matrix notation
- New results in journals often use matrices
- Expectation and Variance operators
- Helps distinguish population parameters from
sample statistics
29Some Special Matrices
- Data vector
- Data matrix
- Unit vector
- Null vector/matrix
- Identity matrix
- Diagonal matrix
- Symmetric square matrix X'X
30Review
- Matrix rank/order
- Matrix transpose
- Matrix addition/subtraction
- Scalar multiplication of matrices
- Matrix multiplication
- Matrix Determinant
- Matrix inversion