Title: Instrumental Variables Estimation (with Examples from Criminology)
1Instrumental Variables Estimation (with Examples
from Criminology)
- Robert Apel, Ph.D.
- School of Criminal Justice
- University at Albany
Center for Social and Demographic
Analysis University at Albany May 5 7, 2009
2Vital Statistics
- Ph.D., Criminology and Criminal Justice, 2004
- University of Maryland
- Coursework in Department of Economics
- Dissertation used instrumental variables
- State child labor laws as instrumental variables
for the causal effect of youth employment on
antisocial behavior
3Topics That Will Be Covered in this Workshop
- Why use IV?
- Discussion of endogeneity bias
- Statistical motivation for IV
- What is an IV?
- Identification issues
- Statistical properties of IV estimators
- How is an IV model estimated?
- Software and data examples
- Diagnostics IV relevance, IV exogeneity, Hausman
4Review of the Linear Model
- Population model Y a ßX e
- Assume that the true slope is positive, so ß gt 0
- Sample model Y a bX e
- Least squares (LS) estimator of ß
- bLS (X'X)1X'Y Cov(X,Y) / Var(X)
- Under what conditions can we speak of bLS as a
causal estimate of the effect of X on Y?
5Review of the Linear Model
- Key assumption of the linear model
- E(X'e) Cov(X,e) E(e X) 0
- Exogeneity assumption X is uncorrelated with
the unobserved determinants of Y - Important statistical property of the LS
estimator under exogeneity - E(bLS) ß Cov(X,e) / Var(X)
- plim(bLS) ß Cov(X,e) / Var(X)
Second terms 0, so bLS unbiased and consistent
6Endogeneity and the Evaluation Problem
- When is the exogeneity assumption violated?
- Measurement error ? Attenuation bias
- Instantaneous causation ? Simultaneity bias
- Omitted variables ? Selection bias
- Selection bias is the problem in observational
research that undermines causal inference - Measurement error and instantaneous causation can
be posed as problems of omitted variables
7When Is the Exogeneity Assumption Violated?
- (1) Measurement error in X (u) that is correlated
with M.E. in Y (v) or with the model error (e) - Classical M.E. leads to attenuation, 0 lt E(bLS) lt
ß, but non-random M.E. (or correlation between
M.E. and X, Y, V, and/or e) introduces unknown
biases
And, if there are multiple Xs, bias
contaminates the whole model, not just the
coefficient on the X measured with error (a.k.a.
smearing)
8When Is the Exogeneity Assumption Violated?
- (2) Instantaneous causation of Y on X
- Direction of the bias depends on what the sign is
for the feedback effect, Y ? X - If positive, E(bLS) gt ß, so overestimate true
effect - If negative, E(bLS) lt ß, so underestimate true
effect and in severe cases can even flip the sign
so that E(bLS) lt 0 even though ß gt 0
This non-recursivity complicates the
relationship between price and quantity in
economics
9When Is the Exogeneity Assumption Violated?
- (3) Omitted variable (W) that is correlated with
both X and Y - Classic problem of omitted variables bias
- Coefficient on X will absorb the indirect path
through W, whose sign depends on Cov(X,W) and
Cov(W,Y)
Things more complicated in applied settings
because there are bound to be many Ws, not to
mention that the smearing problem applies in
this context also
10Example 1 Police Hiring
- Measurement error
- Mobilization of sworn officers (M.E. in X) as
well as differential victim reporting or crime
recording (M.E. in Y) may be correlated with
police size - Instantaneous causation
- More police might be hired during a crime wave
- Omitted variables
- Large departments may differ in fundamental ways
difficult to measure (e.g., urban, heterogeneous)
11Example 2 Sanction Perceptions
- Measurement error
- Measures of perceived sanction risk are probably
noisy (M.E. in X), resulting in attenuation at
best - Instantaneous causation
- Perceptions are sensitive to the success/failure
of criminal behavior, so feedback is negative - Omitted variables
- Perceived risk probably correlated with
unobserved determinants of crime (e.g.,
intelligence)
12Example 3 Delinquent Peers
- Measurement error
- Highly delinquent youth probably overestimate the
delinquency of their peers (M.E. in X), and
likely underestimate their own delinquency (M.E.
in Y) - Instantaneous causation
- If there is influence/imitation, then it is
bidirectional - Omitted variables
- High-risk youth probably select themselves into
delinquent peer groups (birds of a feather)
13Regression EstimationIgnoring Omitted Variables
- Suppose we estimate treatment effect model
- Y a ßX e
- Lets assume without loss of generality that X is
a binary treatment ( 1 if treated 0 if
untreated) - Least squares estimator
- bLS Cov(X,Y) / Var(X) E(Y X 1) E(Y X
0) - Simply the difference in means between treated
units (X 1) and untreated units (X 0)
14Regression EstimationIgnoring Omitted Variables
- But suppose the population treatment effect model
is instead - Y a ßX (dW ?)
- Now the residual conveys information about W
- Consider a plausible example
- Y crime, X marriage, W marriageability
- Marriageability can be broadly construed to
encompass earnings potential, desire for
children, willingness to compromise,
faithfulness, verbal communication skills,... - Including signals that individuals emit about
these qualities
15Regression EstimationIgnoring Omitted Variables
- What does LS estimate when W is omitted?
- bLS C(X,Y)/V(X) C(W,Y)/V(W)
C(X,W)/V(X) - ß d E(W X 1) E(W X 0)
- Marriage effect on crime will be overestimated
- IMPORTANT Even if ß 0, bLS lt 0
16Regression EstimationIgnoring Omitted Variables
- So...
- bLS ß d E(W X 1) E(W X 0)
- Estimate of ß is unbiased if and only if
- 1. Marriageability is uncorrelated with crime
- d 0
- or...
- 2. Marriageability is balanced (i.e.,
equivalent) between married and unmarried
subjects - E(W X 1) E(W X 0)
17Omitted Variables in Criminological Research
- What variables of interest to criminologists are
surely endogenous? - Micro Employment, education, marriage, military
service, fertility, conviction, family
structure,.... - Macro Poverty, unemployment rate, collective
efficacy, immigrant concentration,.... - Basically, EVERYTHING!
- (Im sorry to be the one to break it to you)
18Traditional Strategies to Deal with Omitted
Variables
- Randomization (physical control)
- Achieves balance (in expectation) on any and all
potential Ws - Control variables are technically unnecessary
- Covariate adjustment (statistical control)
- Control for potential Ws in a regression model
- But...we have no idea how many Ws there are, so
model misspecification is still a real problem
here
19Quasi-Experimental Strategies to Deal with
Omitted Variables
- Difference in differences (fixed-effects model)
- Requires panel data
- Propensity score matching
- Requires a lot of measured background variables
- Similar to covariate adjustment, but only the
treated and untreated cases which are on
support are utilized - Instrumental variables estimation
- Requires an exclusion restriction
20Instrumental Variables Estimation Is a Viable
Approach
- An instrumental variable for X is one solution
to the problem of omitted variables bias
- Requirements for Z to be a valid instrument for X
- Relevant Correlated with X
- Exogenous Not correlated with Y but through its
correlation with X
21Important Point about Instrumental Variables
Models
- I often hear...A good instrument should not be
correlated with the dependent variable - WRONG!!!
- Z has to be correlated with Y, otherwise it is
useless as an instrument - It can only be correlated with Y through X
- A good instrument must not be correlated with the
unobserved determinants of Y
22Important Point about Instrumental Variables
Models
- Not all of the available variation in X is used
- Only that portion of X which is explained by Z
is used to explain Y
X Endogenous variable Y Response
variable Z Instrumental variable
23Important Point about Instrumental Variables
Models
Best-case scenario A lot of X is explained by
Z, and most of the overlap between X and Y is
accounted for
Realistic scenario Very little of X is
explained by Z, or what is explained does not
overlap much with Y
24Important Point about Instrumental Variables
Models
- The IV estimator is BIASED
- In other words, E(bIV) ? ß (finite-sample bias)
- The appeal of IV derives from its consistency
- Consistency is a way of saying that E(b) ? ß as
N ? 8 - SoIV studies often have very large samples
- But with endogeneity, E(bLS) ? ß and plim(bLS) ?
ß anyway - Asymptotic behavior of IV
- plim(bIV) ß Cov(Z,e) / Cov(Z,X)
- If Z is truly exogenous, then Cov(Z,e) 0
25Instrumental Variables Terminology
- Three different models to be familiar with
- First stage X a0 a1Z ?
- Structural model Y ß0 ß1X e
- Reduced form Y d0 d1Z ?
- An interesting equality
- d1 a1 ß1
- so
- ß1 d1 / a1
26Different Types of Instrumental Variables
Estimators
- Wald estimator for binary instrument
- bWald E(Y Z 1) E(Y Z 0) / E(X Z
1) E(X Z 0) - Difference in response Difference in treatment
- Instrumental variables (IV) estimator
- bIV (Z'X)1Z'Y Cov(Z,Y) / Cov(Z,X)
- Shows that bIV can be recovered from two samples
- Two-stage least squares (2SLS) estimator
- b2SLS (X'X)1X'Y Cov(X,Y) / Var(X)
- X represents fitted value from first-stage
model
27Different Types of Instrumental Variables
Estimators
- Single binary instrument and no control
variables... - bWald bIV b2SLS
- Single instrument (binary or continuous) with or
without control variables... - bIV b2SLS
- Multiple instruments (binary or continuous) with
or without control variables... - b2SLS
28More on the Method of Two-Stage Least Squares
(2SLS)
- Step 1 X a0 a1Z1 a2Z2 ??? akZk u
- Obtain fitted values (X) from the first-stage
model - Step 2 Y b0 b1X e
- Substitute the fitted X in place of the original
X - Note If done manually in two stages, the
standard errors are based on the wrong residual - e Y b0 b1X when it should be e Y
b0 b1X - Best to just let the software do it for you
29Including Control Variables in an IV/2SLS Model
- Control variables (Ws) should be entered into
the model at both stages - First stage X a0 a1Z a2W u
- Second stage Y b0 b1X b2W e
- Control variables are considered instruments,
they are just not excluded instruments - They serve as their own instrument
30Functional Form Considerations with IV/2SLS
- Binary endogenous regressor (X)
- Consistency of second-stage estimates do not
hinge on getting first-stage functional form
correct - Binary response variable (Y)
- IV probit (or logit) is feasible but is
technically unnecessary - In both cases, linear model is tractable, easily
interpreted, and consistent - Although variance adjustment is well advised
31Functional Form Considerations with IV/2SLS
- Quadratic second stage with a continuous
endogenous regressor - Entering first-stage fitted values and their
square into second-stage model leads to
inconsistency - The square of a linear projection is not
equivalent to a linear projection on a quadratic - Squares and cross-products of IVs should be
treated as additional instruments - Kelejian (1971)
- Linear and squared Xs are treated as two
different endogenous regressors
32Technical Conditions Required for Model
Identification
- Order condition At least the same of IVs as
endogenous Xs - Just-identified model IVs Xs
- Overidentified model IVs gt Xs
- Rank condition At least one IV must be
significant in the first-stage model - Number of linearly independent columns in a
matrix - E(X Z,W) cannot be perfectly correlated with
E(X W)
33Statistical Inference with IV
- Variance estimation
- s2ßLS s2e / SSTX
- s2ßIV s2e / (SSTX ? R2X,Z)
- where
- e Y ß0 ß1X
- NOTICE Because R2X,Z lt 1 ? sbIV gt sbLS
- IV standard errors tend to be large, especially
when R2X,Z is very small, which can lead to type
II errors
34Instrumental Variables and Randomized Experiments
- Imperfect compliance in randomized trials
- Some individuals assigned to treatment group will
not receive Tx, and some assigned to control
group will receive Tx - Assignment error subject refusal investigator
discretion - Some individuals who receive Tx will not change
their behavior, and some who do not receive Tx
will change their behavior - A problem in randomized job training studies and
other social experiments (e.g., housing vouchers)
35Instrumental Variables and Randomized Experiments
- Two different measures of treatment (X)
- Treatment assigned Exogenous
- Intention-to-treat (ITT) analysis
- Reduced-form model Y d0 d1Z ?
- Often leads to underestimation of treatment
effect - Treatment delivered Endogenous
- Individuals who do not comply probably differ in
ways that can undermine the study - Self-selection ? bias and inconsistency
36Angrist (2006), J.E.C.
- Minneapolis D.V. experiment
- Sherman and Berk (1984)
- Cases of male-on-female misdemeanor assault in
two high-density precincts, in which both parties
present at scene - Random assignment of arrest-mediation-separation
- But...treatment assigned was not treatment
delivered - Fidelity vis-à-vis arrest, but many subjects
(25) assigned to mediation/separation were
arrested - Upgrading was more likely when suspect was
rude, suspect assaulted officer, weapons were
involved, victim persistently demanded arrest,
and incident violated restraining order
37Angrist (2006), J.E.C.
38Angrist (2006), J.E.C.
- Estimates of effect of arrest (vs. mediate or
separate) on D.V. recividism (Tables 2, 3) - OLS b .070 (s.e. .038)
- ITT b .108 (s.e. .041)
- 2SLS b .140 (s.e. .053)
- Deterrent effect of arrest is twice as large in
2SLS as opposed to OLS - In this context, 2SLS is known as a local
average treatment effect (Ill come back to this)
39Sexton and Hebel (1984), J.A.M.A.
- Maternal smoking and birth weight
- Sexton and Hebel (1984)
- Sample of pregnant women who were confirmed
smokers, recruited from prenatal care registrants - At least 10 cigarettes per day and not past 18th
week - Random assignment of staff assistance in a
smoking cessation program - Personal visits telephone and mail contacts
- But...some smokers in treatment group did not
quit and some smokers in control group did quit
40Sexton and Hebel (1984), J.A.M.A.
41Sexton and Hebel (1984), J.A.M.A.
(1) First-stage model Mean cigarettes
smoked Treatment 6.4 Control
12.8 First-stage effect bFS 6.4
(2) Reduced-form model Mean birth
weight Treatment 3,278g Control
3,186g Reduced-form effect bRF 92
(3) Structural model Effect of smoking frequency
on mean birth weight bIV 92 / 6.4
14.4g Each cigarette reduces birth weight by
14.4 grams
42Sexton and Hebel (1984), J.A.M.A.
- As an interesting aside, its also possible to
estimate the effect of continuing smoking (vs.
quitting) from the data - First stage bFS 0.23 (57 vs. 80 smokers)
- Reduced form bRF 92g
- Structural bIV 92 / 0.23 400g
- Women who kept smoking by the 8th month of
pregnancy bore children who were 400 grams
lighter, on average
43Permutt and Hebel (1989), Biometrics
- Estimates of the effect of smoking frequency (in
8th month) on birth weight - OLS b 2g (s.e. not reported)
- 2SLS b 14g (s.e. 7g)
- Here as well, 2SLS yields the local average
treatment effect of smoking on birth weight
44Instrumental Variables and Local Average
Treatment Effects
- Definition of a L.A.T.E.
- The average treatment effect for individuals who
can be induced to change treatment status by a
change in the instrument - Imbens and Angrist (1994, p. 470)
- The average causal effect of X on Y for
compliers, as opposed to always takers or
never takers - Not a particularly well-defined (sub)population
- L.A.T.E. is instrument-dependent, in contrast to
the population A.T.E.
45L.A.T.E. in the Previous Two Examples
- In the D.V. study...
- For men who were arrested as per the experimental
protocol, arrest resulted in a mean 14-point
decline in the probability of recidivism compared
to non-arrest interventions - In the maternal smoking study...
- For women who reduced their smoking frequency
because they were assigned to the intervention,
each one-cigarette reduction resulted in a
14-gram increase in birth weight (from mean 11
cigarettes)