Instrumental Variables Estimation (with Examples from Criminology) - PowerPoint PPT Presentation

About This Presentation
Title:

Instrumental Variables Estimation (with Examples from Criminology)

Description:

Instrumental Variables Estimation (with Examples from Criminology) Robert Apel, Ph.D. School of Criminal Justice University at Albany Center for Social and ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 46
Provided by: apel7
Learn more at: http://cega.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Instrumental Variables Estimation (with Examples from Criminology)


1
Instrumental Variables Estimation (with Examples
from Criminology)
  • Robert Apel, Ph.D.
  • School of Criminal Justice
  • University at Albany

Center for Social and Demographic
Analysis University at Albany May 5 7, 2009
2
Vital Statistics
  • Ph.D., Criminology and Criminal Justice, 2004
  • University of Maryland
  • Coursework in Department of Economics
  • Dissertation used instrumental variables
  • State child labor laws as instrumental variables
    for the causal effect of youth employment on
    antisocial behavior

3
Topics That Will Be Covered in this Workshop
  • Why use IV?
  • Discussion of endogeneity bias
  • Statistical motivation for IV
  • What is an IV?
  • Identification issues
  • Statistical properties of IV estimators
  • How is an IV model estimated?
  • Software and data examples
  • Diagnostics IV relevance, IV exogeneity, Hausman

4
Review of the Linear Model
  • Population model Y a ßX e
  • Assume that the true slope is positive, so ß gt 0
  • Sample model Y a bX e
  • Least squares (LS) estimator of ß
  • bLS (X'X)1X'Y Cov(X,Y) / Var(X)
  • Under what conditions can we speak of bLS as a
    causal estimate of the effect of X on Y?

5
Review of the Linear Model
  • Key assumption of the linear model
  • E(X'e) Cov(X,e) E(e X) 0
  • Exogeneity assumption X is uncorrelated with
    the unobserved determinants of Y
  • Important statistical property of the LS
    estimator under exogeneity
  • E(bLS) ß Cov(X,e) / Var(X)
  • plim(bLS) ß Cov(X,e) / Var(X)

Second terms 0, so bLS unbiased and consistent
6
Endogeneity and the Evaluation Problem
  • When is the exogeneity assumption violated?
  • Measurement error ? Attenuation bias
  • Instantaneous causation ? Simultaneity bias
  • Omitted variables ? Selection bias
  • Selection bias is the problem in observational
    research that undermines causal inference
  • Measurement error and instantaneous causation can
    be posed as problems of omitted variables

7
When Is the Exogeneity Assumption Violated?
  • (1) Measurement error in X (u) that is correlated
    with M.E. in Y (v) or with the model error (e)
  • Classical M.E. leads to attenuation, 0 lt E(bLS) lt
    ß, but non-random M.E. (or correlation between
    M.E. and X, Y, V, and/or e) introduces unknown
    biases

And, if there are multiple Xs, bias
contaminates the whole model, not just the
coefficient on the X measured with error (a.k.a.
smearing)
8
When Is the Exogeneity Assumption Violated?
  • (2) Instantaneous causation of Y on X
  • Direction of the bias depends on what the sign is
    for the feedback effect, Y ? X
  • If positive, E(bLS) gt ß, so overestimate true
    effect
  • If negative, E(bLS) lt ß, so underestimate true
    effect and in severe cases can even flip the sign
    so that E(bLS) lt 0 even though ß gt 0

This non-recursivity complicates the
relationship between price and quantity in
economics
9
When Is the Exogeneity Assumption Violated?
  • (3) Omitted variable (W) that is correlated with
    both X and Y
  • Classic problem of omitted variables bias
  • Coefficient on X will absorb the indirect path
    through W, whose sign depends on Cov(X,W) and
    Cov(W,Y)

Things more complicated in applied settings
because there are bound to be many Ws, not to
mention that the smearing problem applies in
this context also
10
Example 1 Police Hiring
  • Measurement error
  • Mobilization of sworn officers (M.E. in X) as
    well as differential victim reporting or crime
    recording (M.E. in Y) may be correlated with
    police size
  • Instantaneous causation
  • More police might be hired during a crime wave
  • Omitted variables
  • Large departments may differ in fundamental ways
    difficult to measure (e.g., urban, heterogeneous)

11
Example 2 Sanction Perceptions
  • Measurement error
  • Measures of perceived sanction risk are probably
    noisy (M.E. in X), resulting in attenuation at
    best
  • Instantaneous causation
  • Perceptions are sensitive to the success/failure
    of criminal behavior, so feedback is negative
  • Omitted variables
  • Perceived risk probably correlated with
    unobserved determinants of crime (e.g.,
    intelligence)

12
Example 3 Delinquent Peers
  • Measurement error
  • Highly delinquent youth probably overestimate the
    delinquency of their peers (M.E. in X), and
    likely underestimate their own delinquency (M.E.
    in Y)
  • Instantaneous causation
  • If there is influence/imitation, then it is
    bidirectional
  • Omitted variables
  • High-risk youth probably select themselves into
    delinquent peer groups (birds of a feather)

13
Regression EstimationIgnoring Omitted Variables
  • Suppose we estimate treatment effect model
  • Y a ßX e
  • Lets assume without loss of generality that X is
    a binary treatment ( 1 if treated 0 if
    untreated)
  • Least squares estimator
  • bLS Cov(X,Y) / Var(X) E(Y X 1) E(Y X
    0)
  • Simply the difference in means between treated
    units (X 1) and untreated units (X 0)

14
Regression EstimationIgnoring Omitted Variables
  • But suppose the population treatment effect model
    is instead
  • Y a ßX (dW ?)
  • Now the residual conveys information about W
  • Consider a plausible example
  • Y crime, X marriage, W marriageability
  • Marriageability can be broadly construed to
    encompass earnings potential, desire for
    children, willingness to compromise,
    faithfulness, verbal communication skills,...
  • Including signals that individuals emit about
    these qualities

15
Regression EstimationIgnoring Omitted Variables
  • What does LS estimate when W is omitted?
  • bLS C(X,Y)/V(X) C(W,Y)/V(W)
    C(X,W)/V(X)
  • ß d E(W X 1) E(W X 0)
  • Marriage effect on crime will be overestimated
  • IMPORTANT Even if ß 0, bLS lt 0

16
Regression EstimationIgnoring Omitted Variables
  • So...
  • bLS ß d E(W X 1) E(W X 0)
  • Estimate of ß is unbiased if and only if
  • 1. Marriageability is uncorrelated with crime
  • d 0
  • or...
  • 2. Marriageability is balanced (i.e.,
    equivalent) between married and unmarried
    subjects
  • E(W X 1) E(W X 0)

17
Omitted Variables in Criminological Research
  • What variables of interest to criminologists are
    surely endogenous?
  • Micro Employment, education, marriage, military
    service, fertility, conviction, family
    structure,....
  • Macro Poverty, unemployment rate, collective
    efficacy, immigrant concentration,....
  • Basically, EVERYTHING!
  • (Im sorry to be the one to break it to you)

18
Traditional Strategies to Deal with Omitted
Variables
  • Randomization (physical control)
  • Achieves balance (in expectation) on any and all
    potential Ws
  • Control variables are technically unnecessary
  • Covariate adjustment (statistical control)
  • Control for potential Ws in a regression model
  • But...we have no idea how many Ws there are, so
    model misspecification is still a real problem
    here

19
Quasi-Experimental Strategies to Deal with
Omitted Variables
  • Difference in differences (fixed-effects model)
  • Requires panel data
  • Propensity score matching
  • Requires a lot of measured background variables
  • Similar to covariate adjustment, but only the
    treated and untreated cases which are on
    support are utilized
  • Instrumental variables estimation
  • Requires an exclusion restriction

20
Instrumental Variables Estimation Is a Viable
Approach
  • An instrumental variable for X is one solution
    to the problem of omitted variables bias
  • Requirements for Z to be a valid instrument for X
  • Relevant Correlated with X
  • Exogenous Not correlated with Y but through its
    correlation with X

21
Important Point about Instrumental Variables
Models
  • I often hear...A good instrument should not be
    correlated with the dependent variable
  • WRONG!!!
  • Z has to be correlated with Y, otherwise it is
    useless as an instrument
  • It can only be correlated with Y through X
  • A good instrument must not be correlated with the
    unobserved determinants of Y

22
Important Point about Instrumental Variables
Models
  • Not all of the available variation in X is used
  • Only that portion of X which is explained by Z
    is used to explain Y

X Endogenous variable Y Response
variable Z Instrumental variable
23
Important Point about Instrumental Variables
Models
Best-case scenario A lot of X is explained by
Z, and most of the overlap between X and Y is
accounted for
Realistic scenario Very little of X is
explained by Z, or what is explained does not
overlap much with Y
24
Important Point about Instrumental Variables
Models
  • The IV estimator is BIASED
  • In other words, E(bIV) ? ß (finite-sample bias)
  • The appeal of IV derives from its consistency
  • Consistency is a way of saying that E(b) ? ß as
    N ? 8
  • SoIV studies often have very large samples
  • But with endogeneity, E(bLS) ? ß and plim(bLS) ?
    ß anyway
  • Asymptotic behavior of IV
  • plim(bIV) ß Cov(Z,e) / Cov(Z,X)
  • If Z is truly exogenous, then Cov(Z,e) 0

25
Instrumental Variables Terminology
  • Three different models to be familiar with
  • First stage X a0 a1Z ?
  • Structural model Y ß0 ß1X e
  • Reduced form Y d0 d1Z ?
  • An interesting equality
  • d1 a1 ß1
  • so
  • ß1 d1 / a1

26
Different Types of Instrumental Variables
Estimators
  • Wald estimator for binary instrument
  • bWald E(Y Z 1) E(Y Z 0) / E(X Z
    1) E(X Z 0)
  • Difference in response Difference in treatment
  • Instrumental variables (IV) estimator
  • bIV (Z'X)1Z'Y Cov(Z,Y) / Cov(Z,X)
  • Shows that bIV can be recovered from two samples
  • Two-stage least squares (2SLS) estimator
  • b2SLS (X'X)1X'Y Cov(X,Y) / Var(X)
  • X represents fitted value from first-stage
    model

27
Different Types of Instrumental Variables
Estimators
  • Single binary instrument and no control
    variables...
  • bWald bIV b2SLS
  • Single instrument (binary or continuous) with or
    without control variables...
  • bIV b2SLS
  • Multiple instruments (binary or continuous) with
    or without control variables...
  • b2SLS

28
More on the Method of Two-Stage Least Squares
(2SLS)
  • Step 1 X a0 a1Z1 a2Z2 ??? akZk u
  • Obtain fitted values (X) from the first-stage
    model
  • Step 2 Y b0 b1X e
  • Substitute the fitted X in place of the original
    X
  • Note If done manually in two stages, the
    standard errors are based on the wrong residual
  • e Y b0 b1X when it should be e Y
    b0 b1X
  • Best to just let the software do it for you

29
Including Control Variables in an IV/2SLS Model
  • Control variables (Ws) should be entered into
    the model at both stages
  • First stage X a0 a1Z a2W u
  • Second stage Y b0 b1X b2W e
  • Control variables are considered instruments,
    they are just not excluded instruments
  • They serve as their own instrument

30
Functional Form Considerations with IV/2SLS
  • Binary endogenous regressor (X)
  • Consistency of second-stage estimates do not
    hinge on getting first-stage functional form
    correct
  • Binary response variable (Y)
  • IV probit (or logit) is feasible but is
    technically unnecessary
  • In both cases, linear model is tractable, easily
    interpreted, and consistent
  • Although variance adjustment is well advised

31
Functional Form Considerations with IV/2SLS
  • Quadratic second stage with a continuous
    endogenous regressor
  • Entering first-stage fitted values and their
    square into second-stage model leads to
    inconsistency
  • The square of a linear projection is not
    equivalent to a linear projection on a quadratic
  • Squares and cross-products of IVs should be
    treated as additional instruments
  • Kelejian (1971)
  • Linear and squared Xs are treated as two
    different endogenous regressors

32
Technical Conditions Required for Model
Identification
  • Order condition At least the same of IVs as
    endogenous Xs
  • Just-identified model IVs Xs
  • Overidentified model IVs gt Xs
  • Rank condition At least one IV must be
    significant in the first-stage model
  • Number of linearly independent columns in a
    matrix
  • E(X Z,W) cannot be perfectly correlated with
    E(X W)

33
Statistical Inference with IV
  • Variance estimation
  • s2ßLS s2e / SSTX
  • s2ßIV s2e / (SSTX ? R2X,Z)
  • where
  • e Y ß0 ß1X
  • NOTICE Because R2X,Z lt 1 ? sbIV gt sbLS
  • IV standard errors tend to be large, especially
    when R2X,Z is very small, which can lead to type
    II errors

34
Instrumental Variables and Randomized Experiments
  • Imperfect compliance in randomized trials
  • Some individuals assigned to treatment group will
    not receive Tx, and some assigned to control
    group will receive Tx
  • Assignment error subject refusal investigator
    discretion
  • Some individuals who receive Tx will not change
    their behavior, and some who do not receive Tx
    will change their behavior
  • A problem in randomized job training studies and
    other social experiments (e.g., housing vouchers)

35
Instrumental Variables and Randomized Experiments
  • Two different measures of treatment (X)
  • Treatment assigned Exogenous
  • Intention-to-treat (ITT) analysis
  • Reduced-form model Y d0 d1Z ?
  • Often leads to underestimation of treatment
    effect
  • Treatment delivered Endogenous
  • Individuals who do not comply probably differ in
    ways that can undermine the study
  • Self-selection ? bias and inconsistency

36
Angrist (2006), J.E.C.
  • Minneapolis D.V. experiment
  • Sherman and Berk (1984)
  • Cases of male-on-female misdemeanor assault in
    two high-density precincts, in which both parties
    present at scene
  • Random assignment of arrest-mediation-separation
  • But...treatment assigned was not treatment
    delivered
  • Fidelity vis-à-vis arrest, but many subjects
    (25) assigned to mediation/separation were
    arrested
  • Upgrading was more likely when suspect was
    rude, suspect assaulted officer, weapons were
    involved, victim persistently demanded arrest,
    and incident violated restraining order

37
Angrist (2006), J.E.C.
38
Angrist (2006), J.E.C.
  • Estimates of effect of arrest (vs. mediate or
    separate) on D.V. recividism (Tables 2, 3)
  • OLS b .070 (s.e. .038)
  • ITT b .108 (s.e. .041)
  • 2SLS b .140 (s.e. .053)
  • Deterrent effect of arrest is twice as large in
    2SLS as opposed to OLS
  • In this context, 2SLS is known as a local
    average treatment effect (Ill come back to this)

39
Sexton and Hebel (1984), J.A.M.A.
  • Maternal smoking and birth weight
  • Sexton and Hebel (1984)
  • Sample of pregnant women who were confirmed
    smokers, recruited from prenatal care registrants
  • At least 10 cigarettes per day and not past 18th
    week
  • Random assignment of staff assistance in a
    smoking cessation program
  • Personal visits telephone and mail contacts
  • But...some smokers in treatment group did not
    quit and some smokers in control group did quit

40
Sexton and Hebel (1984), J.A.M.A.
41
Sexton and Hebel (1984), J.A.M.A.
(1) First-stage model Mean cigarettes
smoked Treatment 6.4 Control
12.8 First-stage effect bFS 6.4
(2) Reduced-form model Mean birth
weight Treatment 3,278g Control
3,186g Reduced-form effect bRF 92
(3) Structural model Effect of smoking frequency
on mean birth weight bIV 92 / 6.4
14.4g Each cigarette reduces birth weight by
14.4 grams
42
Sexton and Hebel (1984), J.A.M.A.
  • As an interesting aside, its also possible to
    estimate the effect of continuing smoking (vs.
    quitting) from the data
  • First stage bFS 0.23 (57 vs. 80 smokers)
  • Reduced form bRF 92g
  • Structural bIV 92 / 0.23 400g
  • Women who kept smoking by the 8th month of
    pregnancy bore children who were 400 grams
    lighter, on average

43
Permutt and Hebel (1989), Biometrics
  • Estimates of the effect of smoking frequency (in
    8th month) on birth weight
  • OLS b 2g (s.e. not reported)
  • 2SLS b 14g (s.e. 7g)
  • Here as well, 2SLS yields the local average
    treatment effect of smoking on birth weight

44
Instrumental Variables and Local Average
Treatment Effects
  • Definition of a L.A.T.E.
  • The average treatment effect for individuals who
    can be induced to change treatment status by a
    change in the instrument
  • Imbens and Angrist (1994, p. 470)
  • The average causal effect of X on Y for
    compliers, as opposed to always takers or
    never takers
  • Not a particularly well-defined (sub)population
  • L.A.T.E. is instrument-dependent, in contrast to
    the population A.T.E.

45
L.A.T.E. in the Previous Two Examples
  • In the D.V. study...
  • For men who were arrested as per the experimental
    protocol, arrest resulted in a mean 14-point
    decline in the probability of recidivism compared
    to non-arrest interventions
  • In the maternal smoking study...
  • For women who reduced their smoking frequency
    because they were assigned to the intervention,
    each one-cigarette reduction resulted in a
    14-gram increase in birth weight (from mean 11
    cigarettes)
Write a Comment
User Comments (0)
About PowerShow.com