Econometrics Course: Endogeneity - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Econometrics Course: Endogeneity

Description:

Pretesting for Endogeneity Problem: the tests all have low power, particularly when 2SLS would cause a significant loss of efficiency. In practice, ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 59

Provided by: MarkSm9

Category:

more less

Transcript and Presenter's Notes

Title: Econometrics Course: Endogeneity

1
Econometrics CourseEndogeneity Simultaneity

Mark W. Smith

2
Overview

Endogeneity
Sources
Responses
Omitted Variables
Measurement Error
Proxy Variables
Method of Instrumental Variables
Properties
Validity and strength of instruments

3
Definition of Endogeneity

Suppose we have a regression equation
y a b1x1 b2x2 e
The variable x1 is endogenous if it is correlated
with e.
Note that this is related to, but not identical
to, the heuristic definition that x1 is
determined within the model.

4
Sources of Endogeneity

1. Omitted variables
If the true model underlying the data is
y a b1x1 b2x2 b3x3 n
but you estimate the model
y a b1x1 b2x2 e
then variable x1 will be endogenous if it is
correlated with x3. Why? Because e f (n, x3).

5
Sources of Endogeneity

2. Measurement error
Suppose the true model underlying the data is
y a b1x1 b2x2 e
but you estimate the model
y a b1x1 b2x2 e
where (x2 x2 j).

6
Sources of Endogeneity

2. Measurement error - continued
Variable x2 will be endogenous if j depends on
x2.
Example Suppose that x2 measures hospital size
(no. of beds), and that the measurement error is
greater for larger hospitals. Then as x2 grows,
so does j. Thus e is correlated with x2, causing
endogeneity.

7
Sources of Endogeneity

2. Measurement error - continued
Rearranging the equation, we have
y a b1x1 b2x2 e
y a b1x1 b2(x2 j) e
y a b1x1 b2x2 (e b2 j)
If j f(x2) then error term is correlated with
x2, causing endogeneity.

8
Sources of Endogeneity

3. Simultaneity
A system of simultaneous equations occurs when
two or more left-hand side variables are
functions of each other (there are other ways of
stating it, too)
y1 a b1x1 g2y2 e
y2 a g1x1 g2y1 e

9
Sources of Endogeneity

3. Simultaneity
With some algebra you can rewrite these two
equations in reduced form as a single equation
with an endogenous regressor.

10
Pretesting for Endogeneity

The most famous test is Hausman (1978). Many
others are described in Nakamura and Nakamura
(1998).
Idea the method of instrumental variables (IV)
uses two-stage least squares (2SLS). If there is
no endogeneity, it is more efficient to use OLS.
If there is endogeneity, OLS is inconsistent and
so 2SLS is best.

11
Pretesting for Endogeneity

Problem the tests all have low power,
particularly when 2SLS would cause a significant
loss of efficiency.
In practice, many people use a Hausman test, fail
to reject the null hypothesis of no endogeneity,
and then use OLS.
A more statistically reliable approach is to base
judgments of endogeneity on how the system under
study works.

12
Responses to Endogeneity

What if you are unsure whether a variable is
endogenous?
Approach 1 ignore it
Approach 2 use instrumental variables (IV) --
described later -- for every possibly endogenous
variable
Approach 3 subtract out the variable using
time-series (panel) data

13
Responses to Endogeneity

Approach 1 ignore it
-- Not advisable true endogeneity causes
OLS to be inconsistent
Approach 2 use IV on every possibly endogenous
variable
-- Not advisable it will cause a loss of
efficiency (and hence wider confidence intervals)
and may lead to bias.

14
Responses to Endogeneity

Approach 3 Difference it out
Suppose that the endogeneity is fixed over time,
such as measurement error or an omitted variable.
Further, suppose that observe data in two time
periods.
A difference-in-difference (DD) model can be
used subtract values at time 1 (before) from
values at time 2 (after) and the endogenous
variable will drop out.

15
Responses to Endogeneity

Approach 3 Difference it out -- continued
Limitations
- DD models will not eliminate selection bias.
- DD models only eliminate fixed variables
sometimes endogenous variables change values over
time

16
Dealing with Omitted Variables
17
Dealing with Omitted Variables

The investigator should have a conceptual model
of the process under study. Guided by this
understanding, there are a few options for
dealing with omitted variables.
1. Find additional data so that every relevant
variable is included.
2. Ignore it
- Acceptable only if omitted variable is
uncorrelated with all included variables
otherwise the coefficient estimates will be
biased up or down.

18
Dealing with Omitted Variables

3. Find proxy variable
Suppose the following
y is the outcome
q is the omitted variable
z is the proxy for q
What properties should the proxy z have?

19
Dealing with Omitted Variables

a. Proxy z should be strongly correlated with q.
b. Proxy z must be redundant ( ignorable)
E (y x, q, z) E (y x, q)
c. Omitted q must be uncorrelated with other
regressors conditional on z
(corr (q , xj) 0 z) for each xj

20
Dealing with Omitted Variables

The last two mean roughly that q and z provide
similar information about the outcome.
You dont observe q, so how can you prove these
conditions are met? Either argue it from theory
or test the assumption using other data.

21
Dealing with Measurement Error
22
Dealing with Measurement Error

1. Improve measurement
- DSS improved by refusing extreme outlier
values
- NPPD improved by requiring more complete data
2. Argue that the degree of error is small
- Use outside data for validation
3. Argue that error is uncorrelated with included
variables

23
Dealing with Proxy Variables
24
Dealing with Proxy Variables

1. What if proxy variable z is correlated with a
regressor x?
OLS is inconsistent, but one can hope and argue
that the inconsistency is less than if z is
omitted.

25
Dealing with Proxy Variables

2. Consider using a lagged dependent variable as
a proxy variable.
Example If you believe that omitted variable qt
strongly affects outcome yt, then a lagged value
of y (such as yt-2) is probably correlated with
qt as well.
Problem yt-2 may be correlated with other xs as
well, leading to inconsistency.

26
Dealing with Proxy Variables

3. Consider using multiple proxy variables for a
single omitted variable.
How? Simply put all proxy variables in the
equation.
Note they all must meet the requirements for
proxies.

27
Dealing with Proxy Variables

4. What if omitted variable q interacts with a
regressor x?
y a b1x b2q b3qx e
? dy/dx b1 b3q
marginal effect of x on y involves q, which is
unobserved

28
Dealing with Proxy Variables

Demean z take every value of z and subtract out
the grand (overall) average value. Call it zd.
y a b1x b2zd b3zdx e
? dy/dx b1 b3zd
b1 because Ezd 0

29
Instrumental Variables
30
Method of Instrumental Variables

Often used to deal with simultaneity.
More generally, IV applies whenever a regressor x
is correlated with the error term e.

31
IV Definition

Model y a b1x1 b2x2 e
Suppose that x2 is endogenous to y. An
instrumental variable is one that
(a) is correlated with the endogenous variable
x2
(b) is uncorrelated with error term e
(c) should not enter the main equation (i.e.,
does not
explain y)

32
Two-Stage Least Squares

Two-stage least squares (2SLS) approach
Stage 1
Predict x2 as a function of all other
variables plus an IV (call it z)
x2 a g1x1 g2z n
Create predicted values of x2 call them x2p

33
Two-Stage Least Squares

Two-stage least squares (2SLS) approach
Stage 2
Predict y as a function of x2p and all other
variables (but not z)
y a b1x1 b2 x2p e
Note adjust the standard errors to account
for the fact that x2p is predicted.

34
Two-Stage Residual Inclusion

2SLS is only consistent when the Stage 2 equation
is linear.
If Stage 2 is nonlinear, use the two-stage
residual inclusion (2SRI) method
- Stage 1 as in 2SLS, leading to predicted x2p
- Develop residuals v x2 - x2p

35
Two-Stage Residual Inclusion

- Stage 2
Predict y as a function of x1, x2 (not x2p)
and the new residuals v
y f (a b1x1 b2 x2 b3v) e
where f(.) is a nonlinear function.
Note that if Stage 2 is linear, then 2SRI
yields the same results as 2SLS.

36
Multiple IVs

What if you have multiple endogenous variables?
1. The number of IVs must equal or exceed the
number of endogenous variables
2. Estimate a separate 1st-stage regression for
each endogenous variable
3. Every 1st-stage regression should contain all
IVs

37
IV Issues

Two issues plague the IV method
1. No IV is available
2. A potential IV is found, but its quality is
uncertain

38
IV Issues

What if there is no IV?
State that no IV exists and forge ahead anyway,
arguing that any bias in OLS is likely to be
small.
Argue that the endogeneity is weak on theoretical
grounds.
Argue that external data indicate that the bias
from OLS is likely to be small.

39
IV Properties

What if you have an IV of unknown quality?
Two characteristics mark a good IV
1. Validity
2. Strength

40
IV Validity

Validity has several components
a. Non-zero correlation with x2
b. Uncorrelated with error term e
c. Uncorrelated with y except through x2
d. Monotonicity as z increases, x2 increases

41
IV Validity

There are several ways to show validity of an IV
Non-zero correlation with the endogenous variable
can be shown directly.
Robustness do alternative IVs yield similar
results?
Non-correlation with the outcome variable of the
2nd
stage. This point must be argued from theory,
an understanding of how the system under study
works.

42
IV Validity

Warning one cannot simply add a candidate IV to
the main model (i.e., the 2nd stage) to see
whether it is significant. The result is biased.
BUT
If there are multiple IVs, one can use a test of
over-identifying restrictions.

43
IV Validity

Overidentification number of candidate IVs
exceeds number of endogenous variables.
Suppose that
(a) You have one endogenous variable and three
candidate IVs
(b) You know that one of the IVs is truly valid.
Use the known-valid IV in the 1st stage and put
the remaining two IVs in the 2nd stage.

44
IV Validity

Over-identification test, continued
If the two remaining IVs are jointly
insignificant in the 2nd stage, then this
supports their use as alternative IVs.
Problem this only works if the IV(s) in the 1st
stage are truly valid and you dont know that!

45
IV Validity

Over-identification test, continued
Partial solution use Sargans (1984) test,
which assumes only that one or more of your IVs
are valid you dont have to specify which. This
method fails only if none of the IVs is valid.
In the end, you must argue for validity on
conceptual grounds at a minimum.

46
IV Validity

Conceptual arguments
1. Explain why z should influence x2
2. Explain why z should not influence y directly
3. Anticipate objections about omitted variables
that link z to the error term e. Show that z is
not related to those omitted variables, perhaps
using outside data. For example, use data on
non-veterans to support a claim about how
veterans act.

47
IV Properties

Two characteristics mark a good instrumental
variable
1. Validity
2. Strength

48
Strong IVs

A strong instrument has a high correlation with
the endogenous variable.
How strong a correlation? Staiger Stock (1997)
recommend a partial F statistic of 5 or greater.
- Run 1st stage with and without the IV.
- Compare the overall F statistics a difference
of 5 or
more is sufficient evidence of strength.

49
Weak IVs

If the IVs are weak,
2SLS and 2SRI are consistent, but there can be
considerable bias even in large samples
standard errors are too small
2SLS and 2SRI perform poorly

50
Weak IVs

What to do if IVs are weak?
If there is a single endogenous variable, use a
conditional likelihood ratio (CLR) test
perform a regular likelihood ratio test
adjust the critical values
available in Stata see Stata Journal, 3,
57-70
and http//elsa.berkeley.edu/wp/marcelo.pdf
by Moreira
and Poi

51
Weak IVs

What if there are multiple endogenous variables
and only weak IVs?
A solution has not been developed yet!

52
Selected References
53
Selected References

JM Wooldridge. Econometric analysis of cross
section and panel data. Cambridge, MA MIT
Press, 2002.
A graduate-level econometrics textbook with
lengthy textual descriptions of practical issues.
HS Bloom, ed. Learning more from social
experiments evolving analytic approaches.
Russell Sage.
A largely non-technical exploration of how
instrumental variables are found and used, with
examples from welfare reform studies.

54
Selected References

MP Murray. Avoiding invalid instruments and
coping with weak instruments. Journal of Economic
Perspectives 200620(4) 111-132.
A superb reference with relatively few
equations. Has an extensive reference list.
A Nakamura, M Nakamura. Model specification and
endogeneity. Journal of Econometrics
199883213-237.
Presents major endogeneity tests, explores
approaches to endogeneity testing. Somewhat
iconoclastic.

55
Selected References

M McClellan, B McNeil, J Newhouse. Does more
intensive treatment of acute myocardial
infarction in the elderly reduce mortality?
Analysis using instrumental variables.
JAMA1994272(11)859-66
Classic paper using IV in health, but
challenging to read.
J Newhouse, M McClellan. Econometrics in outcomes
research the use of instrumental variables. Ann
Rev Pub Health 1998 1917-34.
Non-technical introduction to IV.

56
Selected References

J Terza, A Basu, P Rathouz. Two-stage residual
inclusion estimation Addressing endogeneity in
health econometric modeling. Journal of Health
Economics 200827531-543.
Explains two-stage residual inclusion models and
contrasts them to two-stage least squares.
Moderately technical.

57
Acknowledgements

Much of the content of this presentation is
derived from Wooldridge (2002), Murray (2006),
and Nakamura and Nakamura (2006).
Helpful comments were also provided by HERC staff.

58
Questions?

Write a Comment

User Comments (0)