Title: DifferencesinDifferences and A Brief Introduction to Panel Data
1Differences-in-Differencesand A Brief
Introduction to Panel Data
2John Snow again
3The Grand Experiment
- Water supplied to households by competing private
companies - Sometimes different companies supplied households
in same street - In south London two main companies
- Lambeth Company (water supply from Thames Ditton,
22 miles upstream) - Southwark and Vauxhall Company (water supply from
Thames)
4In 1853/54 cholera outbreak
- Death Rates per 10000 people by water company
- Lambeth 10
- Southwark and Vauxhall 150
- Might be water but perhaps other factors
- Snow compared death rates in 1849 epidemic
- Lambeth 150
- Southwark and Vauxhall 125
- In 1852 Lambeth Company had changed supply from
Hungerford Bridge
5What would be good estimate of effect of clean
water?
6This is basic idea of Differences-in-Differences
- Have already seen idea of using differences to
estimate causal effects - Treatment/control groups in experimental data
- Often would like to find treatment and
control group who can be assumed to be similar
in every way except receipt of treatment - This may be very difficult to do
7A Weaker Assumption is..
- Assume that, in absence of treatment, difference
between treatment and control group is
constant over time - With this assumption can use observations on
treatment and control group pre- and
post-treatment to estimate causal effect - Idea
- Difference pre-treatment is normal difference
- Difference pre-treatment is normal difference
causal effect - Difference-in-difference is causal effect
8A Graphical Representation
9What is D-in-D estimate?
- Standard differences estimator is AB
- But normal difference estimated as CB
- Hence D-in-D estimate is AC
- Note assumes trends in outcome variables the
same for treatment and control groups - This is not testable
- with two periods can get no idea of plausibility
but can with more periods
10Some Notation
- Define
- µitE(yit)
- Where i0 is control group, i1 is treatment
- Where t0 is pre-period, t1 is post-period
- Standard differences estimate of causal effect
is estimate of - µ11-µ01
- Differences-in-Differences estimate of causal
effect is estimate of - (µ11-µ01)-(µ10-µ00)
11How to estimate?
- Can write D-in-D estimate as
- (µ11-µ10)-(µ01 -µ00)
- This is simply the difference in the change of
treatment and control groups so can estimate as
12- This is simply differences estimator applied to
the difference - To implement this need to have repeat
observations on the same individuals - May not have this individuals observed pre- and
post-treatment may be different - What can we do in this case?
13In this case can estimate.
- D-in-D estimate is estimate of ß3 why is this?
14A Comparison of the Two Methods
- Where have repeated observations could use both
methods - Will give same parameter estimates
- But will give different standard errors
- levels version will assume residuals are
independent unlikely to be a good assumption - Can deal with this by
- Clustering
- Or estimating differences version
15Other Regressors
- Can put in other regressors as before
- Perhaps should think about way in which they
enter the estimating equation - E.g. if level of W affects level of y then should
include ?W in differences version
16Differential Trends in Treatment and Control
Groups
- Key assumption underlying validity of D-in-D
estimate is that differences between treatment
and control group would have remained constant in
absence of treatment - Can never test this
- With only two periods can get no idea of
plausibility - But can with more than two periods
17An ExampleVertical Relationships and
Competition in Retail Gasoline Markets, by
Justine Hastings, American Economic Review, 2004
- Interested in effect of vertical integration on
retail petrol prices - Investigates take-over in CA of independent
Thrifty chain of petrol stations by ARCO (more
integrated) - Defines treatment group as petrol stations which
had a Thrifty within 1 mile - Control group those that did not
- Lots of reasons why these groups might be
different so D-in-D approach seems a good idea
18This picture contains relevant information
- Can see D-in-D estimate of 5c per gallon
- Also can see trends before and after change very
similar D-in-D assumption valid
19A Case which does not look so good..Ashenfelters
Dip
- Interested in effect of government-sponsored
training (MDTA) on earnings - Treatment group are those who received training
in 1964 - Control group are random sample of population as
a whole
20Earnings for period 1959-69
21Things to Note..
- Earnings for trainees very low in 1964 as
training not working in that year should ignore
this year - Simple D-in-D approach would compare earnings in
1965 with 1963 - But earnings of trainees in 1963 seem to show a
dip so D-in-D assumption probably not valid - Probably because those who enter training are
those who had a bad shock (e.g. job loss)
22Differences-in-DifferencesSummary
- A very useful and widespread approach
- Validity does depend on assumption that trends
would have been the same in absence of treatment - Can use other periods to see if this assumption
is plausible or not - Uses 2 observations on same individual most
rudimentary form of panel data
23A Brief Introduction to Panel Data
- Panel Data has both time-series and cross-section
dimension N individuals over T periods - Will restrict attention to balanced panels same
number of observations on each individuals - Whole books written about but basics can be
understood very simply and not very different
from what we have seen before - Asymptotics typically done on large N, small T
- Use yit to denote variable for individual i at
time t
24The Pooled Model
- Can simply ignore panel nature of data and
estimate - yitßxiteit
- This will be consistent if E(eitxit)0 or
plim(X e/N)0 - But computed standard errors will only be
consistent if errors uncorrelated across
observations - This is unlikely
- Correlation between residuals of same individual
in different time periods - Correlation between residuals of different
individuals in same time period (aggregate
shocks)
25A More Plausible Model
- Should recognise this as model with group-level
dummies or residuals - Here, individual is a group
26Three Models
- Fixed Effects Model
- Treats ?i as parameter to be estimated (like ß)
- Consistency does not require anything about
correlation with xit - Random Effects Model
- Treats ?i as part of residual (like ?)
- Consistency does require no correlation between
?i and xit - Between-Groups Model
- Runs regression on averages for each individual
27Proposition 5.2The fixed effect estimator of ß
will be consistent if
- E(eitxit)0
- Rank(X,D)NK
- Proof Simple application of what you should know
about linear regression model
28Intuition
- First condition should be obvious regressors
uncorrelated with residuals - Second condition requires regressors to be of
full rank - Main way in which this is likely to fail in fixed
effects model is if some regressors vary only
across individuals and not over time - Such a variable perfectly multicollinear with
individual fixed effect
29Estimating the Fixed Effects Model
- Can estimate by brute force - include separate
dummy variable for every individual but may be
a lot of them - Can also estimate in mean-deviation form
30How does de-meaning work?
- Can do simple OLS on de-meaned variables
- STATA command is like
- . xtreg y x, fe i(id)
31Problems with fixed effect estimator
- Only uses variation within individuals
sometimes called within-group estimator - This variation may be small part of total (so low
precision) and more prone to measurement error
(so more attenuation bias) - Cannot use it to estimate effect of regressor
that is constant for an individual
32Random Effects Estimator
- Treats ?i as part of residual (like ?)
- Consistency does require no correlation between
?i and xit - Should recognise as like model with clustered
standard errors - But random effects estimator is feasible GLS
estimator
33More on RE Estimator
- Will not describe how we compute O-hat see
Wooldridge - STATA command
- . xtreg y x, re i(id)
34Proposition 5.3The random effects estimator of
ß will be consistent if
- E(eitxi1,..xit,.. xiT)0
- E(?ixi1,..xit,.. xiT)0
- Rank(XO-1X)k
- Proof RE estimator a special case of the
feasible GLS estimator so conditions for
consistency are the same. - Error has two components so need a. and b.
35Comments
- Assumption about exogeneity of errors is stronger
than for FE model need to assume eit
uncorrelated with whole history of x this is
called strong exogeneity - Assumption about rank condition weaker than for
FE model e.g. can estimate effect variables that
are constant for a given individual
36Another reason why may prefer RE to FE model
- If exogeneity assumptions are satisfied RE
estimate will be more efficient than FE estimator - Application of general principle that imposing
true restriction on data leads to efficiency
gain.
37Another Useful Result
- Can show that RE estimator can be thought of as
an OLS regression of - On
- Where
- This is sometimes called quasi-time demeaning
- See Wooldridge (ch10, pp286-7) if want to know
more
38Between-Groups Estimator
- This takes individual means and estimates the
regression by OLS - Stata command is xtreg y x, be i(id)
- Condition for consistency the same as for RE
estimator - But BE estimator less efficient as does not
exploit variation in regressors for a given
individual - And cannot estimate variables like time trends
whose average values do not vary across
individuals - So why would anyone ever use it lets think
about measurement error
39Measurement Error in Panel Data Models
- Assume true model is
- Where x is one-dimensional
- Assume E(eitxi1,..xit,.. xiT)0 and
E(?ixi1,..xit,.. xiT)0 so that RE and BE
estimators are consistent
40Measurement Error Model
- Assume
- where uit is classical measurement error, xi is
average value of x for individual i and ?it is
variation around the true value which is assumed
to be uncorrelated with and uit and iid. - We know this measurement error is likely to cause
attenuation bias but this will vary between FE,
RE and BE estimators.
41Proposition 5.4
- For FE model we have
- For BE model we have
- For RE model we have
- Where
42Proof
- General idea is to write each model as an OLS
estimator and then use what we know about
attenuation bias in that model - Will use those earlier results
43Proof for FE Model
- Can write as OLS estimator on de-meaned data
- We have that
- And that
44- De-meaning we have that
- And that
- Take variances
45- Standard formula for attenuation bias gives us
46Proof for BE estimator
- Using earlier results we have
47Proof for RE estimator (a bit more complicated)
- Use result that can write it as OLS on
quasi-de-meaned data - Attenuation bias van be written as
48- Can write these elements as
- Leading to attenuation bias is
- Where
49What should we learn from this?
- All rather complicated dont worry too much
about details - But intuition is simple
- Attenuation bias largest for FE estimator
Var(x) does not appear in denominator FE
estimator does not use this variation in data
50- Attenuation bias larger for RE than BE estimator
as Tgt1gt? - The averaging in the BE estimator reduces the
importance of measurement error. - Important to note that these results are
dependent on the particular assumption about the
measurement error process and the nature of the
variation in xit things would be very different
if measurement error for a given individual did
not vary over time - But general point is the measurement error
considerations could affect choice of model to
estimate with panel data
51Time Effects
- Have treated time and individual dimensions
asymmetrically no good reason for this - Errors likely to be correlated for different
individuals in same time period most common way
to deal with this is to include set of time
dummies
52Estimating Fixed Effects Model in Differences
- Can also get rid of fixed effect by differencing
53Comparison of two methods
- Estimate parameters by OLS on differenced data
- If only 2 observations then get same estimates as
de-meaning method - But standard errors different
- Why? assumption about autocorrelation in
residuals
54What Are these assumptions?
55This leads to time series
- Which is better depends on which assumption is
right how can we decide this? - Allow cross-section dimension to wither
- Focus on case where observations are a time
series for a single unit e.g. a whole economy.