Title: Evaluation Course Spring, 2006
1Evaluation CourseSpring, 2006
- Petra Todd
- University of Pennsylvania
- Department of Economics
2The Evaluation Problem
- Will study econometric methods for evaluating
effects of active labor market programs - Employment, training and job search assistance
programs - School subsidy programs
- Health interventions
3Key questions
- Do program participants benefit from the program?
- What is the social return to the program?
- Would an alternative program yield greater impact
at the same cost?
4Major Goals
- Understand the identifying assumptions needed to
justify application of different estimators - Statistical assumptions
- Behavioral assumptions
- Need to recognize heterogeneity in how people
respond to a program intervention
5Potential Outcomes
- Y0 outcome without treatment
- Y1 output with treatment
- D1 if receive treatment, else D0
- Observed outcome
- YD Y1(1-D) Y0
- Treatment Effect
- ? Y1-Y0
- Not directly observed, missing data problem
6Parameters of Interest
- Average impact of treatment on the treated (TT)
- E(Y1-Y0D1,X)
- Average treatment effect (ATE)
- E(Y1-Y0X)
- Average effect of treatment on the untreated (UT)
- E(Y1-Y0D10,X)
- ATEPr(D1X)TT(1-Pr(D1X))UT
7Other parameters of interest
- Proportion of people benefiting from the program
- Pr(Y1gtY0D1)Pr(?gt0D1)
- Distribution of treatment effects
- F(?D1,X)
- Selected quantile
- Inf ?F(?D1,X)gtq
8Model for outcomes with and without treatment
- Model
- Y1Xß1U1
- Y0Xß0U0
- E(U1X)E(U0X)0
- Observed outcome
- YY0E(Y1-Y0)
- Y Xß0D(Xß1- Xß0)U0D(U1-U0)
9Distinction between TT and ATE
- TTE(?D1,X)Xß1- Xß0E(U1-U0D1,X)
- ATE E(?X)Xß1- Xß0
- TT depends on structural parameters as well as
means of unobservables - Parameters are the same if
- (A1) U1U0
- (A2) E(U1-U0D1,X)0
- Condition (A2) means that D is uninformative on
U1-U0, , i.e. ex post heterogeneity but not acted
on ex ante
10Three Basic Assumptions from least to most general
- Coefficient on D is fixed (given X) and is the
same for everyone (most restrictive) - U1U0
- YXßDa(X)U
- Common model used in applied work
- E(Y1-Y0X,D) a(X)
11- Coefficient on D is random given X, but U1-U0
does not help predict participation in the
program - Pr(D1 U1-U0 ,X)Pr(D1X)
- which implies
- E(U1-U0 D1,X) E(U1-U0 X)
- Coefficient on D is random given X and D helps
predict program participation (least restrictive) - E(U1-U0 D1,X)?E(U1-U0 X)
12How Can Randomization Solve the Evaluation
Problem?
- Comparison group selected using a randomization
devise to randomly exclude some fraction of
program applicants from the program - Main advantage increase comparability between
program participants and nonpartcipants - Have same distribution of observables and of
unobservables - Satisfy program eligibility criteria
13What problems can arise in social experiments?
- Randomization bias occurs when introducing
randomization changes the nature of the program - Greater recruitment needs may lead to change in
acceptance standards - Individuals may decide not to apply if they know
they will be subject to randomization
14- Contamination bias occurs when control group
members seek alternative forms of treatment - Ethical considerations there may be opposition
to the experiment and some sites may refuse to
participate - Dropout some of the treatment group members may
drop out before completing the program - Sample attrition may have differential
attrition between the treatment and control groups
15At what stage should randomization be applied?
- Randomization after acceptance into the program
- Randomization of eligibility
- Let R1 if randomized (treatment group),
- R0 if randomized out (control group)
- Let Y1 and Y0 denote outcomes
- Let D denote someone who applies to the program
and is subject to randomization
16- From treatment group, get E(Y1X,D1,R1)
- From control group, get E(Y0X,D1,R0)
- No randomization bias and random assignment
implies - E(Y1X,D1,R1)E(Y1X,D1)
- E(Y0X,D1,R0)E(Y0X,D1)
- Thus, the experiment gives
- TTE(Y1-Y0X,D1)
17How does program dropout affect experiments?
- Can define treatment as intent-to-treat or
offer of treatment, in which case dropout not a
problem - If dropout occurs prior to receiving the program
(i.e. dropouts do not get treatment), then could
treat it like randomization on eligibility.
18Randomization on eligibility
- Let e1 if eligible, e0 if not eligible
- Let D1 denote would-be participants if program
was made available - E(YX,e1)Pr(D1X,e1)E(Y1X,e1,D1)
- Pr(D0X,e1)E(Y0X,e1,D0)
- E(YX,e0)Pr(D1X,e0)E(Y0X,e0,D1)
- Pr(D0X,e0)E(Y0X,e0,D0)
19- Because eligibility is randomized,
- Pr(D1X,e1)Pr(D1X,e0)
- Pr(D0X,e1)Pr(D0X,e0)
- E(Y0X,e,D1) E(Y0X,D1)
- E(Y1X,e,D1) E(Y1X,D1)
- Thus, difference in previous two equations gives
- Pr(D1X,e1)E(Y1X,e,D1)-E(Y0X,D1)
20What about control group contamination?
- Not necessarily a problem if willing to define
benchmark state as being excluded from the program
21What about sample attrition?
- Attrition is a problem that is common to both
experimental and nonexperimental studies - Attrition occurs when some people are not
followed in the data (maybe due to nonresponse) - If attrition is nonrandom with respect to
treatment, then requires the use of
nonexperimental evaluation methods
22Nonexperimental Estimators Matching
- Assume have access to data on treated and
untreated individuals (D1 and D0) - Assume also have access to a set of X variables
whose distribution is not affected by D - F(XD,YP)f(XYP)
- where YP(Y0,Y1) potential outcomes
23- Matching estimators pair treated individuals with
observably similar untreated individuals - It is usually assumed that
- (Y0,Y1) - D X (M-1)
- or
- Pr(D1X, Y0,Y1) Pr(D1X)
- and
- 0ltPr(D1X)lt1 (M-2)
- To justify this assumption, individuals cannot
select into the program based on anticipated
treatment impact
24- Assumption (M-1) implies
- F(Y0D1,X)F(Y0D0,X)F(Y0X)
- F(Y1D1,X)F(Y1D0,X)F(Y1X)
- also
- E(Y0D1,X)E(Y0D0,X)E(Y0X)
- E(Y1D1,X)E(Y1D0,X)E(Y1X)
- Under assumptions that justify matching, can
estimate TT, ATE, and UT
25- Let n denote number of observations in the
treatment group - A typical matching estimator for TT takes the
form
26- is an estimator for the matched no treatment
outcome - Recall, that (M-1) implies
27How does matching compare to a randomized
experiment?
- Distribution of observables of the matched
controls will be the same in the treatment group - However, distribution of unobservables not
necessarily balanced across groups - Experiment has full support (M-2), but with
matching there can be a failure of the common
support condition
28- Even though matching methods assume
- E(Y1-Y0D1,X)E(Y1-Y0X)
- Can still have potentially
- E(Y1-Y0D1)?E(Y1-Y0)
- E(?D1)?E(?D1,X)f(XD1)dX
- E(?)?E(?X)f(X)dX
29- If interest centers on TT, (M-1) can be replaced
by weaker assumption - E(Y0X,D1)E(Y0X,D0)E(Y0X)
- The weaker assumption allows selection into the
program to depend on Y1 and allows - E(Y1-Y0X,D)?E(Y1-Y0X)
- Only require
- Pr(D1X,Y0,Y1)Pr(D1X,Y1)
30Implementing Matching Estimators
- Problems
- How to construct match when X is of high
dimension - How to choose set of X values
- What do to if Pr(D1X)1 for some X (violation
of common support condition (M-1))
31Rosenbaum and Rubin (1983) Theorem
- Provide a solution to the problem of constructing
a match when X is of high dimension - Show that
- (Y0,Y1) - D X
- Implies
- (Y0,Y1) - D Pr(D1X)
- Reduces the matching problem to a univariate
problem, provided Pr(D1X) can be parametrically
estimated - Pr(D1X) is known as the propensity score
32Proof of RR theorem
- Let P(X)Pr(D1X)
- E(DY0,P(X))E(E(DY0,X)Y0,P(X))
- E(P(X)Y0,P(X))
- P(X)
- E(DP(X))
- Where first equality because X is finer than P(X)
- Recall E(YZ)EXZE(YX,Z)Z
- Here, ZP(X), so E(YX,Z)E(YX)
33Matching can be implemented in two steps
- Step 1 estimate a model for program
participation, estimate the propensity score
P(Xi) for each person - Step 2 Select matches based on the estimated
propensity score
34Ways of constructing matched outcomes
- Define a neighborhood C(Pi) for each person i
Di1 - Neighbors are persons in Dj0 for whom Pj ?
C(Pi) - Set of persons matched to i is
- Aij?Di0 such that Pj ? C(Pi)
35Nearest Neighbor Matching
- C(Pi)min Pi-Pj
- j
- j?Di0
- gt Ai is a singleton set
- Caliper matching
- Matches only made if Pi-Pj lte for some
prespecified tolerance (tries to avoid bad
matches)
36Kernel and Local Linear Matching
- Estimate matched outcomes by nonparametric
regression
37Should matches be reused?
- If dont reuse, then results will not be
invariant to the order in which observations were
matched
38Difference-in-difference matching
- Assume
- (Y0t-Y0t) - D Pr(D1X)
- 0ltPr(D1X)lt1
- Main advantage
- Allows for time invariant unobservable
differences between the treatment group and the
control group - Selection into the program can be based on the
unobservables
39Econometric Models of Program Participation
- Assume individuals have the option of taking
training in period k - Prior to k, observe Y0j, j1..k
- After k, observe two potential outcomes (Y0t,Y1t)
- To participate in training, persons must apply
and be accepted, so there are several
decision-makers - D1 if participates, 0 else
- Assume participation decisions are based on
maximization of future earnings
40Simple model of participation
- First term is earnings stream if participates
- C is the direct cost of training
- Last term is earnings stream if do not
participate - Ik is the information set at time k
41Implications of this simple decision rule
- Past earnings are irrelevant except for value in
predicting future earnings - Persons with lower foregone earnings or lower
costs are more likely to participate - Older persons and persons with higher discount
rates are less likely to participate - The decision to take training is correlated with
future earnings only through the correlation with
expected future earnings
42Special case of above model
- Assume constant treatment effect a
- D1 if expected rewards exceed costs
- If earnings temporarily low, people are more
likely to enroll in the program - Consistent with Ashenfelters Dip Pattern
43Model of the decision process
- Let INH(X)-V
- H(X) expected future rewards
- V costsCY0k (assumed unknown)
- If V assumed to be independent of X, then could
estimate by logistic or probit model - Pr(D1X)eH(X)/ 1 eH(X)
- Pr(D1X)F((H(X)-µ1)/sv)
44Nonexperimental Estimators Control function
methods
- References
- Roy (1951), Willis and Rosen (1979), Heckman and
Honore (1990), Heckman and Sedlacek (1985),
Heckman and Robb (1985, 1986) - Allow selectivity into the program to be based on
unobservables, explicitly modeling and
controlling for potential selectivity bias - Can assume unobservables normally distributed,
but can relax normality
45- Bias can arise because E(e0D,X)?0
- Control function methods find estimators for
E(e0D1,X), E(e0D0,X)
46- Let D1 if Z?-vgt0, D0 else
- Assume
47Semiparametric approach (Heckman, 1980)
48Where residual terms in brackets have mean
zeroK1(P) and K0(P) are termed control
functions (Heckman, 1980)
49- Could approximate
- K1(P)a0 a1P a2P2 a3P3 ..akPk
- K0(P)?0 ?1P ?2P2 ?3P3 .. ?kPk
- But the intercepts will not be separately
identified from the treatment effect, unless
there is a group for which P is close to 1. - identification at infinity (Heckman, 1989)
- Also need an exclusion restriction to ensure
that Xßi and Ki(P) are separately identified
(variable that affects participation but does not
affect the outcome equation) - Identification not necessarily a problem within
the normal model, because can identify off of
functional form
50Nonexperimental Estimators Regression
Discontinuity Methods
- Assumptions
- Rule determining who gets treatment
- Probability of getting treatment changes
discontinuously as a function of some observed
random variables - Examples
- Barnow et. al. (1980) analyze effect of head
start program on childrens test scores. Only
families with outcomes below a threshold got the
program - Design first discussed by Thistlewaite and
Cambell (1960) who used it to evaluate the
effects of national merit awards on career
aspirations
51(No Transcript)
52Additional applications
- Berk and Rauma (1983)
- Van der Klaauw (1996)
- Angrist and Lavy (1996)
- Black (1996)
- Angrist and Krueger (1991)
- Hahn, Todd and Van der Klaauw (2001)
53- Let Y0i and Y1i be outcomes with and without
treatment - Observed outcome is
- Yi DiY0i(1-Di)Y1i
- aiDi?i
- ?i Y1i-Y0i
54Two types of RD designs
- Sharp design
- Dif(zi)
- Assume point at which f(zi) is discontinuous is
known - A special case of selection on observables
- Because Pr(Di1zi) is either 0 or 1, theis
design violates the strict ignorability
condition invoked by matching estimators
55Two types of RD designs
- Sharp design Dif(zi)
- Assume the point at which f(zi) is discontinuous
is known (z0) - A special case of selection on observables
- Because Pr(Di1zi) is equal to 1 or 0, this
violates the strict ignorability condition
required for matching - Fuzzy design
- Di is a random variable given zi
- E(Di zi) Pr(Di1ziz) is known to be
discontinuous at z0
56Identification with sharp design
- Comparison of outcomes for Di1 and Di0
generally subject to selectivity bias - Comparison of outcomes for Di1 and Di0 groups
with z values close to z0 - E(Yizz0e)-E(Yiziz0-e)
- E(?iziz0e)E(aiziz0e)-E(aiziz0-e)
57- Assume
- (C1) E(aiziz) is continuous in z at z0
- (C2) The limit of E(?iziz0e) as e?0 is well
defined - Then
- Lim E(Yizz0e)-E(Yiziz0-e)
E(?iziz0) - e ?0
- Can only identify the treatment effect locally at
point z0 - By increasing the number of discontinuity points,
can identify impacts over a wider range of the
support of z and test a common treatment effect
assumption
58Identification with the Fuzzy Design
- Pr(Di1ziz) is discontinuous at z0, for
example, - Di1 if f(zi )vigt0, else 0
- Case 1 Constant treatment effect. Under (C1),
59LIV Estimators
60Ex-ante Evaluation