Evaluation Course Spring, 2006

1 / 60
About This Presentation
Title:

Evaluation Course Spring, 2006

Description:

Will study econometric methods for evaluating effects of active labor ... Selected quantile. Inf {?:F(?|D=1,X) q} Model for outcomes with and without treatment ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 61
Provided by: pet5166

less

Transcript and Presenter's Notes

Title: Evaluation Course Spring, 2006


1
Evaluation CourseSpring, 2006
  • Petra Todd
  • University of Pennsylvania
  • Department of Economics

2
The Evaluation Problem
  • Will study econometric methods for evaluating
    effects of active labor market programs
  • Employment, training and job search assistance
    programs
  • School subsidy programs
  • Health interventions

3
Key questions
  • Do program participants benefit from the program?
  • What is the social return to the program?
  • Would an alternative program yield greater impact
    at the same cost?

4
Major Goals
  • Understand the identifying assumptions needed to
    justify application of different estimators
  • Statistical assumptions
  • Behavioral assumptions
  • Need to recognize heterogeneity in how people
    respond to a program intervention

5
Potential Outcomes
  • Y0 outcome without treatment
  • Y1 output with treatment
  • D1 if receive treatment, else D0
  • Observed outcome
  • YD Y1(1-D) Y0
  • Treatment Effect
  • ? Y1-Y0
  • Not directly observed, missing data problem

6
Parameters of Interest
  • Average impact of treatment on the treated (TT)
  • E(Y1-Y0D1,X)
  • Average treatment effect (ATE)
  • E(Y1-Y0X)
  • Average effect of treatment on the untreated (UT)
  • E(Y1-Y0D10,X)
  • ATEPr(D1X)TT(1-Pr(D1X))UT

7
Other parameters of interest
  • Proportion of people benefiting from the program
  • Pr(Y1gtY0D1)Pr(?gt0D1)
  • Distribution of treatment effects
  • F(?D1,X)
  • Selected quantile
  • Inf ?F(?D1,X)gtq

8
Model for outcomes with and without treatment
  • Model
  • Y1Xß1U1
  • Y0Xß0U0
  • E(U1X)E(U0X)0
  • Observed outcome
  • YY0E(Y1-Y0)
  • Y Xß0D(Xß1- Xß0)U0D(U1-U0)

9
Distinction between TT and ATE
  • TTE(?D1,X)Xß1- Xß0E(U1-U0D1,X)
  • ATE E(?X)Xß1- Xß0
  • TT depends on structural parameters as well as
    means of unobservables
  • Parameters are the same if
  • (A1) U1U0
  • (A2) E(U1-U0D1,X)0
  • Condition (A2) means that D is uninformative on
    U1-U0, , i.e. ex post heterogeneity but not acted
    on ex ante

10
Three Basic Assumptions from least to most general
  • Coefficient on D is fixed (given X) and is the
    same for everyone (most restrictive)
  • U1U0
  • YXßDa(X)U
  • Common model used in applied work
  • E(Y1-Y0X,D) a(X)

11
  • Coefficient on D is random given X, but U1-U0
    does not help predict participation in the
    program
  • Pr(D1 U1-U0 ,X)Pr(D1X)
  • which implies
  • E(U1-U0 D1,X) E(U1-U0 X)
  • Coefficient on D is random given X and D helps
    predict program participation (least restrictive)
  • E(U1-U0 D1,X)?E(U1-U0 X)

12
How Can Randomization Solve the Evaluation
Problem?
  • Comparison group selected using a randomization
    devise to randomly exclude some fraction of
    program applicants from the program
  • Main advantage increase comparability between
    program participants and nonpartcipants
  • Have same distribution of observables and of
    unobservables
  • Satisfy program eligibility criteria

13
What problems can arise in social experiments?
  • Randomization bias occurs when introducing
    randomization changes the nature of the program
  • Greater recruitment needs may lead to change in
    acceptance standards
  • Individuals may decide not to apply if they know
    they will be subject to randomization

14
  • Contamination bias occurs when control group
    members seek alternative forms of treatment
  • Ethical considerations there may be opposition
    to the experiment and some sites may refuse to
    participate
  • Dropout some of the treatment group members may
    drop out before completing the program
  • Sample attrition may have differential
    attrition between the treatment and control groups

15
At what stage should randomization be applied?
  • Randomization after acceptance into the program
  • Randomization of eligibility
  • Let R1 if randomized (treatment group),
  • R0 if randomized out (control group)
  • Let Y1 and Y0 denote outcomes
  • Let D denote someone who applies to the program
    and is subject to randomization

16
  • From treatment group, get E(Y1X,D1,R1)
  • From control group, get E(Y0X,D1,R0)
  • No randomization bias and random assignment
    implies
  • E(Y1X,D1,R1)E(Y1X,D1)
  • E(Y0X,D1,R0)E(Y0X,D1)
  • Thus, the experiment gives
  • TTE(Y1-Y0X,D1)

17
How does program dropout affect experiments?
  • Can define treatment as intent-to-treat or
    offer of treatment, in which case dropout not a
    problem
  • If dropout occurs prior to receiving the program
    (i.e. dropouts do not get treatment), then could
    treat it like randomization on eligibility.

18
Randomization on eligibility
  • Let e1 if eligible, e0 if not eligible
  • Let D1 denote would-be participants if program
    was made available
  • E(YX,e1)Pr(D1X,e1)E(Y1X,e1,D1)
  • Pr(D0X,e1)E(Y0X,e1,D0)
  • E(YX,e0)Pr(D1X,e0)E(Y0X,e0,D1)
  • Pr(D0X,e0)E(Y0X,e0,D0)

19
  • Because eligibility is randomized,
  • Pr(D1X,e1)Pr(D1X,e0)
  • Pr(D0X,e1)Pr(D0X,e0)
  • E(Y0X,e,D1) E(Y0X,D1)
  • E(Y1X,e,D1) E(Y1X,D1)
  • Thus, difference in previous two equations gives
  • Pr(D1X,e1)E(Y1X,e,D1)-E(Y0X,D1)

20
What about control group contamination?
  • Not necessarily a problem if willing to define
    benchmark state as being excluded from the program

21
What about sample attrition?
  • Attrition is a problem that is common to both
    experimental and nonexperimental studies
  • Attrition occurs when some people are not
    followed in the data (maybe due to nonresponse)
  • If attrition is nonrandom with respect to
    treatment, then requires the use of
    nonexperimental evaluation methods

22
Nonexperimental Estimators Matching
  • Assume have access to data on treated and
    untreated individuals (D1 and D0)
  • Assume also have access to a set of X variables
    whose distribution is not affected by D
  • F(XD,YP)f(XYP)
  • where YP(Y0,Y1) potential outcomes

23
  • Matching estimators pair treated individuals with
    observably similar untreated individuals
  • It is usually assumed that
  • (Y0,Y1) - D X (M-1)
  • or
  • Pr(D1X, Y0,Y1) Pr(D1X)
  • and
  • 0ltPr(D1X)lt1 (M-2)
  • To justify this assumption, individuals cannot
    select into the program based on anticipated
    treatment impact

24
  • Assumption (M-1) implies
  • F(Y0D1,X)F(Y0D0,X)F(Y0X)
  • F(Y1D1,X)F(Y1D0,X)F(Y1X)
  • also
  • E(Y0D1,X)E(Y0D0,X)E(Y0X)
  • E(Y1D1,X)E(Y1D0,X)E(Y1X)
  • Under assumptions that justify matching, can
    estimate TT, ATE, and UT

25
  • Let n denote number of observations in the
    treatment group
  • A typical matching estimator for TT takes the
    form

26
  • is an estimator for the matched no treatment
    outcome
  • Recall, that (M-1) implies

27
How does matching compare to a randomized
experiment?
  • Distribution of observables of the matched
    controls will be the same in the treatment group
  • However, distribution of unobservables not
    necessarily balanced across groups
  • Experiment has full support (M-2), but with
    matching there can be a failure of the common
    support condition

28
  • Even though matching methods assume
  • E(Y1-Y0D1,X)E(Y1-Y0X)
  • Can still have potentially
  • E(Y1-Y0D1)?E(Y1-Y0)
  • E(?D1)?E(?D1,X)f(XD1)dX
  • E(?)?E(?X)f(X)dX

29
  • If interest centers on TT, (M-1) can be replaced
    by weaker assumption
  • E(Y0X,D1)E(Y0X,D0)E(Y0X)
  • The weaker assumption allows selection into the
    program to depend on Y1 and allows
  • E(Y1-Y0X,D)?E(Y1-Y0X)
  • Only require
  • Pr(D1X,Y0,Y1)Pr(D1X,Y1)

30
Implementing Matching Estimators
  • Problems
  • How to construct match when X is of high
    dimension
  • How to choose set of X values
  • What do to if Pr(D1X)1 for some X (violation
    of common support condition (M-1))

31
Rosenbaum and Rubin (1983) Theorem
  • Provide a solution to the problem of constructing
    a match when X is of high dimension
  • Show that
  • (Y0,Y1) - D X
  • Implies
  • (Y0,Y1) - D Pr(D1X)
  • Reduces the matching problem to a univariate
    problem, provided Pr(D1X) can be parametrically
    estimated
  • Pr(D1X) is known as the propensity score

32
Proof of RR theorem
  • Let P(X)Pr(D1X)
  • E(DY0,P(X))E(E(DY0,X)Y0,P(X))
  • E(P(X)Y0,P(X))
  • P(X)
  • E(DP(X))
  • Where first equality because X is finer than P(X)
  • Recall E(YZ)EXZE(YX,Z)Z
  • Here, ZP(X), so E(YX,Z)E(YX)

33
Matching can be implemented in two steps
  • Step 1 estimate a model for program
    participation, estimate the propensity score
    P(Xi) for each person
  • Step 2 Select matches based on the estimated
    propensity score

34
Ways of constructing matched outcomes
  • Define a neighborhood C(Pi) for each person i
    Di1
  • Neighbors are persons in Dj0 for whom Pj ?
    C(Pi)
  • Set of persons matched to i is
  • Aij?Di0 such that Pj ? C(Pi)

35
Nearest Neighbor Matching
  • C(Pi)min Pi-Pj
  • j
  • j?Di0
  • gt Ai is a singleton set
  • Caliper matching
  • Matches only made if Pi-Pj lte for some
    prespecified tolerance (tries to avoid bad
    matches)

36
Kernel and Local Linear Matching
  • Estimate matched outcomes by nonparametric
    regression

37
Should matches be reused?
  • If dont reuse, then results will not be
    invariant to the order in which observations were
    matched

38
Difference-in-difference matching
  • Assume
  • (Y0t-Y0t) - D Pr(D1X)
  • 0ltPr(D1X)lt1
  • Main advantage
  • Allows for time invariant unobservable
    differences between the treatment group and the
    control group
  • Selection into the program can be based on the
    unobservables

39
Econometric Models of Program Participation
  • Assume individuals have the option of taking
    training in period k
  • Prior to k, observe Y0j, j1..k
  • After k, observe two potential outcomes (Y0t,Y1t)
  • To participate in training, persons must apply
    and be accepted, so there are several
    decision-makers
  • D1 if participates, 0 else
  • Assume participation decisions are based on
    maximization of future earnings

40
Simple model of participation
  • D1 if
  • First term is earnings stream if participates
  • C is the direct cost of training
  • Last term is earnings stream if do not
    participate
  • Ik is the information set at time k

41
Implications of this simple decision rule
  • Past earnings are irrelevant except for value in
    predicting future earnings
  • Persons with lower foregone earnings or lower
    costs are more likely to participate
  • Older persons and persons with higher discount
    rates are less likely to participate
  • The decision to take training is correlated with
    future earnings only through the correlation with
    expected future earnings

42
Special case of above model
  • Assume constant treatment effect a
  • D1 if expected rewards exceed costs
  • If earnings temporarily low, people are more
    likely to enroll in the program
  • Consistent with Ashenfelters Dip Pattern

43
Model of the decision process
  • Let INH(X)-V
  • H(X) expected future rewards
  • V costsCY0k (assumed unknown)
  • If V assumed to be independent of X, then could
    estimate by logistic or probit model
  • Pr(D1X)eH(X)/ 1 eH(X)
  • Pr(D1X)F((H(X)-µ1)/sv)

44
Nonexperimental Estimators Control function
methods
  • References
  • Roy (1951), Willis and Rosen (1979), Heckman and
    Honore (1990), Heckman and Sedlacek (1985),
    Heckman and Robb (1985, 1986)
  • Allow selectivity into the program to be based on
    unobservables, explicitly modeling and
    controlling for potential selectivity bias
  • Can assume unobservables normally distributed,
    but can relax normality

45
  • Assume
  • Y0Xß0e0
  • Y1Xß1e1
  • Bias can arise because E(e0D,X)?0
  • Control function methods find estimators for
    E(e0D1,X), E(e0D0,X)

46
  • Let D1 if Z?-vgt0, D0 else
  • Assume

47
Semiparametric approach (Heckman, 1980)
48
Where residual terms in brackets have mean
zeroK1(P) and K0(P) are termed control
functions (Heckman, 1980)
  • Can write the model as

49
  • Could approximate
  • K1(P)a0 a1P a2P2 a3P3 ..akPk
  • K0(P)?0 ?1P ?2P2 ?3P3 .. ?kPk
  • But the intercepts will not be separately
    identified from the treatment effect, unless
    there is a group for which P is close to 1.
  • identification at infinity (Heckman, 1989)
  • Also need an exclusion restriction to ensure
    that Xßi and Ki(P) are separately identified
    (variable that affects participation but does not
    affect the outcome equation)
  • Identification not necessarily a problem within
    the normal model, because can identify off of
    functional form

50
Nonexperimental Estimators Regression
Discontinuity Methods
  • Assumptions
  • Rule determining who gets treatment
  • Probability of getting treatment changes
    discontinuously as a function of some observed
    random variables
  • Examples
  • Barnow et. al. (1980) analyze effect of head
    start program on childrens test scores. Only
    families with outcomes below a threshold got the
    program
  • Design first discussed by Thistlewaite and
    Cambell (1960) who used it to evaluate the
    effects of national merit awards on career
    aspirations

51
(No Transcript)
52
Additional applications
  • Berk and Rauma (1983)
  • Van der Klaauw (1996)
  • Angrist and Lavy (1996)
  • Black (1996)
  • Angrist and Krueger (1991)
  • Hahn, Todd and Van der Klaauw (2001)

53
  • Let Y0i and Y1i be outcomes with and without
    treatment
  • Observed outcome is
  • Yi DiY0i(1-Di)Y1i
  • aiDi?i
  • ?i Y1i-Y0i

54
Two types of RD designs
  • Sharp design
  • Dif(zi)
  • Assume point at which f(zi) is discontinuous is
    known
  • A special case of selection on observables
  • Because Pr(Di1zi) is either 0 or 1, theis
    design violates the strict ignorability
    condition invoked by matching estimators

55
Two types of RD designs
  • Sharp design Dif(zi)
  • Assume the point at which f(zi) is discontinuous
    is known (z0)
  • A special case of selection on observables
  • Because Pr(Di1zi) is equal to 1 or 0, this
    violates the strict ignorability condition
    required for matching
  • Fuzzy design
  • Di is a random variable given zi
  • E(Di zi) Pr(Di1ziz) is known to be
    discontinuous at z0

56
Identification with sharp design
  • Comparison of outcomes for Di1 and Di0
    generally subject to selectivity bias
  • Comparison of outcomes for Di1 and Di0 groups
    with z values close to z0
  • E(Yizz0e)-E(Yiziz0-e)
  • E(?iziz0e)E(aiziz0e)-E(aiziz0-e)

57
  • Assume
  • (C1) E(aiziz) is continuous in z at z0
  • (C2) The limit of E(?iziz0e) as e?0 is well
    defined
  • Then
  • Lim E(Yizz0e)-E(Yiziz0-e)
    E(?iziz0)
  • e ?0
  • Can only identify the treatment effect locally at
    point z0
  • By increasing the number of discontinuity points,
    can identify impacts over a wider range of the
    support of z and test a common treatment effect
    assumption

58
Identification with the Fuzzy Design
  • Pr(Di1ziz) is discontinuous at z0, for
    example,
  • Di1 if f(zi )vigt0, else 0
  • Case 1 Constant treatment effect. Under (C1),

59
LIV Estimators
60
Ex-ante Evaluation
Write a Comment
User Comments (0)