Title: Statistical Analysis of the Regression-Discontinuity Design
1Statistical Analysis of the Regression-Discontinui
ty Design
2Analysis Requirements
C O X O C O O
- Pre-post
- Two-group
- Treatment-control (dummy-code)
3Assumptions in the Analysis
- Cutoff criterion perfectly followed.
- Pre-post distribution is a polynomial or can be
transformed to one. - Comparison group has sufficient variance on
pretest. - Pretest distribution continuous.
- Program uniformly implemented.
4The Curvilinearilty Problem
If the true pre-post relationship is not linear...
8
0
7
0
6
0
f
f
e
t
s
5
0
o
p
4
0
3
0
2
0
1
0
0
9
0
8
0
7
0
6
0
5
0
4
0
3
0
2
0
1
0
0
p
r
e
5The Curvilinearilty Problem
and we fit parallel straight lines as the model...
8
0
7
0
6
0
f
f
e
t
s
5
0
o
p
4
0
3
0
2
0
1
0
0
9
0
8
0
7
0
6
0
5
0
4
0
3
0
2
0
1
0
0
p
r
e
6The Curvilinearilty Problem
and we fit parallel straight lines as the model...
8
0
7
0
6
0
f
f
e
t
s
5
0
o
p
The result will be biased.
4
0
3
0
2
0
1
0
0
9
0
8
0
7
0
6
0
5
0
4
0
3
0
2
0
1
0
0
p
r
e
7The Curvilinearilty Problem
And even if the lines arent parallel
(interaction effect)...
8
0
7
0
6
0
f
f
e
t
s
5
0
o
p
4
0
3
0
2
0
1
0
0
9
0
8
0
7
0
6
0
5
0
4
0
3
0
2
0
1
0
0
p
r
e
8The Curvilinearilty Problem
And even if the lines arent parallel
(interaction effect)...
8
0
7
0
6
0
f
f
e
t
s
5
0
o
p
The result will still be biased.
4
0
3
0
2
0
1
0
0
9
0
8
0
7
0
6
0
5
0
4
0
3
0
2
0
1
0
0
p
r
e
9Model Specification
- If you specify the model exactly, there is no
bias. - If you overspecify the model (add more terms than
needed), the result is unbiased, but inefficient - If you underspecify the model (omit one or more
necessary terms, the result is biased.
10Model Specification
For instance, if the true function is
yi ?0 ?1Xi ?2Zi
11Model Specification
For instance, if the true function is
yi ?0 ?1Xi ?2Zi
And we fit
yi ?0 ?1Xi ?2Zi ei
12Model Specification
For instance, if the true function is
yi ?0 ?1Xi ?2Zi
And we fit
yi ?0 ?1Xi ?2Zi ei
Our model is exactly specified and we obtain an
unbiased and efficient estimate.
13Model Specification
On the other hand, if the true function is
yi ?0 ?1Xi ?2Zi
14Model Specification
On the other hand, if the true model is
yi ?0 ?1Xi ?2Zi
And we fit
yi ?0 ?1Xi ?2Zi ?2XiZi ei
15Model Specification
On the other hand, if the true function is
yi ?0 ?1Xi ?2Zi
And we fit
yi ?0 ?1Xi ?2Zi ?2XiZi ei
Our model is overspecified we included some
unnecessary terms, and we obtain an inefficient
estimate.
16Model Specification
And finally, if the true function is
yi ?0 ?1Xi ?2Zi ?2XiZi ?2Zi
2
17Model Specification
And finally, if the true model is
yi ?0 ?1Xi ?2Zi ?2XiZi ?2Zi
2
And we fit
yi ?0 ?1Xi ?2Zi ei
18Model Specification
And finally, if the true function is
yi ?0 ?1Xi ?2Zi ?2XiZi ?2Zi
2
And we fit
yi ?0 ?1Xi ?2Zi ei
Our model is underspecified we excluded some
necessary terms, and we obtain a biased estimate.
19Overall Strategy
- Best option is to exactly specify the true
function. - We would prefer to err by overspecifying our
model because that only leads to inefficiency. - Therefore, start with a likely overspecified
model and reduce it.
20Steps in the Analysis
- 1. Transform pretest by subtracting the cutoff.
- 2. Examine the relationship visually.
- 3. Specify higher-order terms and interactions.
- 4. Estimate initial model.
- 5. Refine the model by eliminating unneeded
higher-order terms.
21Transform the Pretest
Xi Xi - Xc
- Do this because we want to estimate the jump at
the cutoff. - When we subtract the cutoff from x, then x0 at
the cutoff (becomes the intercept).
22Examine Relationship Visually
Count the number of flexion points (bends) across
both groups...
23Examine Relationship Visually
Count the number of flexion points (bends) across
both groups...
Here, there are no bends, so we can assume a
linear relationship.
24Specify the Initial Model
- The rule of thumb is to include polynomials
to(number of flexion points) 2. - Here, there were no flexion points so...
- Specify to 02 2 polynomials (i.E., To the
quadratic).
25The RD Analysis Model
yi ?0 ?1Xi ?2Zi ?3XiZi ?4Xi ?5Xi Zi
ei
2
2
where
- yi outcome score for the ith unit
- ?0 coefficient for the intercept
- ?1 linear pretest coefficient
- ?2 mean difference for treatment
- ?3 linear interaction
- ?4 quadratic pretest coefficient
- ?5 quadratic interaction
- Xi transformed pretest
- Zi dummy variable for treatment(0 control, 1
treatment) - ei residual for the ith unit
26Data to Analyze
27Initial (Full) Model
The regression equation is posteff 49.1
0.972precut 10.2group - 0.236linint -
0.00539quad 0.00276 quadint Predictor
Coef Stdev t-ratio p Constant
49.1411 0.8964 54.82 0.000 precut
0.9716 0.1492 6.51 0.000 group
10.231 1.248 8.20
0.000 linint -0.2363 0.2162 -1.09
0.275 quad -0.005391 0.004994
-1.08 0.281 quadint 0.002757 0.007475
0.37 0.712 s 6.643 R-sq 47.7
R-sq(adj) 47.1
28Without Quadratic
The regression equation is posteff 49.8
0.824precut 9.89group - 0.0196linint Pred
ictor Coef Stdev t-ratio
p Constant 49.7508 0.6957 71.52
0.000 precut 0.82371 0.05889 13.99
0.000 group 9.8939 0.9528
10.38 0.000 linint -0.01963 0.08284
-0.24 0.813 s 6.639 R-sq 47.5
R-sq(adj) 47.2
29Final Model
The regression equation is posteff 49.8
0.814precut 9.89group Predictor Coef
Stdev t-ratio p Constant
49.8421 0.5786 86.14 0.000 precut
0.81379 0.04138 19.67 0.000 group
9.8875 0.9515 10.39 0.000 s
6.633 R-sq 47.5 R-sq(adj) 47.3
30Final Fitted Model