Title: Problems with the Design and Implementation of Randomized Experiments
1Problems with the Design and Implementation of
Randomized Experiments
- ByLarry V. HedgesNorthwestern University
Presented at the 2009 IES Research Conference
2Hard Answers to Easy Questions
- ByLarry V. HedgesNorthwestern University
Presented at the 2009 IES Research Conference
3Easy Question
- Isnt it ok if I just match (schools) on some
variable before randomizing? - (You know lots of people do it)
- This is a simple question, but giving it an
answer requires serious thinking about design and
analysis
4What Does this Question Mean?
- Generally adding matching or blocking variables
means adding another (blocking) factor to the
design - The exact consequences depend on the design you
started with - Individually randomized (completely randomized
design) - Cluster randomized (hierarchical design)
- Multicenter or matched (randomized blocks design)
5Individually Randomized (Completely Randomized)
Design
- In this case you are adding a blocking factor
crossed with treatment (p blocks) - In other words, the design becomes a
(generalized) randomized block design
Blocks Blocks Blocks Blocks
1 2 p
T Â Â Â Â
C Â Â Â Â
6Individually Randomized (Completely Randomized)
Design
- How does this impact the analysis?
- Think about a balanced design with 2n students
per block and p blocks and the ANOVA partitioning
of sums of squares and degrees of freedom - Original partitioning
- SSTotal SST SSWT
- dfTotal dfT dfWT
- 2pn 1 1 2pn 2
- Original test statistic
- F SST/(SSWT/dfWT)
-
7Individually Randomized (Completely Randomized)
Design
- New partitioning
- SSTotal SST SSB SSBxT SSWC
- dfTotal dfT dfB dfBxT
dfWC - 2pn 1 1 (p 1) (p 1) 2p(n 1)
- New test statistic ?
- F SST/(SSWC/dfWC)
- Or
- F SST/(SSBxT/dfBxT)
- It depends on the inference model
8Individually Randomized (Completely Randomized)
Design
Original Design Blocked Design
SS SST SSWT SS SST (SSB SSBxT SSWC)
df dfT dfWT df dfT (dfB dfBxT dfWC)
2pn1 1 (2pn 2) 2pn1 1 (p-1) (p-1) 2p(n-1)
9Inference Models
- I will mention two inference models
- Conditional inference model
- Unconditional inference model
- These inference models determine the type of
inference (generalization) you wish to make - Inference model chosen has implications for the
statistical analysis procedure chosen - The inference model determines the natural random
effects
10Inference Models
- Conditional Inference Model
- Generalization is to the blocks actually in the
experiment (or those just like them) - Blocks in the experiment are the universe
(population) - Generalization to other blocks depends on
extra-statistical considerations (which blocks
are just like them? How do you know?) - Generalization obviously cannot be model free
11Inference Models
- Unconditional Inference model
- Generalization is to a universe (of blocks)
including blocks not in the experiment - Blocks in the experiment are a sample of blocks
in the universe (population) - If blocks in the experiment can be considered a
representative sample, inference to the
population of blocks is by sampling theory - If blocks are not a probability sample,
generalization gets tricky (what is the universe?
How do you know?)
12Inference Models
- You can think of the inference model as linked to
the sampling model for blocks - If the blocks observed are a (random) sample of
blocks, then they are a source of random
variation - If blocks observed are the entire universe of
relevant blocks, then they are not a source of
random variation - The statistical analysis can be chosen
independently of the inference model, but if it
doesnt include all sources of random variation,
inferences will be compromised
13Inference Models and Statistical
AnalysesIndividually Randomized Design
- Blocks are fixed effects under the conditional
inference models - In this case the correct test statistic is
- FC SST/(SSWC/dfWC)
- and the F-distribution has 1 2p(n -1) df
- Block effects are random under the unconditional
inference model - In this case the correct test statistic is
- FU SST/(SSBxT/dfBxT)
- and the F-distribution has 1 (p -1) df
14Inference Models and Statistical
AnalysesIndividually Randomized Design
- You can see that the error term in the test has
(a lot) more df under fixed effects model 2p(n
1) versus (p 1) - What you cant see is that (if there is a
treatment effect) the average value of the
F-statistic is typically also larger under the
fixed effects model - It is bigger by a factor proportional to
- where ? sBxT2/sB2 is a treatment heterogeneity
parameter and ? is the intraclass correlation and
15Possible Statistical Analyses Individually
Randomized Design
- Possible statistical analyses
- Ignore the blocking
- Include blocks as fixed effects
- Include blocks as random effects
- Consequences depend on whether you want to make a
conditional or unconditional inference
16Making Unconditional Inferences Individually
Randomized Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as fixed effects
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as random effects
- Correct significance levels (but less power than
conditional analysis)
17Making Conditional Inferences Individually
Randomized Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea May deflate actual significance levels
of tests for treatment effects substantially
(unless ? 0) - Include blocks as fixed effects
- Correct significance levels and more powerful
test than for unconditional analysis - Include blocks as random effects
- Bad idea May deflate significance levels and
reduce power
18Cluster Randomized (Hierarchical) Design
- The issues about blocking in the cluster
randomized design are the same as in the
individually randomized design - The inference model will determine the most
appropriate statistical analysis - Examining the properties of the statistical
analysis may also reveal the weakness of the
design for a given inference purpose - For example, a small number of blocks may provide
only very uncertain inference to a universe of
blocks based on sampling arguments
19Cluster Randomized (Hierarchical) Design
- In this case you are adding a blocking factor
crossed with treatment (p blocks) but clusters
are still nested within treatments here Cij is
the jth cluster in the ith block - Note that there are m clusters in each treatment
per block
Block 1 Block 1 Block p Block p
C11, , C1m C1(m1), , C2m Cp1, , Cpm Cp(m1), , Cp(2m)
T Â --- Â ---
C --- Â --- Â
20Cluster Randomized (Hierarchical) Design
- How does this impact the analysis?
- Think about a balanced design with 2mn students
per block and p blocks and the ANOVA partitioning
of sums of squares and degrees of freedom - Original partitioning
- SSTotal SST SSC SSWCT
- dfTotal dfT dfC dfWCT
- 2mn 1 1 2(m 1) 2m(n 1)
- Original test statistic
- F SST/(SSc/dfC)
-
21Cluster Randomized (Hierarchical) Design
- New partitioning
- SSTotal SST SSB SSBxT SSCBxT SSWC
- dfTotal dfT dfB dfBxT dfCBxT dfWC
- 2mpn 1 1 (p 1) (p 1) 2p(m 1) 2pm
(n 1) - New test statistic ?
- F SST/(SSWT/dfWT)
- F SST/(SSCBxT/dfCBxT)
22Inference Models and Statistical Analyses Cluster
Randomized Design
- Blocks are fixed under the conditional inference
model, but clusters are typically random - In this case the correct test statistic is
- FC SST/(SSCBxT/dfCBxT)
- and the F-distribution has 1 2p(m 1) df
- Blocks are random under the unconditional
inference model, but clusters are typically
random - In this case there is no exact ANOVA test if
there are block treatment interactions, but a
conservative test uses the test statistic - FC SST/(SSB/dfB)
- and the F-distribution has 1 (p 1) df (large
sample tests, e.g., based on HLM, are available)
23Inference Models and Statistical Analyses Cluster
Randomized Design
- You can see that the error term has more df under
fixed effects model - If there is a treatment effect the average value
of the F-statistic is also larger under the fixed
effects model - It is bigger by a factor proportional to
- where ?B sBxT2/sB2 is a treatment heterogeneity
parameter and ?B and ?C are the block and cluster
level intraclass correlations, respectively and
24Possible Statistical AnalysesCluster Randomized
Design
- Possible statistical analyses
- Ignore the blocking
- Include blocks as fixed effects
- Include blocks as random effects
- Consequences depend on whether you want to make a
conditional or unconditional inference
25Making Unconditional InferencesCluster
Randomized Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as fixed effects
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as random effects
- Correct significance levels but less power than
conditional analysis
26Making Conditional InferencesCluster Randomized
Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea May deflate actual significance levels
of tests for treatment effects substantially - Include blocks as fixed effects
- Correct significance levels and more powerful
test than for unconditional analysis - Include blocks as random effects
- Not such a bad idea significance levels
unaffected
27Multi-center (Randomized Blocks) Design
- The issues about blocking in the multicenter
(randomized blocks) design are the same as in the
cluster randomized design - The inference model will determine the most
appropriate statistical analysis - Examining the properties of the statistical
analysis may also reveal the weakness of the
design for a given inference purpose - For example, a small number of blocks may provide
only very uncertain inference to a universe of
blocks based on sampling arguments
28Multi-center (Randomized Blocks) Design
- In this case you are adding a blocking factor
crossed with treatment (p blocks) and clusters,
but clusters are still nested within blocks here
Cij is the jth cluster in the ith block - Note that there are m clusters in each treatment
per block and n individuals in each treatment in
each cluster
Block 1 Block 1 Block 1 Block p Block p Block p
C11 C1m Cp1 Cpm
T Â Â Â Â
C Â Â Â Â
29Multi-center (Randomized Blocks) Design
- How does this impact the analysis?
- Think about a balanced design with 2mn students
per block and p blocks n individuals per cell and
the ANOVA partitioning of sums of squares and
degrees of freedom - Original partitioning
- SSTotal SST SSC SSTxC SSWC
- dfTotal dfT dfC dfTxC dfWC
- 2pmn 1 1 (pm 1) (pm 1) 2pm(n
1) - Original test statistic
- F SST/(SSTxC/dfTxC)
-
30Multi-center (Randomized Blocks) Design
- New partitioning
- SSTotal SST SSB SSCB SSBxT SSCBxT
SSWC - dfTotal dfT dfB dfCB dfBxT
dfCBxT dfWC - 2mpn 1 1 (p 1) p(m 1) (p 1) 2p(m
1) 2pm (n 1) - New test statistic ?
- F SST/(SSWC/dfWC)
- F SST/(SSBxT/dfBxT)
- F SST/(SSBxT/dfBxT)
31Inference Models and Statistical Analyses
Randomized Blocks Design
- Blocks are fixed under the conditional inference
models, but clusters are typically random - In this case the correct test statistic is
- FC SST/(SSCBxT/dfCBxT)
- and the F-distribution has 1 p(m 1) df
- Blocks are random under the unconditional
inference model, but clusters are typically
random - In this case the correct test statistic is
- FU SST/(SSBxT/dfBxT)
- and the F-distribution has 1 (p 1) df
32Inference Models and Statistical Analyses
Randomized Blocks Design
- You can see that the error term has more df under
fixed effects model - If there is a treatment effect the average value
of the F-statistic is also larger under the fixed
effects model - It is bigger by a factor proportional to
- where ?B sBxT2/sB2 and ?C sCxT2/sC2 are
treatment heterogeneity parameters and ?B and ?C
are the block and cluster level intraclass
correlations, respectively and
33Possible Statistical AnalysesRandomized Blocks
Design
- Possible statistical analyses
- Ignore the blocking
- Include blocks as fixed effects
- Include blocks as random effects
- Consequences depend on whether you want to make a
conditional or unconditional inference
34Making Unconditional Inferences Randomized
Blocks Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as fixed effects
- Bad idea Will inflate significance levels of
tests for treatment effects substantially - Include blocks as random effects
- Correct significance levels but less power than
conditional analysis
35Making Conditional Inference Randomized Blocks
Design
- Possible statistical analyses
- Ignore the blocking
- Bad idea May deflate actual significance levels
of tests for treatment effects substantially - Include blocks as fixed effects
- Correct significance levels and more powerful
test than for unconditional analysis - Include blocks as random effects
- Bad idea May deflate significance levels and
reduce power
36Another Easy Question
- There was some attrition from my study after
assignment. Does that cause a serious problem? - This is another simple question, but the answer
is far from simple. One answer can be framed
using concepts of experimental design
37Post Assignment Attrition
- A different question has a simple answer
- Does that (attrition) cause a problem in
principle? - The simple answer to that question is YES!
- Randomized experiments with attrition no longer
give model free, unbiased estimates of the causal
effect of treatment - Whether the bias is serious or not depends (on
the model that generates the missing data)
38Post Assignment Attrition
- The design is changed by adding a crossed factor
corresponding to missingness like this - Now we can see a problem with estimating
treatment effect from only the observed part of
the design The observed treatment effect is only
part of the total treatment effect
Observed Missing
T Â Â
C Â Â
39Post Assignment Attrition
- Suppose that the means are given by the µs and
the proportions are given by the ps
Observed Observed  Missing Missing
Proportion Mean  Proportion Mean
T  µTO  µTM
C  µCO    µCM
40Post Assignment Attrition
- The treatment effect on all individuals
randomized is - When the proportion of dropouts is equal in T and
C so that - pT pC p
- The mean of the treatment effect on all
individuals randomized is
41Post Assignment Attrition
- Rewriting this we see that the average treatment
effect for individuals assigned to treatment is - where dO is the treatment effect among the
individuals that are observed and dM is the
treatment effect among the individuals that are
not observed and d is the treatment effect among
all individuals assigned - Thus bounds on dM imply bounds on d
- l
42Post Assignment Attrition
- No estimate of the treatment effect is possible
without an estimate of the treatment effect among
the missing individuals - One possibility is to model (assume) that we know
something about the treatment effect in the
missing individuals - We can assume a range of values to get bounds on
the possible treatment effect
43Post Assignment Attrition
- When attrition rate is not the same in the
treatment groups (pT ? pC) the analysis is
trickier - One idea is to convince ourselves that the
treatment effect for those who drop out is the
same as those who do not
 Observed  Missing
 Mean  Mean
T 90 33
C 67 10
T-C 23 23
44Post Assignment Attrition
- This does not assure that attrition has not
altered the treatment effect - l
 Observed  Missing
 Mean  Mean
T 90 33
C 67 10
T-C 23 23
45Post Assignment Attrition
- This does not assure that attrition has not
altered the treatment effect - We have to know both µTM and µCM to identify the
treatment effect, knowing dM (µTM µCM) is not
enough
 Observed Observed  Missing Missing  Total Total
 n Mean  n Mean  n Mean
T 10 90 90 33 100 39
C 90 67 10 10 100 61
T-C 23 23 -23
46Post Assignment Attrition
- Suppose that
- BLTM and BLCM are lower bounds on the means for
missing individuals in the treatment group and - BUTM and BUCM are the upper bounds
- Then the upper and lower bounds on the treatment
effect are - Lower
- Upper
47Post Assignment Attrition
- Note that none of the results on attrition
involve sampling or estimation error - Results get more complex if we take this into
account, but the basic ideas are those here
48Conclusions
- Many simple questions arise in connection with
field experiments - The answers to these questions often require
thinking through complex aspects of - the design
- the inference model
- assumptions about missing data
- No correct answers are possible without
recognizing these complexities