Sample Design for Group-Randomized Trials - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Sample Design for Group-Randomized Trials

Description:

Sample Design for Group-Randomized Trials Howard S. Bloom Chief Social Scientist MDRC Prepared for the IES/NCER Summer Research Training Institute held at ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 62
Provided by: mark111
Category:

less

Transcript and Presenter's Notes

Title: Sample Design for Group-Randomized Trials


1
Sample Design for Group-Randomized Trials
  • Howard S. Bloom
  • Chief Social Scientist
  • MDRC
  • Prepared for the IES/NCER Summer Research
    Training Institute held at Northwestern
    University on July 27, 2010.

2
Today we will examine
  • Sample size determinants
  • Precision requirements
  • Sample allocation
  • Covariate adjustments
  • Matching and blocking
  • Subgroup analyses
  • Generalizing findings for sites and blocks
  • Using two-level data for three-level situations

3
Part I
  • The Basics

4
Statistical properties of group-randomized impact
estimators
  • Unbiased estimates
  • Yij aB0Tjejeij
  • E(b0) B0
  • Less precise estimates
  • VAR(eij) s2
  • VAR(ej) t2
  • r t2/(t2s2)

5
Design Effect(for a given total number of
individuals)
  • ______________________________________
  • Intraclass Individuals per Group (n)
  • Correlation (r) 10
    50 500
  • 0.01 1.04
    1.22 2.48
  • 0.05 1.20
    1.86 5.09
  • 0.10 1.38
    2.43 7.13
  • _____________________________________

6
Sample design parameters
  • Number of randomized groups (J)
  • Number of individuals per randomized group (n)
  • Proportion of groups randomized to program status
    (P)

7
Reporting precision
  • A minimum detectable effect (MDE) is the smallest
    true effect that has a good chance of being
    found to be statistically significant.
  • We typically define an MDE as the smallest true
    effect that has 80 percent power for a two-tailed
    test of statistical significance at the 0.05
    level.
  • An MDE is reported in natural units whereas a
    minimum detectable effect size (MDES) is reported
    in units of standard deviations

8
Minimum Detectable Effect SizesFor a
Group-Randomized Design with r 0.05 and no
Covariates
  • ___________________________________
  • Randomized Individuals per Group (n)
  • Groups (J) 10 50
    500
  • 10 0.77 0.53
    0.46
  • 20 0.50 0.35
    0.30
  • 40 0.35 0.24
    0.21
  • 120 0.20 0.14
    0.12
  • ___________________________________

9
Implications for sample design
  • It is extremely important to randomize an
    adequate number of groups.
  • It is often far less important how many
    individuals per group you have.

10
Part II
  • Determining required precision

11
When assessing how much precision is needed
  • Always ask relative to what?
  • Program benefits
  • Program costs
  • Existing outcome differences
  • Past program performance

12
Effect Size Gospel According to Cohen and Lipsey
  • Cohen Lipsey
  • (speculative)
    (empirical)
  • _______________________________________________
  • Small 0.2s Small 0.15s
  • Medium 0.5s Medium 0.45s
  • Large 0.8s Large 0.90s

13
Five-year impacts of the Tennessee class-size
experiment
  • Treatment
  • 13-17 versus 22-26 students per class
  • Effect sizes
  • 0.11s to 0.22s for reading and math
  • Findings are summarized from Nye, Barbara, Larry
    V. Hedges and Spyros Konstantopoulos (1999) The
    Long-Term Effects of Small Classes A Five-Year
    Follow-up of the Tennessee Class Size
    Experiment, Educational Evaluation and Policy
    Analysis, Vol. 21, No. 2 127-142.

14
Annual reading and math growth
  • Reading Math
  • Grade Growth Growth
  • Transition Effect Size Effect Size
  • --------------------------------------------------
    --------------
  • K - 1 1.52 1.14
  • 1 - 2 0.97
    1.03
  • 2 - 3 0.60
    0.89
  • 3 - 4 0.36
    0.52
  • 4 - 5 0.40
    0.56
  • 5 - 6 0.32
    0.41
  • 6 - 7 0.23
    0.30
  • 7 - 8 0.26
    0.32
  • 8 - 9 0.24
    0.22
  • 9 - 10 0.19
    0.25
  • 10 - 11 0.19
    0.14
  • 11 - 12 0.06
    0.01
  • --------------------------------------------------
    -----------------------------------------------

Based on work in progress using documentation on
the national norming samples for the CAT5, SAT9,
Terra Nova CTBS, Gates MacGinitie (for reading
only), MAT8, Terra Nova CAT, and SAT10. 95
confidence intervals range in reading from /-
.03 to .15 and in math from /- .03 to .22
15
Performance gap between average (50th
percentile) and weak (10th percentile) schools
Subject and grade Subject and grade District I District II District III District IV
Reading Reading Reading
Grade 3 0.31 0.18 0.16 0.43
Grade 5 0.41 0.18 0.35 0.31
Grade 7 .025 0.11 0.30 NA
Grade 10 0.07 0.11 NA NA
Math Math Math
Grade 3 0.29 0.25 0.19 0.41
Grade 5 0.27 0.23 0.36 0.26
Grade 7 0.20 0.15 0.23 NA
Grade 10 0.14 0.17 NA NA
Source District I outcomes are based on ITBS
scaled scores, District II on SAT 9 scaled
scores, District III on MAT NCE scores, and
District IV on SAT 8 NCE scores.
16
Demographic performance gap in reading and math
Main NAEP scores
Subject and grade Subject and grade Black-White Hispanic-White Male-Female Eligible-Ineligible for free/reduced price lunch
Reading Reading Reading
Grade 4 -0.83 -0.77 -0.18 -0.74
Grade 8 -0.80 -0.76 -0.28 -0.66
Grade 12 -0.67 -0.53 -0.44 -0.45
Math Math Math
Grade 4 -0.99 -0.85 0.08 -0.85
Grade 8 -1.04 -0.82 0.04 -0.80
Grade 12 -0.94 -0.68 0.09 -0.72
Source U.S. Department of Education, Institute
of Education Sciences, National Center for
Education Statistics, National Assessment of
Educational Progress (NAEP), 2002 Reading
Assessment and 2000 Mathematics Assessment.
17
ES Results from Randomized Studies
Achievement Measure Achievement Measure n n Mean
Elementary School Elementary School Elementary School 389 0.33
Standardized test (Broad) Standardized test (Broad) 21 0.07
Standardized test (Narrow) Standardized test (Narrow) 181 0.23
Specialized Topic/Test Specialized Topic/Test 180 0.44
Middle Schools Middle Schools Middle Schools 36 0.51
High Schools High Schools High Schools 43 0.27
18
Part III
  • The ABCs of Sample Allocation

19
Sample allocation alternatives
  • Balanced allocation
  • maximizes precision for a given sample size
  • maximizes robustness to distributional
    assumptions.
  • Unbalanced allocation
  • precision erodes slowly with imbalance for a
    given sample size
  • imbalance can facilitate a larger sample
  • Imbalance can facilitate randomization

20
Variance relationships for the program and
control groups
  • Equal variances when the program does not affect
    the outcome variance.
  • Unequal variances when the program does affect
    the outcome variance.

21
MDES for equal variances without covariates
22
How allocation affects MDES
23
Minimum Detectable Effect Size For Sample
Allocations Given Equal Variances
  • Allocation Example
    Ratio to Balanced

  • Allocation
  • 0.5/0.5 0.54s
    1.00
  • 0.6/0.4 0.55s
    1.02
  • 0.7/0.3 0.59s 1.09
  • 0.8/0.2 0.68s 1.25
  • 0.9/0.1 0.91s 1.67
  • ________________________________________
  • Example is for n 20, J 10, r 0.05, a
    one-tail hypothesis test and no covariates.

24
Implications of unbalanced allocations with
unequal variances
25
Implications Continued
  • The estimated standard error is unbiased
  • When the allocation is balanced
  • When the variances are equal
  • The estimated standard error is biased upward
  • When the larger sample has the larger variance
  • The estimated standard error is biased downward
  • When the larger sample has the smaller variance

26
Interim Conclusions
  • Dont use the equal variance assumption for an
    unbalanced allocation with many degrees of
    freedom.
  • Use a balanced allocation when there are few
    degrees of freedom.

27
References
  • Gail, Mitchell H., Steven D. Mark, Raymond J.
    Carroll, Sylvan B. Green and David Pee (1996) On
    Design Considerations and Randomization-Based
    Inferences for Community Intervention Trials,
    Statistics in Medicine 15 1069 1092.
  • Bryk, Anthony S. and Stephen W. Raudenbush (1988)
    Heterogeneity of Variance in Experimental
    Studies A Challenge to Conventional
    Interpretations, Psychological Bulletin, 104(3)
    396 404.

28
Part IV
  • Using Covariates to Reduce
  • Sample Size

29
Basic ideas
  • Goal Reduce the number of clusters randomized
  • Approach Reduce the standard error of the impact
    estimator by controlling for baseline covariates
  • Alternative Covariates
  • Individual-level
  • Cluster-level
  • Pretests
  • Other characteristics

30
Impact Estimation with a Covariate
  • yij the outcome for student i from school j
  • Tj 1 for treatment schools and 0 for control
    schools
  • Xj a covariate for school j
  • xij a covariate for student i from school j
  • ej a random error term for school j
  • eij a random error term for student i from
    school j

31
Minimum Detectable Effect Size with a Covariate
  • MDES minimum detectable effect size
  • MJ-K a degrees-of-freedom multiplier1
  • J the total number of schools randomized
  • n the number of students in a grade per school
  • P the proportion of schools randomized to
    treatment
  • the unconditional intraclass correlation
    (without a covariate)
  • R12 the proportion of variance across
    individuals within schools (at level 1) predicted
    by the covariate
  • R22 the proportion of variance across schools
    (at level 2) predicted by the covariate
  • 1 For 20 or more degrees of freedom MJ-K equals
    2.8 for a two-tail test and 2.5 for a one-tail
    test with statistical power of 0.80 and
    statistical significance of 0.05

32
Questions Addressed Empirically about the
Predictive Power of Covariates
  • School-level vs. student-level pretests
  • Earlier vs. later follow-up years
  • Reading vs. math
  • Elementary vs. middle vs. high school
  • All schools vs. low-income schools vs.
    low-performing schools

33
Empirical Analysis
  • Estimate r, R22 and R12 from data on thousands of
    students from hundreds of schools, during
    multiple years at five urban school districts
  • Summarize these estimates for reading and math in
    grades 3, 5, 8 and 10
  • Compute implications for minimum detectable
    effect sizes

34
Estimated Parameters for Reading with a
School-level Pretest Lagged One Year
  • __________________________________________________
    _________________
  • School District
  • __________________________________
    _________________________
  • A
    B C D
    E
  • __________________________________________________
    _________________
  • Grade 3
  • r 0.20
    0.15 0.19 0.22
    0.16
  • R22 0.31
    0.77 0.74 0.51
    0.75
  • Grade 5
  • r 0.25
    0.15 0.20 NA
    0.12
  • R22 0.33
    0.50 0.81 NA
    0.70
  • Grade 8
  • r 0.18
    NA 0.23 NA
    NA
  • R22 0.77
    NA 0.91 NA
    NA
  • Grade 10
  • r 0.15
    NA 0.29 NA
    NA
  • R22 0.93
    NA 0.95 NA
    NA
  • __________________________________________________
    __________________

35
Minimum Detectable Effect Sizes for Reading with
a School-Level Pretest (Y-1) or a Student-Level
Pretest (y-1) Lagged One Year
  • __________________________________________________
    ______
  • Grade 3 Grade
    5 Grade 8 Grade 10
  • __________________________________________________
    ______
  • 20 schools randomized
  • No covariate 0.57 0.56
    0.61 0.62
  • Y-1 0.37
    0.38 0.24 0.16
  • y-1 0.38
    0.40 0.28 0.15
  • 40 schools randomized
  • No covariate 0.39 0.38
    0.42 0.42
  • Y-1 0.26
    0.26 0.17 0.11
  • y-1 0.26
    0.27 0.19 0.10
  • 60 schools randomized
  • No covariate 0.32 0.31
    0.34 0.34
  • Y-1 0.21
    0.21 0.13 0.09
  • y-1 0.21
    0.22 0.15 0.08
  • __________________________________________________
    ______

36
Key Findings
  • Using a pretest improves precision dramatically.
  • This improvement increases appreciably from
    elementary school to middle school to high school
    because R22 increases.
  • School-level pretests produce as much precision
    as do student-level pretests.
  • The effect of a pretest declines somewhat as the
    time between it and the post-test increases.
  • Adding a second pretest increases precision
    slightly.
  • Using a pretest for a different subject increases
    precision substantially.
  • Narrowing the sample to schools that are similar
    to each other does not improve precision beyond
    that achieved by a pretest.

37
Source
  • Bloom, Howard S., Lashawn Richburg-Hayes and
    Alison Rebeck Black (2007) Using Covariates to
    Improve Precision for Studies that Randomize
    Schools to Evaluate Educational Interventions
    Educational Evaluation and Policy Analysis,
    29(1) 30 59.

38
Part VThe Putative Power of Pairing
  • A Tail of Two Tradeoffs
  • (It was the best of techniques. It was the worst
    of techniques.
  • Who the dickens said that?)

39
Pairing
  • Why match pairs?
  • for face validity
  • for precision
  • How to match pairs?
  • rank order clusters by covariate
  • pair clusters in rank-ordered list
  • randomize clusters in each pair

40
When to pair?
  • When the gain in predictive power outweighs the
    loss of degrees of freedom
  • Degrees of freedom
  • J - 2 without pairing
  • J/2 - 1 with pairing

41
Deriving the Minimum Required Predictive Power
of Pairing
  • Without pairing
  • With pairing
  • Breakeven R2

42
The Minimum Required Predictive Power of Pairing
  • Randomized Required Predictive
  • Clusters (J) Power (R min2)
  • 6 0.52
  • 8 0.35
  • 10 0.26
  • 20 0.11
  • 30 0.07
  • For a two-tail test.

43
A few key points about blocking
  • Blocking for face validity vs. blocking for
    precision
  • Treating blocks as fixed effects vs.random
    effects
  • Defining blocks using baseline information

44
Part VI
  • Subgroup Analyses 1
  • When to Emphasize Them

45
Confirmatory vs. Exploratory Findings
  • Confirmatory Draw conclusions about the
    programs effectiveness if results are
  • Consistent with theory and contextual factors
  • Statistically significant and large
  • And subgroup was pre-specified
  • Exploratory Develop hypotheses for further study

46
Pre-specification
  • Before the analysis, state that conclusions about
    the program will be based in part on findings for
    this set of subgroups
  • Pre-specification can be based on
  • Theory
  • Prior evidence
  • Policy relevance

47
Statistical significance
  • When should we discuss subgroup findings?
  • Depends on
  • Whether significant differences in impacts across
    subgroups
  • Might depend on whether impacts for the full
    sample are statistically significant

48
Part VII
  • Subgroup Analyses 2
  • Creating Subgroups

49
Defining Features
  • Creating subgroups in terms of
  • Program characteristics
  • Randomized group characteristics
  • Individual characteristics

50
Defining Subgroups by Program Characteristics
  • Based only on program features that were
    randomized
  • Thus one cannot use implementation quality

51
Defining Subgroups by Characteristics Of
Randomized Groups
  • Types of impacts
  • Net impacts
  • Differential impacts
  • Internal validity
  • only use pre-existing characteristics
  • Precision
  • Net impact estimates are limited by reduced
    number of randomized groups
  • Differential impact estimates are triply limited
    (and often need four times as many randomized
    groups)

52
Defining Subgroups by Characteristics of
Individuals
  • Types of impacts
  • Net impacts
  • Differential impacts
  • Internal validity
  • Only use pre-existing characteristics
  • Only use subgroups with sample members from all
    randomized groups
  • Precision
  • For net impacts can be almost as good as for
    full sample
  • For differential impacts can be even better than
    for full sample

53
(No Transcript)
54
Part VIII
  • Generalizing Results from
  • Multiple Sites and Blocks

55
Fixed vs. Random Effects InferenceA Vexing Issue
  • Known vs. unknown populations
  • Broader vs. narrower inferences
  • Weaker vs. stronger precision
  • Few vs. many sites or blocks

56
Weighting Sites and Blocks
  • Implicitly through a pooled regression
  • Explicitly based on
  • Number of schools
  • Number of students
  • Explicitly based on precision
  • Fixed effects
  • Random effects
  • Bottom line the question addressed is what counts

57
Part IX
  • Using Two-Level Data for Three-Level Situations

58
The Issue
  • General Question What happens when you design a
    study with randomized groups that comprise three
    levels based on data which do not account
    explicitly for the middle level?
  • Specific Example What happens when you design a
    study that randomizes schools (with students
    clustered in classrooms in schools) based on data
    for students clustered in schools?

59
3-level vs. 2-level Variance Components
60
3-level vs. 2-level MDES for Original Sample
61
Further References
  • Bloom, Howard S. (2005) Randomizing Groups to
    Evaluate Place-Based Programs, in Howard S.
    Bloom, editor, Learning More From Social
    Experiments Evolving Analytic Approaches (New
    York Russell Sage Foundation).
  • Bloom, Howard S., Lashawn Richburg-Hayes and
    Alison Rebeck Black (2005) Using Covariates to
    Improve Precision Empirical Guidance for Studies
    that Randomize Schools to Measure the Impacts of
    Educational Interventions (New York MDRC).
  • Donner, Allan and Neil Klar (2000) Cluster
    Randomization Trials in Health Research (London
    Arnold).
  • Hedges, Larry V. and Eric C. Hedberg (2006)
    Intraclass Correlation Values for Planning Group
    Randomized Trials in Education (Chicago
    Northwestern University).
  • Murray, David M. (1998) Design and Analysis of
    Group-Randomized Trials (New York Oxford
    University Press).
  • Raudenbush, Stephen W., Andres Martinez and
    Jessaca Spybrook (2005) Strategies for Improving
    Precision in Group-Randomized Experiments
    (University of Chicago).
  • Raudenbush, Stephen W. (1997) Statistical
    Analysis and Optimal Design for Cluster
    Randomized Trials Psychological Methods, 2(2)
    173 185.
  • Schochet, Peter Z. (2005) Statistical Power for
    Random Assignment Evaluations of Education
    Programs, (Princeton, NJ Mathematica Policy
    Research).
Write a Comment
User Comments (0)
About PowerShow.com