Title: Sample size and analytical issues for cluster trials
1Sample size and analytical issues for cluster
trials
- David Torgerson
- Director, York Trials Unit
- djt6_at_york.ac.uk
- www.rcts.org
2Background
- For any trial we want to make it sufficiently
large that if there were a true difference
between the groups that this difference would be
statistically significant. - A Type II error occurs when we wrongly conclude
there is no difference when there actually is.
3Sample size calculations
- Most hand calculations diabolically strain human
limits, even for the easiest formula,.. (Schulz
Grimes, Lancet 2005)
4Sample size formulae
- Usually need a computer to calculate. However, a
simple approximation for a two armed randomised
trial with 11 ratio for a continuous variable
(e.g., blood pressure) is as follows d effect
size (difference/standard deviation)
5Example
- We want to investigate a treatment for back pain.
The measure is the Roland and Morris back pain
scale with a standard deviation of 4. If we want
to detect a 2 point difference how many do we
need? - 2/4 0.5 Effect size (d). 0.5 x 0.5 0.25.
- 32/0.25 128 in total for 80 power, 5
significance (use 42 for 90 power). - NB using computer software answer 126
6Binary variables
- For a dichotomous variable (cured not cured) the
following is useful (a average proportion
difference).
7Example
- Breast feeding rates are only 50 and we have an
educational intervention where we think this will
increase to 60 how many do we need? - d2 0.6-0.5 0.12 0.01
- a 0.60.5/2 0.55
- a2 0.552 0.3025
- 0.01/(0.55-0.3025) 0.040
- 32/0.040 792
- Need 792 to have 80 power to show a 10
difference in breast feeding rates if it were
present (use 42 for 90 power). - NB using computer software the answer is 774
8Approximations
- The formulae slightly overestimate the true
sample size needed. But they can be done on a
hand calculator and you can impress the
statisticians. - What about cluster trials?
9Cluster Sample Size
- Usual sample size estimates assume independence
of observations. When people are members of the
same cluster (e.g., classroom, GP surgery) they
are more related than we would expect to be at
random. - This is the intra-cluster correlation
co-efficient.
10ICC
- The ICC needs to incorporated into the sample
size calculations. The formula is as follows
Design effect 1 (m 1) X ICC. Design effect
is the size the sample needs to be inflated by.
M is the number of people in the cluster.
11Sample size example.
- Lets assume for an individually randomised trial
we need 128 people to detect 0.5 of an effect
size with 80 power (2p 0.05). Now assume we
have 24 groups with 7 members. The ICC is 0.05,
which is quite high. - 1 (7 1) x 0.05 1.3, we need to increase the
sample size by 30. Therefore, we will need 166
participants.
12What happens if cluster gets bigger?
- If our cluster size is twice as big (14), things
begin to get really interesting. - 1(14-1)x0.05 1.65.
- What about 30? (1(30-1)x 0.05 2.45 (I.e, 314
participants). - Say we randomise a larger cluster, such as a
school (n 500) (1(500-1) x 0.05 25.95 (ie.
3322).
13ICC size
- ICCs can be large for some things. ICCs for
educational outcomes for examples are often
around 0.4 to 0.5. - A class-based RCT with n 30 and an ICC of 0.4
would need 1,612 participants or 54 classes with
n 30 in each class.
14What makes the ICC large?
- If the treatment is applied to health care
provider (e.g., guidelines will increase ICCs for
patients). - If cluster relates to outcome variable (e.g.,
smoking cessation and schools) - If members of cluster are expected to influence
each other (e.g., households).
15Reviews of Cluster Trials
Authors Source Years Clustering allowed for in sample size Clustering allowed for in analysis
Donner et al. (1990) 16 non-therapeutic intervention trials 1979 1989 lt20 lt50
Simpson et al. (1995) 21 trials from American Journal of Public Health and Preventive Medicine 1990 1993 19 57
Isaakidis and Ioannidis (2003) 51 trials in Sub-Saharan Africa 1973 2001 (half post 1995) 20 37
Puffer et al. (2003) 36 trials in British Medical Journal, Lancet, and New England Journal of Medicine 1997 2002 56 92
Eldridge et al. (Clinical Trials 2004) 152 trials in primary health care 1997 - 2000 20 59
16Sample Size Problems
Cluster Trials Demand Larger Sample Sizes
17Conditional ICC
- The key ICC is the conditional ICC, usually we
only have access to estimates of the
unconditional ICC. - If we know, and can measure, characteristics that
cause the ICC, we can adjust for this and lower
the ICC. - Cook claims that using covariates allows a school
based RCT to reduce the number for schools from
about 50 to around 22.
18Summary of sample size
- The KEY thing is the size of the cluster. It is
nearly always best to get lots of small clusters
than a few large ones (e.g, a trial with small
hospital wards, GP practices, classrooms will,
ceteris paribus, be better than large clusters). - BUT if the ICC is tiny may not affect the sample
too much.
19Cluster Trials Should I do one?
- If possible avoid like the plague. BUT although
they are difficult to do, properly, they WILL
give more robust answers than other methods,
(e.g., observational data), when done properly. - Is it possible to avoid doing them and do an
individually randomised trial?
20Contamination
- An important justification for their use is
SUPPOSED contamination between participants
allocated to the intervention with people
allocated to the control.
21Spurious Contamination?
- Trial proposal to cluster randomise practices for
a breast feeding study new mothers might talk
to each other! - Trial for reducing cardiac risk factors patients
again might talk to each other. - Trial for removing allergens from homes of
asthmatic children.
22Contamination
- Contamination occurs when some of the control
patients receive the novel intervention. - It is a problem because it reduces the effect
size, which increases the risk of a Type II error
(concluding there is no effect when there
actually is).
23Patient level contamination
- In a trial of counselling adults to reduce their
risk of cardiovascular disease general practices
were randomised to avoid contamination of control
participants by intervention patients.
Steptoe. BMJ 1999319943.
24Accepting Contamination
- We should accept some contamination and deal with
it through individual randomisation and by
boosting the sample size rather than going for
cluster randomisation
Torgerson BMJ 2001322355.
25Counselling Trial
- Steptoe et al, wanted to detect a 9 reduction in
smoking prevalence with a health promotion
intervention. They needed 2000 participants
(rather than 1282) because of clustering. - If they had randomised 2000 individuals this
would have been able to detect a 7 reduction
allowing for a 20 CONTAMINATION.
Steptoe. BMJ 1999319943.
26Comparison of Sample Sizes
NB Assuming an ICC of 0.02.
27Misplaced contamination
- The ONLY health study, Im aware of to date, to
directly compare an individually randomised study
with a cluster design, showed no evidence of
contamination. - In an RCT of nurse led cardiovascular risk factor
screening some intervention clusters had
participants allocated to no treatment. NO
contamination was observed.
28What about dilution bias?
- If, in the presence of contamination, we use
individual allocation we might observe a
difference that is statistically significant but
is not clinically or economically significant. - Dilution has biased the estimate towards the mean.
29Dealing with contamination
- Sometimes there may be substantial contamination
and this will dilute the treatment effects, it
may, however, still be best to individually
randomise if you can measure contamination.
30Per-protocol analysis?
- We cannot adjust for contamination using either
per-protocol or on treatment analysis these
popular analytical methods are plainly wrong as
they violate the random allocation.
31CACE analysis a solution?
- If we can measure contamination we can use a
statistical approach known as Complier Average
Causal Effect (CACE) analysis.
32Assumptions of CACE
- Assumption 1 if the control group had been
offered treatment the same proportion would
comply with treatment this must be true as
random allocation ensures that it is. - Assumption 2 merely being offered treatment has
no effect on outcomes.
33Example CRC screening
- In a RCT of bowel cancer screening only 53 of
people invited for screening attended. - ITT relative risk 0.85. BUT what happened to
those who were screened? The per protocol RR was
0.62 THIS IS WRONG. - What is the true estimate?
34 35True differences
- For ITT the policy of offering screening to the
whole community the RR 0.85, that is a 15
reduction in CRC deaths. - For those who accepted screening their RR was
0.68 a 32 reduction in deaths, NOT a 38
reduction.
36Individuals are best
- Using CACE we can get the best of both worlds
retain individual randomisation and get unbiased
estimates.
37Sample size simulation
- CACE analysis generally produces wider confidence
intervals as there are two sources of variance. - Therefore, it is possible that cluster allocation
may actually have a lower standard error in some
circumstances. - To assess whether this is true we undertook a
simulation exercise.
38 Sample size Trade-off between cluster and
individual allocation
Cluster Size ICC 0.04, Cluster trial Contamination () Individual RCT with CACE Contamination effect
10 1080 0 630 1
30 1740 10 756 1.20
50 2400 20 890 1.41
100 4000 30 1090 1.73
NB 80 power to detect an effect size of
0.2 Source Hewitt PhD thesis.
39Sample size
- CACE performs better than cluster allocation in a
range of sample size scenarios - Because of the difficulties of doing a cluster
trial then an individual trial design with CACE
analysis might be best.
40Limitations
- The assumption that being offered treatment has
no effect is a weakness as some may appear not to
comply but actually access some of the treatment.
41Still need to do a cluster trial?
- If a cluster trial is be undertaken it is
important, once the trial has been completed that
it is analysed correctly and that the effect of
the clustering is accounted for. This has been
known since 1940, when Linquist advocated that
educational trials should use the class as the
natural unit of allocation.
42What did Lindquist proposed
- Each class should be treated both as the unit of
allocation and the unit of analysis. - Put simply a trial with 20 classes of 30 children
is NOT a trial of 600 children it is a trial of
20 classes. - The simplest approach is to calculate the mean
score of each cluster and do a t-test comparing
the two means.
43Example
- A randomised trial of 28 adult literacy classes
sought to ascertain whether or not paying
participants an incentive to attend would improve
adherrence. - 14 classes were randomised for students to get an
incentive 14 were controls. - Students were paid 5 per class attended
- There were 150 students in total the ICC was
0.39.
See Martin Blands website http//www-users.york.a
c.uk/mb55/ for a worked example
44Two-sample t test with equal variances -----------
--------------------------------------------------
----------------- Group Obs Mean
Std. Err. Std. Dev. 95 Conf.
Interval ---------------------------------------
-------------------------------------- Group X
70 6.685714 .4177941 3.495516
5.852238 7.519191 Group Y 82
5.280488 .2991881 2.709263 4.685197
5.875778 ----------------------------------------
------------------------------------- combined
152 5.927632 .2566817 3.164585
5.42048 6.434783 -----------------------------
------------------------------------------------
diff 1.405226 .5037841
.4097968 2.400656 ----------------------
--------------------------------------------------
------ diff mean(Group X) - mean(Group Y)
t 2.7893 Ho diff 0
degrees of
freedom 150 Ha diff lt 0
Ha diff ! 0 Ha diff gt 0
Pr(T lt t) 0.9970 Pr(T gt t) 0.0060
Pr(T gt t) 0.0030
45Wrong
- This analysis is wrong it treats all of the
students as individuals and ignores the
clustering of outcomes between the two
approaches. - Let us try Lindquists approach to the anlaysis.
46Two-sample t test with equal variances -----------
--------------------------------------------------
----------------- Group Obs Mean
Std. Err. Std. Dev. 95 Conf.
Interval ---------------------------------------
-------------------------------------- 1
14 6.69932 .7457716 2.790422
5.088178 8.310461 2 14
5.189229 .3974616 1.487165 4.330565
6.047893 ----------------------------------------
------------------------------------- combined
28 5.944274 .439363 2.32489
5.042776 6.845773 ----------------------------
-------------------------------------------------
diff 1.510091 .8450746
-.226985 3.247166 ---------------------
--------------------------------------------------
------- diff mean(1) - mean(2)
t 1.7869 Ho diff 0
degrees of
freedom 26 Ha diff lt 0
Ha diff ! 0 Ha diff gt 0
Pr(T lt t) 0.9572 Pr(T gt t) 0.0856
Pr(T gt t) 0.0428
47T-test method
- This is correct in the sense that it takes
clustering into account, however, it does not
take chance differences in cluster size into
account or powerful predictors of outcome. - We have information of cluster size and pre-test
literacy score we can use to improve the
precision of our estimate (i.e., reduce width of
the confidence intervals). We can use summary
statistics in a regression approach
48 Source SS df MS
Number of obs 28 -------------------
------------------------ F( 2, 25)
22.97 Model 88.6762362 2
44.3381181 Prob gt F 0.0000
Residual 48.252853 25 1.93011412
R-squared 0.6476 ------------------------
------------------- Adj R-squared
0.6194 Total 136.929089 27
5.07144775 Root MSE
1.3893 ------------------------------------------
------------------------------------ sessions
Coef. Std. Err. t Pgtt 95
Conf. Interval ---------------------------------
--------------------------------------------
group -1.778653 .5301429 -3.36 0.003
-2.870503 -.6868038 midscl -.0945941
.015181 -6.23 0.000 -.1258598
-.0633283 _cons 13.13811 1.175841
11.17 0.000 10.71642 15.5598 -----------
-------
49Other methods
- There are other statistical methods, that are
more complex, and may yield slightly different
results. However, simple methods are
approximately correct and easier to do.
50Summary
- Cluster trials need larger sample sizes than
individually randomised studies. - Clustering needs to be taken into account both in
the sample size and the analysis. - There are simple methods that can do this.