Title: Randomised Controlled Trials in the Social Sciences
1Randomised Controlled Trials in the Social
Sciences Cluster randomised trials Martin
Bland Professor of Health Statistics University
of York www-users.york.ac.uk/mb55/
2- Cluster randomised trials
- Also called group randomised trials.
- Research subjects are not sampled independently,
but in a group. - For example
- all the patients in a general practice are
allocated to the same intervention, the
general practice forming a cluster, - all pupils in a school class are allocated to
the same intervention, the class forming a
cluster.
3Members of a cluster will be more like one
another than they are like members of other
clusters.
4Members of a cluster will be more like one
another than they are like members of other
clusters. We need to take this into account in
the analysis and design.
5- Methods of analysis which ignore clustering
- two sample t method,
- chisquared test for a two way table,
- difference between two proportions,
- relative risk,
- analysis of covariance,
- logistic regression.
6- Methods of analysis which ignore clustering
- two sample t method,
- chisquared test for a two way table,
- difference between two proportions,
- relative risk,
- analysis of covariance,
- logistic regression.
- May mislead, because they assume that all
subjects are independent observations.
7Methods which ignore clustering may mislead,
because they assume that all subjects are
independent observations. Observations within
the same cluster are correlated.
8Methods which ignore clustering may mislead,
because they assume that all subjects are
independent observations. Observations within
the same cluster are correlated. May lead to
standard errors which are too small, confidence
intervals which are too narrow, P values which
are too small.
9A little simulation Four cluster means, two in
each group, from a Normal distribution with mean
10 and standard deviation 2. Generated 10
members of each cluster by adding a random number
from a Normal distribution with mean zero and
standard deviation 1. The null hypothesis, that
there is no difference between the means in the
two populations, is true. Two-sample t test
comparing the means, ignoring the clustering.
10(No Transcript)
111000 times 600 significant differences, with
Plt0.05 502 highly significant, with Plt0.01. If
t test ignoring the clustering were valid, expect
50 significant differences, 5, and 10 highly
significant ones. The analysis assumes that we
have 20 independent observations in each group.
This is not true. We have two independent
clusters of observations, but the observations in
those clusters are really the same thing repeated
ten times.
12(No Transcript)
13- A valid statistical analysis.
- Possible analysis
- find the means for the four clusters
- carry out a two-sample t test using these four
means only. - 1000 simulation runs
- 53 (5.3) significant at Plt0.05
- 14 (1.4) highly significant at Plt0.01
-
14Simulation is very extreme. Two groups of two
clusters and a very large cluster effect. Have
seen a proposed study with two groups of two
clusters. Smaller cluster effect would only
reduce the shrinking of the P values, it would
not remove it. Simulation shows that spurious
significant differences can occur if we ignore
the clustering.
15Example GP Education Trial Trial of General
Practictioner education to improve treatment of
asthma. Educate GPs in small groups, or not,
and evaluate this education by giving repeated
questionnaires to their asthmatic patients.
Asked for my views on the sample size
calculations.
16Original ignored the clustering and the GPs, and
treated the design as a comparison of two groups
of patients. Revised produced a sample size
calculation based primarily on the number of GPs,
not patients.
17The trial was funded and a research fellow, a GP,
appointed. The cluster nature of the study was
self-evident to me. It was not self-evident to
the research fellow!
18The trial was funded and a research fellow, a GP,
appointed. The cluster nature of the study was
self-evident to me. It was not self-evident to
the research fellow! Many researchers find the
importance of clustering very hard to understand.
19The study appeared including the following
description of the analysis For each general
practitioner a score was calculated for each
questionnaire item. Analysis of variance was
then carried out for each questionnaire item to
compare the three groups . . .
20How big is the effect of clustering? The design
effect is what we must multiply the sample size
for a trial which is not clustered, to achieve
the same power. Alternatively, the power of a
cluster randomised trial is the power of an
individuall randomised trial of size divided by
the design effect. Design effect Deff 1
(m - 1)ICC where m is the number of observations
in a cluster and ICC is the intra-cluster
correlation coefficient, the correlation between
pairs of subjects chosen at random from the same
cluster.
21Deff 1 (m - 1)ICC ICC is usually quite
small, 0.04 is a typical figure. If m 1,
cluster size one, no clustering, then Deff 1,
otherwise Deff will exceed 1.
22If we estimate the required sample size ignoring
clustering, we must multiply it by the design
effect to get the sample size required for the
clustered sample. Alternatively, if the sample
size is estimated ignoring the clustering, the
clustered sample has the same power as for a
simple sample of size equal to what we get if we
divide our sample size by the design effect.
23If we analyse the data as if there were no
clusters, the variances of the estimates must be
multiplied by Deff, hence the standard error must
be multiplied by the square root of Deff.
24Deff 1 (m - 1)ICC Clustering may have a
large effect if the ICC is large OR if the
cluster size is large. E.g., if ICC 0.001,
cluster size 500, the design effect will be 1
(500 1)?0.001 1.5, Need to increase the
sample size by 50 to achieve the same power as
an unclustered trial.
25Deff 1 (m - 1)ICC Clustering may have a
large effect if the ICC is large OR if the
cluster size is large. E.g., if ICC 0.001,
cluster size 500, the design effect will be 1
(500 1)?0.001 1.5, Need to increase the
sample size by 50 to achieve the same power as
an unclustered trial. Need to estimate
variances both within and between clusters. If
the number of clusters is small, the between
clusters variance will have few degrees of
freedom and we will be using the t distribution
in inference rather than the Normal. This too
will cost in terms of power.
26Example a grant application An evaluation of a
peer-led health education intervention. A
comparison of two groups each of two clusters
(counties) of about 750 people each.
27Applicants were aware of the problem of cluster
randomisation, but did not give any assessment of
its likely impact on the power of the study,
except to say that the intra-cluster correlation
was "small", i.e. 0.005 based on a US study.
28 Deff 1 (m - 1)ICC For the proposed
design, the mean number of subjects in a cluster
was about 750, so Deff 1 750 0.005
4.75 Thus the estimated sample size for any
given comparison should be multiplied by 4.75.
29The estimated sample size for any given
comparison should be multiplied by 4.75. We
have the same power as an individually randomised
sample of 3000/4.75 630
30Degrees of freedom In large sample approximation
sample size calculations, power 80 and alpha 5
are embodied in the multiplier (0.85 1.96)2
7.90.
31For a small sample calculation using the t test,
1.96 must be replaced by the corresponding 5
point of the t distribution with the appropriate
degrees of freedom. 2 degrees of freedom gives t
4.30. Hence the sample size multiplier is
(0.85 4.30)2 26.52 3.36 times that for the
large sample.
32This will reduce the effective sample size even
more, down to 630/3.36 188. Thus the 3000
men in two groups of two clusters will give the
same power to detect the same difference as 188
men randomised individually.
33This will reduce the effective sample size even
more, down to 630/3.36 188. Thus the 3000
men in two groups of two clusters will give the
same power to detect the same difference as 188
men randomised individually. This proposal came
back with many more clusters.
34Cluster size small, large number of clusters,
small ICC Design effect close to one. Little
effect if the clustering is ignored. E.g.
randomised controlled trial of the effects of
coordinating care for terminally ill cancer
patients (Addington-Hall et al., 1992). 554
patients randomised by GP. About 200 GPs, so
most clusters had only a few patients. Ignored
the clustering.
35- Several approaches can be used to allow for
clustering - summary statistic for each cluster
- adjust standard errors using the design effect
- robust variance estimates
- general estimating equation models (GEEs)
- multilevel modeling
- Bayesian hierarchical models
- others
36- Several approaches can be used to allow for
clustering - summary statistic for each cluster
- adjust standard errors using the design effect
- robust variance estimates
- general estimating equation models (GEEs)
- multilevel modeling
- Bayesian hierarchical models
- others
- Any method which takes into account the
clustering will be a vast improvement compared to
methods which do not.
37A refereeing case study Paper sent in 1997 by
the BMJ. Study of the impact of a specialist
outreach team on the quality of nursing and
residential home care.
38Intervention carried out at the residential home
level. Eligible homes were put into matched
pairs and one of each pair randomised to
intervention. Thus the randomisation was
clustered.
39The randomisation was clustered. Intervention
was applied to the care staff, not to the
patients. The residents in the home were used
to monitor the effect of the intervention on the
staff.
40Clustering was totally ignored in the analysis.
41Clustering was totally ignored in the analysis.
Used the patient as the unit of analysis.
42Clustering was totally ignored in the analysis.
Used the patient as the unit of
analysis. Carried out a Mann-Whitney test of the
scores between the two groups at baseline. This
was not significant.
43Clustering was totally ignored in the analysis.
Used the patient as the unit of
analysis. Carried out a Mann-Whitney test of the
scores between the two groups at baseline. This
was not significant. Mann-Whitney test at
follow-up, completely ignoring the baseline
measurements.
44Clustering was totally ignored in the analysis.
Used the patient as the unit of
analysis. Carried out a Mann-Whitney test of the
scores between the two groups at baseline. This
was not significant. Mann-Whitney test at
follow-up, completely ignoring the baseline
measurements. Wilcoxon matched pairs test for
each group separately and found that one was
significant and the other not.
45Possible approaches Summary statistic for the
home, e.g. the mean change in score. These could
then be compared using a t method. As the
homes were randomised within pairs, I suggested
the paired t method. (This may not be right, as
the matching variables may not be informative and
the loss of degrees of freedom may be a problem.)
The results should be given as a difference in
mean change, with a confidence interval as
recommended in the BMJs guide-lines to authors,
rather than as a P value. Alternative fit a
multi-level model, with homes as one level of
variability, subjects another, and variation
within subjects a third. A job for a
professional statistician.
46What happened next? The paper was rejected.
47What happened next? The paper was rejected.
Study reported in the Lancet!
48What happened next? The paper was rejected.
Study reported in the Lancet! Extra author,
a well-known medical statistician. The unit of
randomisation in the study was the residential
home and not the resident. Thus, all data were
analysed by use of general estimated equation
models to adjust for clustering effects within
homes. . . . Clinical data are presented as means
with 95 CIs calculated with Huber variance
estimates..
49I looked for the acknowledgement to an unknown
referee, in vain.
50Reviews of published trials There have been
several reviews of published cluster randomised
trials in medical applications.
51Some reviews of published cluster randomised
trials
Authors Source Years Clustering allowed for in sample size Clustering allowed for in analysis
Donner et al. (1990) 16 non-therapeutic intervention trials 1979 1989 lt20 lt50
Simpson et al. (1995) 21 trials from American Journal of Public Health and Preventive Medicine 1990 1993 19 57
Isaakidis and Ioannidis (2003) 51 trials in Sub-Saharan Africa 1973 2001 (half post 1995) 20 37
Puffer et al. (2003) 36 trials in British Medical Journal, Lancet, and New England Journal of Medicine 1997 2002 56 92
Eldridge et al. (in press 2003) 152 trials in primary health care 1997 - 2000 20 59
52Importance for the evidence base Incorrect
analyses may produce false conclusions. Sample
sizes may be too small.
53Key references Murray DM. (1998) The Design and
Analysis of Group-Randomized Trials. Oxford,
University Press. Donner A, Klar N. (2000)
Design and Analysis of Cluster Randomised Trials
in Health Research. London, Arnold. Many papers
by Alan Donner and colleagues. Campbell MK,
Elbourne DR, Altman DG for the CONSORT Group. The
CONSORT statement extension to cluster
randomised trials. Submitted for
publication. Bland JM, Kerry SM, Altman DG.
Statistics Notes series in British Medical
Journal, numbers 29-34 www.york-users.ac.uk/mb
55
54Publications on cluster designs How-to-do-it
papers. Statistics notes in the BMJ. Articles in
GP journals. Special editions of Statistical
Methods in Medical Research and Statistics in
Medicine. Papers reporting intraclass correlation
coefficients to help others to design clustered
studies.
55Web of Knowledge search on randomi in clusters
OR cluster randomi
56This is not a thorough search and will have
missed many studies. 2001 includes special issues
of Statistics in Medicine and Statistical Methods
in Medical Research on cluster randomisation. Igno
res papers using clusters in observational
studies.
57Ignores other terms e.g. group
randomised. Cornfield (1978) Randomisation by
group A formal analysis includes the
following Randomization by cluster accompanied
by an analysis appropriate to randomization by
individual is an exercise in self-deception,
however, and should be discouraged. Murray
(1998). The Design and Analysis of
Group-Randomized Trials. Oxford, University
Press.
58Are any of these trials social science? van der
Molen HF, Sluiter JK, Hulshof CTJ, Vink, P, van
Duivenbooden, C, Holman, R, Frings-Dresen, MHW.
TI Implementation of participatory ergonomics
intervention in construction companies.
Scandinavian Journal of Work Environment Health
31, 191-204. Study objective The effectiveness
of the implementation of participatory ergonomics
intervention to reduce physical work demands in
construction work was studied.
59Are any of these trials social science? Shemilt
I, Harvey I, Shepstone L, Swift L, Reading R,
Mugford M, Belderson P, Norris N, Thoburn J,
Robinson J. (2004) A national evaluation of
school breakfast clubs evidence from a cluster
randomized controlled trial and an observational
analysis. Child Care Health and Development 30,
413-427. Study objective To measure the health,
educational and social impacts of breakfast club
provision in schools serving deprived areas
across England. Also Shemilt, I, Mugford M,
Moffatt P, Harvey I, Reading R, Shepstone L,
Belderson P. (2004) A national evaluation of
school breakfast clubs where does economics fit
in? Child Care Health and Development 30,
429-437.
60Are any of these trials social science? Strang
J, McCambridge J. (2004) Can the practitioner
correctly predict outcome in motivational
interviewing? Journal of Substance Abuse
Treatment 27. 83- 88, Study objective We have
examined whether practitioner ratings
(immediately post-intervention) or other recorded
characteristics of a single-session 1-hour
motivational intervention were predictive of
3-month cannabis use outcome.
61Are any of these trials social
science? Stephenson JM, Strange V, Forrest S,
Oakley A, Copas A, Allen E, Babiker A, Black S,
Ali M, Monteiro H, Johnson AM.. (2004)
Pupil-led sex education in England (RIPPLE
study) cluster-randomised intervention trial.
Lancet 364, 338-346. Study objective Improvement
of sex education in schools is a key part of the
UK government's strategy to reduce teenage
pregnancy in England. We examined the
effectiveness of one form of peer-led sex
education in a school-based randomised trial of
over 8000 pupils.
62Are any of these trials social
science? Kendrick D, Royal S (2004) Cycle
helmet ownership and use a cluster randomised
controlled trial in primary school children in
deprived areas. Archives of Disease in Childhood
VL 89, 330-335. Study objective To assess the
effectiveness of two different educational
interventions plus free cycle helmets, in
increasing cycle helmet ownership and use.
63- Conclusions
- The effects of clustering can be large,
inflating Type I errors.
64- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians.
65- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat
Homerus)
66- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat
Homerus) (Even the worthy Homer sometimes nods)
67- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. (Quandoque bonus dormitat
Homerus) (Even the worthy Homer sometimes
nods) (Even the greatest get it wrong).
68- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. - There are many ways to allow for clustering.
69- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. - There are many ways to allow for clustering.
- The number of cluster randomised trials
published has increased greatly.
70- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. - There are many ways to allow for clustering.
- The number of cluster randomised trials
published has increased greatly. - The effects of clustering have often been
ignored.
71- Conclusions
- The effects of clustering can be large,
inflating Type I errors. - This may not be obvious to researchers, even to
statisticians. - There are many ways to allow for clustering.
- The number of cluster randomised trials
published has increased greatly. - The effects of clustering have often been
ignored. - The situation has improved.
72- Recommendations
- Keep up the pressure.
73- Recommendations
- Keep up the pressure.
- Extend to specialist journals.
74- Recommendations
- Keep up the pressure.
- Extend to specialist journals.
75Randomised Controlled Trials in the Social
Sciences Cluster randomised trials Martin
Bland Professor of Health Statistics University
of York www-users.york.ac.uk/mb55/