Title: Difficulties in analysing nonrandomised trials and ways forward
1Difficulties in analysing non-randomised trials
(and ways forward?)
- RCTs in the Social Sciences challenges and
prospects. York University, 13-15 Sept. 2006 - Paul Marchant
- Leeds Metropolitan University
- p.marchant_at_leedsmet.ac.uk
- (Paul Baxter from Department of Statistics,
University of Leeds is involved in developing
some of this work)
2The Basic Point
- My thoughts,
- If Non_RCTs are used, we need a good
understanding of the system being studied and a
quantitative model to work out what is lost and
what the effect is. - The effects being sought may be small so impact
of small systematic errors can be important. - Probably best just to use RCTs, especially when
policy implications are costly.
3The problem
- In crime research there is a 5 point Maryland
Scientific Methods Scale which orders trial
designs (RCT is the top ) - While the ordering may be fine there is no
formal indication of what is lost by using a 4
rather than a 5. - A large potential exists it would seem of drawing
false inference.
4The Randomised Controlled Trial(A truly
marvellous scientific invention)
- Note to avoid bias
- Allocation is best made tamper-proof.
- (e.g. use concealment)
- Use multiple blinding of
- patients,
- physicians,
- assessors,
- analysts
Population
Take Sample
Randomise to 2 groups
Old Treatment
New Treatment
Compare outcomes (averages) recognising that
these are sample results and subject to sampling
variation when applying back to the population
5Counts of those cured and not cured under the two
treatments
By comparing the ratios of numbers cured to
not cured in the 2 arms of the trial, the CPR
(ad)/(cb), it is possible to tell if the new
treatment is better.
6Confidence Intervals
- However there is sampling variability, because we
dont study everybody of interest just our
random sample. - So cannot have perfect knowledge of the effect of
interest, but only an estimate of it within a
confidence interval (CI). - Need to know how to calculate the CI
appropriately. This can be done under
assumptions, which seem reasonable for the case
of a clinical RCT and leads to a simple formula
for the approximate CI (/-1.96 standard error)
of ln(CPR) - (s.e. (ln(CPR)) )2 Var(ln(CPR))
- 1 1 1 1
- a b c d
7Crime counts before and after in two areas one
gets a CRI (4 on the Methods Scale)
- A similar table results. But this is not the same
as the RCT set up as - 1 Not randomised, so no statistical equivalence
exists at the start. - 2 The unit is area, rather than crime event.
8Lighting and crime
- There seem to be many theoretical suggestions
why lighting might increase or decrease crime. - The meta-analysis, HORS251, by Farrington and
Welsh suggests strongly that lighting beats
crime. However my contention is that this study
remains flawed and so we are ignorant of the
effect of lighting on crime. (Note also HORS252
on CCTV)
9Forest Plot as HORS 251 Meta-analysisreconstructe
d
10But this cant be right.
- The assumptions for calculating the CIs cannot be
correct, in this case. Unit is area not crime.
The events are not statistically independent. - Too much variation (heterogeneity) exists between
individual study results compared with the
uncertainty indicated by confidence intervals,
(if the lighting has the same effect on crime in
every study). - Note there is great variation in crime counts
between periods in the comparison areas, where
nothing is changed, so the heterogeneity is
inherent to the natural variation of crime.
11Pointing out the problem
- Marchant (2004), 7 page article in the British
Journal of Criminology drawing attention to the
problem. The formula for the CIs used must be
inappropriate (also mentioning other
short-comings). - The authors of HORS251 had 20-page response on
the next page, justifying the claim that lighting
reduces crime. - But I remain unconvinced by the claim.
12Fixing the Heterogeneity Problem
- A way of making the problem go away is simply to
increase the uncertainty, i.e. stretch the CIs .
(A quasi-Poisson model). - Here the CIs are stretched by a factor of 2.1.
(Equivalent to reducing the events counted in
every setting by a factor 2.12 4.4. ). This
adjustment has been made by the authors. - Problem solved.... or is it? Is such model
plausible? Assumes every study should have its CI
stretched by the same factor. This cannot be
guaranteed. - Only relatively few (13) studies.
- Need sensitivity analysis
13Time Variation in Crime
- It appears that little is known about how crime
varies on various scales. - Much more needs to be known about the occurrence
of crime events to know how to analyse them
properly to be able find effects. - Need access to suitable data sets to examine this
issue. This is on going research in which myself
and colleagues are engaged. - A general point one needs to have knowledge
about the system in order to understand if an
intervention changes things. (And in order to
design studies)
14The Bristol Study (Shaftoe 1994)
Shaftoe said no discernable lighting benefit
but HORS251 said z6.6 Note had the data for the
year immediately prior to the introduction of the
relighting, i.e. periods 2 and 3, been used
rather than unnaturally using periods 1 and 2
which leaves a gap of ½ year, the effect found
would have been half of that claimed. (Shows
large variability.)
15Household studies
- In a couple of instances, instead of just
counting recorded crimes a, b, c, d in the 4
cells (before, after, intervention, comparison),
a household survey before and after of recalled
crimes within the 2 areas (intervention,
comparison) is carried out. - One problem is that (unrecognised by authors
Painter and Farrington) spatial correlation
between the occurrence of crime needs to
considered. Gives rise to a Design Effect
familiar in clustered designs. Reduces the
precision of the estimate of effect. - Other problems, e.g. of differential change of
composition between periods.
16Lack of Equivalence between Areas
- Invariably it is the most crime-ridden area that
gets the lighting, whereas the relatively
crime-free control area is not re-lit. So there
is lack of equivalence at the start. One effect
of this is to allow regression towards the mean
to operate. - The name Control Area is a misnomer.
Comparison Area is a better name.
17Regression towards the mean
Line of Equality
100
Line of mean of Y for a given X
Cloud of Data Points
50
Y The after measurement
0
0
100
50
X The before measurement
18The response given to the lack of equivalence
between the 2 areas. (RTM)
- Farrington and Welsh (2006) claim that RTM is a
not problem because the effect in counted crimes
in 250 Police Basic Command Units going from
2002/3 to 2003/4 showed only small effect (a few
). This is hardly surprising as the areas and
hence the number of crimes counted are an order
of magnitude larger than in HORS251 so the year
to year correlation is expected to be higher than
for the small lighting study areas. - Note Wrigley (1995) This tendency for
correlation coefficients to increase in magnitude
as the size of the areal unit involved increases
has been known since the work of Gehlke and Biehl
(1934).
19Log crime rates in successive periods
20Estimating the effect of RTM
- On the basis of log normal crime rates it can be
shown that if the intervention has no effect, the
expected ln CPR (1-?sy/sx) ln x1/x2 - x1/x2 is the crime rate ratio sx, sy the sds on
the log scale and ? the correlation on the log
scale - variance ln CPR 2 sy2(1-?2)
21Estimation of the effect of RTM
- The simple model of crime rates suggests that the
high year to year correlation typically 0.95 for
the BCU data, would indeed give an effect of a
few . - However the smaller areas used in CRI evaluation
would be expected to have lower correlation - Burglary data from a study of 124 areas has
correlation of about 0.8 giving, all else equal,
an expected effect 4 times larger comparable to
the claimed lighting effect. - Note in general we dont know the correlation
nor rates being compared for the lighting
studies. However, we do know, whereas the
household crime rate ratio at the start was 1.40
for Dudley, that for Stoke was 2.51 giving a much
larger expected RTM effect. - Without better knowledge we cant be definite
about the impact of RTM but the indications are
that the bias could be serious and uncertainty
large.
22Expected natural log of CPR and its CI for a set
of burglary data.
23Potential consequences of weak methods
- Because there is a tendency to find positive
effects and probably even more so with less
rigorous work, one is likely to end up with an
even more distorted research record. - This might lead dubious justification through
flimsy cost benefit analyses justifying a bad
policy. - While it might be possible to estimate the effect
of the excess variability or the effect of RTM
discussed, it would seem problematic to be
confident about adequately adjusting for them. - RCTs would avoid many problems and may be very
cheap relative to policy costs.
24Some conclusions
- A Methods Scale seems to suggest that designs
weaker than RCTs might suffice, without
indicating what is lost. - I have indicated some of the problems which
result. - Need to foster scepticism (Gorard 2002)
- I remain to be convinced that the deficiencies
can be adequately overcome through estimating
quantitatively the consequences of using a weaker
design. - Weaker designs might be useful in preliminary
research but should not be considered as adequate
when there are expensive consequences. - RCTs can be problematic enough! (We need
registered trials, published protocols, blinding
etc..) - Evaluations of policies need to be done to a high
scientific standard. -
25References
- Farrington D.P. and Welsh B.C. (2002) The Effects
of Improved Street Lighting on Crime A
Systematic Review, Home Office Research Study
251, http//www.homeoffice.gov.uk/rds/pdfs2/hors25
1.pdf - Farrington D.P. and Welsh B.C. (2004) Measuring
the Effects of Improved Street Lighting on Crime
A reply to Dr. Marchant The British Journal of
Criminology 44 448-467 http//bjc.oupjournals.org
/cgi/content/abstract/44/3/448 - Farrington D.P. and Welsh B.C. (2006) How
Important is Regression to the Mean in Area-Based
Crime Prevention Research?, Crime Prevention and
Community Safety 8 50 - Gorard S (2002) Fostering Scepticism The
Importance of Warranting Claims, Evaluation and
Research in Education 16 3 p136 - Marchant P.R. (2004) A Demonstration that the
Claim that Brighter Lighting Reduces Crime is
Unfounded The British Journal of Criminology 44
441-447 http//bjc.oupjournals.org/cgi/content/abs
tract/44/3/441
26References continued
- Marchant P.R. (2005) What Works? A Critical Note
on the Evaluation of Crime Reduction Initiatives, - Crime Prevention and Community Safety 7 7-13
- Painter, K. and Farrington, D. P. (1997) The
Crime Reducing Effect of Improved Street
Lighting The Dudley Project, in R.V. Clarke ed.,
Situational Crime Prevention Successful case
studies 209-226 Harrow and Heston, Guilderland
NY. - Shaftoe, H (1994) Easton/Ashley, Bristol
Lighting Improvements, in S. Osborn (ed.) Housing
Safe Communities An Evaluation of Recent
Initiatives 72-77, Safe Neighbourhoods Unit,
London - Tilley N., Pease K., Hough M. and Brown R. (1999)
Burglary Prevention Early Lessons from the Crime
Reduction Programme, Crime Reduction Research
series Paper1 London Home Office - Wrigley N., Revisiting the Modifiable Areal Unit
Problem and Ecological Fallacy pp49-71 in Gould
PR, Hoare AG and Cliff AD Eds Diffusing
Geography Essays for Peter Haggett
27The RTM problem
- The effect of RTM depends on the correlation (the
weaker, the bigger) and increases with the size
of the initial difference between groups. - Authors attempt to justify no RTM concern with
large area crime data which shows only a small
RTM effect. But this is wrong, as correlation
wont be as high in the smaller areas used in the
trials. We also dont know the rates in the areas
in general for the 2 we do. They are quite
different. (1.4X and 2.5X)