Title: Tackling over-dispersion in NHS performance indicators
1Tackling over-dispersion in NHS performance
indicators
- Robert Irons (Analyst Statistician)
- Dr David Cromwell (Team Leader)
20/10/2004
2Outline of presentation
- NHS Star Ratings Model
- Criticism of some of the indicators
- The reason overdispersion
- Options for tackling the problem
- Our solution an additive random effects model
- Effects on the ratings indicators
3Performance Assessment in the UK
- 1990s Government focused on efficiency
- 1997 Labour replaces Conservative government
- Late 90s Labour focus on quality efficiency
- Define Performance Assessment Framework
- Publish NHS Plan in 2000
- Commission for Health Improvement (CHI) created
- Performance ratings first published in 2001,
responsibility passed to CHI for 2003 publication - Healthcare Commission replaces CHI on April 2004,
has broader inspection role
4NHS Performance Ratings
- An at a glance assessment of NHS trusts
performance - Performance rated as 0, 1, 2, or 3 stars
- Yearly publication
- Focus on how trusts deliver government priorities
- Linked to implementation of key policies
- Priorities and Planning framework
- National Service Frameworks
- Have limited role in direct quality improvement
- Modernisation agency helps trusts with low rating
5Scope of NHS ratings
2001 2002 2003 2004
Acute trusts ? ? ? ?
Ambulance trusts ? ? ?
Mental health trusts ? ?
Primary care trusts ? ?
6The ratings model
- Overall rating derived from many different
indicators - and affected by Clinical Governance Reviews
- Two types of indicators, organised in 4 groups
- Key targets Balanced Scorecard indicators
- BS indicators grouped into 3 focus areas
- Patient focus, clinical focus, capacity
capability
7Combining the indicators
- Indicators are measured on different scales
- Categorical (eg. Yes/No)
- Proportional (eg. proportion of patients waiting
longer than 15 months) - Rates (eg. mortality rate within 30 days
following selected surgical procedures) - Further complication
- Performance on some indicators is measured
against published targets define thresholds - Performance on other indicators is based on
relative differences between trusts
8Combining the indicators
- Indicators first transformed so they are all on
an equivalent scale - Key targets assigned to three levels
- achieved
- under-achieved
- significantly under-achieved
- Balanced scorecard indicators
- 1 significantly below average (worst
performance) - 2 below average
- 3 average
- 4 above average
- 5 significantly above average (best
performance)
9Transforming the indicators
- Key target indicators transformed using
thresholds defined by government policy - Balanced scorecard indicators transformed via
several methods - Percentile method
- Statistical method
- Absolute method, if policy target exists
- Mapping method (for indicators with ordinal
scales)
Trust type Trust type Trust type Trust type
Acute trusts Ambulance trusts Mental health trusts Primary care trusts
Percentile 11 3 9 11
Statistical 12 8 9 11
Absolute 8 3 5 4
Defined mapping 4 5 8 7
10Transforming the indicators- the statistical
method
Trust type Trust type Trust type Trust type
Indicators Acute trusts Ambulance trusts Mental health trusts Primary care trusts
Clinical indicators 4 2
Patient survey 5 5 4 5
Staff survey 3 3 3 3
Change in rate indicators 3
11The old statistical method
- Based on simple confidence intervals
- 95 and 99 confidence intervals calculated for a
trusts indicator value - Trust confidence interval compared with the
overall national rate (effectively a single point)
12The old statistical method- problematic
- Not a proper statistical hypothesis test
- Differentiating between trusts based on
differences that exceed levels of sampling
variation - On some indicators, this led to the assignment of
too many NHS trust to the significantly good/ bad
bands on some indicators
13Working example- standardised readmission rate
of patients within 28 days of initial discharge
Significantly below average Below average Average Above average Significantly above average Total
32 6 40 13 49 140
14Readmissions within 28 days of discharge- funnel
plot (2003/04 data)
15Mortality within 30 days of selected surgical
procedures- funnel plot (2003/04 data)
16Z scores
- Standardised residual
- Z scores are used to summarise extremeness of
the indicators - Funnel plot limits approximate to the naïve Z
score - Naïve Z score given by
- Zi (yi t)/si
- Where yi is the indicator value, and si is the
local standard error
17Dealing with over-dispersion
- Three options were considered
- Use of an interval null hypothesis
- Allow for over-dispersion using a multiplicative
variance model - or a random-effects additive variance model
18Interval null hypothesis
- Similar to the naïve Z score or standard funnel
limits - Uses a judgement of what constitutes a normal
range for the indicator - Define normal range (eg percentiles, national
rate x) - Funnel limits then defined as
- Upper/ lower limit Range limit (x si0)
- Reduces number of significant results
- But might be considered somewhat arbitrary
- Interval could be defined based on previous
years data, or prior knowledge - Makes minimal use of the sampling error
19Interval null hypothesis-a funnel plot
20Multiplicative variance model
- Inflates the variance associated with each
observation by an over-dispersion factor (? ) - ? Zi2 Pearson X2
- ? X2 / I
- Limits on funnel plot are then expanded by ? ?
- Do not want ? to be influenced by the outliers we
are trying to identify - Data are first winsorised (shrinks the extreme
z-values in) - Over dispersion factor could be provisionally
defined based on previous years data - Statistically respectable, based on a
quasi-likelihood approach
21Multiplicative over-dispersion-a funnel plot
(not winsorised, ? 21.45)
22Multiplicative over-dispersion-a funnel plot
(10 winsorised, ? 13.97)
23Winsorising
- Winsorising consists of shrinking in the extreme
Z-scores to some selected percentile, using the
following method. - Rank cases according to their naive Z-scores.
- Identify Zq and Z1-q, the (100q) most extreme
top and bottom naive Z-scores, where q might, for
example, be 0.1 - Set the lowest (100q) of Z-scores to Zq, and
the highest (100q) of Z-scores to Z1-q. These
are the Winsorised statistics. - This retains the same number of Z-scores but
discounts the influence of outliers.
24Winsorising
Non winsorised
10 winsorised
25Random effects additive variance model
- Based on a technique developed for meta-analysis
- Originally designed for combining the results of
disparate studies into the same effect - In meta-analysis terms, consider the indicator
value of each trust to be a separate study - Essentially seeks to compare each trust to a
null distribution instead of a point - Assumes that Eyi ?i, and V?i
- Uses a method-of-moments method to estimate
- (Dersimonian and Laird, 1986)
- Based on winsorised estimate of ?
26Random effects additive variance model
- If ( I ? ? ) lt ( I 1) then
- the data are not over-dispersed, and 0
- use standard funnel limits/ naïve Z scores
- Otherwise
- Where wi 1 / si2
- The new random-effects Z score is then calculated
as -
27Comparing to a null distribution
28Additive over-dispersion-a funnel plot (20
winsorised)
29Effects on the banding of trusts- Readmissions
2002/03 data
Significantly below average Below average Average Above average Significantly above average
Previous banding method 32 6 40 13 49
Random-effects (20 winzorised) 3 9 101 21 6
30Why we chose the additive variance method
- Generally avoids situations where two trusts
which have the same value for the indicator get
put in different bands because of precision - A multiplicative model would increase the
variance at some trusts more than at others - e.g. a small trust with large variance would be
affected much more than a large trust with small
variance - By contrast, an additive model increases the
variance at all trusts by the same amount - Better conceptual fit with our understanding of
the problem, that the factors inflating variance
affect all trusts equally, so an additive model
is preferable
31References DJ Spiegelhalter (2004) Funnel plots
for comparing institutional performance.
Statistics in Medicine, 24, (to appear) DJ
Spiegelhalter (2004) Handling over-dispersion of
performance indicators (submitted) R DerSimonian
N Laird (1986) Meta-analysis in clinical
trials. Controlled Clinical Trials,
7177-188 Acknowledgements David
Spiegelhalter Adrian Cook Theo Georghiou Thank you