Title: Performance Monitoring in the Public Services
1Performance Monitoring in the Public Services
2Royal Statistical SocietyPerformance
Indicators Good, Bad, and Ugly
- Some good examples, but
- Scientific standards, in particular statistical
standards, have been largely ignored
3Royal Statistical Society concern
- PM schemes
- Well-designed, avoiding perverse behaviours,
- Sufficiently analysed (context/case-mix)
- Fairly reported (measures of uncertainty)
- Shielded from political interference.
- Address seriously criticisms/concerns of those
being monitored
41. Introduction
- 1990s rise in government by measurement
- goad to efficiency effectiveness
- better public accountability
- (financial)
5Three uses of PM data
- What works? (research role)
- Well/under-performing institutions or public
servants . . . (managerial role) - Hold Ministers to account for stewardship of
public services (democratic role)
62. PM Design, Target Setting Protocol
- How to set targets
- Step 1 Reasoned assessment of plausible
improvement within PM time-scale - Step 2 Work out PM schemes statistical potential
- ( power) re this rational target see p11
7 Power matters
- Excess power - incurs unnecessary cost
- Insufficient power risks failing to identify
effects that matter - Insufficient power cant trust claims of policy
equivalence - How not to set targets see p12
83. Analysis of PM data same principles
- Importance of variability
- intrinsic part of real world interesting per se
contributes to uncertainty in primary
conclusions p15 - Adjusting for context to achieve comparability
- note p17 incompleteness of any
adjustment - Multiple indicators
- resist 1-number summary
- (avoid value judgements reveal intrinsic
variation)
94. Presentation of PIs same principles
- Simplicity / discard uncertainty
- League tables ? uncertainty of ranking PLOT 1
- Star banding ? show uncertainty of
- institutions
banding - Funnel plot variability depends on sample size
divergent hospitals stand out see PLOT 2
10Plot 1 95 intervals for ranks
115. Impact of PM on the public services
- Public cost if PM fails to identify
under-performing institutions so no remedial
action is taken - Less well recognised
- Institutional cost falsely labelled as
under-performing - Unintended consequences e.g. risk-averse
surgeons -
126. Evaluating PM initiatives
- Commensurate with risks costs
- How soon to start (Europes BSE testing)
- Pre-determined policy roll-out (rMDT)
- Disentangling (several) policy effects
- Role of experiments ( randomisation)
13Missed opportunities for experiments
(including randomisation)
- rMDT of prisoners
- Drug Treatment Testing Orders
- Cost-effectiveness matters!
14What works in UK criminal justice?
- RCTs essentially untried . . .
15Judges prescribe sentence on lesser evidence than
doctors prescribe medicines
167. Integrity, confidentiality ethics
- Integrity (statistical)
- For public accountability PIs need
wider-than-government consensus safeguards, as
for National Statistics. - Lacking if irrational targets, insufficient
power, cost-inefficient, analysis lacks
objectivity or is superficial.
17(No Transcript)
18Confidentiality ethics
- PM accesses data about people 3rd party
approval? - Public naming of individuals their informed
consent legal requirements re confidentiality,
as for National Statistics? - Unwarranted cost to reputations superficial
analysis, missing uncertainty measures ? legal
human rights liabilities?
19Royal Statistical Society is calling for
- PM protocols
- Independent scrutiny
- Reporting of measures of uncertainty
- Research into strategies other than name
shame better designs for evaluating policy
initiatives - Wider consideration of PM ethics
cost-effectiveness
20Royal Statistical Society is appealing to
journalists
- When league tables or star ratings are
published, - insist on access to ( reporting of) the
measure of uncertainty that qualifies each
ranking or rating. - Without this qualifier, no-one can separate the
chaff from the wheat, the good from the bad . . .
21Experiments Power matter
- Sheila M. Bird, chair
- RSS Working Party on Performance Monitoring in
the Public Services
22Evaluations-charadePublic money spent on
inferior (usually non-randomised) study designs
that result in poor-quality evidence about how
well policies actually work
- ? costly, inefficient by denying scientific
method, a serious loss in public accountability
23Experimentationasbeacon of lightonpolicy
24Missed opportunities for experiments
(including randomisation)
- rMDT of prisoners
- Drug Treatment Testing Orders (DTTOs)
- Cost-effectiveness matters!
25Court Portcullis DTTO-eligible offenders do
DTTOs work ?
- Off 1 DTTO
- Off 2 DTTO
- Off 3 alternative
- Off 4 DTTO
- Off 5 alternative
- Off 6 alternative
- Database linkage to find out about major harms
offenders deaths, re-incarcerations .
. .
26Court Portcullis DTTO-eligible offenders
cost-effectiveness ?
- Off 7 DTTO
- Off 8 alternative
- Off 9 alternative
- Off10 DTTO
- Off11 DTTO
- Off12 alternative
- Off13 DTTO
- Off14 alternative
- Breaches . . . drugs spend?
27Court Portcullis DTTO-eligible offenders UK
plc ? guess
- Off 7 DTTO ?
- Off 8 DTTO ?
- Off 9 DTTO ?
- Off10 DTTO ?
- Off11 DTTO ?
- Off12 DTTO ?
- Off13 DTTO ?
- Off14 DTTO ?
- (before/after) Interviews versus . . .
?
28Similar other offences . . .
- Random Mandatory Drugs Testing of Prisoners
Cons. punishment as deterrent/exacerbation of
heroin use in jail - Intensive Control and Change Programme Lab.
employment re-offending rate - Scotlands Airborne Initiative Lib/Lab
relative cost-effectiveness - On-charge Drugs Testing Lab. impact on a)
arrest referral, b) treatment uptake c) court
sentence - etc
29Evaluations-charade
- Failure to randomise
- Failure to find out about major harms
- Failure even to elicit alternative sentence ?
funded guesswork on relative cost-effectiveness - Volunteer-bias in follow-up interviews
- Inadequate study size re major outcomes . . .
30Power (study size) matters!
- Back-of-envelope sum for 80 power
- Percentages
- Counts
- If MPs dont know,
- UK plc keeps hurting
31Percentages!
- Prison Service, EW 60,000 rMDTs per annum ?
excess statistical power to identify if
prisoners heroin ve rate decreased from 5 to
4.5 between years simple sum
- Scottish Prison Service 6,000 rMDTs per annum ?
only 50 power to identify if heroin ve rate
decreased from 15 to 13.5. - BUT, comfortable power per 2-year period
32Back-of-envelope sum for 80 power?
- Step 1
- Success failure rate old Success failure
rate new target - --------------------------------------------------
------------------------------ - Difference in success rates difference in
success rate - Step 2
- Multiply Step 1 answer by 8 to get study number
for old. Study same number for new.
33To compare completion rates of 40 old versus
50 new target?
- Step 1
- Success failure rate old Success failure
rate new target - --------------------------------------------------
------------------------------ - Difference in success rates difference in
success rate - 40 60 50 50
- -----------------------------------------
------- - (50 40) (50 40)
34To compare CJ completion rates of 40 old
versus 50 new target?
- Step 1
- 40 60 50 50 2400 2500 4900
- --------------------------
------------------- ---------- - 10 10 100
100 - Step 2 for 80 power
- Multiply Step 1 answer 49 by 8 to get study
number for old 400. Same number for new.
35To compare CJ completion rates of 40 old
versus 50 new target?
- For comfortable 80 power
- Completion rates to be compared between
- 400 old disposal versus 400 new initiative
- Minimum 50 power requires half that !
- Completion rates to be compared between
- 200 old versus 200 new
-
36Power matters counts!
- Scotland 290 drugs-related deaths in 1999 EU
target of 20 reduction by 2004 rationale? - ?
- Nearly sufficient power to assess if target of
230 drugs deaths is met in 2004 - simple sum
- Step 1
- old count target count
- -----------------------------------------
- Count difference count difference
- Step 2
- Multiply Step 1 answer 0.14 by 8 to get study
number of years for old. Same number of years
for new.
37Scotland hazard of drugs-related death depends
on gender? synthesis published data
38Ministerial response?
- Confidential Inquiry into Scotlands
Drugs-related Deaths in 2003 - (200K funded in November 2003)
- versus
- RCT of Naloxone, heroin anti-dote
39Ministers, mind your Ps Qs!
- Five PQs for
- every CJ initiative
40Five PQs for every CJ initiative
- PQ1 Minister, why no randomised controls?
- PQ2 Minister, why have judges not even been
asked to document offenders alternative sentence
that this CJ initiative supplants? - PQ3 What statistical power does Ministerial
pilot have re well-reasoned targets performance
indicators? or kite flying . . . - PQ4 Minister, cost-effectiveness is driven by
longer-term health CJ harms, how are these
ascertained? - PQ5 Minister, any ethical/consent issues?
41(No Transcript)
42Uncertainty, Reporting and Performance Monitoring
- David Spiegelhalter
- MRC Biostatistics Unit, Cambridge
- david.spiegelhalter_at_mrc-bsu.cam.ac.uk
- POST March 2004
43Some relevant experience
- Expert witness in GMC case against Bristol heart
surgeons - Lead in statistical team in Bristol Royal
Infirmary Inquiry - Contributor to Shipman Inquiry
- Part-time post with Commission for Health
Improvement (monitoring and surveillance)
44Outline
- Allowing for uncertainty when monitoring
performance - Can we fairly create league tables?
- How to rank if you must
- Avoiding ranking - funnel plots
- Conclusions
45Primary schools ranked by value-added
measureClass size around 30 means there is
great uncertainty about true underlying
pass-rateBBC WebsiteNote the resulting
rankings need taking with a pinch of salt.
Official statisticians say the significance that
can be attached to different scores depends on
various factors, including the numbers of
children involved.
46Quantifying uncertainty
- If a hospital reports a 20 mortality rate, what
might the true underlying rate be? - First, what do we mean by the true underlying
rate ? - The chance that the next patient will die
- The long-run rate
47Quantifying uncertainty
If a centre reports a 20 mortality rate, what
might the true underlying rate be? This
depends on the sample size
48Indicators with 95 confidence intervals CABG
survival rates for New York Surgeons
49Cardiac surgery in New York State Hospitals
Risk-adjusted mortality rates and 95 intervals
50(No Transcript)
51How fair are league tables, or how to rank if
you must
- Someones got to be bottom
- If we allow for uncertainty, how reasonable is it
to rank institutions? - We can check this by placing a confidence
interval around the rank - This is new statistical methodology, but not
difficult to implement
52Ranks for 51 hospitals with 95
intervalsmortality after fractured hip.
53Ranks for 51 hospitals with 95
intervalsmortality after fractured hip.
- Cannot be 95 confident about any hospital being
in top quarter or bottom quarter - Shows any attempt at detailed ranking is futile
and misleading
54Allowing for uncertainty in a league tableDr
Foster in the Sunday Times
55What about star ratings?
- This is a form of ranking
- By allowing for chance variability in indicators
that contribute to the star rating, we could
quantify how confident we are in the rating - e.g. 90 chance that 3, 10 that 2
56But could we avoid ranking altogether?
- Difficult, as the media will do it anyway
- But could set good example and educate in dangers
of spurious ranking - A funnel plot is one way of comparing
performance without ranking - Plots indicator against sample size
- Alert and alarm limits set
- Form of control chart
- (NHS Modernisation Agency)
57Funnel plot an alternative to the league table
58Teenage pregnancies
- Government aim to reduce teenage pregnancies
(13-15) - Target reduction is 15 between 1998 and 2004
- Hope for 7.5 reduction by 2001
59(No Transcript)
60New York Cardiac Surgeons
61Bristol as an outlier and a volume effect
62Conclusions
- Performance monitoring should acknowledge
uncertainty due to limited sample sizes - Ranking is particularly sensitive to uncertainty
- Analysis and presentation can avoid league
tables - But must be part of coherent performance
assessment framework
63Inspection matters
- Clive B. Fairweather CBE
- formerly, HM Chief Inspector of Prisons, Scotland
64Inspection matters
- Qualifications for the job
- Nil on prisons, but
- Clean driving licence
- Maths ?
65Inspection matters
66Inspection matters
- Instructions for the job
- nil
67Inspection matters
- Ministers interest in my reports
- ?
68Inspection matters
- Initially
- Every prison every 4 years
- Prisons could look good on KPIs, but . . .
69Inspection matters
- Then came Glenochil
- Prisons Health-centre attendances/prescriptions
- (per fortnight TWO prisoners)
- Stirling Hospital admissions
- dislocated shoulder, punctured lung . . .
- Turn-around from drugs on top of prison to
- prison on top of drugs
70Inspection matters
- Cornton Vale womens suicides
- Prisons Health-centre attendances/prescriptions
- (same pattern as Glenochil)
- Stirling Hospital admissions
- (fits drugs)
71Inspection matters intelligently targeted
- No KPI re deaths in prison
- (in-cell television v. morbid contemplation)
- Self-harm
- Prescriptions rMDT heroin positives Assaults
- Overcrowding staff sickness
72Inspection matters intelligently targeted
- Multiple indicators resist 1-number summary
- for every prison in every report
- (illustrated for 4 prisons only)
73Multiple Indicators comparative statistics for
04/2001to 03/2002
74Multiple Indicators comparative statistics for
04/2001to 03/2002
75Multiple Indicators comparative statistics for
04/2001to 03/2002
76Multiple Indicators comparative statistics for
04/2001to 03/2002
77Protocol Independence matter
- Andy P. Grieve
- President, Royal Statistical Society
- Pfizer Pharmaceuticals
78Protocol Matters / Questions
- What is a protocol ?
- Where did the idea of a protocol come from ?
- Why is a protocol necessary ?
79What is a Protocol ?
80A Protocol
- Assumptions / Rationale Choice of PI
- Objectives
- Calculations (power) consultations piloting
- Context/case-mix data checks
- Analysis plan dissemination rules
- Statistical performance of proposed PI monitoring
- follow-up inspections
- Perverse consequences
- PMs cost-effectiveness?
- Identify PM designer analyst to whom queries .
. .
81Where Did the Idea for a Protocol come from ?
- Scientific Method
- E.g. drug development
- Good Clinical Practice (GCP)
- Requires a pre-defined protocol
82GCP Process
- Say what you are going to do
- Do what you said you were going to do
- Verify you did what you said
- The protocol
- The Conduct / Analysis Plan
- Study report/
- Audit Trail
83Why is a Protocol Necessary ?
- Trying to achieve
- Hypothesis gt experiment gt data gt
accept/reject - Trying to prevent
- Experiment gt data gt hypothesis gt accept
- Increase the faith/trust in the scientific
process - No Less true for performance monitoring
84 Independence Matters
- PIs for Public Accountability
- Wide consensus (not only within government)
- How ?
- Design monitoring
- Evidence-based not dogma-based
- Safeguards
- Independence
- c.f. National Statistics
85Value-added (case-mix) matters
- Harvey Goldstein FBA
- Institute of Education,
- University of LONDON
86Value added illustration for two schoolsNote
different slopes and crossover
T2
All Schools
School A
Key Stage 2
T1
School B
S1
Key Stage 1
S2
87Value-added ? True complexity
- Black line all schools true linear
regression of Key Stage 2 outcome on Key Stage 1
results. - 2. Blue line - school B truly adds more value
than schools average to Key Stage 2 performance
of well-performing children at Key Stage 1 (S2)
but less value for poorer performers at Key Stage
1 (S1). - 3. Red regression school As true regression
is non-linear school A exceeds schools
average for lower-ability pupils (S1).
88(No Transcript)
89Performance Monitoring in the Public Services
Cost-effectiveness matters Peter C. Smith,
University of York
90Cost-effectiveness matters
- Why cost-effectiveness matters
- Benefits of performance monitoring
- Costs of performance monitoring
- Direct costs
- Indirect (inadvertent) costs
91Goodharts Law
- As soon as the government attempts to regulate
any particular set of financial assets, these
become unreliable as indicators of economic
trends. - Or
- When a measure becomes a target, it ceases to be
a good measure.
92An example from childrens social services
- CF/A1 Stability of placements of children looked
after (BVPI 49) - Percentage of children looked after at 31 March
with three or more placements during the year - Target is 16 or less.
- Objective is to improve stability for child,
leading to better general outcomes.
93Stability of placements of children some
unintended incentives?
- Avoid looking after challenging children,
especially older children (numerator) - Encourage looking after less challenging children
(denominator) - Time placements to occur after the year end
- Redefine placement, especially in first year
- Discourage temporary care orders
- (For the manager) move to a less challenging
locality!
94The balance sheet for performance monitoring
- A performance monitoring system should, over
time, secure benefits for service users and the
broader citizenry that outweigh its costs.
95Some potential benefits of performance monitoring
- Clarifies objectives and priorities
- Helps identify beacons and basket cases
- Yardstick competition
- (drives forward performance of all)
- Reduces danger of producer capture
- Public reassurance (someone is on the case)
- Contributes to democratic debate
- Enhances accountability
- Can identify more readily what works
- Reduces disparities in quality and outcome of
services
96Some potential costs of performance monitoring
- Collection costs
- Distorted priorities
- Short-term ism
- Misrepresentation
- Gaming
- Alienation amongst staff
- Ossification
97Adverse responses arise from
- Poorly designed instruments (e.g. indicators
allowing no breaches or allowance for
uncertainty) - Inadequate statistical procedures (e.g. police
league tables not allowing for difficulty local
environment) - Poorly designed incentives (e.g. relative
difficulty and immediacy of crime prevention vs
detection targets) - Lack of attention to workforce perspective (n.b.
reliance for data collection on front-line staff)
98From the RSS reportThree broad categories of use
- What works? (research role)
- Well/under-performing institutions or public
servants . . . (managerial role) - Hold local and national governments to account
for stewardship of public services (democratic
role)
991. Research role of performance monitoring
- Seeking to identify what works
- Interest is in general patterns rather than
individual performance - Numerous statistical issues in efficient design,
collection and analysis of data - Critical role of variation experimentation.
1002. The managerial role of performance monitoring
1. Measurement
The public service
2. Analysis
3. Response
101The managerial role
- Objectives are taken as given
- Seeking to identify good bad performers
- Critical role of targets and incentives (explicit
and implicit) - Major scope for perverse outcomes.
102Proportion of variability in performance
indicators attributable to health authorities
(intra-class correlation coefficients)
1033. Democratic model of performance monitoring
- Public release of broad range of information
- Cannot predict how, or by whom, information will
be used - Should not focus only on current governmental
objectives - Independence is a crucial element.
104(No Transcript)
105(No Transcript)
106Hugely diverging interests of different
stakeholders
- Citizens as service users (service effectiveness)
- Citizens as taxpayers (value for money,
effectiveness, inequalities) - Public service professionals
- Public service managers
- Inspectors and regulators
- Researchers
- National and local governments
- Politicians
- The media
107This diversity implies
- Interests in different aspects of performance
- Interests in different levels of detail
- Different methods of analysis
- Different methods of presentation
- Different priorities attached to timeliness,
comprehensiveness, precision, etc. etc.
108How to avoid perverse outcomes
- Careful design and evaluation of measurement
instrument - Careful treatment and presentation of uncertainty
- Careful adjustment for case mix
- Careful attention to ethical issues
- Rigorous audit
- Introduce other measurement instruments to
complement the indicator - Undertake inspection alongside measurement
- Ensure inspection and other incentives are
proportionate - Make the data useful and accessible to front line
staff - Work with staff to understand and endorse the
chosen instrument.
109In short
- Independence
- Protocols
- Evaluation
- Education
110 Specific Recommendations
- Royal Statistical Society
- Working Party on Performance Monitoring in the
Public Services
111Royal Statistical Society 11 Recommendations
- 1. PM procedures need detailed protocol
- 2. Must have clearly specified objectives,
achieve them with rigour input to PM from
institutions being monitored - 3. Designed so that counter-productive behaviour
is discouraged - 4. Cost-effectiveness given wider consideration
in design PMs benefits should outweigh burden
of collecting quality-assured data - 5. Independent scrutiny as safeguard of public
accountability, methodological rigour, and of
those being monitored
112Royal Statistical Society 11 Recommendations
- 6. Major sources of variation - due to case-mix,
for example must be recognised in design,
target setting analysis - 7. Report measures of uncertainty always
- 8. Research Councils to investigate range of
aspects of PM, including strategies other than
name shame - 9. Research into robust methods for evaluating
new government policies, including role of
randomised trials . . . In particular, efficient
designs for roll-out of new initiatives
113Royal Statistical Society 11 Recommendations
-
- 10. Ethical considerations may be involved in all
aspects of PM procedures, and must be properly
addressed - 11. Wide-ranging educational effort is required
about the role and interpretation of PM data - Scotlands Airborne score-card 11/11 . . .
All wrong!
114(No Transcript)
115Statisticians role in PM both
- Strenuously to safeguard from misconceived
reactions to uncertainty those who are monitored
- Design effective PM protocol so that data are
properly collected, exceptional performance can
be recognised reasons further investigated
?Efficient, informative random sampling for
inspections
116How not to set targets p12
- Dont ignore uncertainty 75 success rate
target your class of 30 pupils - Progressive sharpening better of current target
current performance - Cascading same target . . .
- Setting extreme target no-one to wait 4 hours
- Dont ignore known-about variation etc