Performance Monitoring in the Public Services - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Performance Monitoring in the Public Services

Description:

Scientific standards, in particular statistical standards, have been largely ignored ... is driven by longer-term health & CJ harms, how are these ascertained? ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 12
Provided by: temp373
Category:

less

Transcript and Presenter's Notes

Title: Performance Monitoring in the Public Services


1
Performance Monitoring in the Public Services
2
Royal Statistical SocietyPerformance
Indicators Good, Bad, and Ugly
  • Some good examples, but
  • Scientific standards, in particular statistical
    standards, have been largely ignored

3
Royal Statistical Society concern
  • PM schemes
  • Well-designed, avoiding perverse behaviours,
  • Sufficiently analysed (context/case-mix)
  • Fairly reported (measures of uncertainty)
  • Shielded from political interference.
  • Address seriously criticisms/concerns of those
    being monitored

4
1. Introduction
  • 1990s rise in government by measurement
  • goad to efficiency effectiveness
  • better public accountability
  • (financial)

5
Three uses of PM data
  • What works? (research role)
  • Well/under-performing institutions or public
    servants . . . (managerial role)
  • Hold Ministers to account for stewardship of
    public services (democratic role)

6
2. PM Design, Target Setting Protocol
  • How to set targets
  • Step 1 Reasoned assessment of plausible
    improvement within PM time-scale
  • Step 2 Work out PM schemes statistical potential
  • ( power) re this rational target see p11

7
Power matters
  • Excess power - incurs unnecessary cost
  • Insufficient power risks failing to identify
    effects that matter
  • Insufficient power cant trust claims of policy
    equivalence
  • How not to set targets see p12

8
3. Analysis of PM data same principles
  • Importance of variability
  • intrinsic part of real world interesting per se
    contributes to uncertainty in primary
    conclusions p15
  • Adjusting for context to achieve comparability
  • note p17 incompleteness of any
    adjustment
  • Multiple indicators
  • resist 1-number summary
  • (avoid value judgements reveal intrinsic
    variation)

9
4. Presentation of PIs same principles
  • Simplicity / discard uncertainty
  • League tables ? uncertainty of ranking PLOT 1
  • Star banding ? show uncertainty of
  • institutions
    banding
  • Funnel plot variability depends on sample size
    divergent hospitals stand out see PLOT 2

10
Plot 1 95 intervals for ranks
11
5. Impact of PM on the public services
  • Public cost if PM fails to identify
    under-performing institutions so no remedial
    action is taken
  • Less well recognised
  • Institutional cost falsely labelled as
    under-performing
  • Unintended consequences e.g. risk-averse
    surgeons

12
6. Evaluating PM initiatives
  • Commensurate with risks costs
  • How soon to start (Europes BSE testing)
  • Pre-determined policy roll-out (rMDT)
  • Disentangling (several) policy effects
  • Role of experiments ( randomisation)

13
Missed opportunities for experiments
(including randomisation)
  • rMDT of prisoners
  • Drug Treatment Testing Orders
  • Cost-effectiveness matters!

14
What works in UK criminal justice?
  • RCTs essentially untried . . .

15
Judges prescribe sentence on lesser evidence than
doctors prescribe medicines
  • Is
  • public
  • aware?

16
7. Integrity, confidentiality ethics
  • Integrity (statistical)
  • For public accountability PIs need
    wider-than-government consensus safeguards, as
    for National Statistics.
  • Lacking if irrational targets, insufficient
    power, cost-inefficient, analysis lacks
    objectivity or is superficial.

17
(No Transcript)
18
Confidentiality ethics
  • PM accesses data about people 3rd party
    approval?
  • Public naming of individuals their informed
    consent legal requirements re confidentiality,
    as for National Statistics?
  • Unwarranted cost to reputations superficial
    analysis, missing uncertainty measures ? legal
    human rights liabilities?

19
Royal Statistical Society is calling for
  • PM protocols
  • Independent scrutiny
  • Reporting of measures of uncertainty
  • Research into strategies other than name
    shame better designs for evaluating policy
    initiatives
  • Wider consideration of PM ethics
    cost-effectiveness

20
Royal Statistical Society is appealing to
journalists
  • When league tables or star ratings are
    published,
  • insist on access to ( reporting of) the
    measure of uncertainty that qualifies each
    ranking or rating.
  • Without this qualifier, no-one can separate the
    chaff from the wheat, the good from the bad . . .

21
Experiments Power matter
  • Sheila M. Bird, chair
  • RSS Working Party on Performance Monitoring in
    the Public Services

22
Evaluations-charadePublic money spent on
inferior (usually non-randomised) study designs
that result in poor-quality evidence about how
well policies actually work
  • ? costly, inefficient by denying scientific
    method, a serious loss in public accountability

23
Experimentationasbeacon of lightonpolicy
24
Missed opportunities for experiments
(including randomisation)
  • rMDT of prisoners
  • Drug Treatment Testing Orders (DTTOs)
  • Cost-effectiveness matters!

25
Court Portcullis DTTO-eligible offenders do
DTTOs work ?
  • Off 1 DTTO
  • Off 2 DTTO
  • Off 3 alternative
  • Off 4 DTTO
  • Off 5 alternative
  • Off 6 alternative
  • Database linkage to find out about major harms
    offenders deaths, re-incarcerations .
    . .

26
Court Portcullis DTTO-eligible offenders
cost-effectiveness ?
  • Off 7 DTTO
  • Off 8 alternative
  • Off 9 alternative
  • Off10 DTTO
  • Off11 DTTO
  • Off12 alternative
  • Off13 DTTO
  • Off14 alternative
  • Breaches . . . drugs spend?

27
Court Portcullis DTTO-eligible offenders UK
plc ? guess
  • Off 7 DTTO ?
  • Off 8 DTTO ?
  • Off 9 DTTO ?
  • Off10 DTTO ?
  • Off11 DTTO ?
  • Off12 DTTO ?
  • Off13 DTTO ?
  • Off14 DTTO ?
  • (before/after) Interviews versus . . .
    ?

28
Similar other offences . . .
  • Random Mandatory Drugs Testing of Prisoners
    Cons. punishment as deterrent/exacerbation of
    heroin use in jail
  • Intensive Control and Change Programme Lab.
    employment re-offending rate
  • Scotlands Airborne Initiative Lib/Lab
    relative cost-effectiveness
  • On-charge Drugs Testing Lab. impact on a)
    arrest referral, b) treatment uptake c) court
    sentence
  • etc

29
Evaluations-charade
  • Failure to randomise
  • Failure to find out about major harms
  • Failure even to elicit alternative sentence ?
    funded guesswork on relative cost-effectiveness
  • Volunteer-bias in follow-up interviews
  • Inadequate study size re major outcomes . . .

30
Power (study size) matters!
  • Back-of-envelope sum for 80 power
  • Percentages
  • Counts
  • If MPs dont know,
  • UK plc keeps hurting

31
Percentages!
  • Prison Service, EW 60,000 rMDTs per annum ?
    excess statistical power to identify if
    prisoners heroin ve rate decreased from 5 to
    4.5 between years simple sum
  • Scottish Prison Service 6,000 rMDTs per annum ?
    only 50 power to identify if heroin ve rate
    decreased from 15 to 13.5.
  • BUT, comfortable power per 2-year period

32
Back-of-envelope sum for 80 power?
  • Step 1
  • Success failure rate old Success failure
    rate new target
  • --------------------------------------------------
    ------------------------------
  • Difference in success rates difference in
    success rate
  • Step 2
  • Multiply Step 1 answer by 8 to get study number
    for old. Study same number for new.

33
To compare completion rates of 40 old versus
50 new target?
  • Step 1
  • Success failure rate old Success failure
    rate new target
  • --------------------------------------------------
    ------------------------------
  • Difference in success rates difference in
    success rate
  • 40 60 50 50
  • -----------------------------------------
    -------
  • (50 40) (50 40)

34
To compare CJ completion rates of 40 old
versus 50 new target?
  • Step 1
  • 40 60 50 50 2400 2500 4900
  • --------------------------
    ------------------- ----------
  • 10 10 100
    100
  • Step 2 for 80 power
  • Multiply Step 1 answer 49 by 8 to get study
    number for old 400. Same number for new.

35
To compare CJ completion rates of 40 old
versus 50 new target?
  • For comfortable 80 power
  • Completion rates to be compared between
  • 400 old disposal versus 400 new initiative
  • Minimum 50 power requires half that !
  • Completion rates to be compared between
  • 200 old versus 200 new

36
Power matters counts!
  • Scotland 290 drugs-related deaths in 1999 EU
    target of 20 reduction by 2004 rationale?
  • ?
  • Nearly sufficient power to assess if target of
    230 drugs deaths is met in 2004
  • simple sum
  • Step 1
  • old count target count
  • -----------------------------------------
  • Count difference count difference
  • Step 2
  • Multiply Step 1 answer 0.14 by 8 to get study
    number of years for old. Same number of years
    for new.

37
Scotland hazard of drugs-related death depends
on gender? synthesis published data
38
Ministerial response?
  • Confidential Inquiry into Scotlands
    Drugs-related Deaths in 2003
  • (200K funded in November 2003)
  • versus
  • RCT of Naloxone, heroin anti-dote

39
Ministers, mind your Ps Qs!
  • Five PQs for
  • every CJ initiative

40
Five PQs for every CJ initiative
  • PQ1 Minister, why no randomised controls?
  • PQ2 Minister, why have judges not even been
    asked to document offenders alternative sentence
    that this CJ initiative supplants?
  • PQ3 What statistical power does Ministerial
    pilot have re well-reasoned targets performance
    indicators? or kite flying . . .
  • PQ4 Minister, cost-effectiveness is driven by
    longer-term health CJ harms, how are these
    ascertained?
  • PQ5 Minister, any ethical/consent issues?

41
(No Transcript)
42
Uncertainty, Reporting and Performance Monitoring
  • David Spiegelhalter
  • MRC Biostatistics Unit, Cambridge
  • david.spiegelhalter_at_mrc-bsu.cam.ac.uk
  • POST March 2004

43
Some relevant experience
  • Expert witness in GMC case against Bristol heart
    surgeons
  • Lead in statistical team in Bristol Royal
    Infirmary Inquiry
  • Contributor to Shipman Inquiry
  • Part-time post with Commission for Health
    Improvement (monitoring and surveillance)

44
Outline
  • Allowing for uncertainty when monitoring
    performance
  • Can we fairly create league tables?
  • How to rank if you must
  • Avoiding ranking - funnel plots
  • Conclusions

45
Primary schools ranked by value-added
measureClass size around 30 means there is
great uncertainty about true underlying
pass-rateBBC WebsiteNote the resulting
rankings need taking with a pinch of salt.
Official statisticians say the significance that
can be attached to different scores depends on
various factors, including the numbers of
children involved.
 
46
Quantifying uncertainty
  • If a hospital reports a 20 mortality rate, what
    might the true underlying rate be?
  • First, what do we mean by the true underlying
    rate ?
  • The chance that the next patient will die
  • The long-run rate

47
Quantifying uncertainty
If a centre reports a 20 mortality rate, what
might the true underlying rate be? This
depends on the sample size
48
Indicators with 95 confidence intervals CABG
survival rates for New York Surgeons
49
Cardiac surgery in New York State Hospitals
Risk-adjusted mortality rates and 95 intervals
50
(No Transcript)
51
How fair are league tables, or how to rank if
you must
  •   Someones got to be bottom
  • If we allow for uncertainty, how reasonable is it
    to rank institutions?
  • We can check this by placing a confidence
    interval around the rank
  • This is new statistical methodology, but not
    difficult to implement

52
Ranks for 51 hospitals with 95
intervalsmortality after fractured hip.
53
Ranks for 51 hospitals with 95
intervalsmortality after fractured hip.
  • Cannot be 95 confident about any hospital being
    in top quarter or bottom quarter
  • Shows any attempt at detailed ranking is futile
    and misleading

54
Allowing for uncertainty in a league tableDr
Foster in the Sunday Times
55
What about star ratings?
  • This is a form of ranking
  • By allowing for chance variability in indicators
    that contribute to the star rating, we could
    quantify how confident we are in the rating
  • e.g. 90 chance that 3, 10 that 2

56
But could we avoid ranking altogether?
  • Difficult, as the media will do it anyway
  • But could set good example and educate in dangers
    of spurious ranking
  • A funnel plot is one way of comparing
    performance without ranking
  • Plots indicator against sample size
  • Alert and alarm limits set
  • Form of control chart
  • (NHS Modernisation Agency)

57
Funnel plot an alternative to the league table
58
Teenage pregnancies
  • Government aim to reduce teenage pregnancies
    (13-15)
  • Target reduction is 15 between 1998 and 2004
  • Hope for 7.5 reduction by 2001

59
(No Transcript)
60
New York Cardiac Surgeons
61
Bristol as an outlier and a volume effect
62
Conclusions
  • Performance monitoring should acknowledge
    uncertainty due to limited sample sizes
  • Ranking is particularly sensitive to uncertainty
  • Analysis and presentation can avoid league
    tables
  • But must be part of coherent performance
    assessment framework

63
Inspection matters
  • Clive B. Fairweather CBE
  • formerly, HM Chief Inspector of Prisons, Scotland

64
Inspection matters
  • Qualifications for the job
  • Nil on prisons, but
  • Clean driving licence
  • Maths ?

65
Inspection matters
  • Training for the job
  • nil

66
Inspection matters
  • Instructions for the job
  • nil

67
Inspection matters
  • Ministers interest in my reports
  • ?

68
Inspection matters
  • Initially
  • Every prison every 4 years
  • Prisons could look good on KPIs, but . . .

69
Inspection matters
  • Then came Glenochil
  • Prisons Health-centre attendances/prescriptions
  • (per fortnight TWO prisoners)
  • Stirling Hospital admissions
  • dislocated shoulder, punctured lung . . .
  • Turn-around from drugs on top of prison to
  • prison on top of drugs

70
Inspection matters
  • Cornton Vale womens suicides
  • Prisons Health-centre attendances/prescriptions
  • (same pattern as Glenochil)
  • Stirling Hospital admissions
  • (fits drugs)

71
Inspection matters intelligently targeted
  • No KPI re deaths in prison
  • (in-cell television v. morbid contemplation)
  • Self-harm
  • Prescriptions rMDT heroin positives Assaults
  • Overcrowding staff sickness

72
Inspection matters intelligently targeted
  • Multiple indicators resist 1-number summary
  • for every prison in every report
  • (illustrated for 4 prisons only)

73
Multiple Indicators comparative statistics for
04/2001to 03/2002
74
Multiple Indicators comparative statistics for
04/2001to 03/2002
75
Multiple Indicators comparative statistics for
04/2001to 03/2002
76
Multiple Indicators comparative statistics for
04/2001to 03/2002
77
Protocol Independence matter
  • Andy P. Grieve
  • President, Royal Statistical Society
  • Pfizer Pharmaceuticals

78
Protocol Matters / Questions
  • What is a protocol ?
  • Where did the idea of a protocol come from ?
  • Why is a protocol necessary ?

79
What is a Protocol ?
  • PLAN or
  • BLUEPRINT

80
A Protocol
  • Assumptions / Rationale Choice of PI
  • Objectives
  • Calculations (power) consultations piloting
  • Context/case-mix data checks
  • Analysis plan dissemination rules
  • Statistical performance of proposed PI monitoring
  • follow-up inspections
  • Perverse consequences
  • PMs cost-effectiveness?
  • Identify PM designer analyst to whom queries .
    . .

81
Where Did the Idea for a Protocol come from ?
  • Scientific Method
  • E.g. drug development
  • Good Clinical Practice (GCP)
  • Requires a pre-defined protocol

82
GCP Process
  • Say what you are going to do
  • Do what you said you were going to do
  • Verify you did what you said
  • The protocol
  • The Conduct / Analysis Plan
  • Study report/
  • Audit Trail

83
Why is a Protocol Necessary ?
  • Trying to achieve
  • Hypothesis gt experiment gt data gt
    accept/reject
  • Trying to prevent
  • Experiment gt data gt hypothesis gt accept
  • Increase the faith/trust in the scientific
    process
  • No Less true for performance monitoring

84
Independence Matters
  • PIs for Public Accountability
  • Wide consensus (not only within government)
  • How ?
  • Design monitoring
  • Evidence-based not dogma-based
  • Safeguards
  • Independence
  • c.f. National Statistics

85
Value-added (case-mix) matters
  • Harvey Goldstein FBA
  • Institute of Education,
  • University of LONDON

86
Value added illustration for two schoolsNote
different slopes and crossover

T2
All Schools
School A
Key Stage 2
T1
School B
S1
Key Stage 1
S2
87
Value-added ? True complexity
  • Black line all schools true linear
    regression of Key Stage 2 outcome on Key Stage 1
    results.
  • 2. Blue line - school B truly adds more value
    than schools average to Key Stage 2 performance
    of well-performing children at Key Stage 1 (S2)
    but less value for poorer performers at Key Stage
    1 (S1).
  • 3. Red regression school As true regression
    is non-linear school A exceeds schools
    average for lower-ability pupils (S1).

88
(No Transcript)
89
Performance Monitoring in the Public Services
Cost-effectiveness matters Peter C. Smith,
University of York
90
Cost-effectiveness matters
  • Why cost-effectiveness matters
  • Benefits of performance monitoring
  • Costs of performance monitoring
  • Direct costs
  • Indirect (inadvertent) costs

91
Goodharts Law
  • As soon as the government attempts to regulate
    any particular set of financial assets, these
    become unreliable as indicators of economic
    trends.
  • Or
  • When a measure becomes a target, it ceases to be
    a good measure.

92
An example from childrens social services
  • CF/A1 Stability of placements of children looked
    after (BVPI 49)
  • Percentage of children looked after at 31 March
    with three or more placements during the year
  • Target is 16 or less.
  • Objective is to improve stability for child,
    leading to better general outcomes.

93
Stability of placements of children some
unintended incentives?
  • Avoid looking after challenging children,
    especially older children (numerator)
  • Encourage looking after less challenging children
    (denominator)
  • Time placements to occur after the year end
  • Redefine placement, especially in first year
  • Discourage temporary care orders
  • (For the manager) move to a less challenging
    locality!

94
The balance sheet for performance monitoring
  • A performance monitoring system should, over
    time, secure benefits for service users and the
    broader citizenry that outweigh its costs.

95
Some potential benefits of performance monitoring
  • Clarifies objectives and priorities
  • Helps identify beacons and basket cases
  • Yardstick competition
  • (drives forward performance of all)
  • Reduces danger of producer capture
  • Public reassurance (someone is on the case)
  • Contributes to democratic debate
  • Enhances accountability
  • Can identify more readily what works
  • Reduces disparities in quality and outcome of
    services

96
Some potential costs of performance monitoring
  • Collection costs
  • Distorted priorities
  • Short-term ism
  • Misrepresentation
  • Gaming
  • Alienation amongst staff
  • Ossification

97
Adverse responses arise from
  • Poorly designed instruments (e.g. indicators
    allowing no breaches or allowance for
    uncertainty)
  • Inadequate statistical procedures (e.g. police
    league tables not allowing for difficulty local
    environment)
  • Poorly designed incentives (e.g. relative
    difficulty and immediacy of crime prevention vs
    detection targets)
  • Lack of attention to workforce perspective (n.b.
    reliance for data collection on front-line staff)

98
From the RSS reportThree broad categories of use
  • What works? (research role)
  • Well/under-performing institutions or public
    servants . . . (managerial role)
  • Hold local and national governments to account
    for stewardship of public services (democratic
    role)

99
1. Research role of performance monitoring
  • Seeking to identify what works
  • Interest is in general patterns rather than
    individual performance
  • Numerous statistical issues in efficient design,
    collection and analysis of data
  • Critical role of variation experimentation.

100
2. The managerial role of performance monitoring
1. Measurement
The public service
2. Analysis
3. Response
101
The managerial role
  • Objectives are taken as given
  • Seeking to identify good bad performers
  • Critical role of targets and incentives (explicit
    and implicit)
  • Major scope for perverse outcomes.

102
Proportion of variability in performance
indicators attributable to health authorities
(intra-class correlation coefficients)
103
3. Democratic model of performance monitoring
  • Public release of broad range of information
  • Cannot predict how, or by whom, information will
    be used
  • Should not focus only on current governmental
    objectives
  • Independence is a crucial element.

104
(No Transcript)
105
(No Transcript)
106
Hugely diverging interests of different
stakeholders
  • Citizens as service users (service effectiveness)
  • Citizens as taxpayers (value for money,
    effectiveness, inequalities)
  • Public service professionals
  • Public service managers
  • Inspectors and regulators
  • Researchers
  • National and local governments
  • Politicians
  • The media

107
This diversity implies
  • Interests in different aspects of performance
  • Interests in different levels of detail
  • Different methods of analysis
  • Different methods of presentation
  • Different priorities attached to timeliness,
    comprehensiveness, precision, etc. etc.

108
How to avoid perverse outcomes
  • Careful design and evaluation of measurement
    instrument
  • Careful treatment and presentation of uncertainty
  • Careful adjustment for case mix
  • Careful attention to ethical issues
  • Rigorous audit
  • Introduce other measurement instruments to
    complement the indicator
  • Undertake inspection alongside measurement
  • Ensure inspection and other incentives are
    proportionate
  • Make the data useful and accessible to front line
    staff
  • Work with staff to understand and endorse the
    chosen instrument.

109
In short
  • Independence
  • Protocols
  • Evaluation
  • Education

110
Specific Recommendations
  • Royal Statistical Society
  • Working Party on Performance Monitoring in the
    Public Services

111
Royal Statistical Society 11 Recommendations
  • 1. PM procedures need detailed protocol
  • 2. Must have clearly specified objectives,
    achieve them with rigour input to PM from
    institutions being monitored
  • 3. Designed so that counter-productive behaviour
    is discouraged
  • 4. Cost-effectiveness given wider consideration
    in design PMs benefits should outweigh burden
    of collecting quality-assured data
  • 5. Independent scrutiny as safeguard of public
    accountability, methodological rigour, and of
    those being monitored

112
Royal Statistical Society 11 Recommendations
  • 6. Major sources of variation - due to case-mix,
    for example must be recognised in design,
    target setting analysis
  • 7. Report measures of uncertainty always
  • 8. Research Councils to investigate range of
    aspects of PM, including strategies other than
    name shame
  • 9. Research into robust methods for evaluating
    new government policies, including role of
    randomised trials . . . In particular, efficient
    designs for roll-out of new initiatives

113
Royal Statistical Society 11 Recommendations
  • 10. Ethical considerations may be involved in all
    aspects of PM procedures, and must be properly
    addressed
  • 11. Wide-ranging educational effort is required
    about the role and interpretation of PM data
  • Scotlands Airborne score-card 11/11 . . .
    All wrong!

114
(No Transcript)
115
Statisticians role in PM both
  • Strenuously to safeguard from misconceived
    reactions to uncertainty those who are monitored
  • Design effective PM protocol so that data are
    properly collected, exceptional performance can
    be recognised reasons further investigated
    ?Efficient, informative random sampling for
    inspections

116
How not to set targets p12
  • Dont ignore uncertainty 75 success rate
    target your class of 30 pupils
  • Progressive sharpening better of current target
    current performance
  • Cascading same target . . .
  • Setting extreme target no-one to wait 4 hours
  • Dont ignore known-about variation etc
Write a Comment
User Comments (0)
About PowerShow.com