Title: Assessing Responsiveness of Health Measurements
1Assessing Responsiveness of Health Measurements
- Ian McDowell,
- INTA, Santiago, March 20, 2001
2Link Purpose of Measure to Validation Method
- For example
- In a diagnostic instrument, inter-rater and
test-retest reliability are important - For an evaluative measure, internal consistency
is paramount. - For a prognostic or diagnostic instrument,
criterion validity is relevant - For an evaluative measure, construct validation
is central.
3Responsiveness
- For outcome measures, sensitivity to change is a
crucial characteristic - Responsiveness refers to how sensitive a
measure is in indicating change over time or
contrast between groups - Normally considered an element of validity for an
evaluative measurement.
4(Responsiveness, contd)
- There is little consensus over how responsiveness
should be assessed. - This may be because responsiveness requires a
finer breakdown than is normally given - Different facets of responsiveness are relevant
to different types of measure.
5Conceptions of Responsiveness
- The smallest change that could potentially be
detected - The smallest change that could reliably be
detected beyond error - The change typically observed in a population
- The change observed in the subset of the
population judged to have changed - The change seen in those judged to have made an
important change.
6Preliminary Decisions(Before We Begin!)
- What parameter is to be measured? (Pain, QoL,
etc.) - Whose perspective is important the patients,
the clinicians or societys? - What if these conflict?
- Responsive to what? Differences between groups
within a group over time, or to compare
changes over time between two groups? - Unit of analysis? (Average scores, or individual
classification such as a diagnosis?)
7Approaches to Estimating Responsiveness
1. Theoretical (equivalent to content validity)
2. Empirical Internal evaluation (equivalent to
concurrent validity)
3. Empirical External comparison (equivalent to
criterion validity)
81. Modeling Approach
- Content should reflect the types of change
expected to occur with the therapy states, not
traits - There should be no floor or ceiling effects
- Scoring must ensure that change is not diluted in
other factors that do not vary - Scale must have fine enough gradation
92. Internal Empirical Approach
- Apply scale before after calculate effect size
statistic - Because measurement scales vary, results are
expressed in standard deviation units
(Mt - Mc)/SDc - Effect size comparable to a z score if normal
distribution, indicates how many percentiles a
patient will move following treatment.
10Effect Size Statistics
- 1. Use a t-test and report statistically
significant differences as indicators of
responsiveness - 2. Removing the n from the denominator to make
independent of sample size - 3. Denominator can be SD of the baseline scores,
or of scores among stable subjects, or of change
in scores.
11Effect Size Statistics (2)
- Refinements include correction for level of
reliability. E.g., Wyrich proposed standard
error of mean in denominator - SEM SD1 (1-")
- However, a high alpha does not ensure
responsiveness if the measure includes
inter-correlated traits that do not change.
12Effect size
alpha
Impact of including Alpha in Effect Size
Calculation (at difference of 1.5 and SD of 3)
13Comment Effect Sizes
- Useful for comparing responsiveness of different
health measures - Helpful in calculating the power of a study
- However
- Formulae seem somewhat arbitrary
- Effect sizes offer no indication of the clinical
change represented by a given shift in scores
14The MID as a Criterion
- Introduces theme of Minimally Important
Difference (MID) and its cousin, the MCID. - MCID The smallest difference in score in the
domain of interest which patients perceive as
beneficial and which would mandate, in the
absence of troublesome side effects and excessive
cost, a change in the patients management - Estimate internally (using scale itself), or
externally (using some other criterion)
15Setting Internal MIDs
- 1. Apply the measurement select change threshold
seen as important by clinical experts. How much
would the outcome have to change before they
would alter treatment? - 2. Present clinicians with written scenarios and
compare each with the previous one. MCID
average difference in scores between pairs rated
as a little less or a little more.
16Externally-Based MIDs
- Clinicians view patient scenarios and rate
whether they changed significantly or not. - Patients can judge the change in their own
condition no change, a little better, etc. - Alternatively, clinically assess patients, then
randomly assign pairs of them to hold
conversations about their illness, leading to
ratings of whether they were better than the
other, much better, etc.
173. External Criteria for Responsiveness
- 1. Establish MID or MCID. Group patients who
improve (or deteriorate) gt MID and compare to
rest using the measure - 2. Various statistics
- Sensitivity, specificity ROCs
- Point-biserial correlations
- Regression to analyse average scale change on the
measure for each MCID unit change
18Sensitivity (true positives)
1.0 0.8 0.6 0.4 0.2 0.0
SF-36
AIMS2
HAQ
0.0 0.2 0.4 0.6
0.8 1.0
1-specificity (false positives)
ROC Curve for 3 Instruments in Detecting an MCID
19Questions for Discussion
- Are MIDs constant across range? (next slide)
- How can we encourage people to routinely report
before and after changes in scores, SD, and
alpha? - Should we apply a measure to standard scenarios
to get X1 and X2 use this to simulate the
effect size? - How does this all apply to nutritional
assessments?
20Large
Size of change
Cognition?
Physical function?
None
Poor
Good
Health status
Notional Size of an MCID at Various Levels of
Overall Health