Title: TESTING A TEST
1TESTING A TEST
- Ian McDowell
- Department of Epidemiology Community Medicine
- November, 2004
2A Lab Report(Montfort Hospital Biochem Lab)
3The Challenge of Clinical Measurement
- Diagnoses are based on information, from formal
measurements or from your clinical judgment - This information is seldom perfectly accurate
- Random errors can occur
- Biases in judgment or measurement can occur
- Due to biological variability, this patient may
not fit the general rule - Diagnosis (e.g., hypertension) involves a
categorical judgment this often requires
dividing a continuous score (blood pressure) into
categories. Choosing the cutting-point may be
arbitrary
4Therefore
- You need to be aware
- That diagnosis is a matter of probabilities
- That using a quantitative approach is better than
just guessing! - That you will ultimately become familiar with the
typical accuracy of measurements in your chosen
clinical field - Of some of the ways to describe the accuracy of a
measurement - That the principles apply to both diagnostic and
screening tests
5Attributes of Tests or Measures
- Cost, Safety, Acceptability, etc.
- Reliability reproducibility this considers
chance or random errors - Validity Does it measure what it is supposed to
measure? By extension, what diagnostic
conclusion can I draw from a particular score on
the test? Validity may be affected by bias, or
systematic errors
6Reliability and Validity
Reliability Low
High
Validity Low
High
7Ways of Assessing Validity
- Face, Content validity does it make clinical or
biological sense? Does it include the relevant
symptoms? - Criterion comparison to a gold standard
definitive measure - Expressed as sensitivity and specificity
- Construct validity (this is used with abstract
themes, such as quality of life for which there
is no definitive standard)
8Gold Standards
- Sensitivity and specificity are judged against
- More definitive (but expensive or invasive)
tests, such as a complete work-up, - Or against
- Eventual outcome (for screening tests, when
workup of well patients is unethical)
92 x 2 Table for Testing a Test
- Gold standard
- Disease Disease Present Absent
- Positive test a (TP) b (FP)
- Negative test c (FN) d (TN)
- Validity Sensitivity Specificity
- a/(ac) d/(bd)
TP true positive FP false positive
10A Bit More on Sensitivity
- Ability to detect disease when it is present
- a/(ac) TP/(TPFN)
- Mnemonics a sensitive person is one who can
detect your feelings(1 seNsitivity) false
Negative rate (i.e., How many cases are missed by
the screening test?) - Cf. power of statistical test (1-?)
11and More on Specificity
- Ability to detect absence of disease when it is
truly absent (can it detect non-disease?) - d/(bd) TN/(FPTN)
- Mnemonics
- a specific test would identify only that type of
disease. Nothing else looks like this - (1- sPecificity) false Positive rate (How many
are falsely classified as having the disease?)
12Clinical applications
- A specific test can be useful to rule in a
disease. If the result on a specific test is
positive, you can be sure the patient has the
condition SpPin - A sensitive test can be useful for ruling a
disease out. A negative result on a very
sensitive test reassures you that the patient
does not have the disease (SnNout)
13The Selection of a Cutting Point
Well population
Sick population
Pathologicalscores
Healthyscores
Move this way to increase sensitivity
Move this wayto increase specificity
Crucial issue changing cut-point can improve
sensitivity or specificity, but at expense of the
other
14Problems with Wrong Results
- False Positives can arise due to other factors
(such as taking other medications, diet, etc.)
They entail cost and danger of investigations,
labeling, worry - This is similar to Type I or alpha error in a
test of statistical significance the
possibility of falsely concluding that there is
an effect of an intervention. - False Negatives imply missed cases, so
potentially bad outcomes if untreated - cf Type II or beta error the chance of missing a
true difference
15The Crucial Point Predictive Values
- Sensitivity specificity are characteristics of
the test - But the clinician, of course, gets the test
result and do not know if this person is a true
positive or a false positive (or a true or false
negative). Hmmm - How do we assess the predictive value of a
positive or negative result?
16Predictive Values
D D -
D D -
- Based on rows, not columns
- PPV a/(ab) interprets positive test
- NPV d/(cd) interprets negative test
- Immediately useful to clinician they tell us
about the population and thus the patient - Depend upon prevalence of disease, so must be
determined for each clinical setting - As prevalence goes down, PPV goes down and NPV
rises
a
a
b
T T -
c
d
17Same Test, Two Clinical Situations
B. Primary Care Prevalence 55/1155 3
A. Referral hospital Prevalence 55/165 33
D D -
D D -
50
100
50
10
T T -
T T -
5
1000
5
100
Sensitivity 50/55 91 Specificity 1000/1100
91
Sensitivity 50/55 91 Specificity 100/110
91
PPV 50/60 83 NPV 100/105 95
PPV 50/150 33 NPV 1000/1005 99.5
18Practical QuestionDoctor, whats my likelihood
of having the disease?
- To answer this question
- You need to have a general idea of the
sensitivity specificity of the test - To interpret the results, you also need to know
roughly the prevalence of the condition in your
practice. You can then work out the PPV and
answer the patients question. - Give me a break, dude Surely there is an
easier way to bring all this together?
19Prevalence of Disease
- We have seen how this influences the
interpretation of a test score - Before you do the test, prevalence gives your
best guess about the probability that the patient
has the disease - Also known as Pretest Probability of Disease
(ac) / N in 2 x 2 table - Or, can be expressed as odds of disease (ac)
/ (bd)
a
b
c
d
N
20Estimating predictive values for a specific
setting is called calibrating the test
- You could
- Apply a the test and a definitive test to a
consecutive series of patients (rarely feasible) - Calculate from Bayess Theorem (ouch!)
- Draw a hypothetical table (maybe?)
- Use a nomogram (tell me how)
21Calibration by hypothetical table
- Fill cells in following order
- Truth
- Disease Disease Total PV
- Present Absent
- Test Pos
- Test Neg
- Total
4th 5th
7th 6th
8th 9th
10th 11th
1st
2nd 3rd
(from prevalence)
(from sensitivity)
(from specificity)
22 Combining Sensitivity and SpecificityReceiver
Operating Characteristic Curves
Work out Sen and Spec at every possible
cut-point, then plot these. Area under the curve
indicates the information provided by the test
1
0.8
0.6
Sensitivity
Note the theme of sensitivity (1-specificity)
will appearagain!
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1-Specificity ( false positives)
23Likelihood Ratios
- Defined as the odds that a given level of a
diagnostic test result would be expected in a
patient with the disease, as opposed to a patient
without true positives / false positives. - Advantages
- Express sensitivity and specificity in one number
- Can be calculated for many levels of the test
- Can be turned into predictive values
- LR for positive test Sensitivity /
(1-Specificity) - LR for negative test (1-Sensitivity) /
Specificity
24Calibration with a Nomogram
1) You need the LR.2) Select pretest
probability(prevalence) on left axis3) Select
likelihood ratio on center axis 4) Draw line
throughright axis to indicate post-test
probability of disease
Example Prevalence 30 LR 20 Post-test
probability 91
25Chaining LRs Together
- Example 45 year-old woman with 1-month history
of intermittent chest pain. - Pretest probability about 1 for CAD
- History suggestive of angina (substernal pain
radiating down arm induced by effort relieved
by rest). - LR of this history for angina is about 100
26The previous example
1. From the History
Shes youngpretest probabilityabout 1
Pretest probabilityrises to 50based on history
27Chaining LRs Together
- 45 year-old woman with 1-month history of
intermittent chest pain - After the history, post test probability is now
about 50. What will you do? - Record an ECG
- Results 2.2 mm ST-segment depression. LR for
ECG 2.2 mm 10. - Overall post test probability is now gt90 for
coronary artery disease (see next slide)
28The previous example ECG Results
Post-test probabilitynow rises to 90
Now start pretest probability (i.e. prior to
ECG)at 50, based onhistory