Title: Diagnostic Testing
1 - Diagnostic Testing
- Brian Gage, MD
- October 7, 2009
- DOC Research
- Acknowledgment http//www.cebm.utoronto.ca/
glossary/spsncriteria.htm
2Goals
- To use terms used to describe diagnostic tests
- To understand concepts pertaining to Dx tests
- To set up a 2 x 2 table in Excel to do the math
- To understand key principles when designing a
study re a Dx test
32 x 2 Table
4Accuracy
- Sensitivity
- the proportion of diseased persons who have a
positive test - also called the true positive rate and can be
calculated from a/(ac) - Specificity
- the proportion of non-diseased persons who have a
negative test - also called the true negative rate and can be
calculated from d/(bd)
5Why arent Sensitivity Specificity Affected by
Disease Prevalence?
6SpIn SnOut
- Specificity Remember SpPin
- When a test has a high Specificity (i.e. few
false positive, a Positive test rules IN the
disorder. - Sensitivity Remember SnNout
- When a test has a high Sensitivity, a
Negative result rules OUT the disorder.
7Diagnostic Accuracy
- How would you evaluate the accuracy of 4
commercial platforms that can genotype for 2
genes (VKORC1, CYP2C9)? - You have existing DNA genotyping platforms
- What are the most important parts of your study
design?
8Accuracy results (95 CI) for CYP2C9 and VKORC1
on 112 DNA samples
C. King et al. Am J Clin Path 2008
9Calculating 95 Confidence Intervals (CI) with 0
or 100 Results
- Suppose our Dx test was correct 95/100?
- Then, we could calculate the 95 CI
- Formula gt More functions gt Statistical gt Binomial
- CRITBINOM(100,0.95, 0.025) 90
- CRITBINOM(100,0.95, 0.975) 99
- 89-98 calculated at http//faculty.vassar.edu/lo
wry/prop1.html - Normal approximation p zSQRT(p(1-p)/n)
- i.e., 95 1.96sqrt((.95.05)/100) 91-99
- 89-98 calculated in SAS (PROC FREQ)
- What if our test was perfect (i.e.,100/100)?
- Excel wont calculate it the website gets
(96-100).
1095 CI for a perfect test Rule of 3
- The upper 95 CI is obviously 100.
- Calculate the lower 95 CI using the Rule of 3
- Lower limit 3 failures/ trials
- 97 in this case.
- What is the 95 CI if 1000 out of 1000 test
results were correct?
11Predictive Values
- Positive Predictive Value
- The proportion of patients with a positive test
who have the disease - Also known as post-test probability or
posterior probability following a positive
test. - Negative Predictive Value
- The proportion of patients with a negative test
who dont have the disease. - Are predictive values affected by prevalence?
12What is the formula for PPV?
13Odds
- Odds P/(1-P) where P is the probability
- E.g. If the probability is 25 what are the odds?
- P O/ (O1)
- E.g. If the odds are 1/3, what is the probability?
14Does Baby J. have M.D.?
- In a population of 100,020 we expect 20 true
positives, but - Of the 100,000 without M.D.
- Specificity 99.98 False rate 0.02
- 99,980 will be true negative 20 will be false .
15The post test odds are 11
- Therefore,
- Out of 100,020 infants, 20 will be truly positive
and 20 will be false positive - Positive predictive value 50
- An unselected baby boy with a positive screening
test only has a 50/50 chance of having this rare
disease.
16Example PPV of pap smears?
- Rate of atypia in healthy women is 1 out of 1000.
- Sensitivity 0.70
- Specificity 0.90
Find probability that a woman will have atypical
cervical cells given that she had a positive pap
smear.
17Example Pap Screening for Cervical CAAdapted
from www.stat.psu.edu/ljs_05
18Example Pap Screening for Cervical CAAdapted
from www.stat.psu.edu/ljs_05
19Example Pap smear
20Example Pap smear
21PAP predictive value
- PPV 70/10,060 0.00696 0.7
- NPV 89,910/89,940 0.999
A healthy woman with a positive pap has tiny
chance (0.7) of truly having disease, while a
healthy woman with negative pap almost certainly
will be disease free.
22Antman E. et al. N Engl J Med 19963351342
23Example Cardiac Troponin I to Dx MI
- Hospital B has decreased the threshold of a
test to 0.2 ng/mL - How will this change effect
- the of R/O MI pts. who now rule in?
- a quality indicator of post-MI mortality?
- a quality indicator of LOS?
- a quality indicator of post-MI beta-blocker use?
- Medicare reimbursement?
24Example You are contacted to help design a
study of a D-Dimer test
- Background Plasma D-Dimer is a fibrin
degradation product (FDP) resulting from
activation of coagulation and fibrinolysis - Accuracy of Acme Cardiac D-Dimer
- Area under the ROC curve (sens. vs. 1-spec) 0.89
- Intra-assay reproducibility, CV 12
- Coefficient of Variation SD/mean
- Sensitivity (95) specificity 50 because of
FPs w/ CA, inflammation, surgery, etc. - FDA approved for measurement of D-Dimer
25Question D-Dimer
- D-Dimer is less accurate, but much faster and
cheaper than Doppler LE, spiral CT, V/Q scan, or
angiogram - If using the D-Dimer test to Dx a DVT or PE,
should you target patients with a low, medium, or
high pretest probability of disease? - Could use the pretest probabilities of PE from
PIOPED, Table 6 9, 30, 68 - Or use the pretest probabilities from Wells
clinical prediction rule for DVT 3, 17, 75 - Wells PS et al. Value of assessment of pretest
probability of deep-vein thrombosis in clinical
management. Lancet 19973501796
26Question D-Dimer
- Question How would you design the study to
determine whether availability of the D-Dimer
test in the ER or outpatient setting reduced the
use of Doppler LE, V/Q scans, and/or spiral CT?
27D-Dimer Results Goldstein N. et al. Arch Int Med
2001
28More Results Goldstein N. et al. 2001
29If You Dont Have a Gold Standard
Two Classes of Evidence for Validity
2. Construct Validity - Define a model and
collect data to test that model. e.g. vitamin K
supplementation and carboxylation of proteins
with certain base pairs (CALU, MGP) e.g.
warfarin Rx and change in levels of assay.
- Concurrent(Fuchs et al. 2003) used simultaneous
reading comprehension to validate a reading grade
level test - Eg. Troponin I correlates with CPK MB and ECG
change - Predictive
- E.g. drivers test and prediction of MVA
30Reliability
- Inter-rater Reliability Correlations between
users - Intra-rater Reliability Correlation w/ same
users - Why we have to be careful w/ use of correlation
in this context?
31Other Factors Influence the Clinical Decision to
Use a Diagnostic Test
- v Pretest probability of disease
- v Test sensitivity and specificity
- Test costs (clinical and financial)
- Treatment risks and benefits
32Determining the Optimal Use of Diagnostic Tests
for Patients with Acute Respiratory Infections
Probability of Specific Pathogen
0
100
X
Y
Diagnostic Test
Pathogen Rx
Alternative Dx
No Test/Test
Test/Treat
Adapted from Pauker and Kassirer. NEJM. 1980
33When Studying a Test
- Calculate the reliability/reproducibility
- Intra- and inter-observer and intra- and interlab
- CV, Kappa to measure concordance of categorical
vars, intraclass correlation coefficient (ICC)
for cont. vars. - Calculate the validity/accuracy
- Sensitivity, Specificity, PPV, NPV often require
a gold standard - May need to use rule of 3 to calculate 95 CI
of accuracy. - E.g. if you have 100 sensitivity w/ 20 patients,
the upper limit of the 95 CI 3/20 or 15. - Study design principles still hold
- Blinded assessment An accepted gold standard
- Adequate sampling
- Estimate the clinical utility
34Choosing the Threshold of a Test
Downs Syndrome
Normal Karyotype
Probability density function
NL Risk
Downs Risk
AFP
35Complexity and other characteristics of tests can
also be quantified
36Summary
- Accuracy of a diagnostic test can be summarized
by two measures sensitivity specificity - These numbers determine likelihood ratios, LR
- LR can be used on the odds form of Bayes formula
to design a study or evaluate the post-test odds
of disease or on a 2 x 2 table - The threshold to call a test , is a trade-off
between sensitivity specificity, costs, and
treatment benefits/risks
37Next Week
- 330-545 in Wohl Auditorium
- Scientific writing Jeanne Erdmann. Read
- Writing your manuscript
- "INSTEAD OF" and "TRY" tips in writing
- Time-to-event analysis Brian Gage. Read
- Katz MH. Multivariable Analysis A Primer for
Readers of Medical Research Ann Intern Med 2003 - Katz MH et al. Proportional hazards (Cox)
regression. J Gen Intern Med 1993
38More Examples 1 Lyme Disease
- Antibody assay
- Sensitivity 95 specificity 95
- High Lyme Disease prevalence (20)
- Positive predictive value 83
- Low Lyme Disease prevalence (2)
- Positive predictive value 28
- Brown SL. JAMA 199928262-6.
39More Examples 2 Use of D-Dimer post Joint
Replacement
- Suppose youd like to use the D-Dimer test to
determine how long to prescribe an anticoagulant
after orthopedic surgery, a controversial area - Youd like to test the hypothesis that
anticoagulant therapy can be stopped once the
D-Dimer falls to a certain level. - Youre not sure what that level is because
surgery can elevate the D-Dimer. - Theres also no agreement on how long to Rx an
anticoagulant (3 to 42 days is used). - How would you design a study to answer these
questions?
40More Examples 3 Mammography
- Mammography in women between 40-50 yrs
- If 100,000 low-risk women are screened
- 6,034 mammograms will have some abnormality
- 5,998 (99.4) will be false-positive
- 36 will actually have breast cancer
- Why? Prevalence 0.036
- Hamm RM. J Fam Pract 19984744-52.