Title: Analysis of matched data; plus, diagnostic testing
1Analysis of matched data plus, diagnostic testing
2Correlated Observations
- Correlated data arise when pairs or clusters of
observations are related and thus are more
similar to each other than to other observations
in the dataset. - Ignoring correlations will
- overestimate p-values for within-person or
within-cluster comparisons - underestimate p-values for between-person or
between-cluster comparisons
3Pair Matching Why match?
- Pairing can control for extraneous sources of
variability and increase the power of a
statistical test. - Match 1 control to 1 case based on potential
confounders, such as age, gender, and smoking.
4Example
- Johnson and Johnson (NEJM 287 1122-1125, 1972)
selected 85 Hodgkins patients who had a sibling
of the same sex who was free of the disease and
whose age was within 5 years of the
patientsthey presented the data as.
OR1.47 chi-square1.53 (NS)
From John A. Rice, Mathematical Statistics and
Data Analysis.
5Example
- But several letters to the editor pointed out
that those investigators had made an error by
ignoring the pairings. These are not independent
samples because the sibs are pairedbetter to
analyze data like this
OR2.14 chi-square2.91 (p.09)
From John A. Rice, Mathematical Statistics and
Data Analysis.
6Pair Matching example
- Match each MI case to an MI control based on age
and gender. - Ask about history of diabetes to find out if
diabetes increases your risk for MI.
7Pair Matching example
Which cells are informative?
8Pair Matching
OR estimate comes only from discordant pairs! The
question is among the discordant pairs, what
proportion are discordant in the direction of the
case vs. the direction of the control. If more
discordant pairs favor the case, this indicates
ORgt1.
9P(favors case/discordant pair)
10odds(favors case/discordant pair)
11OR estimate comes only from discordant
pairs!! OR 37/16 2.31 Makes Sense!
12McNemars Test
Null hypothesis P(favors case / discordant
pair) .5 (note equivalent to OR1.0 or cell
bcell c)
13McNemars Test
Null hypothesis P(favors case / discordant
pair) .5 (note equivalent to OR1.0 or cell
bcell c)
By normal approximation to binomial
14McNemars Test generally
By normal approximation to binomial
Equivalently
15McNemars Test
McNemars Test
16Example McNemars EXACT test
- Split-face trial
- Researchers assigned 56 subjects to apply SPF 85
sunscreen to one side of their faces and SPF 50
to the other prior to engaging in 5 hours of
outdoor sports during mid-day. The outcome is
sunburn (yes/no). - Unit of observation side of a face
- Are the observations correlated? Yes.
Russak JE et al. JAAD 2010 62 348-349.
17Results ignoring correlation
Table I -- Dermatologist grading of sunburn
after an average of 5 hours of skiing/snowboarding
(P .03 Fishers exact test)
Sun protection factor Sunburned Not sunburned
85 1 55
50 8 48
Fishers exact test compares the following
proportions 1/56 versus 8/56. Note that
individuals are being counted twice!
18Correct analysis of data
Table 1. Correct presentation of the data (P
.016 McNemars exact test).
SPF-50 side SPF-50 side
SPF-85 side Sunburned Not sunburned
Sunburned 1 0
Not sunburned 7 48
McNemars exact test Null hypothesis
Xbinomial (n7, p.5)
19RECALL 95 confidence interval for a difference
in INDEPENDENT proportions
2095 CI for difference in dependent proportions
2195 CI for difference in dependent proportions
22The connection between McNemar and
Cochran-Mantel-Haenszel Tests
23View each pair is its own age-gender stratum
Example Concordant for exposure (cell a from
before)
24x 9
x 37
x 16
x 82
25Mantel-Haenszel for pair-matched data
We want to know the relationship between diabetes
and MI controlling for age and gender (the
matching variables). Mantel-Haenszel methods
apply.
26RECALL The Mantel-Haenszel Summary Odds Ratio
27ad/T 0 bc/T0
ad/T1/2 bc/T0
ad/T0 bc/T1/2
ad/T0 bc/T0
28Mantel-Haenszel Summary OR
29Mantel-Haenszel Test Statistic(same as McNemars)
30Concordant cells contribute nothing to
Mantel-Haenszel statistic (observedexpected)
31Discordant cells
32(No Transcript)
33(No Transcript)
34Example Salmonella Outbreak in France, 1996
From Large outbreak of Salmonella enterica
serotype paratyphi B infection caused by a goats'
milk cheese, France, 1993 a case finding and
epidemiological study BMJ 312 91-94 Jan 1996.
35(No Transcript)
36Epidemic Curve
37Matched Case Control Study
- Case Salmonella gastroenteritis.
- Community controls (11) matched for
- age group (lt 1, 1-4, 5-14, 15-34, 35-44, 45-54,
55-64, or gt 65 years) - gender
- city of residence
38Results
39In 2x2 table form any goats cheese
40In 2x2 table form Brand A Goats cheese
41x8
x24
x2
x25
42Summary 8 concordant-exposed pairs (strata)
contribute nothing to the numerator
(observed-expected0) and nothing to the
denominator (variance0).
Summary 25 concordant-unexposed pairs contribute
nothing to the numerator (observed-expected0)
and nothing to the denominator (variance0).
43Summary 2 discordant control-exposed pairs
contribute -.5 each to the numerator
(observed-expected -.5) and .25 each to the
denominator (variance .25).
Summary 24 discordant case-exposed pairs
contribute .5 each to the numerator
(observed-expected .5) and .25 each to the
denominator (variance .25).
44(No Transcript)
45Diagnostic Testing and Screening Tests
46Characteristics of a diagnostic test
- Sensitivity Probability that, if you truly have
the disease, the diagnostic test will catch it. - SpecificityProbability that, if you truly do not
have the disease, the test will register
negative.
47Calculating sensitivity and specificity from a
2x2 table
ab
cd
Among those with true disease, how many test
positive?
Among those without the disease, how many test
negative?
48Hypothetical Example
10
990
Sensitivity9/10.90
1 false negatives out of 10 cases
Specificity 881/990 .89
109 false positives out of 990
49What factors determine the effectiveness of
screening?
- The prevalence (risk) of disease.
- The effectiveness of screening in preventing
illness or death. - Is the test any good at detecting
disease/precursor (sensitivity of the test)? - Is the test detecting a clinically relevant
condition? - Is there anything we can do if disease (or
pre-disease) is detected (cures, treatments)? - Does detecting and treating disease at an earlier
stage really result in a better outcome? - The risks of screening, such as false positives
and radiation.
50Positive predictive value
- The probability that if you test positive for the
disease, you actually have the disease. - Depends on the characteristics of the test
(sensitivity, specificity) and the prevalence of
disease.
51Example Mammography
- Mammography utilizes ionizing radiation to image
breast tissue. - The examination is performed by compressing the
breast firmly between a plastic plate and an
x-ray cassette that contains special x-ray film. - Mammography can identify breast cancers too small
to detect on physical examination. - Early detection and treatment of breast cancer
(before metastasis) can improve a womans chances
of survival. - Studies show that, among 50-69 year-old women,
screening results in 20-35 reductions in
mortality from breast cancer.
52Mammography
- Controversy exists over the efficacy of
mammography in reducing mortality from breast
cancer in 40-49 year old women. - Mammography has a high rate of false positive
tests that cause anxiety and necessitate further
costly diagnostic procedures. - Mammography exposes a woman to some radiation,
which may slightly increase the risk of mutations
in breast tissue.
53Example
- A 60-year old woman has an abnormal mammogram
what is the chance that she has breast cancer?
E.g., what is the positive predictive value?
54Calculating PPV and NPV from a 2x2 table
ac
bd
Among those who test positive, how many truly
have the disease?
Among those who test negative, how many truly do
not have the disease?
55Hypothetical Example
118
882
PPV9/1187.6
NPV881/88299.9
Prevalence of disease 10/1000 1
56What if disease was twice as prevalent in the
population?
20
980
sensitivity18/20.90
specificity872/980.89
Sensitivity and specificity are characteristics
of the test, so they dont change!
57What if disease was more prevalent?
126
874
PPV18/12614.3
NPV872/87499.8
Prevalence of disease 20/1000 2
58Conclusions
- Positive predictive value increases with
increasing prevalence of disease - Or if you change the diagnostic tests to improve
their accuracy.