Title: Introduction to Biostatistics (BIO/EPI 540) Contingency Tables
1Introduction to Biostatistics(BIO/EPI 540)
Contingency Tables
Acknowledgement Thanks to Professor Pagano
(Harvard School of Public Health) for lecture
material
2Contingency Tables
- Nominal data that are grouped into categories are
often presented in the form of contingency tables
- Rows denote levels of one variable (e.g. disease)
- Columns denote the levels of the other variable
(e.g. exposure)
3Example Discrete Outcomes
Consider whether the rate of caesareans is
different for subjects receiving an electronic
fetal monitoring (EFM), as compared to those
without EMF.
Sample 5,824 deliveries of these 2,850 were EFM
exposed and 2,974 were not.
358 of the 2,850 had c-sections as did 229 of the
2,974.
Binomial with n huge.
4Do the c-section rates differ?
Example Discrete Outcomes
Chi square test
- Proceed as usual
- If there is no difference
- (null hypothesis) what do we expect to see?
- 2. How does this compare to what we have
observed? (statistic its distribution)
5Data-Contingency table
Caesarean Delivery EFM Exposure EFM Exposure Total
Caesarean Delivery Yes No Total
Yes 358 229 587
No 2,492 2,745 5,237
Total 2,850 2,974 5,824
If the c-section rate is the same in both
populations, then ignore column classification
and go with totals.
62x2 Table Null Hypothesis
- Ho The proportion of C-sections among patents
receiving EFM is identical to the proportion of
C-sections among patients who do not receive EMF - Ha The proportion of C-sections among patents
receiving EFM is different from the proportion of
C-sections among patients who do not receive EMF
7Probability of c-section
From the totals we can estimate
8Expected counts under Ho
What do we expect to see if EFM has no effect?
EFM exposed (2,850 mothers)
No EFM (2,974 mothers)
9Observed and Expected counts Contingency Table
Expected, if independence of row and column
classification is true, in boxes
C-sect EFM Exposure? EFM Exposure? EFM Exposure? EFM Exposure? Total
C-sect Yes Yes No No Total
Yes 358 287 229 300 587
No 2492 2563 2745 2674 5237
Total 2850 2850 2974 2974 5824
10Chi Square Goodness of fit
Chi Square Test
(Table page A-26)
11Continuity correction factor
In 2x2 tables (only) we apply a continuity
correction factor
12Example
For the EFM and c-section example, above
Note This is a 2 sided test
13Equivalent Tests
- The above example can be analyzed equivalently
using a two sample test of proportions (Chapter
14.6) - 2 sample test of proportions (Z test) and
Chi-Square test are mathematically equivalent
14Assumptions Chi Square test
- Chi square test is an asymptotic test. i.e.
Works only when sample size is large - Chi Square test treats the row total and column
total of the data as fixed (i.e. not random)
15Assumptions 2 sample test of proportions
- Z test is also an asymptotic test. Assumes that
the Central Limit Theorem for sample means (i.e.
proportions) holds. Thus this test is appropriate
only when sample size is large - Z test assumes that the proportions in each
group being compared are random variables
16Extending to multiple categories r x c Tables
e.g. Accuracy of Death Certificates
Hospit. Certificate Status Certificate Status Certificate Status Total
Hospit. Conf. Accur. Inacc. No Ch. Incorr. Recode Total
Comm. 157 18 54 229
Teach. 268 44 34 346
Total 425 62 88 575
17e.g.
Hospital Certificate Status Certificate Status Certificate Status Certificate Status Certificate Status Certificate Status Total
Hospital Confirmed Accurate Confirmed Accurate Inaccurate No Change Inaccurate No Change Incorrect Recoded Incorrect Recoded Total
Comm. 157 169.3 18 24.7 54 35.0 229
Teach. 268 255.7 44 37.3 34 53.0 346
Total 425 425 62 62 88 88 575
tabi 157 18 54 \ 268 44 34
18Summary
- Contingency Tables
- Analysis of 2x2 tables
- Analysis of rxc tables
- Equivalence between Chi square test and two
sample test of proportions