Title: Why multiple tests are a problem?
1Why multiple tests are a problem?
2Other names
- Multiple comparisons
- Data snooping
- Others?
3References
- H. Scheffe (1953), A method for judging all
contrasts in the analysis of variance,
Biometrika 4087-104 - D.B. Duncan (1965), A Bayesian Approach to
multiple comparisons Technometrics 7171-222. - J.W. Tukey (1953), The problem on multiple
comparisons reprinted in CWJWT Vol. VIII (1994) - R.G. Miller, Simultaneous Statistical nference,
2nd ed. (Springer 1981)
4Thanks to Yoav Benjamini
- Benjamini and Hochberg (1995) Controlling the
false discovery rate a practical and powerful
approach to multiple testing. JR Stat. Soc. Ser.
B
5Example
E. Giovannucci, A. Ascherio, E. Rimm, M.
Stampfer, G. Coldizt, W. Willett Intake of
Carotenoids and Retinol in Relation to Risk of
Prostate Cancer, Journal of the National Cancer
Insitute 87(23)1767--1776 (6 Dec 1995).
6Using responses to a validated,
semiquantitative food Frequency questionnaire
mailed to participants in the Health
Professionals Follow-up Study in 1986,
we assessed dietary intake for a 1-year period
for a cohort of 47,894 eligible subjects
initially free of diagnosed cancer....We
calculate the relative risk (RR) for each of the
upper categories of intake of a specific food
or nutrient by dividing the incidence of prostate
cancer among men in each of these categories by
the rate among men in the lowest intake level....
7Of 46 vegetables and fruits or related
products, four were significantly associated with
lower prostate cancer risk of the four ---
tomato sauce (P for trend 0.001), tomatoes (P
for trend 0.03), and pizza (P for trend
0.05), but not strawberries --- were primary
sources of lycopene.
8BUT the Methods section one page later
statesFor each of 131 food and beverage
items listed ...And the (presumably strongest)
carotenoids and p-valuesare listed in Table 2
(p.1770)Tomato sauce Tomatoes Tomato juice
Pizza 0.001 0.03 0.67 0.05Our
findings ... suggest that tomato-based foods may
beespecially beneficial regarding prostate
cancer risk.
9What is a p-value again?
- When nothing protects, we expect
- 131 x 0.05 ? 7
- foods/nutrients to have p-values lt 0.05
10Microarrays
- When no genes are changing between two groups we
expect - 20,000 x 0.01 200
- genes to have p-value lt 0.01
- However, false positives are not as bad as in
other fields
11What can we do?
- p-values no longer mean what they used to no
argument - Histogram of p-values is useful plot
- What can we do lots of argument
12Multiple Hypothesis Testing
CalledSignificant Not Called Significant Total
Null True V m0 V m0
Altern.True S m1 S m1
Total R m R m
Null Equivalent Expression Alternative
Differential Expression
13Error Rates
- Per comparison error rate (PCER) the expected
value of the number of Type I errors over the
number of hypotheses PCER E(V)/m - Per family error rate (PFER) the expected number
of Type I errors PFER E(V) - Family-wise error rate the probability of at
least one Type I error FEWR Pr(V 1) - False discovery rate (FDR) rate that false
discoveries occur FDR E(V/R Rgt0) E(V/R
Rgt0)Pr(Rgt0) - Positive false discovery rate (pFDR) rate that
discoveries are false pFDR E(V/R Rgt0) - Many others
14Conclusions
- Lets do a multiple comparison of the different
beers sold by the IF