Why multiple tests are a problem? - PowerPoint PPT Presentation

About This Presentation
Title:

Why multiple tests are a problem?

Description:

However, false positives are not as bad as in other fields. What can we do? ... False discovery rate (FDR) rate that false discoveries occur ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 15
Provided by: tri5268
Category:

less

Transcript and Presenter's Notes

Title: Why multiple tests are a problem?


1
Why multiple tests are a problem?
  • Rafael A. Irizarry

2
Other names
  • Multiple comparisons
  • Data snooping
  • Others?

3
References
  • H. Scheffe (1953), A method for judging all
    contrasts in the analysis of variance,
    Biometrika 4087-104
  • D.B. Duncan (1965), A Bayesian Approach to
    multiple comparisons Technometrics 7171-222.
  • J.W. Tukey (1953), The problem on multiple
    comparisons reprinted in CWJWT Vol. VIII (1994)
  • R.G. Miller, Simultaneous Statistical nference,
    2nd ed. (Springer 1981)

4
Thanks to Yoav Benjamini
  • Benjamini and Hochberg (1995) Controlling the
    false discovery rate a practical and powerful
    approach to multiple testing. JR Stat. Soc. Ser.
    B

5
Example
E. Giovannucci, A. Ascherio, E. Rimm, M.
Stampfer, G. Coldizt, W. Willett Intake of
Carotenoids and Retinol in Relation to Risk of
Prostate Cancer, Journal of the National Cancer
Insitute 87(23)1767--1776 (6 Dec 1995).
6
Using responses to a validated,
semiquantitative food Frequency questionnaire
mailed to participants in the Health
Professionals Follow-up Study in 1986,
we assessed dietary intake for a 1-year period
for a cohort of 47,894 eligible subjects
initially free of diagnosed cancer....We
calculate the relative risk (RR) for each of the
upper categories of intake of a specific food
or nutrient by dividing the incidence of prostate
cancer among men in each of these categories by
the rate among men in the lowest intake level....
7
Of 46 vegetables and fruits or related
products, four were significantly associated with
lower prostate cancer risk of the four ---
tomato sauce (P for trend 0.001), tomatoes (P
for trend 0.03), and pizza (P for trend
0.05), but not strawberries --- were primary
sources of lycopene.
8
BUT the Methods section one page later
statesFor each of 131 food and beverage
items listed ...And the (presumably strongest)
carotenoids and p-valuesare listed in Table 2
(p.1770)Tomato sauce Tomatoes Tomato juice
Pizza 0.001 0.03 0.67 0.05Our
findings ... suggest that tomato-based foods may
beespecially beneficial regarding prostate
cancer risk.
9
What is a p-value again?
  • When nothing protects, we expect
  • 131 x 0.05 ? 7
  • foods/nutrients to have p-values lt 0.05

10
Microarrays
  • When no genes are changing between two groups we
    expect
  • 20,000 x 0.01 200
  • genes to have p-value lt 0.01
  • However, false positives are not as bad as in
    other fields

11
What can we do?
  • p-values no longer mean what they used to no
    argument
  • Histogram of p-values is useful plot
  • What can we do lots of argument

12
Multiple Hypothesis Testing
CalledSignificant Not Called Significant Total
Null True V m0 V m0
Altern.True S m1 S m1
Total R m R m
Null Equivalent Expression Alternative
Differential Expression
13
Error Rates
  • Per comparison error rate (PCER) the expected
    value of the number of Type I errors over the
    number of hypotheses PCER E(V)/m
  • Per family error rate (PFER) the expected number
    of Type I errors PFER E(V)
  • Family-wise error rate the probability of at
    least one Type I error FEWR Pr(V 1)
  • False discovery rate (FDR) rate that false
    discoveries occur FDR E(V/R Rgt0) E(V/R
    Rgt0)Pr(Rgt0)
  • Positive false discovery rate (pFDR) rate that
    discoveries are false pFDR E(V/R Rgt0)
  • Many others

14
Conclusions
  • Lets do a multiple comparison of the different
    beers sold by the IF
Write a Comment
User Comments (0)
About PowerShow.com