Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments

About This Presentation
Title:

Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments

Description:

Use Proportion of Overlapping Genes (POG) as a measure of reproducibility, based ... Compute POG on simulated pairs of gene lists; list sizes range from 10 to 15000 ... –

Number of Views:42
Avg rating:3.0/5.0
Slides: 28
Provided by: sas88
Category:

less

Transcript and Presenter's Notes

Title: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments


1
Reproducibility and Ranks of True Positives in
Large Scale Genomics Experiments
  • Russ Wolfinger1, Dmitri Zaykin2, Lev
    Zhivotovsky3,
  • Wendy Czika1, Susan Shao1
  • 1SAS Institute, Inc., 2National Institute of
    Environmental Health Sciences, 3Vavilov Institute
    of General Genetics
  • MCP Vienna
  • July 11, 2007

2
Criticism of Statistical Methods in Genomics
  • Two labs run the same microarray experiment, and
    resulting lists of significant genes barely
    overlap.
  • Significant SNPs from a genetic study are not
    validated in subsequent follow up studies.
  • Conclusions from scientific community
  • Statistical results are not reproducible.
  • Genomics technology is not reliable.

3
P vs FC Controversy
  • Occurred recently within the FDA-driven
    Microarray Quality Control Consortium (MAQC)
  • Biologists, chemists, regulators concerned with
    lack of reproducibility of significant gene
    lists, and have observed that lists based on fold
    change (FC) are more consistent than those based
    on p-values (P)
  • Statisticians usually seek an optimal tradeoff
    between specificity (Type 1) and sensitivity
    (Type 2, power), often portrayed in a Receiver
    Operating Characteristics (ROC) plot

4
Outline
  • Reproducibility versus specificity and
    sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true
    positives
  • All results are based on simulation.

5
Questions
  • Should statisticians concern themselves with
    reproducibility, the hallmark of science? YES!
  • How to define reproducibility?
  • How does it relate to specificity and
    sensitivity?
  • Is it possible to dialectically reconcile
    conflicting perspectives, or at least provide an
    explanatory (and hence mollifying) framework?

6
Simulation Study 1 Based on MAQC Phase 1
Experiment
  • Initially designed and implemented by Wendell
    Jones, Expression Analysis Inc.
  • Two treatment groups, n5 in each
  • 15,000 genes, 1000 truly changed with varying
    degrees of expression that mimic real data
  • Coefficient of variation (CV) on original data
    scale set to varying percentages
    (2, 10, 30, 100)

7
Simulation Study 1 (continued)
  • For sake of simplicity, we focus only on
    gene-selection rules based on fold change (FC,
    same as effect size) or simple t-test p-values
  • Note that gene lists can be constructed in many
    other ways e.g. shrunken t-statistics
  • Use Proportion of Overlapping Genes (POG) as a
    measure of reproducibility, based on simple Venn
    diagram
  • Compute POG on simulated pairs of gene lists
    list sizes range from 10 to 15000
  • Require direction of FC to match

8
Simulated POG vs. Gene List Size
FC Ranking
P-Value Ranking
9
Three Dimensions CV2
FC Ranking
P-Value Ranking
10
Discussion 1
  • Reproducibility is not monotonically related to
    specificity and sensitivity.
  • There appear to be tradeoffs in all three
    dimensions specificity, sensitivity, and
    reproducibility.
  • The weight attached to each dimension depends on
    the objectives of the study.
  • Simple rules based on both FC and P-value cutoffs
    appear viable as a starting compromise.
  • Challenge you to

11
Enter the Third Dimension
Specificity Sensitivity - Reproducibility
12
Volcano Plots Help Visualize Ranking Rules
Dormant Volcano from Two-Sample T-Test (df4)
on 10,000 Genes
13
Outline
  • Reproducibility versus specificity and
    sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true
    positives
  • All results are based on simulation.

14
Simulation Study 2A Number of Best T-Test
Results Required to Cover a Single True Positive
  • Compare different ranking rules based on P, FC,
    or functional combination
  • Two treatment groups, n100 in each
  • 38,500 t-tests (4 df), only 1 truly changed
  • Power for the one true positive set to (80, 90,
    95, 99, and 80-Sidák) at alpha5

15
Simulation Study 2A ResultsNumber of best t-test
(df4) results out of 38,500 required to cover a
single true positive with 95 probability
Ranking by Ranking by Ranking by Ranking by Ranking by
Power p-value (p) log(p) d1/2 log(p) d log(p) d2 d
80 at 5 7255 6727 6544 6410 6374
90 at 5 2067 1868 1863 1937 2322
95 at 5 467 422 455 531 856
99 at 5 11 11 16 26 101
80 at a 1 1 1 2 12
p p-value d effect size a
1-(1-0.05)(1/38500)
16
Simulation Study 2B Number of Best Chi-Square
Test Results Required to Cover a Single True
Positive
  • Again compare different ranking rules based on
    p-value, effect size, or a functional combination
  • Two binomial proportions, n500 in each group
  • 200,000 chi-square 1-df tests, only 1 true
    association
  • Genetic allele frequency for true negatives
    simulated to be uniform 0.05,0.95
  • Genetic allele frequency for true positive
    control group set to 0.1 or 0.5. Frequency for
    case group set higher to achieve power of (80,
    90, 95, 99, and 80-Sidák) at alpha5

17
Simulation Study 2B ResultsNumber of best
chi-square (1 df) test results out of 200,000
required to cover a single true positive with 95
probability TP case frequency 0.1
Ranking by Ranking by Ranking by Ranking by Ranking by
Power p-value (p) log(p) d1/2 log(p) d log(p) d2 d
80 at 5 38776 43559 46292 49332 58689
90 at 5 12159 15075 16895 19675 27466
95 at 5 2753 3764 4667 5900 10102
99 at 5 55 101 157 261 869
80 at a 1 1 1 2 7
p p-value d effect size a
1-(1-0.05)(1/200,000)
18
Simulation Study 2B ResultsNumber of best
chi-square (1 df) test results out of 200,000
required to cover a single true positive with 95
probability TP case frequency 0.5
Ranking by Ranking by Ranking by Ranking by Ranking by
Power p-value (p) log(p) d1/2 Log(p) d log(p) d2 d
80 at 5 39940 35887 33784 31678 28451
90 at 5 11107 9293 8451 7682 6685
95 at 5 2962 2338 2078 1856 1582
99 at 5 51 36 31 27 23
80 at a 1 1 1 1 1
p p-value d effect size a
1-(1-0.05)(1/200,000)
19
Discussion 2
  • Incorporating effect size into ranking rules can
    improve ranking performance, particularly when
    variance of true positives is comparatively
    larger than variance of true negatives
  • Possible Empirical Bayes effect

20
Outline
  • Reproducibility versus specificity and
    sensitivity
  • Rank distribution of a single true positive
  • P-value combination methods for multiple true
    positives
  • All results are based on simulation.

21
Simulation Study 3 Compare Power of P-Value
Combination Methods with Multiple True Positives
  • 5,000 Chi-Square (1 df) tests
  • Number of true associations ranges from 10 to 200
    with various powers
  • Compare Sidak, Simes, Fisher Combination, and
    three more modern methods
  • Gamma Method (GM)
  • Truncated Product Method (TPM)
  • Rank Truncated Product (RTP)

22
Gamma Method (GM)
  • Generalization of Fisher and Stouffer
  • Sum inverse Gamma-transformed 1-pi
  • Tune using Soft Truncation Threshold,
    accommodates effect heterogeneity

23
Truncated Product Method (TPM)
  • Combine only the subset of p-values less than
    some threshold
  • Assess significance by evaluating product
    distribution via Monte Carlo on uniforms.
  • Upon rejecting the null, can claim true positives
    are in the subset

24
Rank Truncated Product (RTP)
  • Combine the K smallest p-values
  • Assess significance by evaluating product
    distribution with Monte Carlo
  • K1 same as Sidak, Kmax same as Fisher
  • On rejecting the null, cannot claim true
    positives are in the subset

25
Simulation Study 3 ResultsPower of different
p-value combination methods from 5,000
chi-square (1 df) tests
TA TA Power Sidák Simes Fisher GM 0.05 GM 0.1 TPM 0.05 TPM 0.01 TPM 0.005 TPM 0.001 RTP 10 RTP 50 RTP 100 RTP 200
10 0.90 0.899 0.756 0.225 0.791 0.650 0.279 0.455 0.550 0.752 0.879 0.814 0.739 0.625
50 0.50 0.498 0.351 0.525 0.799 0.789 0.595 0.650 0.656 0.601 0.636 0.751 0.769 0.764
50 0.60 0.592 0.553 0.693 0.961 0.950 0.788 0.876 0.888 0.864 0.875 0.947 0.951 0.942
100 0.30 0.297 0.181 0.598 0.644 0.697 0.595 0.543 0.495 0.378 0.377 0.544 0.607 0.649
100 0.40 0.401 0.339 0.831 0.926 0.944 0.861 0.853 0.825 0.715 0.703 0.874 0.907 0.926
200 0.20 0.202 0.143 0.756 0.653 0.746 0.696 0.563 0.490 0.332 0.314 0.511 0.605 0.682
200 0.25 0.255 0.216 0.920 0.883 0.936 0.895 0.814 0.742 0.545 0.509 0.765 0.847 0.904
200 0.30 0.297 0.300 0.981 0.978 0.992 0.980 0.949 0.915 0.764 0.715 0.932 0.967 0.984
26
Discussion 3
  • Gamma Method competitive as a global test
  • Truncated Product Method enables more specific
    inference.

27
Reproducibility and Ranks of True Positives in
Large Scale Genomics Experiments
  • Russ Wolfinger1, Dmitri Zaykin2, Lev
    Zhivotovsky3,
  • Wendy Czika1, Susan Shao1
  • 1SAS Institute, Inc., 2National Institute of
    Environmental Health Sciences, 3Vavilov Institute
    of General Genetics
  • MCP Vienna
  • July 11, 2007
Write a Comment
User Comments (0)
About PowerShow.com