Controlling the FDR in the Analysis of Genetic Expression - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Controlling the FDR in the Analysis of Genetic Expression

Description:

Multiple testing of such data will produce correlated test statistics ! ... Holm's procedure. Control FDR: Linear step-up procedure (Benjamini and Hochberg, 1995) ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 53
Provided by: anat3
Category:

less

Transcript and Presenter's Notes

Title: Controlling the FDR in the Analysis of Genetic Expression


1
Controlling the FDR in the Analysis of Genetic
Expression
Anat Reiner Tel-Aviv University
2
Outline
  • Correlated test statistics
  • Complex Analysis

3
The False Discovery Rate (FDR) criterion
Benjamini and Hochberg (95)
  • R rejected hypotheses discoveries
  • V false discoveries
  • The error (type I) in the entire study is
    measured by

i.e. the proportion of false discoveries among
the discoveries (0 if none found).
4
FDR controlling proceures
  • Linear Step-Up Procedure
  • (Benjamini and Hochberg, 95)

5
FDR Adjusted P-Values
  • For an individual hypothesis,

6
Data inter-dependencies
  • Between genes
  • Between measurement errors of expression levels

- co-regulation - spatial effects
  • RNA source
  • normalization process
  • pooled variability estimation

Multiple testing of such data will produce
correlated test statistics !
7
Resampling Idea
  • Create B data sets by permuting subjects order
    (mix treatment and control)
  • Underlying AssumptionThe joint distribution of
    p-values corresponding to the true null
    hypotheses, which is generated through the
    p-value resampling scheme, represents the real
    joint distribution under the null hypothesis.

8
  • for each value of p, the number of
    resampling-based p-values less than p, denoted by
    V(p), is an estimate to the expected number of
    p-values corresponding to true null hypotheses
    less than p.
  • Since the FDR is also a function of the number of
    false null hypotheses being rejected, estimate
    conservatively the number of false null
    hypotheses less than p, denoted by .

9
  • FDRV/R
  • RVS
  • number of resampling-based p-values less than p,
    V(p), is an estimate to the expected number of
    p-values corresponding to true null hypotheses
    less than p.
  • estimate conservatively the number of false null
    hypotheses less than p, denoted by
    .

10
  • Then conservatively estimate the FDR adjustment
    by
  • where two adjustments are suggested
  • The FDR local estimatorconservative on the mean
  • The FDR upper limitbounds the FDR with
    probability 95.

11
  • BH Point Estimator
  • Use the linear step-up procedure to control the
    FDR
  • Instead of raw p-values, p-values are estimated
    by resampling from the marginal distribution
  • For the k-th gene, with an observed test
    statistics tk,

12
Data Lipid Metabolism Study (Yang et al,
2001)
Reference(common control) a pool from8
control mice
  • Purpose
  • Identify genes with altered expression

13
Original Data Statistics
14
Study Applying Multiple Comparison Procedures to
Microarray Data
Procedures Used
  • Control FWE
  • Resampling-based procedure (Westfall et al 1989)
  • Holms procedure
  • Control FDR
  • Linear step-up procedure (Benjamini and
    Hochberg, 1995)
  • Two Resampling-based procedures(Yekutieli et al,
    1999)
  • BH point estimator

15
-ValuesAdjusted P Original Data
16
Simulation Study
  • To obtain simulation data
  • Remove effects.
  • Shuffle experiment and control groups.
  • Add effects to 70 randomly selected genes.
  • Apply multiple testing procedures (100
    iterations).
  • Repeat 1-4 400 times
  • Calculate the average FDR and power over the 400
    simulations

17
Simulation Differential Expression Patterns
r n 1/p i -1/p, i1,,n
18
-ValuesAdjusted P Simulation Data
19
(No Transcript)
20
Test Power
21
Conclusions
  • All four FDR controlling procedures retain higher
    power than FWE controlling procedures.
  • The choice among the four is a matter of buying
    more power and better properties at the expense
    of more complicated computations

22
Conclusions (contd)
  • A substantial increase in power is gained when
    the p-values are estimated by resampling, and
    then used in the linear step-up procedure.
  • Still, if the software is available, the
    researcher may be better off using the more
    powerful resampling estimators.

23
Correlated Test Statistics
Positive Dependency (Benjamini Yekutieli, 2001
and Yekutieli, 2002).
  • The linear step-up procedure controls the FDR for
    positive dependent test statistics.
  • This condition is satisfied by

- positively correlated one-sided normal and t
test statistics.
- absolute values of normal and t test
statistics, when all null hypotheses are true.
24
X1 vs. X2
Correlated Test Statistics Simulation Study
abs(X1) vs. abs(X2)
25
FDR vs. ?2, m2
26
FDR Deviation vs. ? (m2)
27
Joint Distribution of X2,X1 - FDR Areas
28
FDR vs. ?2
29
FDR vs. ?2
30
FDR vs. ?2, m3
31
FDR Deviation vs. ? (m3)
32
FDR Deviation vs. ? (m4)
33
FDR Deviation vs. ? (m6)
34
FDR for General m and corr. 1
  • Consider a set of m p-values
  • m0 of them correspond the subset of true null
    hypotheses
  • m1m-m0 correspond the subset of false null
    hypotheses
  • If correlation is 1, all p-values in each subset
    are identical
  • represent these m p-values by two p-values ,
  • respective weights w0m0, w1m1.

35
FDR for General m BH proc.
36
FDR for General m LF Case
37
Maximal FDR and FDR Deviation vs. m0 / m
38
FDR in Complex Study
  • Family of Hypotheses (Westfall Young ,1993)
  • Questions asked form a natural and coherent unit
  • All tests are considered simultaneously
  • Probable that many or all hypotheses are true

39
FDR in Complex Study
Family of Hypotheses (Westfall Young ,1993)
Should FDR be directly controlled for all of the
hypotheses in the study?
40
FDR in Complex Study
  • Suggestions
  • Direct approach use scalability of FDR.
  • Select subsets using statistics that are
    independent from step to step.

41
FDR in Complex Study
  • Suggestions
  • Organize families in hierarchical tree structure
    and use appropriate FDR controlling procedures.

42
Gene expression relative to behavioral markers
  • Purpose
  • Identify changes is gene expression related to
    behavioral effects of opiods.
  • DataExpression of 26,300 genes for 10 mouse
    strains in 5 brain regions.

43
  • Experimental Design

44
Research Questions
  • Identify a pool of genes distinguishing between
    strains
  • Pairwise comparisons of genes
  • Test for significant interaction indicating
    unusual level of expression in particular strain
    by brain-region combinations.
  • correlating strain differences in gene expression
    levels and behavior markers.

45
Strain by brain-region interaction
  • subset method
  • 957 genes identified in the first stage at
    thresh. 0.05
  • 50,000 interactions tested in second stage
  • only 13 interactions discovered

46
Strain by brain-region interaction
  • Hierarchical testing scheme
  • use thresh. 0.017
  • 758 genes are selected in the first stage
  • 76 interactions discovered

47
Correlation Analysis
  • subset method
  • Use thresh. 0.025 in 1st stage, 0.05 in 2nd stage
  • 225 triplicates discovered

48
Correlation Analysis
  • Hierarchical testing scheme
  • Use thresh. 0.025 in each stage
  • 230 triplicates discovered

49
tree 1st stage 0.025 2nd stage 0.025
50
Subset 1st stage 0.025 2nd stage 0.05
51
The two-staged procedure
  • Benjamini, Krieger, Yekutieli(00)
  • Use the BH at level q once, and get r1.
  • Estimate m0 by
  • Proved FDR q under independence Conjectured
    FDR q under positive dependency

52
Further Procedures
  • Recall that for BH procedure FDR m0/mq
  • Hence estimate m0, and
  • use qqm/mo instead of q in BH
  • The adaptive procedure

  • Benjamini Hochberg (89/00)
  • The two-stage procedure

  • Benjamini, Krieger, Yekutieli(00)
Write a Comment
User Comments (0)
About PowerShow.com