Controlling the FDR in the Analysis of Genetic Expression - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Controlling the FDR in the Analysis of Genetic Expression

Description:

Multiple testing of such data will produce correlated test statistics ! ... Holm's procedure. Control FDR: Linear step-up procedure (Benjamini and Hochberg, 1995) ... – PowerPoint PPT presentation

Number of Views:249

Avg rating:3.0/5.0

Slides: 53

Provided by: anat3

Category:

more less

Transcript and Presenter's Notes

Title: Controlling the FDR in the Analysis of Genetic Expression

1
Controlling the FDR in the Analysis of Genetic
Expression
Anat Reiner Tel-Aviv University
2
Outline

Correlated test statistics

Complex Analysis

3
The False Discovery Rate (FDR) criterion
Benjamini and Hochberg (95)

R rejected hypotheses discoveries
V false discoveries
The error (type I) in the entire study is
measured by

i.e. the proportion of false discoveries among
the discoveries (0 if none found).
4
FDR controlling proceures

Linear Step-Up Procedure
(Benjamini and Hochberg, 95)

5
FDR Adjusted P-Values

For an individual hypothesis,

6
Data inter-dependencies

Between genes
Between measurement errors of expression levels

- co-regulation - spatial effects

RNA source
normalization process
pooled variability estimation

Multiple testing of such data will produce
correlated test statistics !
7
Resampling Idea

Create B data sets by permuting subjects order
(mix treatment and control)
Underlying AssumptionThe joint distribution of
p-values corresponding to the true null
hypotheses, which is generated through the
p-value resampling scheme, represents the real
joint distribution under the null hypothesis.

for each value of p, the number of
resampling-based p-values less than p, denoted by
V(p), is an estimate to the expected number of
p-values corresponding to true null hypotheses
less than p.
Since the FDR is also a function of the number of
false null hypotheses being rejected, estimate
conservatively the number of false null
hypotheses less than p, denoted by .

FDRV/R
RVS
number of resampling-based p-values less than p,
V(p), is an estimate to the expected number of
p-values corresponding to true null hypotheses
less than p.
estimate conservatively the number of false null
hypotheses less than p, denoted by
.

Then conservatively estimate the FDR adjustment
by
where two adjustments are suggested
The FDR local estimatorconservative on the mean
The FDR upper limitbounds the FDR with
probability 95.

BH Point Estimator
Use the linear step-up procedure to control the
FDR
Instead of raw p-values, p-values are estimated
by resampling from the marginal distribution
For the k-th gene, with an observed test
statistics tk,

12
Data Lipid Metabolism Study (Yang et al,
2001)
Reference(common control) a pool from8
control mice

Purpose
Identify genes with altered expression

13
Original Data Statistics
14
Study Applying Multiple Comparison Procedures to
Microarray Data
Procedures Used

Control FWE
Resampling-based procedure (Westfall et al 1989)
Holms procedure
Control FDR
Linear step-up procedure (Benjamini and
Hochberg, 1995)
Two Resampling-based procedures(Yekutieli et al,
1999)
BH point estimator

15
-ValuesAdjusted P Original Data
16
Simulation Study

To obtain simulation data
Remove effects.
Shuffle experiment and control groups.
Add effects to 70 randomly selected genes.
Apply multiple testing procedures (100
iterations).
Repeat 1-4 400 times
Calculate the average FDR and power over the 400
simulations

17
Simulation Differential Expression Patterns
r n 1/p i -1/p, i1,,n
18
-ValuesAdjusted P Simulation Data
19
(No Transcript)
20
Test Power
21
Conclusions

All four FDR controlling procedures retain higher
power than FWE controlling procedures.

The choice among the four is a matter of buying
more power and better properties at the expense
of more complicated computations

22
Conclusions (contd)

A substantial increase in power is gained when
the p-values are estimated by resampling, and
then used in the linear step-up procedure.

Still, if the software is available, the
researcher may be better off using the more
powerful resampling estimators.

23
Correlated Test Statistics
Positive Dependency (Benjamini Yekutieli, 2001
and Yekutieli, 2002).

The linear step-up procedure controls the FDR for
positive dependent test statistics.

This condition is satisfied by

- positively correlated one-sided normal and t
test statistics.
- absolute values of normal and t test
statistics, when all null hypotheses are true.
24
X1 vs. X2
Correlated Test Statistics Simulation Study
abs(X1) vs. abs(X2)
25
FDR vs. ?2, m2
26
FDR Deviation vs. ? (m2)
27
Joint Distribution of X2,X1 - FDR Areas
28
FDR vs. ?2
29
FDR vs. ?2
30
FDR vs. ?2, m3
31
FDR Deviation vs. ? (m3)
32
FDR Deviation vs. ? (m4)
33
FDR Deviation vs. ? (m6)
34
FDR for General m and corr. 1

Consider a set of m p-values
m0 of them correspond the subset of true null
hypotheses
m1m-m0 correspond the subset of false null
hypotheses
If correlation is 1, all p-values in each subset
are identical
represent these m p-values by two p-values ,
respective weights w0m0, w1m1.

35
FDR for General m BH proc.
36
FDR for General m LF Case
37
Maximal FDR and FDR Deviation vs. m0 / m
38
FDR in Complex Study

Family of Hypotheses (Westfall Young ,1993)
Questions asked form a natural and coherent unit
All tests are considered simultaneously
Probable that many or all hypotheses are true

39
FDR in Complex Study
Family of Hypotheses (Westfall Young ,1993)
Should FDR be directly controlled for all of the
hypotheses in the study?
40
FDR in Complex Study

Suggestions
Direct approach use scalability of FDR.
Select subsets using statistics that are
independent from step to step.

41
FDR in Complex Study

Suggestions
Organize families in hierarchical tree structure
and use appropriate FDR controlling procedures.

42
Gene expression relative to behavioral markers

Purpose
Identify changes is gene expression related to
behavioral effects of opiods.
DataExpression of 26,300 genes for 10 mouse
strains in 5 brain regions.

Experimental Design

44
Research Questions

Identify a pool of genes distinguishing between
strains
Pairwise comparisons of genes
Test for significant interaction indicating
unusual level of expression in particular strain
by brain-region combinations.
correlating strain differences in gene expression
levels and behavior markers.

45
Strain by brain-region interaction