Title: Friday 130 second computer lab session:
1Friday (1/30) second computer lab
session Location 3073 (3rd floor), Department
of Computational Biology, BST3, 3501 Fifth
Avenue. Time 930-1045AM
2Agenda
- Detecting differentially expressed (DE) genes
- Traditional t-test and Wilcoxon rank sum test
- Non-parametric permutation method
- Multiple testing
- Introduction
- False discovery rate
- Empirical Bayes analysis
- Significant Analysis of Microarray (SAM)
- Limma pacakge
- SAM software
3Statistical Issues in Microarray Analysis
Experimental design
Integrative analysis meta-analysis
41. Select DE genes
Which genes are differentially expressed between
tumor and normal?
51. Select DE genes
Traditional two-sample comparison by t-test
Equal variance
P-value is defined as the probability that, under
null hypothesis, the test statistic occur to be
at least as extreme as the observed value t.
61. Select DE genes
Traditional two-sample comparison by t-test
Unequal variance
Welsh approximation
P-value is defined as the probability that, under
null hypothesis, the test statistic occur to be
at least as extreme as the observed value t.
71. Select DE genes
The logic of hypothesis in biology and in
statistics is very different. Biological
hypothesis Design validation experiments to
validate the hypothesis. Key design a feasible
experiment and find a reasonable
control. Statistical hypothesis testing Set up
H0/HA and testing statistics. Hope to reject H0
by seeing that the observed test statistic is
extremely impossible to happen under null
hypothesis. Key Clarify the problem by setting
up H0/HA. Constructing a good test statistic and
obtaining its null distribution (under null
hypothesis) is the key.
81. Select DE genes
Statistical significance the observed
differential expression is unlikely to be due to
chance. Scientific (biological)
significance the observed level of differential
expression is of sufficient magnitude to be of
biological relevance.
91.1. Wilcoxon Rank Sum Test
- Problem with t-test
- Normal assumption (not robust)
- Method vulnerable to outliers.
101.1. Wilcoxon Rank Sum Test
X Y (78, 72) (102, 105) 2
1 3 4 WA213. Whats the null
distribution of WA? If X and Y have the same
distribution, then WA should have the same
probability of all the possible
permutations. (1,2)(3,4) WA123 (1,3)(2,4)
WA134 (1,4)(2,3) WA145 (2,3)(1,4)
WA235 (2,4)(1,3) WA246 (3,4)(1,2)
WA347 So p-value of the observed data is
2/60.33. Not significant because sample size too
small.
11Compare t-test and Wilcoxon rank sum test
- If data is normal, t-test is the most efficient.
Wilcoxon will lose some efficiency. - If data not normal, Wilcoxon is usually better
than t-test. - A surprising result is that even when data is
normal, Wilcoxon only lose very little
efficiency. - Pitman (1949) proposed the concept of asymptotic
relative efficiency (ARE) to compare two tests.
It is defined as the reciprocal ratio of sample
size needed to achieve the same statistical
power. - If t-test needs 100 samples, we only need
n2100/0.864115.7 samples for Wilcoxon to
achieve the same statistical power.
121.2. Permutation test
- T-test is efficient but not robust.
- Rank-sum test is robust but relatively
inefficient. - Permutation method can be an extension from any
powerful testing methods. - Instead of obtaining the null distribution from
model assumptions (such as normal assumption),
the null distribution is calculated by randomly
permuting the class labels. - Advantage The method is more robust (model-free)
than t-test and more efficient than Wilcoxon
test.
131.2. Permutation test
X Y (78, 72) (102, 105)
Whats the null distribution of T? If X and
Y have the same distribution, then T should have
the same probability of all the possible
permutations. (78, 72)(102, 105) ? T ? (78,
102)(72, 105) ? T ? (78, 105)(72, 102) ? T
? (72, 102)(78, 105) ? T ? (72, 105)(78, 102) ?
T ? (102, 105)(78, 72) ? T ? So p-value of the
observed data is 2/60.33. Not significant
because sample size too small.
Instead of rank, we can use any reasonable
statistics (e.g. t-statistics)
141.2. Permutation test
- Disadvantage
- The resampling nature of the method, however,
makes it slow. - The tail distribution is difficult to obtain. In
multiple comparison of thousands of testings, we
usually concern a small p-value. (talk about it
later) - e.g. If 2 samples in normal and 3 samples in
tumor, therere totally 5!120 permutations. The
p-value has no enough precision.
151.2. Permutation test
Other test statistics
- Welch t-test allows unequal variances
- Penalized t-statistics
- Efron et al. 2001
- Tusher et al. 2001 (SAM)
- Moderated t-statistics
- where is the
shrunken estimate of standard deviation.
? Is to stabilize the statistics in case s is
close to 0.
162. Multiple comparison
Which genes are differentially expressed between
tumor and normal?
So far we discuss hypothesis testing applied for
each individual gene.
172. Multiple comparison
p-value for each individual gene is misleading
for scientific interpretation.
Yeh UCSF
182. Multiple comparison
What does that mean? If a biologist gives us a
10000-gene data set that contains no DE gene
(like what we did in simulation), we will always
report around 100 DE genes using p-value0.01
threshold. P-value is a very misleading measure
in scientific interpretation when multiple
comparisons are performed simultaneously.
192. Multiple comparison
What a scientist cares is Among the detected DE
gene list, how many are real and how many are
false positives? gt concept of false discovery
rate (FDR)
202. Multiple comparison
- Family-wise Error Rate (FWER)
- Pr(Vgt0)Pr(at least one false positives)
- False Discovery Rate (FDR)
- FWER is usually too strict for many scientific
application. - (Benjamini and Hochberg, JRSS-B, 1995) proposed
the idea of FDR the proportion of type I errors
among rejected hypotheses. - FDRE(Q)E(V/R)
212. Multiple comparison
- Controlling FWER
- Bonferroni ?/m
- .
- If ?0.05, m6000 genes, then the p-value
threshold is selected as 0.05/60000.0000083. - The method is too conservative. Almost no genes
will be selected in microarray study. - Sidak
- minP (Westfall Young)
- maxT
- Other methods to control FWER have better bound.
But normally trying to control FWER selects only
very few genes.
222.2. False discovery rate
When to use FWER and when to use
FDR? FWERPr(Vgt0)Pr(at least one false
positives) FDRE(Q)E(V/R)
- Choose FWER if high confidence in ALL selected
genes is desired (for example, selecting
candidate genes for RT-PCR validation). Loss of
power due to strong control of type-I error. - Use more flexible FDR procedures if certain
proportions of false positives are tolerable
(e.g. gene discovery, selecting candidate
co-regulated gene sets for GO/pathway analysis).
232.2. False discovery rate
Implementation is easy. Assumes independence
between genes and take the p-values of each gene
as input.
Yeh UCSF
242.2. False discovery rate
Concept of q-value and a method to estimate it
Yeh UCSF
252.2. False discovery rate
Yeh UCSF
262.2. False discovery rate
Concept of q-value similar to p-value, its a
number attached to each gene that reflects the
corresponding FDR if the threshold is taken at
this gene.
27FWER and FDR
282.3. Empirical Bayes
Empirical Bayes analysis of a microarray
experiment. Efron B, Tibshirani R, Storey JD and
Tusher V. JASA 2001
292.3. Empirical Bayes
We will come back for the selection of a0 later.
302.3. Empirical Bayes
p1 probability of being a DE gene. p0
1-p1probability of being not a DE gene. f1(z)
the density of Z for DE genes. f2(z) the
density of Z for non-DE genes. Mixture density
for Z f(z) p0 f0(z) p1 f1(z)
312.3. Empirical Bayes
Bayes rule
How to estimate f(z), f0(z) and p0?
322.3. Empirical Bayes
Estimate f0(z) and f(z) U, I unirradiated or
irradiated by ions A, B two wild-type human
lymphoblastoid cell ines. 1, 2 replicate slides
332.3. Empirical Bayes
Estimate p0
Obtain an upper boud of p0 . In general, it is
very difficult to get a good estimate of p0.
342.3. Empirical Bayes
352.3. Empirical Bayes
Corresponding estimates of f, f0 and f1.
362.3. Empirical Bayes
- Conclusion
- The paper provides a unifying model with very
careful statistical robust inference. The result
gives much statistical insight. - Assume independence among genes which is not
true.
- Why not popular in routine microarray analysis?
- No convenient software to implement the method.
- The estimation of f0, p0 and a0 generally are not
easy. - Biologists like easy methods (instead of a black
box). They may prefer a slightly inferior method
but working and understandable.
- Lesson to learn
- Develop methods that the biologists concern.
- Statisticians need to promote their methods and
explain the advantage of using a more complex
model.
372.4. Significant analysis of microarray (SAM)
- Constructing a test statistics for each gene
- s0 is introduced to stabilize the variance of
d(i) when s(i) small. - Suppose d(i) are ordered such that d(1)?d(2).
382.4. Significant analysis of microarray (SAM)
- Compute the null distribution via permutaion of
samples - For each permutation p, similarly compute dp(i)
such that dp(1)?dp(2). - Define dE(i)Averagep(dp(i)).
- Criterion for calling a DE gene is judged by the
threshold ? - if d(i)-dE(i)gt ?
392.4. Significant analysis of microarray (SAM)
- How many false positives? Estimate through
permuted samples
Is it a good estimate? Its a conservative
estimate (the estimated FDR is larger than
actual FDR).
402.4. Significant analysis of microarray (SAM)
Estimated from permutation Not from biological
validation
412.4. Significant analysis of microarray (SAM)
Software package SAM
http//www-stat.stanford.edu/tibs/SAM/
42ANOVA (Analysis of Variance)Model
2.5. LIMMA package
- Let yijkg be the fluorescent intensity measured
from Array i, Dye j, Variety k, and Gene g, on
the appropriate scale (such as log). A typical
analysis of variance (ANOVA) model is - yijkg µ Ai Dj Vk Gg (AG)ig (DG)jg
(VG)kg ?ijkg
µ, A, D, V are normalization terms G are
the overall gene effects AGs are spot
effects DGs are gene-specific dye
effects VGs are the effects of interest. The
capture the expression of genes specifically
attributable to varieties. ? is random error
43Two stage ANOVA
2.5. LIMMA package
- Global ANOVA model
- yijkgr µ Ai Dj Vk Gg (AG)ig (DG)jg
(VG)kg eijkg - However, fitting the global model is
computationally prohibitive. In stead, breaking
the model into two stages - Two stage ANOVA
- Fit the normalization model
- yijkg µ Ai Dj Vk rijkgr
- Fit residuals on per gene basis
- rijkr G (AG)i (DG)j (VG)k eijk
44limma package
2.5. LIMMA package
- Fitting of gene-wise linear models to estimate
log ratios between two or more target samples
simultaneously lm.series, rlm.series, glm.series
(handle replicate spots). - ebayes moderated t-statistics and log-odds of
differential expression by empirical Bayes
shrinkage of the standard errors towards a common
value.
45Conclusion
- How many DE genes in condition A versus B?
- Difficulties encountered
- Parametric assumptions not valid
- Complex correlation structure between genes
- P-value threshold even for FDR is usually very
small. The accuracy of tail distribution may be
problematic. (e.g. permutation methods) - Interprete p-values carefully
- DE is continuous, not binary
- In practice, its sometimes more realistic to
rank the genes in order of evidence (test
statistics) of DE, just as if you dont know
statistics at all.
Remember the strength of microarray is the
ability of global monitoring, not a rigorous
biological proof of a single genes function or
behavior.
46Reference
Multiple comparison -- Benjamini, Y Y Hochberg
(1995). Controlling the false discovery rate a
practical and powerful approach to multiple
testing. Journal of the Royal Statistical Society
B, 57, 289-300.-- Westfall, PH and SS Young
(1993) Resampling-based multiple testing
Examples and Methods for p-Value Adjustment.
Wiley.-- Storey (2002) A direct approach to
false discovery rates. J. Roy. Stat. Soc. Ser. B,
64479-498. -- Efron B, Tibshirani R, Storey JD
and Tusher V. (2001) Empirical Bayes analysis fo
a microarray experiment. JASA 96(456)1151-1160.-
- Tusher V, Tibshirani R and Chu G. Significance
analysis of microarrays applied to the ionizing
radiation response. PNAS 2001 98 5116-5121.-- S
Dudoit, JP Shaffer JC Boldrick (2003). Multiple
hypothesis testing in microarray experiments.
Statistical Science, Vol. 18, 71-103. A lot
more in the past few years!! Multiple comparison
is a hot topic in statistical community.
473. SAM
1. SAM Excel-add-on installation After
installation, two SAM button controls will be
shown in Excel..
SAM Button Control
483. SAM
2. Two-sample comparison Highlight data then
click SAM. A dialog box will pop-up for
parameters in SAM. Check for SAM manual for data
format for un-paired two sample comparison,
paired two sample comparison and multiple
class comparison.
493. SAM
3. Select parameter Delta A SAM plot
controller will pop-up to enhance selection of a
suitable Delta (which correspond to FDR).
FDR information
Selection of Delta
503. SAM
The Excel add-on is convenient for biologist to
apply the method with visualization. In the lab
session and homework, we will learn to use the
SAM package in R.