Multiple Comparisons for Microarray Experiments Motivation and Methods - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Multiple Comparisons for Microarray Experiments Motivation and Methods

Description:

Mik Black, The University of Auckland, February 13, 2004. SPONSORED BY ... Usually a low quantile of the distribution of gene-specific standard errors. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 28
Provided by: tea878
Category:

less

Transcript and Presenter's Notes

Title: Multiple Comparisons for Microarray Experiments Motivation and Methods


1
SPONSORED BY
2
Microarrays, pathways and Thomas Bayes
  • Mik Black
  • Department of Statistics, The University of
    Auckland
  • Otago Genomics Facility Meeting
  • February 13, 2004

3
Overview
  • Statisticians and microarrays.
  • Summary of current research.
  • Statistical models for microarray data.
  • False discovery rate control.
  • Current and future work.

4
Statisticians and microarrays
  • Statisticians role
  • Experimental design
  • Power calculations
  • Normalization
  • Analysis
  • Similar to any experimental situation.
  • Often little scope for novel statistical research.

5
Why bother?
  • Build research partnerships.
  • Interdisciplinary crosstalk.
  • Publications.
  • Increase diversity of research.

6
Why bother?
  • Build research partnerships.
  • Interdisciplinary crosstalk.
  • Publications.
  • Increase diversity of research.
  • Encourage good statistical practice.
  • Early adoption of novel statistical methods.

7
Student research
  • Marcus Davy (M.Sc. thesis)
  • characteristics of FDR controlling procedures for
    microarray experimentation.
  • Hadley Wickham (M.Sc. project)
  • spatial normalization.
  • visualization techniques.
  • Thomas Tiang (PGDipSci project)
  • normalization strategies.
  • Po-Hsun Huang (Summer studentship/Honours
    project)
  • normalization for boutique arrays.

8
Personal research
  • Bayesian and likelihood-based models for
    microarray data.
  • Semi-parametric modeling.
  • False discovery rate control and extensions.

9
Standard post-normalization analysis
  • Calculate test statistics for each gene.
  • These reflect the magnitude of observed
    differential expression relative to observed
    variability.
  • Use resampling methods to obtain p-values.
  • Bootstrapping
  • Permutations
  • Use p-values to determine genes undergoing
    significant differential expression, subject
    pre-determined level of error rate control.
  • Parametric alternatives use normal distribution
    theory.

10
Bayesian models for microarrays
  • Newton et al. (2001) introduced a Bayesian model
    for single-array experiments based on Gamma
    distributions.
  • This approach has since been extended to
    encompass multiple arrays, and to provide greater
    flexibility in terms of (non)parametric
    assumptions (Kendziorski et al. 2002 Newton and
    Kendziorski, 2002 Newton et al. 2003).
  • Produces estimates of the probability of
    differential expression for each gene in the
    experiment.
  • These probabilities can then be used to produce a
    list of genes undergoing significant
    differential expression.

11
Which approach?
  • P-values
  • Advantages simple, (non-)parametric, standard.
  • Disadvantages under-powered, genes considered in
    isolation, variance structure?
  • Bayesian model
  • Advantages intuitive output (probabilities),
    models underlying distribution of expression
    means (improved power), variance shrinkage.
  • Disadvantages poor performance under model
    mis-specification, complex implementation,
    non-standard.

12
Determining differential expression
  • Each of the previously described methods produces
    an estimate of the likelihood of differential
    expression for each gene.
  • The next step involves deciding which genes
    should be considered to have undergone
    significant differential expression.
  • This decision is closely linked to the level of
    error we are willing to tolerate in our analysis.
  • Although there are many options available,
    control of the false discovery rate has become a
    popular approach.

13
False discovery rate control
  • Introduced by Benjamini and Hochberg (1995).
  • Want to control the number of incorrect
    rejections, V, as a proportion of the total
    number of rejections, R.
  • Stepwise p-value adjustment guarantees
  • Finner and Roters (2001) showed that

where ?0 is the proportion of true null
hypotheses.
  • FDR is expectation, so control is on average.

14
Adaptive control of the FDR
  • Control of the FDR at level a requires estimation
    of the proportion of true null hypotheses, .
  • In the microarray setting, this is the proportion
    of genes on the array which do not undergo
    differential expression.
  • Estimation of this quantity is not
    straightforward. Although Storey (2002) and
    Storey and Tibshirani (2003) have proposed
    methods for this, they can produce severely
    biased estimates (Black, 2004).
  • Newton et al. (2003) demonstrated that their
    Bayesian approach provides adaptive FDR control.

15
Case study DNA methylation in Arabidopsis
  • Two array, dye swapped design.
  • ddm1 mutant versus wild-type.
  • 4224 spots, 1882 features (genes).
  • Multiple replicate spots per gene.
  • 1523 genes with 2 spots each.

16
Logged data background and foreground (array 1)
Foreground
Background
Normalized Foreground
17
Logged data background and foreground (array 2)
Foreground
Background
Normalized Foreground
18
Per-array loess normalization (FG only)
19
Spatial check
20
Simple analysis of normalized data
  • Calculate two sample pooled variance t test
    statistic for each gene.
  • Calculate p-value for each gene either based on
    normal probabilities, or from bootstrapping.
  • Use p-values in estimation procedure.
  • Use stepwise p-value adjustment to adaptively
    control the FDR at level to achieve
    FDR .

21
Improved analysis
  • Often small per-gene variances can make small
    fold-changes statistically significant.
  • Tusher et al. (2001) proposed the SAM
    (Significance Analysis of Microarrays) method to
    overcome this problem.
  • Add small fudge factor to denominator of test
    statistic.
  • Usually a low quantile of the distribution of
    gene-specific standard errors.
  • Functions as a shrinkage estimation procedure.

22
Bayesian analysis
  • Fit model of Newton and Kendziorski (2002) to the
    data.
  • Use probabilities of differential expression to
    achieve adaptive FDR control.
  • Hierarchical model structure allows data
    sharing across genes (effectively producing
    shrinkage).

23
Summary of results
  • Numbers of differentially expressed genes, and
    estimates of p0

24
Bayes versus p-values gene order
  • Percentage agreement on rankings of first n
    genes

25
Conclusions
  • Per-gene variances resulted in a large number of
    genes reported as differentially expressed under
    adaptive FDR control.
  • Use of SAM procedure radically reduced this
    number through shrinkage estimation.
  • Removes problem of small variances making small
    fold-changes significant.
  • Bayesian model also used shrinkage estimators and
    adaptive FDR control, but detected more (and
    different) genes as differentially expressed.
  • Simulations support superiority of Bayesian
    procedure assuming model is correctly specified.

26
Current and future work
  • Likelihood-based method which can control the
    actual (rather than average) proportion of false
    discoveries for a given set of rejections.
  • e.g., For a list of differentially expressed
    genes, the probability that less than 10 of
    these are false positives is at least 95.
  • Applying the Bayesian analysis approach to the
    problem of identifying differentially regulated
    pathways.
  • Extension of the work of Mootha et al. (2003).

27
Acknowledgements
  • Rebecca Doerge (Purdue University)
  • Rob Marteinssen (Cold Spring Harbor)
  • Vincent Colot (URGV)
  • Zach Lippman (Cold Spring Harbor)
Write a Comment
User Comments (0)
About PowerShow.com