b - PowerPoint PPT Presentation

About This Presentation
Title:

b

Description:

... signal for SHR much lower than for Brown Norway and SHR.4 control strains ... 3 SHR compared with 3 transgenic rats (with Cd36) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 57
Provided by: richar509
Category:
Tags:

less

Transcript and Presenter's Notes

Title: b


1
Bayesian Modelling for Differential Gene
Expression
Alex Lewin (Imperial College) Sylvia Richardson
(IC Epidemiology) Tim Aitman (IC Microarray
Centre) In collaboration with Anne-Mette Hein,
Natalia Bochkina (IC Epidemiology) Helen Causton
(IC Microarray Centre) Peter Green (Bristol)
2
Insulin-resistance gene Cd36
cDNA microarray hybridisation signal for SHR
much lower than for Brown Norway and SHR.4
control strains Aitman et al 1999, Nature
Genet 2176-83
3
Larger microarray experiment look for other
genes associated with Cd36
Microarray Data 3 SHR compared with 3 transgenic
rats (with Cd36) 3 wildtype (normal) mice
compared with 3 mice with Cd36 knocked out ?
12000 genes on each array Biological
Question Find genes which are expressed
differently between animals with and without Cd36.
4
  • Bayesian Hierarchical Model for Differential
    Expression
  • Decision Rules
  • Predictive Model Checks
  • Simultaneous estimation of normalization and
    differential expression
  • Gene Ontology analysis for differentially
    expressed genes

5
Microarray analysis is amulti-step process
We aim to integrate all the steps in a common
statistical framework
6
Bayesian Modelling Framework
  • Model different sources of variability
    simultaneously,
  • within array, between array
  • Uncertainty propagated from data to parameter
    estimates (so not over-optimistic in
    conclusions).
  • Share information in appropriate ways to get
    robust estimates.

7
Gene Expression Data
3 wildtype mice, Fat tissue hybridised to
Affymetrix chips
Newton et al. 2001 Showed data fit well by Gamma
or Log Normal distributions Kerr et al.
2000 Linear model on log scale
8
Bayesian hierarchical model for differential
expression
  • Data ygsr log expression for gene g, condition
    s, replicate r
  • ?g gene effect
  • dg differential effect for gene g between 2
    conditions
  • ?r(g)s array effect (expression-level
    dependent)
  • ?gs2 gene variance
  • 1st level
  • yg1r ?g, dg, ?g1 ? N(?g ½ dg ?r(g)1 ,
    ?g12),
  • yg2r ?g, dg, ?g2 ? N(?g ½ dg ?r(g)2 ,
    ?g22),
  • Sr ?r(g)s 0
  • ?r(g)s function of ?g , parameters a
    and b

9
Priors for gene effects
  • Mean effect ?g
  • ?g Unif (much wider than data range)
  • Differential effect dg
  • dg N(0,104) fixed effects (no structure
    in prior)
  • OR mixture
  • dg p0d0 p1G_ (1.5, ?1) p2G (1.5, ?2)

10
References
  • Fixed Effects
  • Kerr et al. 2000
  • Mixture Models
  • Newton et al. 2004 (non-parametric mixture)
  • Löenstedt and Speed 2003, Smyth 2004
  • (conjugate mixture prior)
  • Broet et al. 2002 (several levels of DE)

11
Prior for gene variances
  • Two extreme cases
  • (1) Constant variance ?gsr ? N(0, ?2)
  • Too stringent Poor fit
  • (2) Independent variances ?gsr ? N(0, ?g2)
  • ! Variance estimates based on few replications
    are highly variable
  • Need to share information between genes to
  • better estimate their variance, while allowing
  • some variability
  • Hierarchical
    model

12
Prior for gene variances
  • 2nd level
  • ?gs2 µs, ts ? logNormal (µs, ts)
  • Hyper-parameters µs and ts can be influential.
  • Empirical Bayes
  • Eg. Löenstedt and Speed 2003, Smyth 2004
  • Fixes µs , ts
  • Fully Bayesian
  • 3rd level
  • µs ? N( c, d) ts ? Gamma (e, f)

13
Gene specific variances are stabilised
  • Variances estimated using information from all G
    x R measurements (12000 x 3) rather than just 3
  • Variances stabilised and shrunk towards average
    variance

14
Prior for array effects (Normalization)
  • Spline Curve
  • ?r(g)s quadratic in ?g for ars(k-1) ?g
    ars(k)
  • with coeff (brsk(1), brsk(2) ), k 1,
    breakpoints

Locations of break points not fixed Must do
sensitivity checks on break points
15
Array effect as a function of gene effect
16
Effect of normalisation on density
Before (ygsr)

After (ygsr- ?r(g)s )
17
Bayesian hierarchical model for differential
expression
  • 1st level
  • ygsr ?g, dg, ?gs ? N(?g ½ dg ?r(g)s ,
    ?gs2),
  • 2nd level
  • Fixed effect priors for ?g, dg
  • Array effect coefficients, Normal and Uniform
  • ?gs2 µs, ts ? logNormal (µs, ts)
  • 3rd level
  • µs ? N( c, d)
  • ts ? Gamma (e, f)

18
WinBUGS software for fitting Bayesian models
  • Declare the model

WinBUGS does the calculations
for( i in 1 ngenes ) for( j in 1 nreps)
y1i, j dnorm(x1i, j, tau1i)
x1i, j lt- alphai - 0.5deltai
beta1i, j for( i
in 1 ngenes ) tau1i lt- 1.0/sig21i
sig21i lt- exp(lsig21i) lsig21i
dnorm(mm1,tt1) mm1 dnorm( 0.0,1.0E-3) tt1
dgamma(0.01,0.01)
19
WinBUGS software for fitting Bayesian models
  • Whole posterior distribution
  • Posterior means, medians, quantiles

20
  • Bayesian Hierarchical Model for Differential
    Expression
  • Decision Rules
  • Predictive Model Checks
  • Simultaneous estimation of normalization and
    differential expression
  • Gene Ontology analysis for differentially
    expressed genes

21
Decision Rules for Inference
  • So far, discussed fitting the model.
  • How do we decide which genes are differentially
    expressed?
  • Parameters of interest ?g , dg , ?g
  • What quantity do we consider, dg , (dg /?g) , ?
  • How do we summarize the posterior distribution?

22
Fixed Effects Model
  • Inference on d
  • (1) dg E(dg data) posterior mean
  • Like point estimate of log fold change.
  • Decision Rule gene g is DE if dg gt dcut
  • (2) pg P( dg gt dcut data)
  • posterior probability (incorporates
    uncertainty)
  • Decision Rule gene g is DE if pg gt pcut
  • This allows biologist to specify what size of
    effect is interesting (not just statistical
    significance)

23
Fixed Effects Model
  • Inference on d, ?
  • (1) tg E(dg data) / E(?g data)
  • Like t-statistic.
  • Decision Rule gene g is DE if tg gt tcut
  • (2) pg P( dg /?g gt tcut data)
  • Decision Rule gene g is DE if pg gt pcut
  • Bochkina and Richardson (in preparation)

24
Mixture Model
  • dg p0d0 p1G_ (1.5, ?1) p2G (1.5, ?2)

(1) dg E(dg data) posterior mean Shrunk
estimate of log fold change. Decision Rule
gene g is DE if dg gt dcut (2) Classify genes
into the mixture components. pg P(gene g not
in H0 data) Decision Rule gene g is DE if pg
gt pcut
25
Illustration of decision rule
pg P( dg gt log(2) and ?g gt 4 data) x
pg gt 0.8 ? t-statistic gt 2.78 (95 CI)
26
  • Bayesian Hierarchical Model for Differential
    Expression
  • Decision Rules
  • Predictive Model Checks
  • Simultaneous estimation of normalization and
    differential expression
  • Gene Ontology analysis for differentially
    expressed genes

27
Bayesian P-values
  • Compare observed data to a null distribution
  • P-value probability of an observation from the
    null distribution being more extreme than the
    actual observation
  • If all observations come from the null
    distribution, the distribution of p-values is
    Uniform

28
Cross-validation p-values
Idea of cross validation is to split the data
one part for fitting the model, the rest for
validation n units of observation For each
observation yi, run model on rest of data y-i,
predict new data yinew from posterior
distribution.
Bayesian p-value pi Prob(yinew gt yi data y-i)
Distribution of p-values pi, i1,,n is
approximately Uniform if model adequately
describes the data.
29
Posterior Predictive p-values
For large n, not possible to run model n
times. Run model on all data. For each
observation yi, predict new data yinew from
posterior distribution.
Bayesian p-value pi Prob(yinew gt yi all data)
all data includes yi p-values are less extreme
than they should be p-values are
conservative (not quite Uniform).
30
Example Check priors on gene variances
  1. Compare equal and exchangeable variance models
  2. Compare different exchangeable priors

Want to compare data for each gene, not gene and
replicate, so use sample variance Sg2 (suppress
index s here)
Bayesian p-value Prob( Sg2 new gt Sg2 obs data)
31
WinBUGS code for posterior predictive checks
replicate relevant sampling distribution calculat
e sample variances count no. times predicted
sample variance is bigger than observed sample
variance
for( i in 1 ngenes ) for( j in 1 nreps)
y1i, j dnorm(x1i, j, tau1i)
ynew1i, j dnorm(x1i, j, tau1i)
x1i, j lt- alphai - 0.5deltai
beta1i, j s21i
lt- pow(sd(y1i, ), 2) s2new1i lt-
pow(sd(ynew1i, ), 2) pval1i lt-
step(s2new1i - s21i)
32
Posterior predictive
Graph shows structure of model
33
Less conservative than posterior
predictive (Marshall and Spiegelhalter, 2003)
Mixed predictive
34
Four models for gene variances
Equal variance model Model 1 ?2 ? log Normal
(0, 10000) Exchangeable variance models Model
2 ?g-2 ? Gamma (2, ß) Model 3 ?g-2 ? Gamma
(a, ß) Model 4 ?g2 ? log Normal (µ, t) (a, ß,
µ, t all parameters)
35
Bayesian predictive p-values
36
  • Bayesian Hierarchical Model for Differential
    Expression
  • Decision Rules
  • Predictive Model Checks
  • Simultaneous estimation of normalization and
    differential expression
  • Gene Ontology analysis for differentially
    expressed genes

37
Expression level dependent normalization
Many gene expression data sets need normalization
which depends on expression level. Usually
normalization is performed in a pre-processing
step before the model for differential expression
is used. These analyses ignore the fact that the
expression level is measured with
variability. Ignoring this variability leads to
bias in the function used for normalization.
38
Simulated Data
Gene variances similar range and distribution to
mouse data Array effects cubic functions of
expression level Differential effects 900
genes dg 0 50 genes dg ? N( log(3),
0.12) 50 genes dg ? N( -log(3), 0.12)
39
Array Effects and Variability for Simulated Data
40
Two-step method (using loess)
  1. Use loess smoothing to obtain array effects
    ?loessr(g)s
  2. Subtract loess array effects from data
    yloessgsr ygsr - ?loessr(g)s
  3. Run our model on yloessgsr with no array effects

41
Decision rules for selecting differentially
expressed genes
If P( dg gt dcut data) gt pcut then gene g is
called differentially expressed. dcut chosen
according to biological hypothesis of interest
(here we use log(3) ). pcut corresponds to the
error rate (e.g. False Discovery Rate or
Mis-classification Penalty) considered acceptable.
42
Full model v. two-step method
Plot observed False Discovery Rate against pcut
(averaged over 5 simulations) Solid line for
full model Dashed line for pre-normalized method
43
Different two-step methods
  • yloessgsr ygsr - ?loessr(g)s
  • ymodelgsr ygsr - E(?r(g)s data)
  • Results from 2 different two-step methods are
    much closer to each other than to full model
    results.

44
  • Bayesian Hierarchical Model for Differential
    Expression
  • Decision Rules
  • Predictive Model Checks
  • Simultaneous estimation of normalization and
    differential expression
  • Gene Ontology analysis for differentially
    expressed genes

45
Gene Ontology (GO)
Database of biological terms Arranged in graph
connecting related terms Directed Acyclic Graph
links indicate more specific terms 16,000 terms
from QuickGO website (EBI)
46
Gene Ontology (GO)
from QuickGO website (EBI)
47
Gene Annotations
  • Genes/proteins annotated to relevant GO terms
  • Gene may be annotated to several GO terms
  • GO term may have 1000s of genes annotated to it
    (or none)
  • Gene annotated to term A ? annotated to all
    ancestors of A (terms that are related and more
    general)

48
GO annotations of genes associated with the
insulin-resistance gene Cd36
Compare GO annotations of genes most and least
differentially expressed Most differentially
expressed ? pg gt 0.5 (280 genes) Least
differentially expressed ? pg lt 0.2 (11171
genes)
49
GO annotations of genes associated with the
insulin-resistance gene Cd36
For each GO term, Fishers exact test on
proportion of differentially expressed genes
with annotations v. proportion of
non-differentially expressed genes with
annotations observed O A expected E
C(AB)/(CD) if no association of GO annotation
with DE FatiGO website http//fatigo.bioinfo.cnio
.es/
50
GO annotations of genes associated with the
insulin-resistance gene Cd36
O observed no. differentially expressed genes E
expected no. differentially expressed genes
51
All GO ancestors of Inflammatory response
This term was not accessed by FatiGO Relations
between GO terms were found using
QuickGO http//www.ebi.ac.uk/ego/
52
Further Work to do on GO
  • Account for dependencies between GO terms
  • Multiple testing corrections
  • Uncertainty in annotation
  • ( work in preparation )

53
Summary
  • Bayesian hierarchical model flexible, estimates
    variances robustly
  • Predictive model checks show exchangeable prior
    good for gene variances
  • Useful to find GO terms over-represented in the
    most differentially-expressed genes
  • Paper available (Lewin et al. 2005, Biometrics,
    in press)
  • http //www.bgx.org.uk/

54
(No Transcript)
55
Decision Rules
  • In full Bayesian framework, introduce latent
    allocation variable zg 0,1 for gene g in null,
    alternative
  • For each gene, calculate posterior probability of
    belonging to unmodified component pg Pr( zg
    0 data )
  • Classify using cut-off on pg (Bayes rule
    corresponds to 0.5)
  • For any given pg , can estimate FDR, FNR.

For gene-list S, est. (FDR data) Sg ? S pg
/ S
56
The Null Hypothesis
Composite Null Point Null, alternative not
modelled Point Null, alternative modelled
Write a Comment
User Comments (0)
About PowerShow.com