PPT – b PowerPoint presentation | free to download

About This Presentation

Title:

b

Description:

... signal for SHR much lower than for Brown Norway and SHR.4 control strains ... 3 SHR compared with 3 transgenic rats (with Cd36) ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 57

Provided by: richar509

Category:

Tags:

more less

Transcript and Presenter's Notes

Title: b

1
Bayesian Modelling for Differential Gene
Expression
Alex Lewin (Imperial College) Sylvia Richardson
(IC Epidemiology) Tim Aitman (IC Microarray
Centre) In collaboration with Anne-Mette Hein,
Natalia Bochkina (IC Epidemiology) Helen Causton
(IC Microarray Centre) Peter Green (Bristol)
2
Insulin-resistance gene Cd36
cDNA microarray hybridisation signal for SHR
much lower than for Brown Norway and SHR.4
control strains Aitman et al 1999, Nature
Genet 2176-83
3
Larger microarray experiment look for other
genes associated with Cd36
Microarray Data 3 SHR compared with 3 transgenic
rats (with Cd36) 3 wildtype (normal) mice
compared with 3 mice with Cd36 knocked out ?
12000 genes on each array Biological
Question Find genes which are expressed
differently between animals with and without Cd36.
4

Bayesian Hierarchical Model for Differential
Expression
Decision Rules
Predictive Model Checks
Simultaneous estimation of normalization and
differential expression
Gene Ontology analysis for differentially
expressed genes

5
Microarray analysis is amulti-step process
We aim to integrate all the steps in a common
statistical framework
6
Bayesian Modelling Framework

Model different sources of variability
simultaneously,
within array, between array
Uncertainty propagated from data to parameter
estimates (so not over-optimistic in
conclusions).
Share information in appropriate ways to get
robust estimates.

7
Gene Expression Data
3 wildtype mice, Fat tissue hybridised to
Affymetrix chips
Newton et al. 2001 Showed data fit well by Gamma
or Log Normal distributions Kerr et al.
2000 Linear model on log scale
8
Bayesian hierarchical model for differential
expression

Data ygsr log expression for gene g, condition
s, replicate r
?g gene effect
dg differential effect for gene g between 2
conditions
?r(g)s array effect (expression-level
dependent)
?gs2 gene variance
1st level
yg1r ?g, dg, ?g1 ? N(?g ½ dg ?r(g)1 ,
?g12),
yg2r ?g, dg, ?g2 ? N(?g ½ dg ?r(g)2 ,
?g22),
Sr ?r(g)s 0
?r(g)s function of ?g , parameters a
and b

9
Priors for gene effects

Mean effect ?g
?g Unif (much wider than data range)
Differential effect dg
dg N(0,104) fixed effects (no structure
in prior)
OR mixture
dg p0d0 p1G_ (1.5, ?1) p2G (1.5, ?2)

10
References

Fixed Effects
Kerr et al. 2000
Mixture Models
Newton et al. 2004 (non-parametric mixture)
Löenstedt and Speed 2003, Smyth 2004
(conjugate mixture prior)
Broet et al. 2002 (several levels of DE)

11
Prior for gene variances

Two extreme cases
(1) Constant variance ?gsr ? N(0, ?2)
Too stringent Poor fit
(2) Independent variances ?gsr ? N(0, ?g2)
! Variance estimates based on few replications
are highly variable
Need to share information between genes to
better estimate their variance, while allowing
some variability
Hierarchical
model

12
Prior for gene variances

2nd level
?gs2 µs, ts ? logNormal (µs, ts)
Hyper-parameters µs and ts can be influential.
Empirical Bayes
Eg. Löenstedt and Speed 2003, Smyth 2004
Fixes µs , ts
Fully Bayesian
3rd level
µs ? N( c, d) ts ? Gamma (e, f)

13
Gene specific variances are stabilised

Variances estimated using information from all G
x R measurements (12000 x 3) rather than just 3
Variances stabilised and shrunk towards average
variance

14
Prior for array effects (Normalization)

Spline Curve
?r(g)s quadratic in ?g for ars(k-1) ?g
ars(k)
with coeff (brsk(1), brsk(2) ), k 1,
breakpoints

Locations of break points not fixed Must do
sensitivity checks on break points
15
Array effect as a function of gene effect
16
Effect of normalisation on density
Before (ygsr)

After (ygsr- ?r(g)s )
17
Bayesian hierarchical model for differential
expression

1st level
ygsr ?g, dg, ?gs ? N(?g ½ dg ?r(g)s ,
?gs2),
2nd level
Fixed effect priors for ?g, dg
Array effect coefficients, Normal and Uniform
?gs2 µs, ts ? logNormal (µs, ts)
3rd level
µs ? N( c, d)
ts ? Gamma (e, f)

18
WinBUGS software for fitting Bayesian models

Declare the model

WinBUGS does the calculations
for( i in 1 ngenes ) for( j in 1 nreps)
y1i, j dnorm(x1i, j, tau1i)
x1i, j lt- alphai - 0.5deltai
beta1i, j for( i
in 1 ngenes ) tau1i lt- 1.0/sig21i
sig21i lt- exp(lsig21i) lsig21i
dnorm(mm1,tt1) mm1 dnorm( 0.0,1.0E-3) tt1
dgamma(0.01,0.01)
19
WinBUGS software for fitting Bayesian models

Whole posterior distribution
Posterior means, medians, quantiles

Bayesian Hierarchical Model for Differential
Expression
Decision Rules
Predictive Model Checks
Simultaneous estimation of normalization and
differential expression
Gene Ontology analysis for differentially
expressed genes

21
Decision Rules for Inference

So far, discussed fitting the model.
How do we decide which genes are differentially
expressed?
Parameters of interest ?g , dg , ?g
What quantity do we consider, dg , (dg /?g) , ?
How do we summarize the posterior distribution?

22
Fixed Effects Model

Inference on d
(1) dg E(dg data) posterior mean
Like point estimate of log fold change.
Decision Rule gene g is DE if dg gt dcut
(2) pg P( dg gt dcut data)
posterior probability (incorporates
uncertainty)
Decision Rule gene g is DE if pg gt pcut
This allows biologist to specify what size of
effect is interesting (not just statistical
significance)

23
Fixed Effects Model

Inference on d, ?
(1) tg E(dg data) / E(?g data)
Like t-statistic.
Decision Rule gene g is DE if tg gt tcut
(2) pg P( dg /?g gt tcut data)
Decision Rule gene g is DE if pg gt pcut
Bochkina and Richardson (in preparation)

24
Mixture Model

dg p0d0 p1G_ (1.5, ?1) p2G (1.5, ?2)

(1) dg E(dg data) posterior mean Shrunk
estimate of log fold change. Decision Rule
gene g is DE if dg gt dcut (2) Classify genes
into the mixture components. pg P(gene g not
in H0 data) Decision Rule gene g is DE if pg
gt pcut
25
Illustration of decision rule
pg P( dg gt log(2) and ?g gt 4 data) x
pg gt 0.8 ? t-statistic gt 2.78 (95 CI)
26

Bayesian Hierarchical Model for Differential
Expression
Decision Rules
Predictive Model Checks
Simultaneous estimation of normalization and
differential expression
Gene Ontology analysis for differentially
expressed genes

27
Bayesian P-values

Compare observed data to a null distribution
P-value probability of an observation from the
null distribution being more extreme than the
actual observation
If all observations come from the null
distribution, the distribution of p-values is
Uniform

28
Cross-validation p-values
Idea of cross validation is to split the data
one part for fitting the model, the rest for
validation n units of observation For each
observation yi, run model on rest of data y-i,
predict new data yinew from posterior
distribution.
Bayesian p-value pi Prob(yinew gt yi data y-i)
Distribution of p-values pi, i1,,n is
approximately Uniform if model adequately
describes the data.
29
Posterior Predictive p-values
For large n, not possible to run model n
times. Run model on all data. For each
observation yi, predict new data yinew from
posterior distribution.
Bayesian p-value pi Prob(yinew gt yi all data)
all data includes yi p-values are less extreme
than they should be p-values are
conservative (not quite Uniform).
30
Example Check priors on gene variances

Compare equal and exchangeable variance models
Compare different exchangeable priors

Want to compare data for each gene, not gene and
replicate, so use sample variance Sg2 (suppress
index s here)
Bayesian p-value Prob( Sg2 new gt Sg2 obs data)
31
WinBUGS code for posterior predictive checks
replicate relevant sampling distribution calculat
e sample variances count no. times predicted
sample variance is bigger than observed sample
variance
for( i in 1 ngenes ) for( j in 1 nreps)
y1i, j dnorm(x1i, j, tau1i)
ynew1i, j dnorm(x1i, j, tau1i)
x1i, j lt- alphai - 0.5deltai
beta1i, j s21i
lt- pow(sd(y1i, ), 2) s2new1i lt-
pow(sd(ynew1i, ), 2) pval1i lt-
step(s2new1i - s21i)
32
Posterior predictive
Graph shows structure of model
33
Less conservative than posterior
predictive (Marshall and Spiegelhalter, 2003)
Mixed predictive
34
Four models for gene variances
Equal variance model Model 1 ?2 ? log Normal
(0, 10000) Exchangeable variance models Model
2 ?g-2 ? Gamma (2, ß) Model 3 ?g-2 ? Gamma
(a, ß) Model 4 ?g2 ? log Normal (µ, t) (a, ß,
µ, t all parameters)
35
Bayesian predictive p-values
36

Bayesian Hierarchical Model for Differential
Expression
Decision Rules
Predictive Model Checks
Simultaneous estimation of normalization and
differential expression
Gene Ontology analysis for differentially
expressed genes

37
Expression level dependent normalization
Many gene expression data sets need normalization
which depends on expression level. Usually
normalization is performed in a pre-processing
step before the model for differential expression
is used. These analyses ignore the fact that the
expression level is measured with
variability. Ignoring this variability leads to
bias in the function used for normalization.
38
Simulated Data
Gene variances similar range and distribution to
mouse data Array effects cubic functions of
expression level Differential effects 900
genes dg 0 50 genes dg ? N( log(3),
0.12) 50 genes dg ? N( -log(3), 0.12)
39
Array Effects and Variability for Simulated Data
40
Two-step method (using loess)

Use loess smoothing to obtain array effects
?loessr(g)s
Subtract loess array effects from data
yloessgsr ygsr - ?loessr(g)s
Run our model on yloessgsr with no array effects

41
Decision rules for selecting differentially
expressed genes
If P( dg gt dcut data) gt pcut then gene g is
called differentially expressed. dcut chosen
according to biological hypothesis of interest
(here we use log(3) ). pcut corresponds to the
error rate (e.g. False Discovery Rate or
Mis-classification Penalty) considered acceptable.
42
Full model v. two-step method
Plot observed False Discovery Rate against pcut
(averaged over 5 simulations) Solid line for
full model Dashed line for pre-normalized method
43
Different two-step methods

yloessgsr ygsr - ?loessr(g)s
ymodelgsr ygsr - E(?r(g)s data)
Results from 2 different two-step methods are
much closer to each other than to full model
results.

Bayesian Hierarchical Model for Differential
Expression
Decision Rules
Predictive Model Checks
Simultaneous estimation of normalization and
differential expression
Gene Ontology analysis for differentially
expressed genes

45
Gene Ontology (GO)
Database of biological terms Arranged in graph
connecting related terms Directed Acyclic Graph
links indicate more specific terms 16,000 terms
from QuickGO website (EBI)
46
Gene Ontology (GO)
from QuickGO website (EBI)
47
Gene Annotations

Genes/proteins annotated to relevant GO terms
Gene may be annotated to several GO terms
GO term may have 1000s of genes annotated to it
(or none)
Gene annotated to term A ? annotated to all
ancestors of A (terms that are related and more
general)

48
GO annotations of genes associated with the
insulin-resistance gene Cd36
Compare GO annotations of genes most and least
differentially expressed Most differentially
expressed ? pg gt 0.5 (280 genes) Least
differentially expressed ? pg lt 0.2 (11171
genes)
49
GO annotations of genes associated with the
insulin-resistance gene Cd36
For each GO term, Fishers exact test on
proportion of differentially expressed genes
with annotations v. proportion of
non-differentially expressed genes with
annotations observed O A expected E
C(AB)/(CD) if no association of GO annotation
with DE FatiGO website http//fatigo.bioinfo.cnio
.es/
50
GO annotations of genes associated with the
insulin-resistance gene Cd36
O observed no. differentially expressed genes E
expected no. differentially expressed genes
51
All GO ancestors of Inflammatory response
This term was not accessed by FatiGO Relations
between GO terms were found using
QuickGO http//www.ebi.ac.uk/ego/
52
Further Work to do on GO

Account for dependencies between GO terms
Multiple testing corrections
Uncertainty in annotation
( work in preparation )

53
Summary

Bayesian hierarchical model flexible, estimates
variances robustly
Predictive model checks show exchangeable prior
good for gene variances
Useful to find GO terms over-represented in the
most differentially-expressed genes
Paper available (Lewin et al. 2005, Biometrics,
in press)
http //www.bgx.org.uk/

54
(No Transcript)
55
Decision Rules

In full Bayesian framework, introduce latent
allocation variable zg 0,1 for gene g in null,
alternative
For each gene, calculate posterior probability of
belonging to unmodified component pg Pr( zg
0 data )
Classify using cut-off on pg (Bayes rule
corresponds to 0.5)
For any given pg , can estimate FDR, FNR.

For gene-list S, est. (FDR data) Sg ? S pg
/ S
56
The Null Hypothesis
Composite Null Point Null, alternative not
modelled Point Null, alternative modelled

Write a Comment

User Comments (0)