PPT – b PowerPoint presentation | free to download

About This Presentation

Title:

b

Description:

b – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 32

Provided by: AHe1

Category:

Tags: kep | kip

more less

Transcript and Presenter's Notes

Title: b

1
Bayesian inference in differential expression
experiments
Sylvia Richardson Natalia Bochkina Alex
Lewin Centre for Biostatistics Imperial College,
London
Biological Atlas of Insulin Resistance
www.bgx.org.uk
BBSRC
2
Background

Investigating changes of gene expression under
different conditions is one of the key questions
in many biological experiments
Specificity of the context is
High dimensional data (ten of thousands of genes)
and few samples
Need to borrow information
Many sources of variability
Important to adopt a flexible modelling framework

Bayesian Hierarchical Modelling allows to capture
important features of the data while maintaining
generalisibility of the tools/ techniques
developed
3
Modelling differential expression
Condition 2
Condition 1
Start with given point estimates of expression

Hierarchical model of replicate variability and
array effect
Hierarchical model of replicate variability and
array effect
Posterior distribution (flat prior)
Differential expression parameter
Mixture modelling for classification
4
Outline

Background
Bayesian hierarchical models for differential
expression experiments
Decision rules based on tail posterior
probabilities
Comparison with existing approaches
FDR estimation for tail posterior probabilities
Extension of tail posterior probabilities to
analysing multiclass experiments
Illustration
Discussion and further work

5
I -- Bayesian hierarchical model for differential
expression (Lewin et al, Biometrics, 2006)

Data ygcr log gene expression gene g,
replicate r, condition c
?g gene effect
dg differential effect for gene g between
2 conditions
?r(g)c array effect modelled as a smooth
(spline) function of ?g
?gc2 gene specific variance
1st level yg1r ? N(?g ½ dg ?r(g)1 ,
?g12)
yg2r ? N(?g ½ dg ?r(g)2 , ?g22)
Sr?r(g)c 0, ?r(g)c function of ?g ,
parameters c,d
2nd level Flat priors for ?g , dg, c,d
?gc2 ? g (ac, bc)
(lognormal or inverse-gamma)

Exchangeable variances
6
Joint modelling of array effects and differential
expression

Performs normalisation simultaneously with
estimation
Gives fewer false positives than plug in
BHM set up allows to check some of the modelling
assumptions using mixed posterior predictive
checks
the need for gene specific variances
their 2nd level distribution

Found that lognormal or 2 parameter inverse gamma
distribution for the variances gave similar
model checks
7
Selecting genes that are differentially expressed

Interested in testing the null hypothesis
Two broad approaches have been used

P value type Mixture P(H0 ygcr)
H0 H1 U 0,1 close to 0 close to 1 close to 0
References Baldi and Long Smyth 2004, Moderated t stat Lonnstedt Speed 02, Newton Kendziorski, 01, 03 Lonnstedt Britton 05, Gottardo 06, .

8
Bayesian mixtures

Relies on specification of prior model for
Choice of model for the alternative (see the
poster by Alex Lewin)
Could influence the performance of the
classification
To check how the alternative fits the data is non
standard

Investigate properties of Bayesian selection
rules based on non informative prior for
9
II -- Bayesian selection rules for pairwise
comparisons

1st level (no array effect)
Hierarchical model
Extend p value approach to consider the tail
probabilities of appropriate function of
parameters

10
Posterior distributions

Define the Bayesian T statistic
The following conditional distributions hold

Posterior distributions

11
Tail posterior probabilities 1 (N. Bochkina and
SR, 2006)

Use selection rules of the form
What statistic to choose
How to define its percentiles ?
we suppose that we could have observed data
with (its expected value of
under the null)
work out the percentiles using posterior
distributions conditional on

Summarise the distribution of the Tg by a tail
area
12
Tail posterior probabilities 2Recall

Corresponding distribution function involves
numerical integration ? computationally expensive

But
Distribution function of
does not involve gene specific parameters

? The percentile is easy to calculate ? Consider
the tail probability
13
Key point F0 is gene independent (conjugate case)
14
Another Bayesian rule

A natural idea is to compare the parameter
to 0,
i.e. to consider
or its complementary or the 2-sided alternative
It turns out that this Bayesian selection rule
behaves like a p-value
Distribution of is uniform under H0
There is equivalence with frequentist testing
based on the marginal distribution of under
the null, in the spirit of the moderated t
statistic introduced by Smyth 2004

15
Link between p(dg,0) and the moderated t statistic
Moderated t statistic
16
Histograms of measure of differential
expression Simulated data
17
Tail posterior probabilities 3

Investigate the performance of selection rules
based on
In particular
what is the FDR associated with each value of
?
In the conjugate case
How does this rule compares to rules based on

Use F0
Use Storey
Use observed proportion
18
Comparison of estimated (solid line) and true
FDR (dashed line) on simulated data
p0 0.90
p0 0.70
p0 0.95
19
III-- Data Sets and Biological questions

Biological Questions
Understand the mechanisms of insulin resistance
Cell line experiments where reaction of mouse
muscle cell line to treatment by insulin or
metformin (an insulin replacement drug) is
observed after 2 and 12 hours
Questions of interest related to simple and
compound comparisons
3 replicates for each condition, Affymetrix
MOE430A chip, 22690 genes per chip
Data pre-processed by RMA and normalised using
intensity dependent LOESS normalisation

20
Volcano plots for muscle cell data Change
between insulin and control at 2 hours
p(tg , t (a)), a 0.05
2max p(dg ,0), 1- p(dg ,0) - 1
Cut-off 0. 925
Peaked around zero Varies steeply as a function
of
Less peaked around zero Allows better separation
21
Insulin versus control
p0 0.61
p0 0.98
22
Metformin versus control
Tail posterior probabilities
2 hours
12 hours
p0 0.56
p0 0.79
Estimated FDR
72 selected (FDR 0.5)
1854 selected (FDR 0.5)
23
IV Extension to the analysis of multi class data

In our case study, 3 groups (control c0, insulin
c1, metformin c2) and 3 times points t0, t1
( 2 hours), t2 (12 hours) each replicated 3
times
ANOVA like model formulation suited to the
analysis of such multifactorial experiments

Global variance parametrisation (borrowing
information)
24
Joint tail posterior probabilities

Interest is in testing a compound null
hypothesis, i.e. involving several differential
parameters
e.g. testing jointly for the effect of insulin
and metformin at 2 hours
In this case, we are interested in a specific
alternative
Note Rejecting the null hypothesis in an ANOVA
setting corresponds to a different alternative
Define joint tail posterior probabilities
where is the Bayesian T statistic for each
treatment

25
Benefits of joint posterior probabilities

Takes into account correlation of the
differential expression measures between the
conditions induced by sharing the same variance
parameter
Usual practice is to
Carry out pairwise comparisons
Select genes for each comparison using same
cut-off on the pp
Intersect lists and find genes common to both
lists
Joint pp shown to lead to fewer false positives
in this case of positive correlation (simulation
study)

26
Correlation of DE parameters and Bayesian T
statistic for insulin and metformin (2 hours)

With joint tail posterior probabilities, and a
cut-off of pcut 0.92, 280 selected as jointly
perturbed at 2 hours
Applying pairwise comparison and combining the
lists adds another 47 genes to the list

27
Discussion 1

Tail posterior probabilities (Tpp) is a generic
tool that can be used in any situations where a
large number of hypotheses related in a
hierarchical fashion are to be tested
We have derived the distribution of the Tpp under
the null and proposed a corresponding estimate of
FDR
This distribution requires numerical integration
but is gene independent (conjugate case), so only
needs to be evaluated once
Tpp is a smooth function of the amount of DE with
a gradient that spreads the genes, thus
allowing to choose genes with desired level of
uncertainty about their DE
Interesting connection between Bayesian and
frequentist inference for the differential
expression parameter

28
Discussion 2

Interesting to compare performance of Tpp with
that of mixture models
E.g Gamma mixtures (see poster by Alex Lewin)
dg p0d0 p1G (-x1.5, ?1) p2G (x1.5, ?2)
H0 H1
Dirichlet distribution for (p0, p1, p2)
Exp(1) hyper prior for ?1 and ?2
Also Normal and t mixtures have been considered
dg p0d0 (1-p0) T(?,µ,t) (µ 1, t,
? -1 Exp(1) )
dg p0d0 (1-p0) N(µ,t) (µ 1, t
Exp(1) )

29
Simulated data

3000 variables, 6 replicates, 2 conditions
yg1r ? N(?g, ?g2)
yg2r ? N(?g dg, ?g2)
?g2 0.03 LogNorm(-3.85, 0.82),
?g Norm(7, 25),
dg slightly asymmetric
5 dg dg gt 0 h( dg),
10 dg dg lt 0 h(-dg),
85 dg N(0, 0.01),

30
Comparison of mixture and tail pp

Fit 3 mixture models (Gamma, Normal, t
alternative) and flat model.
Classification mixtures P H1 data, flat tail
posterior probability.

Comparable performance, with a little edge for
the Gamma and Normal mixture
31
Thanks
BBSRC Exploiting Genomics grant Wellcome Trust
BAIR consortium Colleagues in the Biostatistics
group Marta Blangiardo, Anne Mette Hein, Maria
de Iorio Colleagues in the Biology group at
Imperial Tim Aitman, Ulrika Andersson, Dave
Carling Papers and technical reports
www.bgx.org.uk/ For the tail probability
paper www.bgx.org.uk/Natalia/Bochkina.ps

Write a Comment

User Comments (0)

About PowerShow.com

b - PowerPoint PPT Presentation

b

b – PowerPoint PPT presentation