Title: Replicated Microarray Data Proportion of Differentially Expressed Genes
1Replicated Microarray DataProportion of
Differentially Expressed Genes
- Ingrid Lönnstedt
- Department of Mathematics, Uppsala University
- FMS Bayesian Inference in Biostatistics, April
4 2003, - Tom Britton, SU (Terry Speed, UCB)
2Microarrays-genetic fingerprints comparing two
samples of cells
Green intensity Gene actitity before
treatment Red intensity Gene activity after
treatment
One spot one gene
one logratio Mlog R/G
3Replicated microarray data
Mijlog Rij/Gij i1...N e.g. N10000
genes j1...n e. g. n4 replicates
Thousands of genes Few replicates Large
variance Which genes are differentially
expressed between the cell samples? i.e. Which
genes have EMij different from 0?
4Common statistics
5Bayesian framework
Each gene i has a (stochastic) mean mi and a
(stochastic) variance si2 Mij mi, si2
N( mi , si2 )
Most genes have mi 0. Small proportion p of
genes have mi si2 N(0, csi2 ) All genes have
si2 G-1(a,b)
6Bayesian framework
Empirical Bayes Model
Mij mi, si2 N( mi , si2 ) si2
G-1(a,b) mi 0 prob. 1-p mi si2 N(0,
csi2 ) prob. p
7Lodsratio for differential expression
p fixed
8Estimate proportion p of differentially
expressed genes
Method of moments (works fine for simulated data)
SRBI data p 0.860
The model does not fit our real data!
9What is wrong with the model?
- The connection c between variances
Mij mi, si2 N( mi , si2 ) si2
G-1(a,b) mi 0 prob. 1-p mi si2 N(0,
csi2 ) prob. p
10Hierarchical Bayes model
Mij mi, si2 N( mi , si2 ) si2
G-1(a,b) mi 0 prob. 1-p mi N(0, v )
prob. p
- No explicit posterior distributions
- Use MCMC and Gibbs sampler
- Sample each parameter realization from
- P (qk q-k , Data )
11Marginal posterior for mean mP ( mi p , n ,
Di )
- Two cases
- mi0 prob. 1-pi
- mi N(, ) prob. pi
- Posterior probability pi P(mi 0 p , n ,
Di )Probabilities only known up to
proportionality
12Estimate proportion p of differentially
expressed genes
Simulated data 8 replicates
SRBI data prior Ep0.5 p 0.9950 prior
Ep0.01 p0.8786
13What is wrong with the model?
- The connection c between variances
- Thick tails in data
14Hierarchical Bayes with t-distr.
- No explicit posterior distributions
- No explicit marginal posteriors
- P (qk q-k , Data )
- Use MCMC and Metropolis-Hastings single component
algorithm -
-
Sample each parameter realization from proposal
distribution qk q (qk q-k , Data
) Accept new realization with probability
15Estimate proportion p of differentially
expressed genes
2 years
Sample dataset from t-model (true
p0.01) Estimate from N-model
p ?
16What is wrong with the model?
- The connection c between variances
- Thick tails in data
- If mi N(0, v ) most expressed genes have mean
around 0
17Hierarchical Bayes with G-distr.
P ( mi l,g)
mi
0
Sample dataset from this model (true
p0.01) Estimate from N-model
p 0.01227
18Summary
- Microarray data many genes, few replicates.
- Which genes are differentially expressed? (mi 0
) - Empirical Bayes method, p0.86
- Normal Hierarchical Bayes model
- Student t HB model ?
- G HB model ?