Model checks for complex hierarchical models - PowerPoint PPT Presentation

About This Presentation

Title:

Model checks for complex hierarchical models

Description:

Many complex models used in bioinformatics. Classification/clustering can be greatly affected by ... Our approach: exploit the structure of the model to ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 32

Provided by: richar509

Category:

more less

Transcript and Presenter's Notes

Title: Model checks for complex hierarchical models

1
Model checks for complex hierarchical models

Alex Lewin and Sylvia Richardson
Imperial College
Centre for Biostatistics

2
Background and Aims

Many complex models used in bioinformatics
Classification/clustering can be greatly affected
by choice of distributions
Our approach exploit the structure of the model
to perform predictive checks
hierarchical models generally involve
exchangeability assumptions
mixture models are partially exchangeable

3
Outline of Talk

Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work model checks for a clustering and
variable selection model (Tadesse et al. 2005)

4
Hierarchical mixture model for gene expression
data
w Dirichlet(1,,1), various priors for dg, ?g
dg ? Swjhj(?j), ?g2 µ,t ? f(µ,t)
ygr dg, ?g ? N(dg, ?g2)
g gene r replicate j mixture component
5
Mixture model for gene expression data

Many mixture models have been proposed for gene
expression data
Set-up is similar to variable selection prior
point mass alternative distribution
Particular choices for alternative
Normal (Lönnstedt and Speed)
Uniform (Parmigiani et al)
many others

6
Mixture model for gene expression data
Allow for asymmetry in over-and under-expressed
genes ? 3-component mixture model dg ?
w1h1(?1) w2h2(?2) w3h3(?3)
6 knock-out and 5 wildtype mice MAS5.0 processed
data
7
Mixture model for gene expression data
Classify each gene into mixture components using
posterior probabilities
8
Choice of mixture prior affects classification
results
Mixture Prior for dg Est. w2 ( in null)
w1Unif(-?-,0) w2d(0) w3Unif(0,?) 0.96
w1Gam-(1.5,?-) w2 d(0) w3Gam(1.5,?) 0.68
w1Gam-(1.5,?-) w2N(0,e) w3Gam(1.5,?) 0.99
9
Outline of Talk

Mixture model for gene expression data
Models checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work model checks for a clustering and
variable selection model (Tadesse et al. 2005)

10
Predictive model checks

Predict new data from the model
Use posterior predictive distribution
Condition on hyperparameters (mixed predictive
? not very conservative)
Get Bayesian p-value for each gene/marker/sample
Use all p-values together (100s or 1000s) to
assess model fit
Gelman, Meng and Stern 1995 Marshall and
Spiegelhalter 2003

11
Checking distribution for gene variances
Bayesian p-value for gene g pg Prob( Smpred gt
Sgobs data )
All genes are exchangeable ? histogram of
p-values for all genes together
12
Mixed v. posterior predictive

Predictive p-values for data simulated from the
model
Histograms should be Uniform
Mixed predictive distribution much less
conservative than posterior predictive

Using global distribution
Using gene-specific distributions
13
Checking different variance models
?g2 µ,t ? Gam(µ,t), µ fixed
?g2 ?2 for all genes
Model differential expression between 3
transgenic and 3 wildtype mice
?g2 µ,t ? Gam(µ,t)
?g2 µ,t ? logNorm(µ,t)
14
Implementation (MCMC)

pg 0
for t 1,,niter
stpred ? f(µt,tt)
Stmpred ? Gam( m, m(stpred)-2 )
pg ? pg I Stmpred gt Sgobs
pg ? pg / niter

niter no. MCMC iterations m (no. replicates
1)/2
Just two extra parameters predicted at each
iteration
15
Outline of Talk

Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work model checks for a clustering and
variable selection model (Tadesse et al. 2005)

16
Checking mixture prior
dg ? w1h1(?1) w2h2(?2) w3h3(?3) OR dg
?, zg j hj(?j) j 1,,3 P(zg j)
wj Model checking focus on separate mixture
components
17
Issues for mixture model checking

dg ?, zg j hj(?j) j 1,,3
Think about MCMC iterations
Mixture component is estimated from genes
currently assigned to that component
Can only define p-value for given gene and mix.
component when the gene is assigned to that
component (i.e. condition on zg in p-value)
So check each component using only the genes
currently assigned (i.e. condition on zg in
histogram)

18
Predictive checks for mixture model
Bayesian p-value for gene g and mix. component
j pgj Prob( ybargjmpred gt ybargobs data,
zgj )

Genes assigned to the same mix. component are
exchangeable
histogram of p-values for each mix. component
separately
histogram for component j made only from genes
with large P(zg j)

19
Condition on classification to check separate
components
Predictive p-values for data simulated from the
model
All genes with P(zg j) gt 0 Only genes with
P(zg j) gt 0.5
Effectively we condition on a best classification
20
Checking different mixture distributions
w1Unif(-?-,0) w2d(0) w3Unif(0,?)

Outer mix. components skewed too much away from
zero
Null component too narrow

21
Checking different mixture distributions
w1Gam-(1.5,?-) w2 d(0) w3Gam(1.5,?)

Outer components skewed opposite
Null still too narrow?

22
Checking different mixture distributions
w1Gam-(1.5,?-) w2N(0,e) w3Gam(1.5,?)

Better fit for all components

23
Implementation

pgj 0
for t 1,,niter
djtpred hjt(?jt) j 1,,3
ybargtmpred ? N( djtpred , ?g2/nrep ) for
j zgt
pgj ? pgj I ybargtmpred gt ybargobs for j
zgt
pgj ? pgj / niter(zgj)

Need ngenes extra parameters at each iteration
24
Summary of model checking procedure

Find part of model where individuals are assumed
to be exchangeable (so information is shared)
Choose test statistic T (eg. sample mean or
variance)
Predict Tpred from distribution for exchangeable
individuals (whole posterior for Tpred)
Compare observed Ti for each individual i to
distribution of Tpred
For checking mixture components, condition on the
best classification

25
Outline of Talk

Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work model checks for a clustering and
variable selection model (Tadesse et al. 2005)

26
Clustering and variable selection (Tadesse et al.
2005)

yi vector of gene expression for each sample i
1,,n
Multi-variate mixture model for clustering
samples
yi zi j ? MVN(?j, ?j) j 1,,J
P(zi j) wj
No. of mix. components (J) is estimated in the
model
Aim to select genes which are informative for
clustering the samples

27
Clustering and variable selection (Tadesse et al.
2005)
? vector of indices of variables not used to
cluster samples
Likelihood conditional on allocation to mixture
? vector of indices of selected variables
Conjugate priors on multivariate means and
covariance matrices P(?g 1) f
i sample g gene j mix. component
28
Clustering and variable selection (Tadesse et al.
2005)
Model checking want to check the distribution
for each mixture component separately
(conditional on J) In addition, need to condition
on a given variable selection Clearly impossible
computationally
i sample g gene j mix. component
29
Computing predictive p-values

Run model with no prediction
Find the best configuration
set of selected variables (?)
no. mixture components J
allocation of samples to mixture components zi
Re-run model, with (?), J and zi fixed,
calculated predictive p-values

pij Prob( Tjpred gt Tiobs data, zij, J, (?)
) where T y2 (for example)
30
Conclusions

Choice of model distributions can greatly
influence results of clustering and
classification
For models where information is shared across
individuals, predictive checks can be used as an
alternative to cross-validation
Should be possible to do this even for quite
complex models (if you can fit the model, you can
check it)

31
Acknowledgements
Collaborators on BBSRC Exploiting Genomics
Grant Natalia Bochkina, Clare Marshall Peter
Green Meeting on model checking in
Cambridge David Spiegelhalter Shaun Seaman BBSRC
Exploiting Genomics Grant Paper and software at
http//www.bgx.org.uk/

Write a Comment

User Comments (0)