Title: Model Selection Methods
1Model Selection Methods
2Model Selection and Model Averaging-I
- Model Selection
- Upon which model (of a set) should inference be
based, i.e. which is the best model. - Model averaging
- How to combine results from different models
taking account of their relative probability /
likelihood. - How to calculate the uncertainty associated with
model choice when making predictions.
3Model Selection and Model Averaging-II
- Let be a quantity in which we are interested
and let us assume we can estimate from K
models. Assume the value of from model k is
and that the weight assigned to model k is ,
then - Model averaging uses all (or most) of the
candidate models whereas model selection selects
the best model (the model with the highest value
of ). - How then to choose ?
4Interlude - The Example Problem
- Estimate the value of y(9) where the set of
candidate models (k0,1,2,3) is
5Model Selection(Common sense-I)
- Only consider models that
- make biological sense (i.e. are at least
conceptually acceptable) - are not clearly mis-specified (e.g. check the
residuals for runs) and - are consistent with the assumed distributional
assumptions (i.e. errors are normal with constant
variance when this is assumed). - If none of the models satisfy the above criteria,
it is time to find a new set of models!
6Model Selection(Common sense-II)
The constant model seems mis- specified, but lets
check that with a residual plot.
7Model Selection(Common sense-III)
8Model Selection(Overview)
- Most model selection criteria are based an
information criterion of the form - The various approaches differ in terms of how q
is specified. - This definition leads to the following natural
definition for the weights
9Model Selection(Classical methods-I)
- The options for q are
- AIC
- AICc
- BIC
- where p is the number of parameters and n is the
number of data points (arguably difficult to
count for models with various data types)
10Model Selection(Classical methods-II)
The best and second best models do not differ
too much Should the conclusions of the
second-best model be ignored completely ?
11Model Selection(Classical methods-III)
- Model-specific predictions for y(9) 241.2,
358.2, 347.3, and 330.44. - Model-averaged estimates
- AIC 335.6
- BIC 337.9
- AICc 340.1
- How to quantify the uncertainty associated with
these predictions? The variance of a
model-averaged quantity will be at least as large
as if the estimate was based on the best model
why?
12Model Selection(Classical methods-IV)
- Bootstrapping can be used to quantify model
selection error. This involves generating pseudo
data sets, fitting each model and applying the
model selection algorithm. - The results of bootstrapping tell us
- the frequency with which each model is selected
- the variances of the model outputs conditioned on
each model and - the variances of the model outputs accounting for
model selection error.
13Model Selection(Classical methods-V)
- How to generate the bootstrap data sets
- Residuals and model-predictions based on the
model with the largest value of wk. - Residuals and model-predictions based on the
model that explains the most of the variance in
the data. - Residuals and model-predictions based on
selecting each model based on its AIC (or BIC)
weight. - Based on resampling the underlying raw data (with
replacement).
14Model Selection(Bayesian methods-I)
- DIC (Deviance Information Criterion)
- The value for the mean deviance (twice the
negative log-likelihood) is evaluated from an
MCMC chain. - Determining the value of the estimated deviance
is less simple approaches include the Maximum
Posterior Density (MPD) estimate the deviance of
mean of the posterior and the deviance of median
of the posterior.
15Model Selection(Bayesian methods-II)
- Bayes Factor (essentially the posterior odds
ratio, assuming that the prior probability for
each model is the same), i.e. the weight in favor
of model k is - where is computed by
16Model Selection(Bayesian methods-III)
- Notes
- The formula used to compute the Bayes factor can
be numerically unstable alternative
formulations are available. - The results of using DIC may be very sensitive to
how the estimated deviance is defined. - Both methods require that MCMC chains (that
converge) are available for all models. - Computing the posterior for a model output is
relatively straightforward the posterior
distribution for model k is sampled with
probability wk.
17Model Selection(Bayesian methods-IV)
For simplicity, uniform priors are placed on all
of the parameters
18Model Selection(Bayesian methods-V)
19Model Selection(Bayesian methods-VI)
20Additional Caveat
- Care needs to taken when choosing models for
consideration. For example, if the same model is
selected more than once (or a number of slight
variants of one model are included in the set of
models), that model will be (unintentionally)
overweighted. - The bootstrapping approach outlined above may
handle this problem. - Alternative priors can be assigned to each
model.
21References
- Buckland, S.T., K.P. Burnham and N.H. Augustin.
1997. Model selection An integral part of
inference. Biometrics 53 603-618. - Burnham, K.P. and D.R. Anderson. 2002. Model
selection and multi-model inference A practical
information-theoretic approach 2nd ed. New York
Springer. - Hoeting, J.A., D. Madigan, A.E. Raftery and C.T.
Volinsky. 1999. Bayesian model averaging A
tutorial. Statistical Science 14 382-417. - Kass, R.E. and A.E. Raftery. 1995. Bayes factors.
JASA 90 773-795. - Spiegelhalter, D.J., N.G. Best, B.P. Carlin and
A. van der Linde. 2002. Bayesian measures of
model complexity and fit. Journal of the Royal
Statistical Society B 64 583-639.