Title: Practical Model Selection and Multi-model Inference using R
1Practical Model Selection and Multi-model
Inference using R
- Presented by
- Eric Stolen and Dan Hunt
2Foundation Theory, hypotheses, and models
3Theory
- This is the link with science, which is about
understanding how the world works
4Theory
- A set of propositions set out as an
explanation. - Theories are generalizations.
- Theories contain questions.
- Theories continually change
- (Ford, E. D. 2000. Scientific Method for
Ecological Research. Cambridge University
Press.)
5Theory
- Example 1 Wading bird foraging
- Ideal Free Distribution
- Marginal Value Theorem
- Scramble Competition
6Theory
- Example 2 Indigo Snake Habitat selection
- Animal perception
- Evolutionary Biology
- Population Demography
7Hypotheses
- Many views confusing!
- A hypothesis is a statement derived from
scientific theory that postulates something about
how the world works - A testable hypothesis is a hypothesis that can be
falsified by a contradiction between a prediction
derived from the hypothesis and data measured in
the appropriate way
8Hypotheses
- To use the Information-theoretic toolbox, we must
be able to state a hypothesis as a statistical
model (or more precisely an equation which allows
us to calculate the maximum likelihood of the
hypothesis)
9Multiple Working Hypotheses
- We operate with a set of multiple alternative
hypotheses (models) - The many advantages include safeguarding
objectivity, and allowing rigorous inference. - Chamberlain (1890)
- Strong Inference - Platt (1964)
- Karl Popper (ca. 1960) Bold Conjectures
10Deriving the model set
- This is the tough part (but also the creative
part) - much thought needed, so dont rush
- collaborate, seek outside advice, read the
literature, go to meetings - How and When hypotheses are better than What
hypotheses (strive to predict rather than
describe)
11Models Indigo Snake example
- Study of indigo snake habitat use
- Response variable home range size ln(ha)
- SEX
- Land cover 2-3 levels (lC2)
- weeks effort/exposure
- Science question Is there a seasonal difference
in habitat use between sexes?
12Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
13Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
14Models Indigo Snake example
SEX land cover weeks SEX land cover SEX
weeks llc2 weeks SEX land cover weeks SEX
land cover SEX land cover SEX land cover
weeks SEX land cover
15Models fish habitat use example
- Study of fish habitat use in salt marsh
- Response variable was density ln(fish m-2 1)
- Habitat vegetated or unvegetated
- Site 7 impoundments
- Season 4 seasons
- Science questions
- Is there evidence for a difference in density
between habitats? - Is there a seasonal difference in habitat use by
resident marsh fish?
16Models fish habitat use example
- Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason - Site Season Habitat SiteHabitat
SeasonHabitat - Site Season Habitat SiteSeason
SiteHabitat - Site Season Habitat SiteSeason
SeasonHabitat - Site Season Habitat SiteHabitat
- Site Habitat SiteHabitat
- Site Season Habitat SeasonHabitat
- Season Habitat SeasonHabitat
- Site Season Habitat SiteSeason
- Site Season SiteSeason
- Site Season Habitat
- Site Season
- Site Habitat
- Season Habitat
- Site
- Season
- Habitat
17Models fish habitat use example
- Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason - Site Season Habitat SiteHabitat
SeasonHabitat - Site Season Habitat SiteSeason
SiteHabitat - Site Season Habitat SiteSeason
SeasonHabitat - Site Season Habitat SiteHabitat
- Site Habitat SiteHabitat
- Site Season Habitat SeasonHabitat
- Season Habitat SeasonHabitat
- Site Season Habitat SiteSeason
- Site Season SiteSeason
- Site Season Habitat
- Site Season
- Site Habitat
- Season Habitat
- Site
- Season
- Habitat
18Models fish habitat use example
- Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason - Site Season Habitat SiteHabitat
SeasonHabitat - Site Season Habitat SiteSeason
SiteHabitat - Site Season Habitat SiteSeason
SeasonHabitat - Site Season Habitat SiteHabitat
- Site Habitat SiteHabitat
- Site Season Habitat SeasonHabitat
- Season Habitat SeasonHabitat
- Site Season Habitat SiteSeason
- Site Season SiteSeason
- Site Season Habitat
- Site Season
- Site Habitat
- Season Habitat
- Site
- Season
- Habitat
19The importance of a priori thinking You cant
go back home!
20Modeling
- Trade-off between precision and bias
- Trying to derive knowledge / advance learning
not fit the data - Relationship between data (quantity and quality)
and sophistication of the model
21Precision-Bias Trade-off
Bias 2
Model Complexity increasing umber of Parameters
22Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
23Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
24Kullback-Leibler Information
- Basic concept from Information theory
- The information lost when a model is used to
represent full reality - Can also think of it as the distance between a
model and full reality
25Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
26Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
27Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
28Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
The relative difference between models is constant
G3
29Akaikes Contributions
- Figured out how to estimate the relative
Kullback-Leibler distance between models in a set
of models - Figured out how to link maximum likelihood
estimation theory with expected K-L information - An (Akaikes) Information Criteria
- AIC -2 loge (Lmodeli data) 2K
30Akaikes Contributions
- Figured out how to estimate the relative K-L
distance between models in a set of models - Figured out how to link maximum likelihood
estimation theory with expected K-L information - An (Akaikes) Information Criteria
- AIC -2 loge (Lmodeli data) 2K
31Akaikes Contributions
- Figured out how to estimate the relative K-L
distance between models in a set of models - Figured out how to link maximum likelihood
estimation theory with expected K-L information - An (Akaikes) Information Criteria
- AIC -2 loge (Lmodeli data) 2K
32I-T mechanics
- AICci -2loge (Likelihood of model i given the
data) 2K (n/(n-K-1)) -
- or
- AIC 2K(K1)/(n-K-1)
- (where K the number of parameters estimated and
n the sample size)
33I-T mechanics
- AICcmin AICc for the model with the lowest AICc
value - Di AICci AICcmin
34I-T mechanics
- Model Probability (also Bayesian posterior model
probabilities) - evidence ratio of model i to model j wi / wj
35I-T mechanics
- Least Squares Regression
- AIC n loge (s2) 2K (n/(n-K-1))
- Where s2 RSS / n
- (explain offset for constant part)
36I-T mechanics
- Counting Parameters
- K number of parameters estimated
- Least Square Regression
- K number of parameters 2 (for intercept s)
37I-T mechanics
- Counting Parameters
- K number of parameters estimated
- Logistic Regression
- K number of parameters 1 (for intercept)
38I-T mechanics
- Counting Parameters
- Non-identifiable parameters
39Comparing Models
40Comparing Models
Combined model weight 0.995
41Comparing Models
Evidence Ratio 4.52
42Comparing Models
43Comparing Models
Evidence Ratio 3.03
44Comparing Models
Evidence Ratio 4.28 (.34.22.14.08) /
(.11.04.02.01)
45Generalized Linear Models
46Mathematical details
- Three parts to a GLM
- Link function
- linear equation
- error distribution
47Mathematical details
- General Linear Models linear regression and
ANOVA - Link function Identity link
- linear equation
- error distribution Normal Distribution
(Gaussian)
Y b0 b1X1 b2X2 e
48Mathematical details
- Logistic Regression
- Link function - Logit link ln(p / (1-p))
- linear equation
- error distribution Binomial Distribution
Logit(p) b0 b1X1 b2X2 e
49Mathematical details
- What types of models can be compared within a
single I-T analysis? - Data must be fixed (including response)
- Must be able to calculate maximum likelihood
- (ways to deal with quasi-likelihood)
- Models do not need to be nested
- In some cases AIC is additive
50Model Fitting Preliminaries
- Understanding the data/variables
- Avoid data dredging!
- safe data screening practices
- Detect outliers, scale issues, collinearity
- Tools in R
51Tools in R
- Tools in R
- Generalized linear models
- lm
- glm
- Packages
- Design Package
- FE Harrell. 2001. Regression Modeling Strategies
with Applications to Linear Models, Logistic
Regression, and Survival Analysis. Springer. - CAR package
- Fox, J. 2002. An R and S-plus Companion to
Applied Regression. Sage Publications.
52Tools in R
- Tools in R
- Model formula
- Ex)
- Output
- summary(model4)
- model4aic
- Model4coefficients
model4 lt- glm(helpage2 sex mom_dad suburb
brdeapp matepp density I(density2) ,
familybinomial,datachoices)
53Tools in R
- Fitting the model set
- R program does the work
- Trouble-shooting
- Export results
54Fish Example
55Model Checking
- Model Checking
- Global model must fit
- Models used for inference must meet assumptions,
- Look for numerical problems
- Tools in R
56Fish Example
57Interpretation of I-T results
58Interpretation of models for inference
- Case 1 One or a few models best models
- Examining model parameters and predictions
- Effects
- Prediction
- graphing results
- nomograms
- Presenting Results
- Anderson, D. R., W. A. Link, D. H. Johnson, and
K. P. Burnham. 2001. Suggestions for presenting
the results of data analysis. Journal of Wildlife
Management 65373-378.
59Tools
- Calculations in Excel
- AICc, Model weights, model likelihood, evidence
ratios - Sorting the models by evidence (exciting concept)
- Model weights, evidence ratios, relative variable
importance
60Fish Example
61Multi-model Inference
- Model selection uncertainty
- Model-average prediction
- Model-average parameter estimates
62Model Averaging Predictions
63Model Averaging Predictions
64Model Averaging Predictions
65Model Averaging Predictions
66Model Averaging Parameters
67Unconditional Variance Estimator
68Unconditional Variance Estimator
69Snake Example
70Multi-model Inference
71Multi-model Inference