Practical Model Selection and Multi-model Inference using R - PowerPoint PPT Presentation

1 / 71

About This Presentation

Title:

Practical Model Selection and Multi-model Inference using R

Description:

Chamberlain (1890) Strong Inference - Platt (1964) Karl Popper ... + 2K Figured out how to estimate the relative K-L distance between models in a set of ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 72

Provided by: Stol75

Category:

more less

Transcript and Presenter's Notes

Title: Practical Model Selection and Multi-model Inference using R

1
Practical Model Selection and Multi-model
Inference using R

Presented by
Eric Stolen and Dan Hunt

2
Foundation Theory, hypotheses, and models
3
Theory

This is the link with science, which is about
understanding how the world works

4
Theory

A set of propositions set out as an
explanation.
Theories are generalizations.
Theories contain questions.
Theories continually change
(Ford, E. D. 2000. Scientific Method for
Ecological Research. Cambridge University
Press.)

5
Theory

Example 1 Wading bird foraging
Ideal Free Distribution
Marginal Value Theorem
Scramble Competition

6
Theory

Example 2 Indigo Snake Habitat selection
Animal perception
Evolutionary Biology
Population Demography

7
Hypotheses

Many views confusing!
A hypothesis is a statement derived from
scientific theory that postulates something about
how the world works
A testable hypothesis is a hypothesis that can be
falsified by a contradiction between a prediction
derived from the hypothesis and data measured in
the appropriate way

8
Hypotheses

To use the Information-theoretic toolbox, we must
be able to state a hypothesis as a statistical
model (or more precisely an equation which allows
us to calculate the maximum likelihood of the
hypothesis)

9
Multiple Working Hypotheses

We operate with a set of multiple alternative
hypotheses (models)
The many advantages include safeguarding
objectivity, and allowing rigorous inference.
Chamberlain (1890)
Strong Inference - Platt (1964)
Karl Popper (ca. 1960) Bold Conjectures

10
Deriving the model set

This is the tough part (but also the creative
part)
much thought needed, so dont rush
collaborate, seek outside advice, read the
literature, go to meetings
How and When hypotheses are better than What
hypotheses (strive to predict rather than
describe)

11
Models Indigo Snake example

Study of indigo snake habitat use
Response variable home range size ln(ha)
SEX
Land cover 2-3 levels (lC2)
weeks effort/exposure
Science question Is there a seasonal difference
in habitat use between sexes?

12
Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
13
Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
14
Models Indigo Snake example
SEX land cover weeks SEX land cover SEX
weeks llc2 weeks SEX land cover weeks SEX
land cover SEX land cover SEX land cover
weeks SEX land cover
15
Models fish habitat use example

Study of fish habitat use in salt marsh
Response variable was density ln(fish m-2 1)
Habitat vegetated or unvegetated
Site 7 impoundments
Season 4 seasons
Science questions
Is there evidence for a difference in density
between habitats?
Is there a seasonal difference in habitat use by
resident marsh fish?

16
Models fish habitat use example

Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason
Site Season Habitat SiteHabitat
SeasonHabitat
Site Season Habitat SiteSeason
SiteHabitat
Site Season Habitat SiteSeason
SeasonHabitat
Site Season Habitat SiteHabitat
Site Habitat SiteHabitat
Site Season Habitat SeasonHabitat
Season Habitat SeasonHabitat
Site Season Habitat SiteSeason
Site Season SiteSeason
Site Season Habitat
Site Season
Site Habitat
Season Habitat
Site
Season
Habitat

17
Models fish habitat use example

Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason
Site Season Habitat SiteHabitat
SeasonHabitat
Site Season Habitat SiteSeason
SiteHabitat
Site Season Habitat SiteSeason
SeasonHabitat
Site Season Habitat SiteHabitat
Site Habitat SiteHabitat
Site Season Habitat SeasonHabitat
Season Habitat SeasonHabitat
Site Season Habitat SiteSeason
Site Season SiteSeason
Site Season Habitat
Site Season
Site Habitat
Season Habitat
Site
Season
Habitat

18
Models fish habitat use example

Site Season Habitat SiteHabitat
SeasonHabitat SiteSeason
Site Season Habitat SiteHabitat
SeasonHabitat
Site Season Habitat SiteSeason
SiteHabitat
Site Season Habitat SiteSeason
SeasonHabitat
Site Season Habitat SiteHabitat
Site Habitat SiteHabitat
Site Season Habitat SeasonHabitat
Season Habitat SeasonHabitat
Site Season Habitat SiteSeason
Site Season SiteSeason
Site Season Habitat
Site Season
Site Habitat
Season Habitat
Site
Season
Habitat

19
The importance of a priori thinking You cant
go back home!
20
Modeling

Trade-off between precision and bias
Trying to derive knowledge / advance learning
not fit the data
Relationship between data (quantity and quality)
and sophistication of the model

21
Precision-Bias Trade-off
Bias 2
Model Complexity increasing umber of Parameters
22
Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
23
Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
24
Kullback-Leibler Information

Basic concept from Information theory
The information lost when a model is used to
represent full reality
Can also think of it as the distance between a
model and full reality

25
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
26
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
27
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
28
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
The relative difference between models is constant
G3
29
Akaikes Contributions

Figured out how to estimate the relative
Kullback-Leibler distance between models in a set
of models
Figured out how to link maximum likelihood
estimation theory with expected K-L information
An (Akaikes) Information Criteria
AIC -2 loge (Lmodeli data) 2K

30
Akaikes Contributions

Figured out how to estimate the relative K-L
distance between models in a set of models
Figured out how to link maximum likelihood
estimation theory with expected K-L information
An (Akaikes) Information Criteria
AIC -2 loge (Lmodeli data) 2K

31
Akaikes Contributions

Figured out how to estimate the relative K-L
distance between models in a set of models
Figured out how to link maximum likelihood
estimation theory with expected K-L information
An (Akaikes) Information Criteria
AIC -2 loge (Lmodeli data) 2K

32
I-T mechanics

AICci -2loge (Likelihood of model i given the
data) 2K (n/(n-K-1))
or
AIC 2K(K1)/(n-K-1)
(where K the number of parameters estimated and
n the sample size)

33
I-T mechanics

AICcmin AICc for the model with the lowest AICc
value
Di AICci AICcmin

34
I-T mechanics

Model Probability (also Bayesian posterior model
probabilities)
evidence ratio of model i to model j wi / wj

35
I-T mechanics

Least Squares Regression
AIC n loge (s2) 2K (n/(n-K-1))
Where s2 RSS / n
(explain offset for constant part)

36
I-T mechanics

Counting Parameters
K number of parameters estimated
Least Square Regression
K number of parameters 2 (for intercept s)

37
I-T mechanics

Counting Parameters
K number of parameters estimated
Logistic Regression
K number of parameters 1 (for intercept)

38
I-T mechanics

Counting Parameters
Non-identifiable parameters

39
Comparing Models
40
Comparing Models
Combined model weight 0.995
41
Comparing Models
Evidence Ratio 4.52
42
Comparing Models
43
Comparing Models
Evidence Ratio 3.03
44
Comparing Models
Evidence Ratio 4.28 (.34.22.14.08) /
(.11.04.02.01)
45
Generalized Linear Models
46
Mathematical details

Three parts to a GLM
Link function
linear equation
error distribution

47
Mathematical details

General Linear Models linear regression and
ANOVA
Link function Identity link
linear equation
error distribution Normal Distribution
(Gaussian)

Y b0 b1X1 b2X2 e
48
Mathematical details

Logistic Regression
Link function - Logit link ln(p / (1-p))
linear equation
error distribution Binomial Distribution

Logit(p) b0 b1X1 b2X2 e
49
Mathematical details

What types of models can be compared within a
single I-T analysis?
Data must be fixed (including response)
Must be able to calculate maximum likelihood
(ways to deal with quasi-likelihood)
Models do not need to be nested
In some cases AIC is additive

50
Model Fitting Preliminaries

Understanding the data/variables
Avoid data dredging!
safe data screening practices
Detect outliers, scale issues, collinearity
Tools in R

51
Tools in R

Tools in R
Generalized linear models
lm
glm
Packages
Design Package
FE Harrell. 2001. Regression Modeling Strategies
with Applications to Linear Models, Logistic
Regression, and Survival Analysis. Springer.
CAR package
Fox, J. 2002. An R and S-plus Companion to
Applied Regression. Sage Publications.

52
Tools in R

Tools in R
Model formula
Ex)
Output
summary(model4)
model4aic
Model4coefficients

model4 lt- glm(helpage2 sex mom_dad suburb
brdeapp matepp density I(density2) ,
familybinomial,datachoices)
53
Tools in R

Fitting the model set
R program does the work
Trouble-shooting
Export results

54
Fish Example
55
Model Checking

Model Checking
Global model must fit
Models used for inference must meet assumptions,
Look for numerical problems
Tools in R

56
Fish Example
57
Interpretation of I-T results
58
Interpretation of models for inference

Case 1 One or a few models best models
Examining model parameters and predictions
Effects
Prediction
graphing results
nomograms
Presenting Results
Anderson, D. R., W. A. Link, D. H. Johnson, and
K. P. Burnham. 2001. Suggestions for presenting
the results of data analysis. Journal of Wildlife
Management 65373-378.

59
Tools

Calculations in Excel
AICc, Model weights, model likelihood, evidence
ratios
Sorting the models by evidence (exciting concept)
Model weights, evidence ratios, relative variable
importance

60
Fish Example
61
Multi-model Inference

Model selection uncertainty
Model-average prediction
Model-average parameter estimates

62
Model Averaging Predictions
63
Model Averaging Predictions
64
Model Averaging Predictions
65
Model Averaging Predictions
66
Model Averaging Parameters
67
Unconditional Variance Estimator
68
Unconditional Variance Estimator
69
Snake Example
70
Multi-model Inference
71
Multi-model Inference

Write a Comment

User Comments (0)