Practical Model Selection and Multi-model Inference using R - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Practical Model Selection and Multi-model Inference using R

Description:

Chamberlain (1890) Strong Inference - Platt (1964) Karl Popper ... + 2K Figured out how to estimate the relative K-L distance between models in a set of ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 72
Provided by: Stol75
Category:

less

Transcript and Presenter's Notes

Title: Practical Model Selection and Multi-model Inference using R


1
Practical Model Selection and Multi-model
Inference using R
  • Presented by
  • Eric Stolen and Dan Hunt

2
Foundation Theory, hypotheses, and models
3
Theory
  • This is the link with science, which is about
    understanding how the world works

4
Theory
  • A set of propositions set out as an
    explanation.
  • Theories are generalizations.
  • Theories contain questions.
  • Theories continually change
  • (Ford, E. D. 2000. Scientific Method for
    Ecological Research. Cambridge University
    Press.)

5
Theory
  • Example 1 Wading bird foraging
  • Ideal Free Distribution
  • Marginal Value Theorem
  • Scramble Competition

6
Theory
  • Example 2 Indigo Snake Habitat selection
  • Animal perception
  • Evolutionary Biology
  • Population Demography

7
Hypotheses
  • Many views confusing!
  • A hypothesis is a statement derived from
    scientific theory that postulates something about
    how the world works
  • A testable hypothesis is a hypothesis that can be
    falsified by a contradiction between a prediction
    derived from the hypothesis and data measured in
    the appropriate way

8
Hypotheses
  • To use the Information-theoretic toolbox, we must
    be able to state a hypothesis as a statistical
    model (or more precisely an equation which allows
    us to calculate the maximum likelihood of the
    hypothesis)

9
Multiple Working Hypotheses
  • We operate with a set of multiple alternative
    hypotheses (models)
  • The many advantages include safeguarding
    objectivity, and allowing rigorous inference.
  • Chamberlain (1890)
  • Strong Inference - Platt (1964)
  • Karl Popper (ca. 1960) Bold Conjectures

10
Deriving the model set
  • This is the tough part (but also the creative
    part)
  • much thought needed, so dont rush
  • collaborate, seek outside advice, read the
    literature, go to meetings
  • How and When hypotheses are better than What
    hypotheses (strive to predict rather than
    describe)

11
Models Indigo Snake example
  • Study of indigo snake habitat use
  • Response variable home range size ln(ha)
  • SEX
  • Land cover 2-3 levels (lC2)
  • weeks effort/exposure
  • Science question Is there a seasonal difference
    in habitat use between sexes?

12
Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
13
Models Indigo Snake example
SEX land cover type (lc2) weeks SEX lc2 SEX
weeks llc2 weeks SEX lc2 weeks SEX lc2
SEX lc2 SEX lc2 weeks SEX lc2
14
Models Indigo Snake example
SEX land cover weeks SEX land cover SEX
weeks llc2 weeks SEX land cover weeks SEX
land cover SEX land cover SEX land cover
weeks SEX land cover
15
Models fish habitat use example
  • Study of fish habitat use in salt marsh
  • Response variable was density ln(fish m-2 1)
  • Habitat vegetated or unvegetated
  • Site 7 impoundments
  • Season 4 seasons
  • Science questions
  • Is there evidence for a difference in density
    between habitats?
  • Is there a seasonal difference in habitat use by
    resident marsh fish?

16
Models fish habitat use example
  • Site Season Habitat SiteHabitat
    SeasonHabitat SiteSeason
  • Site Season Habitat SiteHabitat
    SeasonHabitat
  • Site Season Habitat SiteSeason
    SiteHabitat
  • Site Season Habitat SiteSeason
    SeasonHabitat
  • Site Season Habitat SiteHabitat
  • Site Habitat SiteHabitat
  • Site Season Habitat SeasonHabitat
  • Season Habitat SeasonHabitat
  • Site Season Habitat SiteSeason
  • Site Season SiteSeason
  • Site Season Habitat
  • Site Season
  • Site Habitat
  • Season Habitat
  • Site
  • Season
  • Habitat

17
Models fish habitat use example
  • Site Season Habitat SiteHabitat
    SeasonHabitat SiteSeason
  • Site Season Habitat SiteHabitat
    SeasonHabitat
  • Site Season Habitat SiteSeason
    SiteHabitat
  • Site Season Habitat SiteSeason
    SeasonHabitat
  • Site Season Habitat SiteHabitat
  • Site Habitat SiteHabitat
  • Site Season Habitat SeasonHabitat
  • Season Habitat SeasonHabitat
  • Site Season Habitat SiteSeason
  • Site Season SiteSeason
  • Site Season Habitat
  • Site Season
  • Site Habitat
  • Season Habitat
  • Site
  • Season
  • Habitat

18
Models fish habitat use example
  • Site Season Habitat SiteHabitat
    SeasonHabitat SiteSeason
  • Site Season Habitat SiteHabitat
    SeasonHabitat
  • Site Season Habitat SiteSeason
    SiteHabitat
  • Site Season Habitat SiteSeason
    SeasonHabitat
  • Site Season Habitat SiteHabitat
  • Site Habitat SiteHabitat
  • Site Season Habitat SeasonHabitat
  • Season Habitat SeasonHabitat
  • Site Season Habitat SiteSeason
  • Site Season SiteSeason
  • Site Season Habitat
  • Site Season
  • Site Habitat
  • Season Habitat
  • Site
  • Season
  • Habitat

19
The importance of a priori thinking You cant
go back home!
20
Modeling
  • Trade-off between precision and bias
  • Trying to derive knowledge / advance learning
    not fit the data
  • Relationship between data (quantity and quality)
    and sophistication of the model

21
Precision-Bias Trade-off
Bias 2
Model Complexity increasing umber of Parameters
22
Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
23
Precision-Bias Trade-off
variance
Bias 2
Model Complexity increasing umber of Parameters
24
Kullback-Leibler Information
  • Basic concept from Information theory
  • The information lost when a model is used to
    represent full reality
  • Can also think of it as the distance between a
    model and full reality

25
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
26
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
27
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
G3
28
Kullback-Leibler Information
Truth / reality
G1 (best model in set)
G2
The relative difference between models is constant
G3
29
Akaikes Contributions
  • Figured out how to estimate the relative
    Kullback-Leibler distance between models in a set
    of models
  • Figured out how to link maximum likelihood
    estimation theory with expected K-L information
  • An (Akaikes) Information Criteria
  • AIC -2 loge (Lmodeli data) 2K

30
Akaikes Contributions
  • Figured out how to estimate the relative K-L
    distance between models in a set of models
  • Figured out how to link maximum likelihood
    estimation theory with expected K-L information
  • An (Akaikes) Information Criteria
  • AIC -2 loge (Lmodeli data) 2K

31
Akaikes Contributions
  • Figured out how to estimate the relative K-L
    distance between models in a set of models
  • Figured out how to link maximum likelihood
    estimation theory with expected K-L information
  • An (Akaikes) Information Criteria
  • AIC -2 loge (Lmodeli data) 2K

32
I-T mechanics
  • AICci -2loge (Likelihood of model i given the
    data) 2K (n/(n-K-1))
  • or
  • AIC 2K(K1)/(n-K-1)
  • (where K the number of parameters estimated and
    n the sample size)

33
I-T mechanics
  • AICcmin AICc for the model with the lowest AICc
    value
  • Di AICci AICcmin

34
I-T mechanics
  • Model Probability (also Bayesian posterior model
    probabilities)
  • evidence ratio of model i to model j wi / wj

35
I-T mechanics
  • Least Squares Regression
  • AIC n loge (s2) 2K (n/(n-K-1))
  • Where s2 RSS / n
  • (explain offset for constant part)

36
I-T mechanics
  • Counting Parameters
  • K number of parameters estimated
  • Least Square Regression
  • K number of parameters 2 (for intercept s)

37
I-T mechanics
  • Counting Parameters
  • K number of parameters estimated
  • Logistic Regression
  • K number of parameters 1 (for intercept)

38
I-T mechanics
  • Counting Parameters
  • Non-identifiable parameters

39
Comparing Models
40
Comparing Models
Combined model weight 0.995
41
Comparing Models
Evidence Ratio 4.52
42
Comparing Models
43
Comparing Models
Evidence Ratio 3.03
44
Comparing Models
Evidence Ratio 4.28 (.34.22.14.08) /
(.11.04.02.01)
45
Generalized Linear Models
46
Mathematical details
  • Three parts to a GLM
  • Link function
  • linear equation
  • error distribution

47
Mathematical details
  • General Linear Models linear regression and
    ANOVA
  • Link function Identity link
  • linear equation
  • error distribution Normal Distribution
    (Gaussian)

Y b0 b1X1 b2X2 e
48
Mathematical details
  • Logistic Regression
  • Link function - Logit link ln(p / (1-p))
  • linear equation
  • error distribution Binomial Distribution

Logit(p) b0 b1X1 b2X2 e
49
Mathematical details
  • What types of models can be compared within a
    single I-T analysis?
  • Data must be fixed (including response)
  • Must be able to calculate maximum likelihood
  • (ways to deal with quasi-likelihood)
  • Models do not need to be nested
  • In some cases AIC is additive

50
Model Fitting Preliminaries
  • Understanding the data/variables
  • Avoid data dredging!
  • safe data screening practices
  • Detect outliers, scale issues, collinearity
  • Tools in R

51
Tools in R
  • Tools in R
  • Generalized linear models
  • lm
  • glm
  • Packages
  • Design Package
  • FE Harrell. 2001. Regression Modeling Strategies
    with Applications to Linear Models, Logistic
    Regression, and Survival Analysis. Springer.
  • CAR package
  • Fox, J. 2002. An R and S-plus Companion to
    Applied Regression. Sage Publications.

52
Tools in R
  • Tools in R
  • Model formula
  • Ex)
  • Output
  • summary(model4)
  • model4aic
  • Model4coefficients

model4 lt- glm(helpage2 sex mom_dad suburb
brdeapp matepp density I(density2) ,
familybinomial,datachoices)
53
Tools in R
  • Fitting the model set
  • R program does the work
  • Trouble-shooting
  • Export results

54
Fish Example
55
Model Checking
  • Model Checking
  • Global model must fit
  • Models used for inference must meet assumptions,
  • Look for numerical problems
  • Tools in R

56
Fish Example
57
Interpretation of I-T results
58
Interpretation of models for inference
  • Case 1 One or a few models best models
  • Examining model parameters and predictions
  • Effects
  • Prediction
  • graphing results
  • nomograms
  • Presenting Results
  • Anderson, D. R., W. A. Link, D. H. Johnson, and
    K. P. Burnham. 2001. Suggestions for presenting
    the results of data analysis. Journal of Wildlife
    Management 65373-378.

59
Tools
  • Calculations in Excel
  • AICc, Model weights, model likelihood, evidence
    ratios
  • Sorting the models by evidence (exciting concept)
  • Model weights, evidence ratios, relative variable
    importance

60
Fish Example
61
Multi-model Inference
  • Model selection uncertainty
  • Model-average prediction
  • Model-average parameter estimates

62
Model Averaging Predictions
63
Model Averaging Predictions
64
Model Averaging Predictions
65
Model Averaging Predictions
66
Model Averaging Parameters
67
Unconditional Variance Estimator
68
Unconditional Variance Estimator
69
Snake Example
70
Multi-model Inference
71
Multi-model Inference
Write a Comment
User Comments (0)
About PowerShow.com