Day%207%20Model%20Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Day%207%20Model%20Evaluation

Description:

Day 7 Model Evaluation Lecture 6A - Model Evaluation C. D. Canham * Just to dispense with this at first... The traditional Chi-squared test is frequently referred to ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 30
Provided by: canhamc
Learn more at: http://www.sortie-nd.org
Category:

less

Transcript and Presenter's Notes

Title: Day%207%20Model%20Evaluation


1
Day 7Model Evaluation
2
Elements of Model evaluation
  • Goodness of fit
  • Prediction Error
  • Bias
  • Outliers and patterns in residuals

3
Assessing Goodness of Fit for Continuous Data
  • Visual methods
  • Dont underestimate the power of your eyes, but
    eyes can deceive, too...
  • Quantification
  • A variety of traditional measures, all with some
    limitations...

A good review... C. D. Schunn and D. Wallach.
Evaluating Goodness-of-Fit in Comparison of
Models to Data. Sourcehttp//www.lrdc.pitt.edu/sc
hunn/gof/GOF.doc
4
Traditional inferential tests masquerading as GOF
measures
  • The c2 goodness of fit statistic
  • For categorical data only, thiscan be used as a
    test statisticWhat is the probability that the
    model is true, given the observed results
  • The test can only be used to reject a model. If
    the model is accepted, the statistic contains no
    information on how good the fit is..
  • Thus, this is really a badness of fit
    statistic
  • Other limitations as a measure of goodness of
    fit
  • Rewards sloppy research if you are actually
    trying to test (as a null hypothesis) a real
    model, because small sample size and noisy data
    will limit power to reject the null hypothesis

5
Visual evaluation for continuous data
  • Graphing observed vs. predicted...

6
Examples
Goodness of fit of neighborhood models of canopy
tree growth for 2 species at Date Creek, BC
Observed
Predicted
Source Canham, C. D., P. T. LePage, and K. D.
Coates. 2004. A neighborhood analysis of canopy
tree competition effects of shading versus
crowding. Canadian Journal of Forest Research.
7
Goodness of Fit vs. Bias
11 line
8
R2 as a measure of goodness of fit
  • R2 proportion of variance explained by the
    model...(relative to that explained by the simple
    mean of the data)

(Note R2 is NOT bounded between 0 and 1)
this interpretation of R2 is technically only
valid for data where SSE is an appropriate
estimate of variance (e.g. normal data)
9
R2 when is the mean the mean?
  • Clark et al. (1998) Ecological Monographs 68220

For i1..N observations in j 1..S sites uses
the SITE means, rather than the overall mean, to
calculate R2
10
r2 as a measure of goodness of fit
r2 squared correlation (r) between observed (x)
and predicted (y)
NOTE r and r2 are both bounded between 0 and 1
11
R2 vs r2
Is this a good fit (r20.81) or a really lousy
fit (R2-0.39)? (its undoubtedly biased...)
12
A note about notation...
Check the documentation when a package reports
R2 or r2. Dont assume they will be used as
I have used them...
Sample Excel output using the trendline option
for a chart The R2 value of 0.89 reported by
Excel is actually r2 (While R2 is actually
0.21) (If you specify no intercept, Excel
reports true R2...)
13
R2 vs. r2 for goodness of fit
  • When there is no bias, the two measures will be
    almost identical (but I prefer R2, in principle).
  • When there is bias, R2 will be low to negative,
    but r2 will indicate how good the fit could be
    after taking the bias into account...

14
Sensitivity of R2 and r2 to data range
15
The Tyranny of R2 (and r2)
  • Limitations of R2 (and r2) as a measure of
    goodness of fit...
  • Not an absolute measure (as frequently
    assumed),
  • particularly when the variance of the
    appropriate PDF is NOT independent of the mean
    (expected) value
  • i.e. lognormal, gamma, Poisson,

16
Gamma Distributed Data...
The variance of the gamma increases as the square
of the mean!...
17
So, how good is good?
  • Our assessment is ALWAYS subjective, because of
  • Complexity of the process being studied
  • Sources of noise in the data
  • From a likelihood perspective, should you ever
    expect R2 1?

18
Other Goodness of Fit Issues...
  • In complex models, a good fit may be due to the
    overwhelming effect of one variable...
  • The best-fitting model may not be the most
    general
  • i.e. the fit can be improved by adding terms that
    account for unique variability in a specific
    dataset, but that limit applicability to other
    datasets. (The curse of ad hoc multiple
    regression models...)

19
How good is good deviance
  • Comparison of your model to a full model, given
    the probability model.

For i 1..n observations, a vector X of observed
data (xi), and a vector q of j 1..m parameters
(qj)
Define a full model with n parameters qi xi
(qfull). Then
Nelder and Wedderburn (1972)
20
Deviance for normally-distributed data
Log-likelihood of the full model is a function of
both sample size (n) and variance (s2)
Therefore deviance is NOT an absolute measure
of goodness of fit... But, it does establish a
standard of comparison (the full model), given
your sample size and your estimate of the
underlying variance...
21
Forms of Bias
Proportional bias (slope not 1)
Systematic bias (intercept not 0)
22
Learn from your mistakes(Examine your
residuals...)
  • Residual observed predicted
  • Basic questions to ask of your residuals
  • Do they fit the PDF?
  • Are they correlated with factors that arent in
    the model (but maybe should be?)
  • Do some subsets of your data fit better than
    others?

23
Using Residuals to Calculate Prediction Error
  • RMSE (Root mean squared error) (i.e. the
    standard deviation of the residuals)

24
Predicting lake chemistry from spatially-explicit
watershed data
  • At steady state

Where concentration, lake volume and flushing
rate are observed, And input and inlake decay
are estimated
25
Predicting iron concentrations in Adirondack lakes
Results from a spatially-explicit, mass-balance
model of the effects of watershed composition on
lake chemistry
Source Maranger et al. (2006)
26
Should we incorporate lake depth?
  • Shallow lakes are more unpredictable than deeper
    lakes
  • The model consistently underestimates Fe
    concentrations in deeper lakes

27
Adding lake depth improves the model...
R2 went from 56 to 65
It is just as important that it made sense to add
depth...
28
But shallow lakes are still a problem...
29
Summary Model Evaluation
  • There are no silver bullets...
  • The issues are even muddier for categorical
    data...
  • An increase in goodness of fit does not
    necessarily result in an increase in knowledge
  • Increasing goodness of fit reduces uncertainty in
    the predictions of the models, but this costs
    money (more and better data). How much are you
    willing to spend?
  • The signal to noise issue if you can see the
    signal through the noise, how far are you willing
    to go to reduce the noise?
Write a Comment
User Comments (0)
About PowerShow.com