Day%207%20Model%20Evaluation

About This Presentation

Title:

Day%207%20Model%20Evaluation

Description:

Day 7 Model Evaluation Lecture 6A - Model Evaluation C. D. Canham * Just to dispense with this at first... The traditional Chi-squared test is frequently referred to ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 30

Provided by: canhamc

Learn more at: http://www.sortie-nd.org

Category:

more less

Transcript and Presenter's Notes

Title: Day%207%20Model%20Evaluation

1
Day 7Model Evaluation
2
Elements of Model evaluation

Goodness of fit
Prediction Error
Bias
Outliers and patterns in residuals

3
Assessing Goodness of Fit for Continuous Data

Visual methods
Dont underestimate the power of your eyes, but
eyes can deceive, too...
Quantification
A variety of traditional measures, all with some
limitations...

A good review... C. D. Schunn and D. Wallach.
Evaluating Goodness-of-Fit in Comparison of
Models to Data. Sourcehttp//www.lrdc.pitt.edu/sc
hunn/gof/GOF.doc
4
Traditional inferential tests masquerading as GOF
measures

The c2 goodness of fit statistic
For categorical data only, thiscan be used as a
test statisticWhat is the probability that the
model is true, given the observed results
The test can only be used to reject a model. If
the model is accepted, the statistic contains no
information on how good the fit is..
Thus, this is really a badness of fit
statistic
Other limitations as a measure of goodness of
fit
Rewards sloppy research if you are actually
trying to test (as a null hypothesis) a real
model, because small sample size and noisy data
will limit power to reject the null hypothesis

5
Visual evaluation for continuous data

Graphing observed vs. predicted...

6
Examples
Goodness of fit of neighborhood models of canopy
tree growth for 2 species at Date Creek, BC
Observed
Predicted
Source Canham, C. D., P. T. LePage, and K. D.
Coates. 2004. A neighborhood analysis of canopy
tree competition effects of shading versus
crowding. Canadian Journal of Forest Research.
7
Goodness of Fit vs. Bias
11 line
8
R2 as a measure of goodness of fit

R2 proportion of variance explained by the
model...(relative to that explained by the simple
mean of the data)

(Note R2 is NOT bounded between 0 and 1)
this interpretation of R2 is technically only
valid for data where SSE is an appropriate
estimate of variance (e.g. normal data)
9
R2 when is the mean the mean?

Clark et al. (1998) Ecological Monographs 68220

For i1..N observations in j 1..S sites uses
the SITE means, rather than the overall mean, to
calculate R2
10
r2 as a measure of goodness of fit
r2 squared correlation (r) between observed (x)
and predicted (y)
NOTE r and r2 are both bounded between 0 and 1
11
R2 vs r2
Is this a good fit (r20.81) or a really lousy
fit (R2-0.39)? (its undoubtedly biased...)
12
A note about notation...
Check the documentation when a package reports
R2 or r2. Dont assume they will be used as
I have used them...
Sample Excel output using the trendline option
for a chart The R2 value of 0.89 reported by
Excel is actually r2 (While R2 is actually
0.21) (If you specify no intercept, Excel
reports true R2...)
13
R2 vs. r2 for goodness of fit

When there is no bias, the two measures will be
almost identical (but I prefer R2, in principle).
When there is bias, R2 will be low to negative,
but r2 will indicate how good the fit could be
after taking the bias into account...

14
Sensitivity of R2 and r2 to data range
15
The Tyranny of R2 (and r2)

Limitations of R2 (and r2) as a measure of
goodness of fit...

Not an absolute measure (as frequently
assumed),
particularly when the variance of the
appropriate PDF is NOT independent of the mean
(expected) value
i.e. lognormal, gamma, Poisson,

16
Gamma Distributed Data...
The variance of the gamma increases as the square
of the mean!...
17
So, how good is good?

Our assessment is ALWAYS subjective, because of
Complexity of the process being studied
Sources of noise in the data
From a likelihood perspective, should you ever
expect R2 1?

18
Other Goodness of Fit Issues...

In complex models, a good fit may be due to the
overwhelming effect of one variable...
The best-fitting model may not be the most
general
i.e. the fit can be improved by adding terms that
account for unique variability in a specific
dataset, but that limit applicability to other
datasets. (The curse of ad hoc multiple
regression models...)

19
How good is good deviance

Comparison of your model to a full model, given
the probability model.

For i 1..n observations, a vector X of observed
data (xi), and a vector q of j 1..m parameters
(qj)
Define a full model with n parameters qi xi
(qfull). Then
Nelder and Wedderburn (1972)
20
Deviance for normally-distributed data
Log-likelihood of the full model is a function of
both sample size (n) and variance (s2)
Therefore deviance is NOT an absolute measure
of goodness of fit... But, it does establish a
standard of comparison (the full model), given
your sample size and your estimate of the
underlying variance...
21
Forms of Bias
Proportional bias (slope not 1)
Systematic bias (intercept not 0)
22
Learn from your mistakes(Examine your
residuals...)

Residual observed predicted
Basic questions to ask of your residuals
Do they fit the PDF?
Are they correlated with factors that arent in
the model (but maybe should be?)
Do some subsets of your data fit better than
others?

23
Using Residuals to Calculate Prediction Error

RMSE (Root mean squared error) (i.e. the
standard deviation of the residuals)

24
Predicting lake chemistry from spatially-explicit
watershed data

At steady state

Where concentration, lake volume and flushing
rate are observed, And input and inlake decay
are estimated
25
Predicting iron concentrations in Adirondack lakes
Results from a spatially-explicit, mass-balance
model of the effects of watershed composition on
lake chemistry
Source Maranger et al. (2006)
26
Should we incorporate lake depth?

Shallow lakes are more unpredictable than deeper
lakes
The model consistently underestimates Fe
concentrations in deeper lakes

27
Adding lake depth improves the model...
R2 went from 56 to 65
It is just as important that it made sense to add
depth...
28
But shallow lakes are still a problem...
29
Summary Model Evaluation

There are no silver bullets...
The issues are even muddier for categorical
data...
An increase in goodness of fit does not
necessarily result in an increase in knowledge
Increasing goodness of fit reduces uncertainty in
the predictions of the models, but this costs
money (more and better data). How much are you
willing to spend?
The signal to noise issue if you can see the
signal through the noise, how far are you willing
to go to reduce the noise?