Title: Day%207%20Model%20Evaluation
1Day 7Model Evaluation
2Elements of Model evaluation
- Goodness of fit
- Prediction Error
- Bias
- Outliers and patterns in residuals
3Assessing Goodness of Fit for Continuous Data
- Visual methods
- Dont underestimate the power of your eyes, but
eyes can deceive, too... - Quantification
- A variety of traditional measures, all with some
limitations...
A good review... C. D. Schunn and D. Wallach.
Evaluating Goodness-of-Fit in Comparison of
Models to Data. Sourcehttp//www.lrdc.pitt.edu/sc
hunn/gof/GOF.doc
4Traditional inferential tests masquerading as GOF
measures
- The c2 goodness of fit statistic
- For categorical data only, thiscan be used as a
test statisticWhat is the probability that the
model is true, given the observed results - The test can only be used to reject a model. If
the model is accepted, the statistic contains no
information on how good the fit is.. - Thus, this is really a badness of fit
statistic - Other limitations as a measure of goodness of
fit - Rewards sloppy research if you are actually
trying to test (as a null hypothesis) a real
model, because small sample size and noisy data
will limit power to reject the null hypothesis
5Visual evaluation for continuous data
- Graphing observed vs. predicted...
6Examples
Goodness of fit of neighborhood models of canopy
tree growth for 2 species at Date Creek, BC
Observed
Predicted
Source Canham, C. D., P. T. LePage, and K. D.
Coates. 2004. A neighborhood analysis of canopy
tree competition effects of shading versus
crowding. Canadian Journal of Forest Research.
7Goodness of Fit vs. Bias
11 line
8R2 as a measure of goodness of fit
- R2 proportion of variance explained by the
model...(relative to that explained by the simple
mean of the data)
(Note R2 is NOT bounded between 0 and 1)
this interpretation of R2 is technically only
valid for data where SSE is an appropriate
estimate of variance (e.g. normal data)
9R2 when is the mean the mean?
- Clark et al. (1998) Ecological Monographs 68220
For i1..N observations in j 1..S sites uses
the SITE means, rather than the overall mean, to
calculate R2
10r2 as a measure of goodness of fit
r2 squared correlation (r) between observed (x)
and predicted (y)
NOTE r and r2 are both bounded between 0 and 1
11R2 vs r2
Is this a good fit (r20.81) or a really lousy
fit (R2-0.39)? (its undoubtedly biased...)
12A note about notation...
Check the documentation when a package reports
R2 or r2. Dont assume they will be used as
I have used them...
Sample Excel output using the trendline option
for a chart The R2 value of 0.89 reported by
Excel is actually r2 (While R2 is actually
0.21) (If you specify no intercept, Excel
reports true R2...)
13R2 vs. r2 for goodness of fit
- When there is no bias, the two measures will be
almost identical (but I prefer R2, in principle). - When there is bias, R2 will be low to negative,
but r2 will indicate how good the fit could be
after taking the bias into account...
14Sensitivity of R2 and r2 to data range
15The Tyranny of R2 (and r2)
- Limitations of R2 (and r2) as a measure of
goodness of fit...
- Not an absolute measure (as frequently
assumed), - particularly when the variance of the
appropriate PDF is NOT independent of the mean
(expected) value - i.e. lognormal, gamma, Poisson,
16Gamma Distributed Data...
The variance of the gamma increases as the square
of the mean!...
17So, how good is good?
- Our assessment is ALWAYS subjective, because of
- Complexity of the process being studied
- Sources of noise in the data
- From a likelihood perspective, should you ever
expect R2 1?
18Other Goodness of Fit Issues...
- In complex models, a good fit may be due to the
overwhelming effect of one variable... - The best-fitting model may not be the most
general - i.e. the fit can be improved by adding terms that
account for unique variability in a specific
dataset, but that limit applicability to other
datasets. (The curse of ad hoc multiple
regression models...)
19How good is good deviance
- Comparison of your model to a full model, given
the probability model.
For i 1..n observations, a vector X of observed
data (xi), and a vector q of j 1..m parameters
(qj)
Define a full model with n parameters qi xi
(qfull). Then
Nelder and Wedderburn (1972)
20Deviance for normally-distributed data
Log-likelihood of the full model is a function of
both sample size (n) and variance (s2)
Therefore deviance is NOT an absolute measure
of goodness of fit... But, it does establish a
standard of comparison (the full model), given
your sample size and your estimate of the
underlying variance...
21Forms of Bias
Proportional bias (slope not 1)
Systematic bias (intercept not 0)
22Learn from your mistakes(Examine your
residuals...)
- Residual observed predicted
- Basic questions to ask of your residuals
- Do they fit the PDF?
- Are they correlated with factors that arent in
the model (but maybe should be?) - Do some subsets of your data fit better than
others?
23Using Residuals to Calculate Prediction Error
- RMSE (Root mean squared error) (i.e. the
standard deviation of the residuals)
24Predicting lake chemistry from spatially-explicit
watershed data
Where concentration, lake volume and flushing
rate are observed, And input and inlake decay
are estimated
25Predicting iron concentrations in Adirondack lakes
Results from a spatially-explicit, mass-balance
model of the effects of watershed composition on
lake chemistry
Source Maranger et al. (2006)
26Should we incorporate lake depth?
- Shallow lakes are more unpredictable than deeper
lakes - The model consistently underestimates Fe
concentrations in deeper lakes
27Adding lake depth improves the model...
R2 went from 56 to 65
It is just as important that it made sense to add
depth...
28But shallow lakes are still a problem...
29Summary Model Evaluation
- There are no silver bullets...
- The issues are even muddier for categorical
data... - An increase in goodness of fit does not
necessarily result in an increase in knowledge - Increasing goodness of fit reduces uncertainty in
the predictions of the models, but this costs
money (more and better data). How much are you
willing to spend? - The signal to noise issue if you can see the
signal through the noise, how far are you willing
to go to reduce the noise?