Title: Diagnostics
1Diagnostics Part II
- Using statistical tests to check to see if the
assumptions we made about the model are realistic
2Diagnostic methods
- Some simple (but subjective) plots. (Then)
- Some formal statistical tests. (Now)
3Simple linear regression model
The response Yi is a function of a systematic
linear component and a random error component
with assumptions that
- Error terms have mean 0, i.e., E(?i) 0.
- ?i and ?j are uncorrelated (independent).
- Error terms have same variance, i.e., Var(?i)
?2. - Error terms ?i are normally distributed.
4Why should we keep NAGGING ourselves about the
model?
- All of the estimates, confidence intervals,
prediction intervals, hypothesis tests, etc. have
been developed assuming that the model is
correct. - If the model is incorrect, then the formulas and
methods we use are at risk of being incorrect.
(Some are more forgiving than others.)
5Summary of the tests well learn
- Durbin-Watson test for detecting correlated
(adjacent) error terms. - Modified Levene test for constant error variance.
- (Ryan-Joiner) correlation test for normality of
error terms.
6The Durbin-Watson test for uncorrelated
(adjacent) error terms
Durbin-Watson test statistic
- Compare D to Durbin-Watson test bounds in Table
B.7 - If D gt upper bound (dU), conclude no
correlation. - If D lt lower bound (dL), conclude positive
correlation. - If D is between the two bounds, the test is
inconclusive.
7Example Blaisdell Company
Seasonally adjusted quarterly data, 1988 to 1992.
Reasonable fit, but are the error terms
positively auto-correlated?
8(No Transcript)
9Blaisdell Company Example Durbin-Watson test
- Stat gtgt Regression gtgt Regression. Under
Options, select Durbin-Watson statistic. - Durbin-Watson statistic 0.73
- Table B.7 with level of significance a0.01,
(p-1)1 predictor variable, and n20 (5 years, 4
quarters each) gives dL 0.95 and dU1.15. - Since D0.73 lt dL0.95, conclude error terms are
positively auto-correlated.
10For completeness sake one more thing about
Durbin-Watson test
- If test for negative auto-correlation is desired,
use D4-D instead. If D lt dL, then conclude
error terms are negatively auto-correlated. - If two-sided test is desired (both positive and
negative auto-correlation possible), conduct both
one-sided tests, D and D, separately. Level of
significance is then 2a.
11Modified Levene Test for nonconstant error
variance
- Divide the data set into two roughly equal-sized
groups, based on the level of X. - If the error variance is either increasing or
decreasing with X, the absolute deviations of the
residuals around their group median will be
larger for one of the two groups. - Two-sample t to test whether mean of absolute
deviations for one group differs significantly
from mean of absolute deviations for second group.
12Modified Levene Test in Minitab
- Use Manip gtgt Code gtgt Numeric to numeric to
create a GROUP variable based on the values of X. - Stat gtgt Regression gtgt Regression. Under Storage
, select residuals. - Stat gtgt Basic statistics gtgt 2 Variances Specify
Samples (RESI1) and Subscripts (GROUP). Select
OK. Look in session window for Levene P-value.
13Example How is plutonium activity related to
alpha particle counts?
14A residual versus fits plot suggesting
non-constant error variance
15Plutonium Alpha Example Modified Levenes Test
Levene's Test (any continuous distribution)
Test Statistic 9.452 P-Value 0.006
It is highly unlikely (P0.006) that wed get
such an extreme Levene statistic (L9.452) if the
variances of the two groups were equal. Reject
the null hypothesis at the 0.01 level, and
conclude that the error variances are not
constant.
16(Ryan-Joiner) Correlation test for normality of
error terms in Minitab
- H0 Error terms are normally distributed vs. HA
Error terms are not normally distributed - Stat gtgt Regression gtgt Regression. Under
storage, select residuals. - Stat gtgt Basic statistics gtgt Normality Test.
Select residuals (RESI1) and request Ryan-Joiner
test. Select OK.
17100 chi-square (1 df) data values
18Normal probability plot and test for 100
chi-square (1 df) data values
19100 normal(0,1) data values
20Normal probability plot and test for 100
normal(0,1) data values
21Normal probability plot for Tree diameter (X) and
C-dating Age (Y)
22Tree diameter and Age Example Ryan-Joiner
Correlation Test
23Some closing comments
- Checking of assumptions is important, but be
aware of the robustness of your methods, so you
dont get too hung up. - Model checking is an art as well as a science.
- Do not think that there is some definitive
correct answer in the back of the book. - Use your knowledge of the subject matter.