Title: Regression Diagnostics
1Regression Diagnostics
- Contents
- Residuals
- Graphical Methods
- Multicolinearity, Nonnormality,
Heteroscedasticity, Autocorrelation - Measures of Influence
24.1 Introduction
- The conditions required for the model must be
checked. Violation of any conditions makes the
inferences invalid. - Is the error variable normally distributed?
- Is the error variance constant?
- Are the errors independent?
- Can we identify outlier?
- Is multicolinearity (intercorrelation) a problem?
Draw a histogram of the residuals
Plot the residuals versus the time periods
34.2 Residuals
- Ordinary least squares residuals
- ei Yi ? Yi
- where
- Externally studentized residuals
4Where pii is the ith diagonal element of the hat
matrix P X(XX)?1X is the estimate
of ? with ith observation deleted.
4.2 Residuals
Once the studentized residuals are calculated,
the externally studentized residuals can be
calculated through the relationship
54.3 Graphical Methods
- There is no single statistical tool that is as
powerful as a well-chosen graph Chambers et
al. (1983) - Eye-balling can give diagnostic insights no
formal diagnostics will ever provide Huber
(1991).
6Anscombes Quartet Four Data Sets Having Same
Values of Summary
Mean(Y) 7.501, Mean(X) 9.0, Std(Y) 2.031,
Std(X)3.32, Cor(Y, X) 0.8, Y 3 0.5 X, etc.
7(No Transcript)
8Graphical methods can be useful in many ways
- Detect errors in the data (e.g., an outlying
point may be a result of a typographical error, - Recognizing patterns in the data (e.g., clusters,
outliers), - Explore relationship among variables,
- Discover new phenomena,
- Confirm or negate assumptions,
- Assess the adequacy of a fitted model,
- Suggest remedial actions (e.g., transform the
data), - Enhance numerical analysis in general.
9Graphical methods can be classified into two
classes
- Graphs before fitting the model. These are
useful in correcting errors in data and in
selecting a model - Graphs after fitting a model. These are
particularly useful for checking the model
assumptions and for assessing the goodness of the
fit.
10Functionality of common plots
- Plot of Y vs Xi, I 1, , p to reveal Y-X
relationship. - Normal probability plot of studentized residuals
for checking the normality assumption. - Scatter plots of studentized residuals against
each of the predictor variables for checking
linearity of Y-X relation, and constancy of error
variance.
11Functionality of common plots
- Scatter plots of the studentized residuals versus
the fitted values similar to the above. - Index plot of the studentized residuals for
checking the independence assumption. - Matrix plot of the predictors for checking
multicolinearity
12Example 4.1. Using R (lm() and plot()) to do the
following plots based the motor inn data in
Example 3.1
- Plot Y vs, respectively, X1, X2, X3, X4, X5, X6.
- Give a matrix plot of all X variables
- Plot the OLS residuals vs Xi, i 1, 2, , 6
- Plot of OLS residuals vs the fitted values
- Normal probability plot of studentized residuals
r - R commend qqnorm(r)
- Index plot of studentized residuals
13Diagnostics Multicolinearity
- Example 4.2 Predicting house price (EX4-01.xls)
- A real estate agent believes that a house selling
price can be predicted using the house size,
number of bedrooms, and lot size. - A random sample of 100 houses was drawn and data
recorded.
- Analyze the relationship among the four variables
14Diagnostics Multicolinearity
- The proposed model isPRICE ?0 ?1 BEDROOMS
?2 H-SIZE ?3 LOTSIZE ?
The model is valid, but no variable is
significantly related to the selling price !!!
Why?
15Diagnostics Multicolinearity
- Multicolinearity is found to be a problem.
- Multicolinearity causes two kinds of
difficulties - The t statistics appear to be too small.
- The b coefficients cannot be interpreted as
slopes.
16(No Transcript)
17 Remedying Violations of the Required Conditions
- Nonnormality or heteroscedasticity can be
remedied using transformations on the y variable. - The transformations can improve the linear
relationship between the dependent variable and
the independent variables. - Many computer software systems allow us to make
the transformations easily.
18Reducing Nonnormality by Transformations
- A brief list of transformations
- Y log Y (for Y gt 0)
- Use when the ? increases with Y, or
- Use when the error distribution is positively
skewed - Y Y2
- Use when the ?2 is proportional to E(Y), or
- Use when the error distribution is negatively
skewed - Y Y1/2 (for Y gt 0)
- Use when the ?2 is proportional to E(Y)
- Y 1/Y
- Use when ?2 increases significantly when y
increases beyond some critical value.
19Durbin - Watson TestAre the Errors
Autocorrelated?
- This test detects first order autocorrelation
between consecutive residuals in a time series - If autocorrelation exists the error variables are
not independent
Residual at time i
20Positive First Order Autocorrelation
Residuals
0
Time
Positive first order autocorrelation occurs when
consecutive residuals tend to be similar.
Then, the value of d is small (less than 2).
21Negative First Order Autocorrelation
Residuals
0
Time
Negative first order autocorrelation occurs when
consecutive residuals tend to markedly differ.
Then, the value of d is large (greater than 2).
22One tail test for Positive First Order
Autocorrelation
- If dltdL there is enough evidence to show that
positive first-order correlation exists - If dgtdU there is not enough evidence to show that
positive first-order correlation exists - If d is between dL and dU the test is
inconclusive.
23One Tail Test for Negative First Order
Autocorrelation
- If dgt4-dL, negative first order correlation
exists - If dlt4-dU, negative first order correlation does
not exists - if d falls between 4-dU and 4-dL the test is
inconclusive.
24Two-Tail Test for First Order Autocorrelation
- If dltdL or dgt4-dL first order autocorrelation
exists - If d falls between dL and dU or between 4-dU and
4-dLthe test is inconclusive - If d falls between dU and 4-dU there is no
evidence for first order autocorrelation
25Testing the Existence of Autocorrelation, Example
- Example 4.3 (EX4-03)
- How does the weather affect the sales of lift
tickets in a ski resort? - Data of the past 20 years sales of tickets, along
with the total snowfall and the average
temperature during Christmas week in each year,
was collected. - The model hypothesized was
- TICKETS b0 b1SNOWFALL b2TEMPERATURE e
- Regression analysis yielded the following
results
26The Regression Equation Assessment (I)
The model seems to be very poor
- R-square0.1200
- It is not valid (Signif. F 0.3373)
- No variable is linearly related to Sales
27Diagnostics The Error Distribution
The errors histogram
The errors may be normally distributed
28Diagnostics Heteroscedasticity
29Diagnostics First Order Autocorrelation
The errors are not independent!!
30Diagnostics First Order Autocorrelation
Using the computer - Excel
Tools gt Data Analysis gt Regression (check the
residual option and then OK) Tools gt Data
Analysis Plus gt Durbin Watson Statistic gt
Highlight the range of the residuals from the
regression run gt OK
Test for positive first order auto-correlation n
20, p2. From the Durbin-Watson table we have
dL1.10, dU1.54. The statistic
d0.5931 Conclusion Because dltdL , there is
sufficient evidence to infer that positive first
order autocorrelation exists.
The residuals
31The Modified Model Time Included
The modified regression model (EX4-02mod.xls) TIC
KETSb0 b1SNOWFALL b2TEMPERATURE b3TIME e
- All the required conditions are met for this
model. - The fit of this model is high R2 0.7410.
- The model is valid. Significance F .0001.
-
- SNOWFALL and TIME are linearly related to
ticket sales. - TEMPERATURE is not linearly related to ticket
sales.
324.4. Leverage, Influence, and Outliers
- Influential point a point is an influential
point if its deletion causes substantial changes
in the fitted model (estimated coefficients,
fitted values, t-tests, etc.). - Outliers in the Response Variable observations
with large standardized residuals are outliers in
the response variable. A rule of thumb larger
than 3 sd away from the mean (zero). - Leverage value pii, ith diagonal element of P
matrix. - Outliers in the Predictors Outliers in the
predictors (the X-space) are defined based on the
magnitude of pii,
334.4. Leverage, Influence, and Outliers
- specifically, if
- pii gt 2(p1)/n
- then, the ith observation is an outlying
observation - with respect to X variables.
- This is because pii measures the distance of a
point to X. It is clearer in simple linear
regression
344.5. Measures of Influence
- Let
be the fitted values and the estimate of ? when
we drop the ith observation.
- Cooks Distance measures the influence of the
ith observation by summarizing the differences
between the fitted values obtained from the full
data and the fitted values obtained by deleting
the ith observation
354.5. Measures of Influence
- Cooks Distance can be calculated through the
relation
- A rule of thumb if Ci gt1, then the ith
observation is influential. - More flexible and informative way of detecting
the influential observations is to do an index
plot of Ci . - There are other measures of influence, see text,
Sec 4.9 - See R script file MotorInn.r in course website
for details in calculating various quantities and
plotting.