Title: STATS 330: Lecture 12
1STATS 330 Lecture 12
Diagnostics 4
2Diagnostics 4
- Aim of todays lecture
- To discuss diagnostics for independence
3Independence
- One of the regression assumptions is that the
errors are independent. - Data that is collected sequentially over time
often have errors that are not independent. - If the independence assumption does not hold,
then the standard errors will be wrong and the
tests and confidence intervals will be
unreliable. - Thus, we need to be able to detect lack of
independence.
4Types of dependence
- If large positive errors have a tendency to
follow large positive errors, and large negative
errors a tendency to follow large negative
errors, we say the data has positive
autocorrelation - If large positive errors have a tendency to
follow large negative errors, and large negative
errors a tendency to follow large positive
errors, we say the data has negative
autocorrelation
5Diagnostics
- If the errors are positively autocorrelated,
- Plotting the residuals against time will show
long runs of positive and negative residuals - Plotting residuals against the previous residual
(ie ei vs ei-1) will show a positive trend - A correlogram of the residuals will show positive
spikes, gradually decaying
6Diagnostics (2)
- If the errors are negatively autocorrelated,
- Plotting the residuals against time will show
alternating positive and negative residuals - Plotting residuals against the previous residual
(ie ei vs ei-1) will show a negative trend - A correlogram of the residuals will show
alternating positive and negative spikes,
gradually decaying
7Residuals against time
Can omit the x vector if it is sequence numbers
- reslt-residuals(lm.obj)
- plot(1length(res),res,
- xlabtime,ylabresiduals, typeb)
- lines(1length(res),res)
- abline(h0, lty2)
Dots/lines
Dotted line at 0 (mean residual)
8(No Transcript)
9Residuals against previous
- reslt-residuals(lm.obj)
- nlt-length(res)
- plot.reslt-res-1 element 1 has no previous
- prev.reslt-res-n have to be equal length
- plot(prev.res,plot.res,
- xlabprevious residual,ylabresidual)
10Plots for different degrees of autocorrelation
11Correlogram
- Correlogram (autocorrelation function, acf) is
plot of lag k autocorrelation versus k - Lag k autocorrelation is correlation of residuals
k time units apart
12(No Transcript)
13Durbin-Watson test
- We can also do a formal hypothesis test, (the
Durbin-Watson test) for independence - The test assumes the errors follow a model of the
form
NB
where the uis are independent, normal and have
constant variance. r is the lag 1 correlation
this is the autoregressive model of order 1
14Durbin-Watson test (2)
- When r 0, the errors are independent
- The DW test tests independence by testing r 0
- r is estimated by
15Durbin-Watson test (3)
- Value of DW is between 0 and 4
- Values of DW around 2 are consistent with
independence - Values close to 4 indicate negative serial
correlation - Values close to 0 indicate positive serial
correlation
16Durbin-Watson test (4)
- There exist values dL, dU depending on the number
of variables k in the regression and the sample
size n see table on next slide - Use the value of DW to decide on independence as
follows
Positive autocorrelation
Negative autocorrelation
Independence
0
4
4-dU
4-dL
dL
dU
Inconclusive
17Durbin-Watson table
18Example the advertising data
- Sales and advertising data
- Data on monthly sales and advertising spend for
35 months - Model is Sales spend prev.spend
- (prev.spend spend in previous month)
19Advertising data
- gt ad.df
- spend prev.spend sales
- 1 16 15 20.5
- 2 18 16 21.0
- 3 27 18 15.5
- 4 21 27 15.3
- 5 49 21 23.5
- 6 21 49 24.5
- 7 22 21 21.3
- 8 28 22 23.5
- 9 36 28 28.0
- 10 40 36 24.0
- 11 3 40 15.5
- 21 3 17.3
- 35 lines in all
20R code for residual vs previous plot
advertising.lmlt-lm(salesspend prev.spend, data
ad.df) reslt-residuals(advertising.lm) nlt-length(
res) plot.reslt-res-1 prev.reslt-res-n plot(prev
.res,plot.res, xlab"previous residual",ylab"resi
dual",main"Residual versus previous residual \n
for the advertising data") abline(coef(lm(plot.res
prev.res)), col"red", lwd2)
21(No Transcript)
22Time series plot, correlogram R code
layout(2,1) plot(res, type"b", xlab"Time
Sequence", ylab "Residual", main "Time
series plot of residuals for the advertising
data") abline(h0, lty2, lwd2,col"blue") acf(r
es, main "Correlogram of residuals for the
advertising data")
23Increasing trend?
24Calculating DW
- gt rhohatlt-cor(plot.res,prev.res)
- gt rhohat
- 1 0.4450734
- gt DWlt-2(1-rhohat)
- gt DW
- 1 1.109853
For n35 and k2, dL 1.34. Since DW 1.109 lt
dL 1.34 , strong evidence of positive serial
correlation
25Durbin-Watson table
use (1.28 1.39)/2 1.34
26Remedy (1)
- If we detect serial correlation, we need to fit
special time series models to the data. - For full details see postgrad course 726.
- Assuming that the AR(1) mode is ok, we can use
the arima function in R to fit the regression
27Fitting a regression with AR(1) errors
- gt arima(ad.dfsales,orderc(1,0,0),
- xregcbind(spend,prev.spend))
- Call
- arima(x ad.dfsales, order c(1, 0, 0), xreg
cbind(spend, prev.spend)) - Coefficients
- ar1 intercept spend prev.spend
- 0.4966 16.9080 0.1218 0.1391
- s.e. 0.1580 1.6716 0.0308 0.0316
- sigma2 estimated as 9.476
- log likelihood -89.16, aic 188.32
28Comparisons
29Remedy (2)
- Recall there was a trend in the time series plot
of the residuals, these seem related to time - Thus, time is a lurking variable , a variable
that should be in the regression but isnt - Try model
- Sales spend prev.spend time
30Fitting new model
time135 new.advertising.lmlt-lm(salesspend
prev.spend time, data ad.df) reslt-residuals(ne
w.advertising.lm) nlt-length(res) plot.reslt-res-1
prev.reslt-res-n DW 2(1-cor(plot.res,prev.r
es))
31DW Retest
- DW is now 1.73
- For a model with 3 explanatory variables, du is
about 1.66 (refer to the table), so no evidence
of serial correlation - Time is a highly significant variable in the
regression - Problem is fixed!