STATS 330: Lecture 12

About This Presentation

Title:

STATS 330: Lecture 12

Description:

Lag k autocorrelation is correlation of residuals k time units apart. 330 lecture 12 ... plot(res, type='b', xlab='Time Sequence', ylab = 'Residual' ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 32

Provided by: statAuc

Category:

more less

Transcript and Presenter's Notes

Title: STATS 330: Lecture 12

1
STATS 330 Lecture 12
Diagnostics 4
2
Diagnostics 4

Aim of todays lecture
To discuss diagnostics for independence

3
Independence

One of the regression assumptions is that the
errors are independent.
Data that is collected sequentially over time
often have errors that are not independent.
If the independence assumption does not hold,
then the standard errors will be wrong and the
tests and confidence intervals will be
unreliable.
Thus, we need to be able to detect lack of
independence.

4
Types of dependence

If large positive errors have a tendency to
follow large positive errors, and large negative
errors a tendency to follow large negative
errors, we say the data has positive
autocorrelation
If large positive errors have a tendency to
follow large negative errors, and large negative
errors a tendency to follow large positive
errors, we say the data has negative
autocorrelation

5
Diagnostics

If the errors are positively autocorrelated,
Plotting the residuals against time will show
long runs of positive and negative residuals
Plotting residuals against the previous residual
(ie ei vs ei-1) will show a positive trend
A correlogram of the residuals will show positive
spikes, gradually decaying

6
Diagnostics (2)

If the errors are negatively autocorrelated,
Plotting the residuals against time will show
alternating positive and negative residuals
Plotting residuals against the previous residual
(ie ei vs ei-1) will show a negative trend
A correlogram of the residuals will show
alternating positive and negative spikes,
gradually decaying

7
Residuals against time
Can omit the x vector if it is sequence numbers

reslt-residuals(lm.obj)
plot(1length(res),res,
xlabtime,ylabresiduals, typeb)
lines(1length(res),res)
abline(h0, lty2)

Dots/lines
Dotted line at 0 (mean residual)
8
(No Transcript)
9
Residuals against previous

reslt-residuals(lm.obj)
nlt-length(res)
plot.reslt-res-1 element 1 has no previous
prev.reslt-res-n have to be equal length
plot(prev.res,plot.res,
xlabprevious residual,ylabresidual)

10
Plots for different degrees of autocorrelation
11
Correlogram

acf(residuals(lm.obj))

Correlogram (autocorrelation function, acf) is
plot of lag k autocorrelation versus k
Lag k autocorrelation is correlation of residuals
k time units apart

12
(No Transcript)
13
Durbin-Watson test

We can also do a formal hypothesis test, (the
Durbin-Watson test) for independence
The test assumes the errors follow a model of the
form

NB
where the uis are independent, normal and have
constant variance. r is the lag 1 correlation
this is the autoregressive model of order 1
14
Durbin-Watson test (2)

When r 0, the errors are independent
The DW test tests independence by testing r 0
r is estimated by

15
Durbin-Watson test (3)

DW test statistic is

Value of DW is between 0 and 4
Values of DW around 2 are consistent with
independence
Values close to 4 indicate negative serial
correlation
Values close to 0 indicate positive serial
correlation

16
Durbin-Watson test (4)

There exist values dL, dU depending on the number
of variables k in the regression and the sample
size n see table on next slide
Use the value of DW to decide on independence as
follows

Positive autocorrelation
Negative autocorrelation
Independence
0
4
4-dU
4-dL
dL
dU
Inconclusive
17
Durbin-Watson table
18
Example the advertising data

Sales and advertising data
Data on monthly sales and advertising spend for
35 months
Model is Sales spend prev.spend
(prev.spend spend in previous month)

19
Advertising data

gt ad.df
spend prev.spend sales
1 16 15 20.5
2 18 16 21.0
3 27 18 15.5
4 21 27 15.3
5 49 21 23.5
6 21 49 24.5
7 22 21 21.3
8 28 22 23.5
9 36 28 28.0
10 40 36 24.0
11 3 40 15.5
21 3 17.3
35 lines in all

20
R code for residual vs previous plot
advertising.lmlt-lm(salesspend prev.spend, data
ad.df) reslt-residuals(advertising.lm) nlt-length(
res) plot.reslt-res-1 prev.reslt-res-n plot(prev
.res,plot.res, xlab"previous residual",ylab"resi
dual",main"Residual versus previous residual \n
for the advertising data") abline(coef(lm(plot.res
prev.res)), col"red", lwd2)
21
(No Transcript)
22
Time series plot, correlogram R code
layout(2,1) plot(res, type"b", xlab"Time
Sequence", ylab "Residual", main "Time
series plot of residuals for the advertising
data") abline(h0, lty2, lwd2,col"blue") acf(r
es, main "Correlogram of residuals for the
advertising data")
23
Increasing trend?
24
Calculating DW

gt rhohatlt-cor(plot.res,prev.res)
gt rhohat
1 0.4450734
gt DWlt-2(1-rhohat)
gt DW
1 1.109853

For n35 and k2, dL 1.34. Since DW 1.109 lt
dL 1.34 , strong evidence of positive serial
correlation
25
Durbin-Watson table
use (1.28 1.39)/2 1.34
26
Remedy (1)

If we detect serial correlation, we need to fit
special time series models to the data.
For full details see postgrad course 726.
Assuming that the AR(1) mode is ok, we can use
the arima function in R to fit the regression

27
Fitting a regression with AR(1) errors

gt arima(ad.dfsales,orderc(1,0,0),
xregcbind(spend,prev.spend))
Call
arima(x ad.dfsales, order c(1, 0, 0), xreg
cbind(spend, prev.spend))
Coefficients
ar1 intercept spend prev.spend
0.4966 16.9080 0.1218 0.1391
s.e. 0.1580 1.6716 0.0308 0.0316
sigma2 estimated as 9.476
log likelihood -89.16, aic 188.32

28
Comparisons
29
Remedy (2)

Recall there was a trend in the time series plot
of the residuals, these seem related to time
Thus, time is a lurking variable , a variable
that should be in the regression but isnt
Try model
Sales spend prev.spend time

30
Fitting new model
time135 new.advertising.lmlt-lm(salesspend
prev.spend time, data ad.df) reslt-residuals(ne
w.advertising.lm) nlt-length(res) plot.reslt-res-1
prev.reslt-res-n DW 2(1-cor(plot.res,prev.r
es))
31
DW Retest

DW is now 1.73
For a model with 3 explanatory variables, du is
about 1.66 (refer to the table), so no evidence
of serial correlation
Time is a highly significant variable in the
regression
Problem is fixed!

Write a Comment

User Comments (0)

About PowerShow.com

STATS 330: Lecture 12 - PowerPoint PPT Presentation

STATS 330: Lecture 12

Lag k autocorrelation is correlation of residuals k time units apart. 330 lecture 12 ... plot(res, type='b', xlab='Time Sequence', ylab = 'Residual' ... – PowerPoint PPT presentation