STATS 330: Lecture 12 - PowerPoint PPT Presentation

About This Presentation
Title:

STATS 330: Lecture 12

Description:

Lag k autocorrelation is correlation of residuals k time units apart. 330 lecture 12 ... plot(res, type='b', xlab='Time Sequence', ylab = 'Residual' ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 32
Provided by: statAuc
Category:
Tags: stats | lecture

less

Transcript and Presenter's Notes

Title: STATS 330: Lecture 12


1
STATS 330 Lecture 12
Diagnostics 4
2
Diagnostics 4
  • Aim of todays lecture
  • To discuss diagnostics for independence

3
Independence
  • One of the regression assumptions is that the
    errors are independent.
  • Data that is collected sequentially over time
    often have errors that are not independent.
  • If the independence assumption does not hold,
    then the standard errors will be wrong and the
    tests and confidence intervals will be
    unreliable.
  • Thus, we need to be able to detect lack of
    independence.

4
Types of dependence
  • If large positive errors have a tendency to
    follow large positive errors, and large negative
    errors a tendency to follow large negative
    errors, we say the data has positive
    autocorrelation
  • If large positive errors have a tendency to
    follow large negative errors, and large negative
    errors a tendency to follow large positive
    errors, we say the data has negative
    autocorrelation

5
Diagnostics
  • If the errors are positively autocorrelated,
  • Plotting the residuals against time will show
    long runs of positive and negative residuals
  • Plotting residuals against the previous residual
    (ie ei vs ei-1) will show a positive trend
  • A correlogram of the residuals will show positive
    spikes, gradually decaying

6
Diagnostics (2)
  • If the errors are negatively autocorrelated,
  • Plotting the residuals against time will show
    alternating positive and negative residuals
  • Plotting residuals against the previous residual
    (ie ei vs ei-1) will show a negative trend
  • A correlogram of the residuals will show
    alternating positive and negative spikes,
    gradually decaying

7
Residuals against time
Can omit the x vector if it is sequence numbers
  • reslt-residuals(lm.obj)
  • plot(1length(res),res,
  • xlabtime,ylabresiduals, typeb)
  • lines(1length(res),res)
  • abline(h0, lty2)

Dots/lines
Dotted line at 0 (mean residual)
8
(No Transcript)
9
Residuals against previous
  • reslt-residuals(lm.obj)
  • nlt-length(res)
  • plot.reslt-res-1 element 1 has no previous
  • prev.reslt-res-n have to be equal length
  • plot(prev.res,plot.res,
  • xlabprevious residual,ylabresidual)

10
Plots for different degrees of autocorrelation
11
Correlogram
  • acf(residuals(lm.obj))
  • Correlogram (autocorrelation function, acf) is
    plot of lag k autocorrelation versus k
  • Lag k autocorrelation is correlation of residuals
    k time units apart

12
(No Transcript)
13
Durbin-Watson test
  • We can also do a formal hypothesis test, (the
    Durbin-Watson test) for independence
  • The test assumes the errors follow a model of the
    form

NB
where the uis are independent, normal and have
constant variance. r is the lag 1 correlation
this is the autoregressive model of order 1
14
Durbin-Watson test (2)
  • When r 0, the errors are independent
  • The DW test tests independence by testing r 0
  • r is estimated by

15
Durbin-Watson test (3)
  • DW test statistic is
  • Value of DW is between 0 and 4
  • Values of DW around 2 are consistent with
    independence
  • Values close to 4 indicate negative serial
    correlation
  • Values close to 0 indicate positive serial
    correlation

16
Durbin-Watson test (4)
  • There exist values dL, dU depending on the number
    of variables k in the regression and the sample
    size n see table on next slide
  • Use the value of DW to decide on independence as
    follows

Positive autocorrelation
Negative autocorrelation
Independence
0
4
4-dU
4-dL
dL
dU
Inconclusive
17
Durbin-Watson table
18
Example the advertising data
  • Sales and advertising data
  • Data on monthly sales and advertising spend for
    35 months
  • Model is Sales spend prev.spend
  • (prev.spend spend in previous month)

19
Advertising data
  • gt ad.df
  • spend prev.spend sales
  • 1 16 15 20.5
  • 2 18 16 21.0
  • 3 27 18 15.5
  • 4 21 27 15.3
  • 5 49 21 23.5
  • 6 21 49 24.5
  • 7 22 21 21.3
  • 8 28 22 23.5
  • 9 36 28 28.0
  • 10 40 36 24.0
  • 11 3 40 15.5
  • 21 3 17.3
  • 35 lines in all

20
R code for residual vs previous plot
advertising.lmlt-lm(salesspend prev.spend, data
ad.df) reslt-residuals(advertising.lm) nlt-length(
res) plot.reslt-res-1 prev.reslt-res-n plot(prev
.res,plot.res, xlab"previous residual",ylab"resi
dual",main"Residual versus previous residual \n
for the advertising data") abline(coef(lm(plot.res
prev.res)), col"red", lwd2)
21
(No Transcript)
22
Time series plot, correlogram R code
layout(2,1) plot(res, type"b", xlab"Time
Sequence", ylab "Residual", main "Time
series plot of residuals for the advertising
data") abline(h0, lty2, lwd2,col"blue") acf(r
es, main "Correlogram of residuals for the
advertising data")
23
Increasing trend?
24
Calculating DW
  • gt rhohatlt-cor(plot.res,prev.res)
  • gt rhohat
  • 1 0.4450734
  • gt DWlt-2(1-rhohat)
  • gt DW
  • 1 1.109853

For n35 and k2, dL 1.34. Since DW 1.109 lt
dL 1.34 , strong evidence of positive serial
correlation
25
Durbin-Watson table
use (1.28 1.39)/2 1.34
26
Remedy (1)
  • If we detect serial correlation, we need to fit
    special time series models to the data.
  • For full details see postgrad course 726.
  • Assuming that the AR(1) mode is ok, we can use
    the arima function in R to fit the regression

27
Fitting a regression with AR(1) errors
  • gt arima(ad.dfsales,orderc(1,0,0),
  • xregcbind(spend,prev.spend))
  • Call
  • arima(x ad.dfsales, order c(1, 0, 0), xreg
    cbind(spend, prev.spend))
  • Coefficients
  • ar1 intercept spend prev.spend
  • 0.4966 16.9080 0.1218 0.1391
  • s.e. 0.1580 1.6716 0.0308 0.0316
  • sigma2 estimated as 9.476
  • log likelihood -89.16, aic 188.32

28
Comparisons
29
Remedy (2)
  • Recall there was a trend in the time series plot
    of the residuals, these seem related to time
  • Thus, time is a lurking variable , a variable
    that should be in the regression but isnt
  • Try model
  • Sales spend prev.spend time

30
Fitting new model
time135 new.advertising.lmlt-lm(salesspend
prev.spend time, data ad.df) reslt-residuals(ne
w.advertising.lm) nlt-length(res) plot.reslt-res-1
prev.reslt-res-n DW 2(1-cor(plot.res,prev.r
es))
31
DW Retest
  • DW is now 1.73
  • For a model with 3 explanatory variables, du is
    about 1.66 (refer to the table), so no evidence
    of serial correlation
  • Time is a highly significant variable in the
    regression
  • Problem is fixed!
Write a Comment
User Comments (0)
About PowerShow.com