Autocorrelation

About This Presentation

Title:

Autocorrelation

Description:

Regarding the problem of detection, we start with the Durbin-Watson (DW) ... Granger and Newbold present some examples with artificially generated data where ... – PowerPoint PPT presentation

Number of Views:307

Avg rating:3.0/5.0

Slides: 148

Provided by: ACE5135

Category:

more less

Transcript and Presenter's Notes

Title: Autocorrelation

1
Chapter 6

Autocorrelation

2
What is in this Chapter?

How do we detect this problem?
What are the consequences?
What are the solutions?

3
What is in this Chapter?

Regarding the problem of detection, we start with
the Durbin-Watson (DW) statistic, and discuss its
several limitations and extensions.
Durbin's h-test for models with lagged dependent
variables
Tests for higher-order serial correlation.
We discuss (in Section 6.5) the consequences of
serially correlated errors and OLS estimators.

4
What is in this Chapter?

The solutions to the problem of serial
correlation are discussed in
Section 6.3 estimation in levels versus first
differences
Section 6.9 strategies when the DW test
statistic is significant
Section 6.10 trends and random walks
This chapter is very important and the several
ideas have to be understood thoroughly.

5
6.1 Introduction

The order of autocorrelation
In the following sections we discuss how to
1. Test for the presence of serial correlation.
2. Estimate the regression equation when the
errors are serially correlated.

6
6.2 Durbin-Watson Test
7
6.2 Durbin-Watson Test

The sampling distribution of d depends on values
of the explanatory variables and hence Durbin and
Watson derived upper limits and lower
limits for the significance level for d.
There are tables to test the hypothesis of zero
autocorrelation against the hypothesis of
first-order positive autocorrelation. ( For
negative autocorrelation we interchange
.)

8
6.2 Durbin-Watson Test

If , we reject the null hypothesis
of no autocorrelation.
If , we do not reject the null
hypothesis.
If , the test is
inconclusive.

9
6.2 Durbin-Watson Test

Although we have said that
this approximation is valid only in large
samples.
The mean of d when has been shown to
be given approximately by (the proof is rather
complicated for our purpose)
where k is the number of regression
parameters estimated (including the constant
term) and n is the sample size.

10
6.2 Durbin-Watson Test

Thus, even for zero serial correlation, the
statistic is biased upward form 2.
If k5 and n15, the bias is as large as 0.8.
We illustrate the use of the DW test with an
example.

11
6.2 Durbin-Watson Test

Illustrative Example
Consider the data in Table 3.11. The estimated
production function is
Referring to the DW table with k2 and n39 for
5 significance level, we see that
.
Since the observed , we
reject the hypothesis at the 5 level.

12
6.2 Limitations of D-W Test

It test for only first-order serial correlation.
The test is inconclusive if the computed value
lies between .
The test cannot be applied in models with lagged
dependent variables.

13
6.3 Estimation in Levels Versus First Differences

Simple solutions to the serial correlation
problem First Difference
If the DW test rejects the hypothesis of zero
serial correlation, what is the next step?
In such cases one estimates a regression by
transforming all the variables by
?-differencing (quasi-first difference)
First-difference

14
6.3 Estimation in Levels Versus First Differences
15
6.3 Estimation in Levels Versus First Differences
16
6.3 Estimation in Levels Versus First Differences

When comparing equations in levels and first
differences, one cannot compare the R2 because
the explained variables are different.
One can compare the residual sum of squares but
only after making a rough adjustment. (Please
refer to P.231)

17
6.3 Estimation in Levels Versus First Differences
18
6.3 Estimation in Levels Versus First Differences

For instance, if the residual sum of squares is ,
say, 1.2 by the level equation, and 0.8 by the
difference equation and n 11, k1, DW0.9, then
the adjusted residual sum of squares with the
levels equation is (9/10)(0.9)(1.2)0.97 which is
the number to be compared with 0.8.

19
6.3 Estimation in Levels Versus First Differences

Since we have comparable residual sum of squares
(RSS), we can get the comparable R2 as well,
using the relationship RSS S y y (l - R2)

20
6.3 Estimation in Levels Versus First Differences

Let
from the first difference equation
residual sum of squares from the
levels equation
residual sum of squares from the
first difference
equation
comparable from the levels
equation
Then

21
6.3 Estimation in Levels Versus First Differences

Illustrative Examples
Consider the simple Keynesian model discussed by
Friedman and Meiselman. The equation estimated in
levels is
where Ct personal consumption expenditure
(current dollars)
At autonomous expenditure
(current dollars)

22
6.3 Estimation in Levels Versus First Differences

The model fitted for the 1929-1030 period gave
(figures in parentheses are standard)

23
6.3 Estimation in Levels Versus First Differences

This is to be compared with
from the equation in first differences.

24
6.3 Estimation in Levels Versus First Differences

For the production function data in Table 3.11
the first difference equation is
The comparable figures the levels equation
reported earlier in Chapter 4, equation (4.24) are

25
6.3 Estimation in Levels Versus First Differences

This is to be compared with
from the equation in first differences.

26
6.3 Estimation in Levels Versus First Differences

Harvey gives a different definition of .He
defines it as
This does not adjust for the face that the error
variances in the levels equations and the first
difference equation are not the same.
The arguments for his suggestion are given in his
paper.

27
6.3 Estimation in Levels Versus First Differences

In the example with the Friedman-Meiselman data
his measure of is given by
Although cannot be greater than 1, it can
be negative.
This would be the case when
,that is, when the level model is giving a
poorer explanation than the naïve model, which
says that is a constant.

28
6.3 Estimation in Levels Versus First Differences

Usually, with time-series data, one gets high R2
values if the regressions are estimated with the
levels yt and Xt but one gets low R2 values if
the regressions are estimated in first
differences (yt - yt-1) and (xt - xt-1).

29
6.3 Estimation in Levels Versus First Differences

Since a high R2 is usually considered as proof of
a strong relationship between the variables under
investigation, there is a strong tendency to
estimate the equations in levels rather than in
first differences.
This is sometimes called the R2 syndrome."
An example

30
6.3 Estimation in Levels Versus First Differences

However, if the DW statistic is very low, it
often implies a misspecified equation, no matter
what the value of the R2 is
In such cases one should estimate the regression
equation in first differences and if the R2 is
low, this merely indicates that the variables y
and x are not related to each other.

31
6.3 Estimation in Levels Versus First Differences

Granger and Newbold present some examples with
artificially generated data where y, x, and the
error u are each generated independently so that
there is no relationship between y and x.
But the correlations between yt and yt-1,.Xt and
Xt-1, and ut and ut-1 are very high.
Although there is no relationship between y and x
the regression of y on x gives a high R2 but a
low DW statistic.

32
6.3 Estimation in Levels Versus First Differences

When the regression is run in first differences,
the R2 is close to zero and the DW statistic is
close to 2.
Thus demonstrating that there is indeed no
relationship between y and x and that the R2
obtained earlier is spurious.
Thus regressions in first differences might often
reveal the true nature of the relationship
between y and x.
An example

33
Homework

Find the data
Y is the Taiwan stock index
X is the U.S. stock index
Run two equations
The equation in levels (log-based price)
The equation in the first differences
A comparison between the two equations
The beta estimate and its significance
The R square
The value of DW statistic
Q Adopt the equation in levels or the first
differences?

34
6.3 Estimation in Levels Versus First Differences

For instance, suppose that we have quarterly
data then it is possible that the errors in any
quarter this year are most highly correlated with
the errors in the corresponding quarter last year
rather than the errors in the preceding quarter
That is, ut could be uncorrelated with ut-1 but
it could be highly correlated with ut-4.
If this is the case, the DW statistic will fail
to detect it.

35
6.3 Estimation in Levels Versus First Differences

What we should be using is a modified statistic
defined as
The quarterly data (e.g. GDP)
The monthly data (e.g. Industrial product index)

36
6.4 Estimation Procedures with Autocorrelated
Errors

Now we will derive var(ut) and the correlations
between ut and lagged values of ut. ..
From equation(6.1) note that ut depends on et and
ut-1 ,ut-1 depends on et-1 and ut-2 ,and so on.

37
6.4 Estimation Procedures with Autocorrelated
Errors

This ut depends on et ,et-1 ,et-2 ,.Since et
are serially independent, and ut-1 depends on
et-1 ,et-2 and so on, but not et ,we have
Since , we have
for all t.

38
6.4 Estimation Procedures with Autocorrelated
Errors

If we denote var(ut) by , we have
Thus we have
This gives the variance of ut in terms of the
variance of et and the parameter .

39
6.4 Estimation Procedures with Autocorrelated
Errors

Let us now derive the correlations. Denoting the
correlation between ut and ut-1 (which is called
the correlation of lag s) by , we get
But
Hence
or

40
6.4 Estimation Procedures with Autocorrelated
Errors

Since we get by successive substitution
Thus the lag correlations are all powers of
and decline geometrically.

41
6.4 Estimation Procedures with Autocorrelated
Errors

GLS (Generalized least squares)

42
6.4 Estimation Procedures with Autocorrelated
Errors
43
6.4 Estimation Procedures with Autocorrelated
Errors

In actual practice is not known
There are two types of procedures for estimating
1. Iterative procedures
2. Grid-search procedures.

44
6.4 Estimation Procedures with Autocorrelated
Errors

Iterative Procedures
Among the iterative procedures, the earliest was
the Cochrane-Orcutt (C-O) procedure.
In the Cochrane-Otcutt procedure we estimate
equation(6.2) by OLS, get the estimated residuals
, and estimate
.

45
6.4 Estimation Procedures with Autocorrelated
Errors

Durbin suggested an alternative method of
estimating .
In this procedure, we write equation (6.5) as
We regress
and take the estimated coefficient of as
an estimate of .

46
6.4 Estimation Procedures with Autocorrelated
Errors

Use equation (6.6) and (6.6) and estimate a
regression of y on x.
The only thing to note is that the slop
coefficient in this equation is , but the
intercept is .
Thus after estimating the regression of y on x,
we have to adjust the constant term
appropriately to get estimates of the parameters
of the original equation (6.2).

47
6.4 Estimation Procedures with Autocorrelated
Errors

Further, the standard errors we compute from the
regression of y on x are now asymptotic
standard errors because of the fact that has
been estimated.
If there are lagged values of y as explanatory
variables, these standard errors are not correct
even asymptotically.
The adjustment needed in this case is discussed
in Section 6.7.

48
6.4 Estimation Procedures with Autocorrelated
Errors

Grid-Search Procedures
One of the first grid-search procedures is the
Hildreth and Lu procedure suggested in 1960.
The procedure is as follows. Calculate
in equation(6.6) for different values of
at intervals of 0.1 in the rang
.
Estimate the regression of and
calculate the residual sum of squares RSS in each
case.

49
6.4 Estimation Procedures with Autocorrelated
Errors

Choose the value of for which the RSS is
minimum.
Again repeat this procedure for smaller intervals
of around this value.
For instance, if the value of for which RSS
is minimum is -0.4, repeat this search procedure
for values of at intervals of 0.01 in the
range .

50
6.4 Estimation Procedures with Autocorrelated
Errors

This procedure is not the same as the ML
procedure. If the errors et are normally
distributed, we can write the log-likelihood
function as (derivation is omitted)
where
Thus minimizing Q is not the same as maximizing
log L.
We can use the grid-search procedure to get the
ML estimate.

51
6.4 Estimation Procedures with Autocorrelated
Errors

Consider the data in Table 3.11 and the
estimation of the production function
The OLS estimation gave a DW statistic of 0.86,
suggesting significant positive autocorrelation.
Assuming that the errors were AR(1), two
estimation procedures were used the Hildreth-Lu
grid search and the iterative Cochrane-Orcutt
(C-O).

52
6.4 Estimation Procedures with Autocorrelated
Errors

The other procedures we have described can also
be tried, but this is left as an exercise.
The Hildreth-Lu procedure gave .
The iterative C-O procedure gave .
The DW test statistic implied that
.

53
6.4 Estimation Procedures with Autocorrelated
Errors

The estimates of the parameters (with standard
errors in parentheses) were as follows

54
6.4 Estimation Procedures with Autocorrelated
Errors

In this example the parameter estimates given by
Hildreth-Lu and the iterative C-O procedures are
pretty close to each other.
Correcting for the autocorrelation in the errors
has resulted in a significant change in the
estimates of .

55
6.5 Effect of AR(1) Errors on OLS Estimates

In Section 6.4 we described different procedures
for the estimation of regression models with
AR(1) errors
We will now answer two questions that might arise
with the use of these procedures
1. What do we gain from using these procedures?
2. When should we not use these procedures?

56
6.5 Effect of AR(1) Errors on OLS Estimates

First, in the case we are considering (i.e., the
case where the explanatory variable Xt is
independent of the error ut), the OLS estimates
are unbiased
However, they will not be efficient
Further, the tests of significance we apply,
which will be based on the wrong covariance
matrix, will be wrong.

57
6.5 Effect of AR(1) Errors on OLS Estimates

In the case where the explanatory variables
include lagged dependent variables, we will have
some further problems, which we discuss in
Section 6.7
For the present, let us consider the simple
regression model

58
6.5 Effect of AR(1) Errors on OLS Estimates

For the present, let us consider the simple
regression model
Let
If ut are AR(1), we have .

59
6.5 Effect of AR(1) Errors on OLS Estimates
60
6.5 Effect of AR(1) Errors on OLS Estimates

If we ignore the autocorrelation problem, we
would be computing
.Thus we would be ignoring the expression in the
parentheses of equation (6.10).
To get an idea of the magnitude of this
expression, let us assume that the xt series also
follow an AR(1) process with

61
6.5 Effect of AR(1) Errors on OLS Estimates

Since we are now assuming xt to be stochastic,
we will consider the asymptotic variance of
.
The expression in parentheses in equation (6.10)
is now

62
6.5 Effect of AR(1) Errors on OLS Estimates

Thus
where T is the number of observations.
If ,then
Thus ignoring the expression in the parenthesis
of equation (6.10) results in an underestimation
by close to 78 for the variance of .

63
6.5 Effect of AR(1) Errors on OLS Estimates

If ,this is an unbiased estimate .
If ,then under the assumptions we are
making, we have approximately
Again if ,
we have

64
6.5 Effect of AR(1) Errors on OLS Estimates

We can also derive the asymptotic variance of the
ML estimator when both x and u are
first-order autoregressive as follow. Note that
the ML estimator of is asymptotically
equivalent to the estimator obtained from a
regression of
.

65
6.5 Effect of AR(1) Errors on OLS Estimates

Hence
where

66
6.5 Effect of AR(1) Errors on OLS Estimates

When xt is autoregressive we have
Hence by substitution we get the asymptotic
variance of as

67
6.5 Effect of AR(1) Errors on OLS Estimates

Thus the efficiency of the OLS estimator is
One can compute this for different values of
.
For this efficiency is 0.21.

68
6.5 Effect of AR(1) Errors on OLS Estimates

Thus the consequences of autocorrelated errors
are
1. The least squares estimators are unbiased but
are not efficient. Sometimes they are
considerably less efficient than the procedures
that take account of the autocorrelation
2. The sampling variances are biased and
sometimes likely to be seriously understated.
Thus R2 as well as t and F statistics tend to be
exaggerated.

69
6.5 Effect of AR(1) Errors on OLS Estimates

The solution to these problems is to use the
maximum likelihood procedure (one-step procedure)
or some other procedure mentioned earlier
(two-step procedure) that takes account of the
autocorrelation.
However, there are four important points to note

70
6.5 Effect of AR(1) Errors on OLS Estimates

If is known, it is true that one can get
estimators better than OLS that take account of
autocorrelation. However, in practice is
known and has to be estimated. In small samples
it is not necessarily true that one gains (in
terms of mean-square error for ) by
estimating .
This problem has been investigated by Rao
and Griliches, who suggest the rule of thumb (for
sample of size 20) that one can use the methods
that take account of autocorrelation if
,where is the estimated first-order
serial correlation from an OLS regression. In
samples of larger sizes it would be worthwhile
using these methods for smaller than 0.3.

71
6.5 Effect of AR(1) Errors on OLS Estimates

2. The discussion above assumes that the true
errors are first-order autoregressive. If they
have a more complicated structure (e.g.,
second-order autoregressive), it might be thought
that it would still be better to proceed on the
assumption that the errors are first-order
autoregressive rather than ignore the problem
completely and use the OLS method???
Engle shows that this is not necessarily true
(i.e., sometimes one can be worse off making the
assumption of first-order autocorrelation than
ignoring the problem completely).

72
6.5 Effect of AR(1) Errors on OLS Estimates

In regressions with quarterly (or monthly) data,
one might find that the errors exhibit fourth (or
twelfth)-order autocorrelation because of not
making adequate allowance for seasonal effects.
In such case if one looks for only first-order
autocorrelation, one might not find any. This
does not mean that autocorrelation is not a
problem. In this case the appropriate
specification for the error term may be
for quarterly data and
monthly data.

73
6.5 Effect of AR(1) Errors on OLS Estimates

Finally, and most important, it is often possible
to confuse misspecified dynamics with serial
correlation in the errors. For instance, a static
regression model with first-order autocorrelation
in the errors, that is,
,can written as

74
6.5 Effect of AR(1) Errors on OLS Estimates

4. The model is the same as
with the restriction .
We can estimate the model (6.11) and test
this restriction. If it is rejected, clearly it
is not valid to estimate (6.11).(the test
procedure is described in Section 6.8.)

75
6.5 Effect of AR(1) Errors on OLS Estimates

The errors would be serially correlated but not
because the errors follow a first-order
autoregressive process but because the term xt-1
and yt-1 have been omitted.
Thus is what is meant by misspecified dynamics.
Thus significant serial correlation in the
estimated residuals does not necessarily imply
that we should estimate a serial correlation
model.

76
6.5 Effect of AR(1) Errors on OLS Estimates

Some further tests are necessary (like the
restriction in the
above-mentioned case).
In face, it is always best to start with an
equation like (6.11) and test this restriction
before applying any test for serial correlation.

77
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

In previous sections we considered explanatory
variables that were uncorrelated with the error
term
This will not be the case if we have lagged
dependent variables among the explanatory
variables and we have serially correlated errors
There are several situations under which we would
be considering lagged dependent variables as
explanatory variables
These could arise through expectations,
adjustment lags, and so on.
Let us consider a simple model

78
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

Let us consider a simple model
et are independent with mean 0 and variance s2
and .
Because ut depends on ut-1 and yt-1 depends on
ut-1, the two variables yt-1 and ut will be
correlated.

79
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
An example
80
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

Durbins h-Test
Since the DW test is not applicable in these
models, Durbin suggests an alternative test,
called the h-test.
This test uses
as a standard normal variable.

81
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

Hence is the estimated first-order serial
correlation from the OLS residual, is
the estimated variance of the OLS estimate of a,
and n is the sample size.
If , the test is not applicable.
In this case Durbin suggests the following test.

82
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

Durbins Alternative Test
From the OLS estimation of equation(6.12) compute
the residuals .
Then regress
The test for ?0 is carried out by testing the
significance of the coefficient of in the
latter regression.

83
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

An equation of demand for food estimated from 50
observations gave the following results (figures
in parentheses are standard errors)
where qtfood consumption per capita
ptfood price (retail price deflated
by the consumer
price index)
ytper capita disposable income
deflated by the
consumer price index

84
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

We have
Hence Duribins h-statistic is
This is significant at the1level.
Thus we reject the hypothesis ?0, even though
the DW statistic is close to 2 and the estimate
from the OLS residuals is

85
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

Let us keep all the numbers the same and just
change the standard error of .
The following are the results
Thus, other things equal, the precision with
which is estimated has estimated has
significant effect on the outcome of the h-test.

86
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

In the case where the h-test cannot be used, we
can use the alternative test suggested by Durbin,
However, the Monte Carlo study by Maddala and Rao
suggests that this test does not have good power
in those cases where the h-test cannot be used.

87
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables

On the other hand, in cases where the h-test can
be used, Durbins second test is almost as
powerful.
It is not often used because it involves more
computations.
However, we will show that Durbins second test
can be generalized to higher-order
autoregressions, whereas the h-test cannot.

88
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

The h-test we have discussed is, like the
Durbin-Watson test, a test for first-order
autoregression.
Breusch and Godfrey discuss some general tests
that are easy to apply and are valid for very
general hypotheses about the serial correlation
in the errors.

89
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

These tests are derived from a general principle
called the Lagrange multiplier (LM) principle
A discussion of this principle is beyond the
scope of this book.
For the present we will explain what the test is.
The test is similar to Durbin's second test that
we have discussed

90
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

Consider the regression model
We are interested in testing H0?1?2?p0.
The xs in equation(6.14) include lagged
dependent variables as well.
The LM test is as follows.

91
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

First, estimate (6.14) by OLS and obtain the
least squares residuals .
Next, estimate the regression equation
And test whether the coefficient of are
all zero.
We take the conventional F-statistic and use p.F
as ?2 with d.f. of p.
We use the ?2-test rather than the F-test because
the LM test is a large sample test.

92
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

The test can be used for different specifications
of the error process.
For instance, for the problem of testing for
fourth-order autocorrelation
we just estimate
Instead of (6.16) and test ?40

93
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

The test procedure is the same for autoregressive
or moving average errors.
For instance if we have a moving average (MA)
error
instead of (6.17), the test procedure is still
to estimate (6.18) and test .

94
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

In all these case, we just test H0 by estimating
equation (6.16) with p2 and test ?1?20.
What is of importance is the degree of
autoregression, not its nature.

95
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

Finally, an alternative to the estimation of
(6.16) is to estimate the equation

96
6.8 A General Test for Higher-Order Serial
Correlation The LM Test

The LM test for serial correlation is
Estimate equation (6.14) by OLS get the residual
.
Estimate equation (6.16) or (6.19) by OLS and
compute the F-statistic for testing the
hypothesis
.
Use with p degrees of freedom.

97
6.9 Strategies When the DW Test Statistic is
Significant

The DW test is designed as a test for the
hypothesis ? 0 if the errors follow a
first-order autoregressive process
However, the test has been found to be robust
against other alternatives such as AR(2), MA(1),
ARMA(1, 1), and so on.
Further, and more disturbingly, it catches
specification errors like omitted variables that
are themselves autocorrelated, and misspecified
dynamics (a term that we will explain).
Thus the strategy to adopt, if the DW test
statistic is significant, is not clear.
We discuss three different strategies

98
6.9 Strategies When the DW Test Statistic is
Significant

1. Assume that the significant DW statistic is an
indication of serial correlation but may not be
due to AR(1) errors
2. Test whether serial correlation is due to
omitted variables.
3. Test whether serial correlation is due to
misspecified dynamics.

99
6.9 Strategies When the DW Test Statistic is
Significant

Errors Not AR(1)
In case 1, if the DW statistic is significant,
since it does not necessarily mean that the
errors are AR(1), we should check for
higher-order autoregressions by estimating
equations of the form
Once the order been determined, we can estimate
the model with appropriate assumptions about the
error stricture by the methods described in
Section 6.4.

100
6.9 Strategies When the DW Test Statistic is
Significant

Moving average (MA) errors and ARMA errors?
Estimation with MA errors and ARMA errors is more
complicated than with AR errors.
However, researchers suggest that it is the order
of the error process that is more important than
the particular form the practical point of view,
for most economic data, it is just sufficient to
determine the order of the AR process.
Thus if a significant DW statistic is observed,
the appropriate strategy would be to try to see
whether the errors are generated by a
higher-order AR process than AR(1) and then
undertake estimation.

101
6.9 Strategies When the DW Test Statistic is
Significant

Autocorrelation Caused by Omitted Variables
Suppose that the true regression equation is
and instead we estimate

102
6.9 Strategies When the DW Test Statistic is
Significant

Then since , if xt is
autocorrelated, this will produce autocorrelation
in vt.
However vt is no longer independent of xt.
This not only are the OLS estimators of ß0 and ß1
from (6.20) inefficient, they are inconsistent as
well.

103
6.9 Strategies When the DW Test Statistic is
Significant

Serial correlation due to misspecification
dynamics.
In a seminal paper published in 1964, Sargan
pointed out that a significant DW statistic does
not necessarily imply that we have a serial
correlation problem.
This point was also emphasized by Henry and
Mizon.
The argument goes as follows.
Consider
and et are independent with a common variable
s2.
We can write this model as

104
6.9 Strategies When the DW Test Statistic is
Significant

Consider an alternative stable dynamic model
Equation (6.25) is the same as equation(6.26)
with the restriction

105
6.9 Strategies When the DW Test Statistic is
Significant

A test for ?0 is a test for ß10 (and ß30).
But before we test this, what Sargan says is that
we should first test the restriction (6.27) and
test for ?0 only if the hypothesis
is not rejected.
If this hypothesis is rejected, we do not have a
serial correlation model and the serial
correlation in the errors in (6.24) is due to
misspecified dynamics, that is the omission of
the variable yt-1 and xt-1 from the equation.

106
6.9 Strategies When the DW Test Statistic is
Significant

If the DW test statistic is significant, a proper
approach is to test the restriction(6.27) to make
sure that what we have is a serial correlation
model before we undertake any autoregressive
transformation of the variables.
In fact, Sargan suggests starting with the
general model (6.26) and testing the restriction
(6.27) first, before attempting any test for
serial correlation.

107
6.9 Strategies When the DW Test Statistic is
Significant

Illustrative Example
Consider the data in Table 3.11 and the
estimation of the production function (4.24).
In Section 6.4 we presented estimates of the
equation assuming that the errors are AR(1).
This was based on a DW test statistic of 0.86.
Suppose that we estimate an equation of the
form(6.26).
The results are as follows (all variables in
logs figures in parentheses are standard errors)

108
6.9 Strategies When the DW Test Statistic is
Significant

Illustrative Example
Under the assumption that the errors are AR(1),
the residual sum of squares, obtained from the
Hildreth-Lu procedure we used in Section 6.4, is
RSS10.02635

109
6.9 Strategies When the DW Test Statistic is
Significant

Since we have two slope coefficients, we have two
restrictions of the form (6.27).
Note that for the general dynamic model we
estimating six parameters (a and five ßs).
For the serial correlation model we are
estimating four parameters (a, two ßs, and ?).
We will use the likelihood ratio test (LR)

110
6.9 Strategies When the DW Test Statistic is
Significant

and -2loge?has a ?2 -distribution with d.f.
2(number of restrictions).
In our example
which is significant at the 1 level.
Thus the hypothesis if a first-order
autocorrelation is rejected.
Although the DW statistic is significant, this
does not mean that the errors are AR(1).

111
6.10 Trends and Random Walks

Throughout our discussion we have assumed that
for all t, and
for all t and
k, where is serial correlation of lag k
(this is simply a function of the lag k and does
not depend on t).
If these assumptions are satisfied, the series ut
is called covariance stationary (covariances are
constant over time) or just stationary.

112
6.10 Trends and Random Walks

Many economic time series are clearly
nonstationary in the sense that the mean and
variance depend on time, and they tend to depart
ever further from any given value as time goes
on.
If this movement is predominantly in one
direction (up or down) we say that series
exhibits a trend.
More detailed discussion of the topics covered
briefly here can be found in Chapter 14.

113
6.10 Trends and Random Walks

Nonstationary time series are frequently
de-trended before further analysis is done.
There are two procedures used for de-trending
Estimating regressions on time.
Successive differencing.

114
6.10 Trends and Random Walks

In the regression approach it is assumed that the
series yt is generated by the mechanism
where f(t) is the trend and ut is a
stationary series with mean zero and variance
.
Let us suppose that f(t) is linear so that we have

115
6.10 Trends and Random Walks

Note that the trend-eliminated series is ,
the least squares residuals that satisfy the
relationship .
If differencing is used to eliminate the trend we
get .
We have to take a first difference again to
eliminate ß and we get
as the de-trended series.

116
6.10 Trends and Random Walks

On the other hand, suppose we assume that yt is
generated by the model
Where et is a stationary series with mean zero
and variance .
In this case the first difference of yt is
stationary with mean ß.
This model is also known as the random-walk
model.
Accumulating yt starting with an initial value y0
we get from eauation(6.30).

117
6.10 Trends and Random Walks

which has the same form as (6.29) except for
the fact that the disturbance is not stationary,
it has variance ts2 that increase over time.
Nelson and Plosser call the model (6.29)
tren-stationary processes (TSP) and model (6.30)
difference-stationary processes (DSP)

118
6.10 Trends and Random Walks

Both the models exhibit a linear trend. But the
appropriate method of eliminating the trend
differs
To test the hypothesis that a time series belongs
to the TSP class against the alternative that it
belongs to the DSP class, Nelson and Plosser use
a test developed by Dickey and Fuller
which belong to the DSP class if ?1,ß0 and
the TSP class if .

119
6.10 Trends and Random Walks

Thus we have to test the hypothesis ?1,ß0
against .
The problem here is that we cannot use the usual
least squares distribution theory when ?1.
Dickey and Fuller show that the least squares
estimate of ? is not distributed around unity
under the DSP hypothesis (that is, the true value
?1) but rather around a value less than one.
However, the negative bias diminishes as the
number of observations increases.

120
6.10 Trends and Random Walks

They tabulate the significance points for testing
the hypothesis ?1 against .
Nelson and Plosser applied the Dickey-Fuller test
to a wide range of historical time series for the
U.S. economy and found that the DSP hypothesis
was accepted in all cases, with the exception of
the unemployment rate.
They conclude that for most economic time series
the DSP model is more appropriate

121
6.10 Trends and Random Walks

The problem of testing the hypothesis ?1 in the
first-order autoregressive equation of the form
is called testing for unit roots.
There is an enormous literature on this problem
but one of the most commonly used tests is the
Dickey-Fuller test.
The standard expression for the large sample
variance of the least squares estimator
which would be zero under the null
hypothesis.
Hence, one needs to derive the limiting
distribution of under H0, ?1 to apply the
test.

122
Three Types of RW

RW without drift Yt1Yt-1ut
RW with drift Ytalpha1Yt-1ut
RW with drift and time trend Ytalphabetat1Yt
-1ut
utiid(0,sigma)
An example

123
Augmented D-F (ADF) tests

Yta1Yt-1ut
Yt-Yt-1(a1-1)Yt-1ut
?Yt(a1-1)Yt-1ut
?Yt?Yt-1ut
H0a11 H0 ?0
?Yt?Yt-1S?Yt-iut
Unit root test an example
Limitations of ADF Tests

124
6.10 Trends and Random Walks

Spurious Trends
If ß0 in equation (6.30) the model is called a
trendless random walk or a random walk with zero
drift.
However, from equation(6.31) note that even
though there is no trend in the mean, there is a
trend in the variance.
Suppose that the true model is of the DSP type
with ß0.
What happens if we estimated a TSP type model?

125
6.10 Trends and Random Walks

That is, the true model is one with no trend in
the mean but only a trend in the variance, and we
estimate a model with a trend in the mean but no
trend in the variance.
It is intuitively clear that the trend in the
variance will be transmitted to the mean and we
will find a significant coefficient for t even
though in reality there is no trend in the mean.
How serious is this problem?

126
6.10 Trends and Random Walks

Nelson and Kang analyze this and conclude that
1. Regression of a random walk on time by least
squares will produce R2 values of around 0.44
regardless of sample size when, in fact, the mean
of the variable has no relationship with time
whatever.
2. In the case of random walks with drift, that
is ß?0, the R2 will be higher and will increase
with the sample size, reaching one in the limit
regardless of the value of ß.

127
6.10 Trends and Random Walks

3. The residual from the regression on time which
we take as the de-trended series, has on the
average only about 14 of the true stochastic
variance of the original series.
4. The residuals from the regression on time are
also autocorrelated being roughly(1-10/N) at lag
one, where N is the sample size.

128
6.10 Trends and Random Walks

5. Conventional t-tests to test the significance
of some of the regressors are not valid. They
tend to reject the null hypothesis of no
dependence on time, with very high frequency.

129
6.10 Trends and Random Walks

6. Regression of one random walk on another, with
time included for trend, is strongly subject to
the spurious regression phenomenon. That is, the
conventional t-test will tend to indicate a
relationship between the variables when none is
present.
An spurious regression example

130
6.10 Trends and Random Walks

The main conclusion is that using a regression on
time has serious consequences when, in fact, the
time series is of the DSP type and, hence,
differencing is the appropriate procedure for
trend elimination
Plosser and Schwert also argue that with most
economic time series it is always best to work
with differenced data rather than data in levels
The reason is that if indeed the data series are
of the DSP type, the errors in the levels
equation will have variances increasing over time

131
6.10 Trends and Random Walks

Under these circumstances many of the properties
of least squares estimators as well as tests of
significance are invalid
On the other hand, suppose that the levels
equation is correctly specified. Then all
differencing will do is produce a moving average
error and at worst ignoring it will give
inefficient estimates
For instance, suppose that we have the model

132
6.10 Trends and Random Walks

where ut are independent with mean zero and
common variance .
If we difference this equation, we get
where the error is a moving average, and,
hence, not serially independent.
But estimating the first difference equation by
least squares still gives us unbiased/consistent
estimates.

133
6.10 Trends and Random Walks

Thus, the consequences of differencing when it is
not needed are much less serious than those of
failing to difference when it is appropriate
(when the true model is of the DSP type).
In practice, it is best to use the Dickey-Fuller
test to check whether the data are of DSP or TSP
type.
Otherwise, it is better to use differencing and
regressions in first differences, rather than
regressions in levels with time as an extra
explanatory variable.

134
6.10 Trends and Random Walks

The Concept of Cointegration Differencing vs.
Long-Run Effects
One drawback of the procedure of differencing is
that it results in a loss of valuable "long-run
information" in the data
Recently, the concept of cointegrated series has
been suggested as one solution to this problem.
First, we need to define the term
"cointegration.

135
6.10 Trends and Random Walks

If yt follow a random walk model, that is
then we get by successive substitution,
Thus, yt is a summation of ej , and

136
6.10 Trends and Random Walks

YtI(1)
Yt is a random walk
?Yt is a white noise, or iid
No one could predict the future price change
The market is efficient
The impact of previous shock on the price will
remain and not approach to zero

137
6.10 Trends and Random Walks

We say in this case yt is I(1)intergrated to
order one. If yt is I(1) and we add to this zt
which is I(0), then will be I(1).
When we specify regression models in time series,
we have to make sure that the different variables
are integrated to the same degree. Otherwise, the
equation does not make sense.
For instance, if we specify the regression model
and we say that , that is
ut is I(0), we have to make sure that yt and xt
are integrated to the same order.

138
6.10 Trends and Random Walks

For instance, if yt is I(1) and xt is I(0) there
will not be any ß that will satisfy the
relationship(6.34).
Suppose yt is I(1) and xt is I(1) then if there
is a nonzero ß such that is I(0),
then yt and xt are said to be cointegrated.

139
6.10 Trends and Random Walks

Suppose that yt and xt are both random walks, so
that they are both I(1).Then an equation in first
differences of the form
is a valid equation, since
, and vt are all I(0).
Equation(6.34) is considered a long-run
relationship between yt and xt and
equation(6.35)describes short-run dynamics.
Engle and Granger suggest estimating(6.34) by
ordinary least squares, obtaining the estimator
and substituting it in equation (6.35) to
estimate the parameters aand ?.

140
6.10 Trends and Random Walks

This two-step estimation procedure, however,
rests on the assumption that yt and xt are
cointegrated.
It is, therefore, important to test for
cointegration.
Engle and Granger suggest estimaing(6.34) by
ordinary least squares, getting the residual

141
6.10 Trends and Random Walks

test amounts to is testing the hypothesis ?1 in
that is, testing the hypothesis
H0 ut is I(1)
In essence, we testing the null hypothesis that
yt and xt are not cointegrated.
Note that yt is I(1) and xt is I(1), so we are
trying to see that ut is not I(1).

142
Cointegration

Co-integration test an example
Run the VECM (vector error correction model) by
E-view

143
Homework

Find the spot and futures prices
5-year daily data at least
Run the cointegration test
Run the VECM
Check the lead-lag relationship using the EC
parameter estimate

144
6.11 ARCH Models and Serial Correlation

We saw in Section 6.9 that a significant DW
statistic can arise through a number of
misspecifications.
We will now discuss one other source. This is the
ARCH model suggested by Engle (1982) which has,
in recent years, been found useful in the
analysis of speculative prices.
ARCH stands for "autoregressive conditional
heteroskedasticity.
Robert Engle and Clive Granger
NOBEL PRIZE WINNERS FOR ECONOMICS, 2003

145
6.11 ARCH Models and Serial Correlation

GARCH (p,q) Model (by Bollerslev, 1986)

146
6.11 ARCH Models and Serial Correlation

The high level of persistence in GARCH models
the sum of the two GARCH parameter estimates
approximates unity in most cases
Li and Lin (2003) This finding provides some
support for the notion that GARCH models are
handicapped by the inability to account for
structural changes during the estimation period
and thus suffers from a high persistence problem
in variance settings
An example GARCH (1,1)

147
6.11 ARCH Models and Serial Correlation