Title: Autocorrelation
1Chapter 6
2What is in this Chapter?
- How do we detect this problem?
- What are the consequences?
- What are the solutions?
3What is in this Chapter?
- Regarding the problem of detection, we start with
the Durbin-Watson (DW) statistic, and discuss its
several limitations and extensions. - Durbin's h-test for models with lagged dependent
variables - Tests for higher-order serial correlation.
- We discuss (in Section 6.5) the consequences of
serially correlated errors and OLS estimators.
4What is in this Chapter?
- The solutions to the problem of serial
correlation are discussed in - Section 6.3 estimation in levels versus first
differences - Section 6.9 strategies when the DW test
statistic is significant - Section 6.10 trends and random walks
- This chapter is very important and the several
ideas have to be understood thoroughly.
56.1 Introduction
- The order of autocorrelation
- In the following sections we discuss how to
- 1. Test for the presence of serial correlation.
- 2. Estimate the regression equation when the
errors are serially correlated.
66.2 Durbin-Watson Test
76.2 Durbin-Watson Test
- The sampling distribution of d depends on values
of the explanatory variables and hence Durbin and
Watson derived upper limits and lower
limits for the significance level for d. - There are tables to test the hypothesis of zero
autocorrelation against the hypothesis of
first-order positive autocorrelation. ( For
negative autocorrelation we interchange
.)
86.2 Durbin-Watson Test
- If , we reject the null hypothesis
of no autocorrelation. - If , we do not reject the null
hypothesis. - If , the test is
inconclusive.
96.2 Durbin-Watson Test
- Although we have said that
this approximation is valid only in large
samples. - The mean of d when has been shown to
be given approximately by (the proof is rather
complicated for our purpose) - where k is the number of regression
parameters estimated (including the constant
term) and n is the sample size.
106.2 Durbin-Watson Test
- Thus, even for zero serial correlation, the
statistic is biased upward form 2. - If k5 and n15, the bias is as large as 0.8.
- We illustrate the use of the DW test with an
example.
116.2 Durbin-Watson Test
- Illustrative Example
- Consider the data in Table 3.11. The estimated
production function is - Referring to the DW table with k2 and n39 for
5 significance level, we see that
. - Since the observed , we
reject the hypothesis at the 5 level.
126.2 Limitations of D-W Test
- It test for only first-order serial correlation.
- The test is inconclusive if the computed value
lies between . - The test cannot be applied in models with lagged
dependent variables.
136.3 Estimation in Levels Versus First Differences
- Simple solutions to the serial correlation
problem First Difference - If the DW test rejects the hypothesis of zero
serial correlation, what is the next step? - In such cases one estimates a regression by
transforming all the variables by - ?-differencing (quasi-first difference)
- First-difference
146.3 Estimation in Levels Versus First Differences
156.3 Estimation in Levels Versus First Differences
166.3 Estimation in Levels Versus First Differences
- When comparing equations in levels and first
differences, one cannot compare the R2 because
the explained variables are different. - One can compare the residual sum of squares but
only after making a rough adjustment. (Please
refer to P.231)
176.3 Estimation in Levels Versus First Differences
186.3 Estimation in Levels Versus First Differences
- For instance, if the residual sum of squares is ,
say, 1.2 by the level equation, and 0.8 by the
difference equation and n 11, k1, DW0.9, then
the adjusted residual sum of squares with the
levels equation is (9/10)(0.9)(1.2)0.97 which is
the number to be compared with 0.8.
196.3 Estimation in Levels Versus First Differences
- Since we have comparable residual sum of squares
(RSS), we can get the comparable R2 as well,
using the relationship RSS S y y (l - R2)
206.3 Estimation in Levels Versus First Differences
- Let
- from the first difference equation
- residual sum of squares from the
levels equation - residual sum of squares from the
first difference - equation
- comparable from the levels
equation -
- Then
-
216.3 Estimation in Levels Versus First Differences
- Illustrative Examples
- Consider the simple Keynesian model discussed by
Friedman and Meiselman. The equation estimated in
levels is - where Ct personal consumption expenditure
- (current dollars)
- At autonomous expenditure
- (current dollars)
226.3 Estimation in Levels Versus First Differences
- The model fitted for the 1929-1030 period gave
(figures in parentheses are standard)
236.3 Estimation in Levels Versus First Differences
- This is to be compared with
from the equation in first differences.
246.3 Estimation in Levels Versus First Differences
- For the production function data in Table 3.11
the first difference equation is - The comparable figures the levels equation
reported earlier in Chapter 4, equation (4.24) are
256.3 Estimation in Levels Versus First Differences
- This is to be compared with
from the equation in first differences.
266.3 Estimation in Levels Versus First Differences
- Harvey gives a different definition of .He
defines it as - This does not adjust for the face that the error
variances in the levels equations and the first
difference equation are not the same. - The arguments for his suggestion are given in his
paper.
276.3 Estimation in Levels Versus First Differences
- In the example with the Friedman-Meiselman data
his measure of is given by - Although cannot be greater than 1, it can
be negative. - This would be the case when
,that is, when the level model is giving a
poorer explanation than the naïve model, which
says that is a constant.
286.3 Estimation in Levels Versus First Differences
- Usually, with time-series data, one gets high R2
values if the regressions are estimated with the
levels yt and Xt but one gets low R2 values if
the regressions are estimated in first
differences (yt - yt-1) and (xt - xt-1).
296.3 Estimation in Levels Versus First Differences
- Since a high R2 is usually considered as proof of
a strong relationship between the variables under
investigation, there is a strong tendency to
estimate the equations in levels rather than in
first differences. - This is sometimes called the R2 syndrome."
- An example
306.3 Estimation in Levels Versus First Differences
- However, if the DW statistic is very low, it
often implies a misspecified equation, no matter
what the value of the R2 is - In such cases one should estimate the regression
equation in first differences and if the R2 is
low, this merely indicates that the variables y
and x are not related to each other.
316.3 Estimation in Levels Versus First Differences
- Granger and Newbold present some examples with
artificially generated data where y, x, and the
error u are each generated independently so that
there is no relationship between y and x. - But the correlations between yt and yt-1,.Xt and
Xt-1, and ut and ut-1 are very high. - Although there is no relationship between y and x
the regression of y on x gives a high R2 but a
low DW statistic.
326.3 Estimation in Levels Versus First Differences
- When the regression is run in first differences,
the R2 is close to zero and the DW statistic is
close to 2. - Thus demonstrating that there is indeed no
relationship between y and x and that the R2
obtained earlier is spurious. - Thus regressions in first differences might often
reveal the true nature of the relationship
between y and x. - An example
33Homework
- Find the data
- Y is the Taiwan stock index
- X is the U.S. stock index
- Run two equations
- The equation in levels (log-based price)
- The equation in the first differences
- A comparison between the two equations
- The beta estimate and its significance
- The R square
- The value of DW statistic
- Q Adopt the equation in levels or the first
differences?
346.3 Estimation in Levels Versus First Differences
- For instance, suppose that we have quarterly
data then it is possible that the errors in any
quarter this year are most highly correlated with
the errors in the corresponding quarter last year
rather than the errors in the preceding quarter - That is, ut could be uncorrelated with ut-1 but
it could be highly correlated with ut-4. - If this is the case, the DW statistic will fail
to detect it.
356.3 Estimation in Levels Versus First Differences
- What we should be using is a modified statistic
defined as - The quarterly data (e.g. GDP)
- The monthly data (e.g. Industrial product index)
366.4 Estimation Procedures with Autocorrelated
Errors
- Now we will derive var(ut) and the correlations
between ut and lagged values of ut. .. - From equation(6.1) note that ut depends on et and
ut-1 ,ut-1 depends on et-1 and ut-2 ,and so on.
376.4 Estimation Procedures with Autocorrelated
Errors
- This ut depends on et ,et-1 ,et-2 ,.Since et
are serially independent, and ut-1 depends on
et-1 ,et-2 and so on, but not et ,we have - Since , we have
for all t.
386.4 Estimation Procedures with Autocorrelated
Errors
- If we denote var(ut) by , we have
- Thus we have
- This gives the variance of ut in terms of the
variance of et and the parameter .
396.4 Estimation Procedures with Autocorrelated
Errors
- Let us now derive the correlations. Denoting the
correlation between ut and ut-1 (which is called
the correlation of lag s) by , we get - But
- Hence
- or
406.4 Estimation Procedures with Autocorrelated
Errors
- Since we get by successive substitution
- Thus the lag correlations are all powers of
and decline geometrically.
416.4 Estimation Procedures with Autocorrelated
Errors
- GLS (Generalized least squares)
426.4 Estimation Procedures with Autocorrelated
Errors
436.4 Estimation Procedures with Autocorrelated
Errors
- In actual practice is not known
- There are two types of procedures for estimating
- 1. Iterative procedures
- 2. Grid-search procedures.
446.4 Estimation Procedures with Autocorrelated
Errors
- Iterative Procedures
- Among the iterative procedures, the earliest was
the Cochrane-Orcutt (C-O) procedure. - In the Cochrane-Otcutt procedure we estimate
equation(6.2) by OLS, get the estimated residuals
, and estimate
.
456.4 Estimation Procedures with Autocorrelated
Errors
- Durbin suggested an alternative method of
estimating . - In this procedure, we write equation (6.5) as
- We regress
and take the estimated coefficient of as
an estimate of .
466.4 Estimation Procedures with Autocorrelated
Errors
- Use equation (6.6) and (6.6) and estimate a
regression of y on x. - The only thing to note is that the slop
coefficient in this equation is , but the
intercept is . - Thus after estimating the regression of y on x,
we have to adjust the constant term
appropriately to get estimates of the parameters
of the original equation (6.2).
476.4 Estimation Procedures with Autocorrelated
Errors
- Further, the standard errors we compute from the
regression of y on x are now asymptotic
standard errors because of the fact that has
been estimated. - If there are lagged values of y as explanatory
variables, these standard errors are not correct
even asymptotically. - The adjustment needed in this case is discussed
in Section 6.7.
486.4 Estimation Procedures with Autocorrelated
Errors
- Grid-Search Procedures
- One of the first grid-search procedures is the
Hildreth and Lu procedure suggested in 1960. - The procedure is as follows. Calculate
in equation(6.6) for different values of
at intervals of 0.1 in the rang
. - Estimate the regression of and
calculate the residual sum of squares RSS in each
case.
496.4 Estimation Procedures with Autocorrelated
Errors
- Choose the value of for which the RSS is
minimum. - Again repeat this procedure for smaller intervals
of around this value. - For instance, if the value of for which RSS
is minimum is -0.4, repeat this search procedure
for values of at intervals of 0.01 in the
range .
506.4 Estimation Procedures with Autocorrelated
Errors
- This procedure is not the same as the ML
procedure. If the errors et are normally
distributed, we can write the log-likelihood
function as (derivation is omitted) - where
- Thus minimizing Q is not the same as maximizing
log L. - We can use the grid-search procedure to get the
ML estimate.
516.4 Estimation Procedures with Autocorrelated
Errors
- Consider the data in Table 3.11 and the
estimation of the production function - The OLS estimation gave a DW statistic of 0.86,
suggesting significant positive autocorrelation. - Assuming that the errors were AR(1), two
estimation procedures were used the Hildreth-Lu
grid search and the iterative Cochrane-Orcutt
(C-O).
526.4 Estimation Procedures with Autocorrelated
Errors
- The other procedures we have described can also
be tried, but this is left as an exercise. - The Hildreth-Lu procedure gave .
- The iterative C-O procedure gave .
- The DW test statistic implied that
.
536.4 Estimation Procedures with Autocorrelated
Errors
- The estimates of the parameters (with standard
errors in parentheses) were as follows
546.4 Estimation Procedures with Autocorrelated
Errors
- In this example the parameter estimates given by
Hildreth-Lu and the iterative C-O procedures are
pretty close to each other. - Correcting for the autocorrelation in the errors
has resulted in a significant change in the
estimates of .
556.5 Effect of AR(1) Errors on OLS Estimates
- In Section 6.4 we described different procedures
for the estimation of regression models with
AR(1) errors - We will now answer two questions that might arise
with the use of these procedures - 1. What do we gain from using these procedures?
- 2. When should we not use these procedures?
566.5 Effect of AR(1) Errors on OLS Estimates
- First, in the case we are considering (i.e., the
case where the explanatory variable Xt is
independent of the error ut), the OLS estimates
are unbiased - However, they will not be efficient
- Further, the tests of significance we apply,
which will be based on the wrong covariance
matrix, will be wrong.
576.5 Effect of AR(1) Errors on OLS Estimates
- In the case where the explanatory variables
include lagged dependent variables, we will have
some further problems, which we discuss in
Section 6.7 - For the present, let us consider the simple
regression model
586.5 Effect of AR(1) Errors on OLS Estimates
- For the present, let us consider the simple
regression model - Let
- If ut are AR(1), we have .
596.5 Effect of AR(1) Errors on OLS Estimates
606.5 Effect of AR(1) Errors on OLS Estimates
- If we ignore the autocorrelation problem, we
would be computing
.Thus we would be ignoring the expression in the
parentheses of equation (6.10). - To get an idea of the magnitude of this
expression, let us assume that the xt series also
follow an AR(1) process with
616.5 Effect of AR(1) Errors on OLS Estimates
- Since we are now assuming xt to be stochastic,
we will consider the asymptotic variance of
. - The expression in parentheses in equation (6.10)
is now -
626.5 Effect of AR(1) Errors on OLS Estimates
- Thus
- where T is the number of observations.
- If ,then
- Thus ignoring the expression in the parenthesis
of equation (6.10) results in an underestimation
by close to 78 for the variance of .
636.5 Effect of AR(1) Errors on OLS Estimates
- If ,this is an unbiased estimate .
- If ,then under the assumptions we are
making, we have approximately - Again if ,
we have
646.5 Effect of AR(1) Errors on OLS Estimates
- We can also derive the asymptotic variance of the
ML estimator when both x and u are
first-order autoregressive as follow. Note that
the ML estimator of is asymptotically
equivalent to the estimator obtained from a
regression of
.
656.5 Effect of AR(1) Errors on OLS Estimates
666.5 Effect of AR(1) Errors on OLS Estimates
- When xt is autoregressive we have
- Hence by substitution we get the asymptotic
variance of as
676.5 Effect of AR(1) Errors on OLS Estimates
- Thus the efficiency of the OLS estimator is
- One can compute this for different values of
. - For this efficiency is 0.21.
686.5 Effect of AR(1) Errors on OLS Estimates
- Thus the consequences of autocorrelated errors
are - 1. The least squares estimators are unbiased but
are not efficient. Sometimes they are
considerably less efficient than the procedures
that take account of the autocorrelation - 2. The sampling variances are biased and
sometimes likely to be seriously understated.
Thus R2 as well as t and F statistics tend to be
exaggerated.
696.5 Effect of AR(1) Errors on OLS Estimates
- The solution to these problems is to use the
maximum likelihood procedure (one-step procedure)
or some other procedure mentioned earlier
(two-step procedure) that takes account of the
autocorrelation. - However, there are four important points to note
706.5 Effect of AR(1) Errors on OLS Estimates
- If is known, it is true that one can get
estimators better than OLS that take account of
autocorrelation. However, in practice is
known and has to be estimated. In small samples
it is not necessarily true that one gains (in
terms of mean-square error for ) by
estimating . - This problem has been investigated by Rao
and Griliches, who suggest the rule of thumb (for
sample of size 20) that one can use the methods
that take account of autocorrelation if
,where is the estimated first-order
serial correlation from an OLS regression. In
samples of larger sizes it would be worthwhile
using these methods for smaller than 0.3.
716.5 Effect of AR(1) Errors on OLS Estimates
- 2. The discussion above assumes that the true
errors are first-order autoregressive. If they
have a more complicated structure (e.g.,
second-order autoregressive), it might be thought
that it would still be better to proceed on the
assumption that the errors are first-order
autoregressive rather than ignore the problem
completely and use the OLS method??? - Engle shows that this is not necessarily true
(i.e., sometimes one can be worse off making the
assumption of first-order autocorrelation than
ignoring the problem completely).
726.5 Effect of AR(1) Errors on OLS Estimates
- In regressions with quarterly (or monthly) data,
one might find that the errors exhibit fourth (or
twelfth)-order autocorrelation because of not
making adequate allowance for seasonal effects.
In such case if one looks for only first-order
autocorrelation, one might not find any. This
does not mean that autocorrelation is not a
problem. In this case the appropriate
specification for the error term may be
for quarterly data and
monthly data.
736.5 Effect of AR(1) Errors on OLS Estimates
- Finally, and most important, it is often possible
to confuse misspecified dynamics with serial
correlation in the errors. For instance, a static
regression model with first-order autocorrelation
in the errors, that is,
,can written as
746.5 Effect of AR(1) Errors on OLS Estimates
- 4. The model is the same as
-
- with the restriction .
- We can estimate the model (6.11) and test
this restriction. If it is rejected, clearly it
is not valid to estimate (6.11).(the test
procedure is described in Section 6.8.)
756.5 Effect of AR(1) Errors on OLS Estimates
- The errors would be serially correlated but not
because the errors follow a first-order
autoregressive process but because the term xt-1
and yt-1 have been omitted. - Thus is what is meant by misspecified dynamics.
Thus significant serial correlation in the
estimated residuals does not necessarily imply
that we should estimate a serial correlation
model.
766.5 Effect of AR(1) Errors on OLS Estimates
- Some further tests are necessary (like the
restriction in the
above-mentioned case). - In face, it is always best to start with an
equation like (6.11) and test this restriction
before applying any test for serial correlation.
776.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- In previous sections we considered explanatory
variables that were uncorrelated with the error
term - This will not be the case if we have lagged
dependent variables among the explanatory
variables and we have serially correlated errors - There are several situations under which we would
be considering lagged dependent variables as
explanatory variables - These could arise through expectations,
adjustment lags, and so on. - Let us consider a simple model
786.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- Let us consider a simple model
- et are independent with mean 0 and variance s2
and . - Because ut depends on ut-1 and yt-1 depends on
ut-1, the two variables yt-1 and ut will be
correlated.
796.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
An example
806.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- Durbins h-Test
- Since the DW test is not applicable in these
models, Durbin suggests an alternative test,
called the h-test. - This test uses
- as a standard normal variable.
816.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- Hence is the estimated first-order serial
correlation from the OLS residual, is
the estimated variance of the OLS estimate of a,
and n is the sample size. - If , the test is not applicable.
In this case Durbin suggests the following test.
826.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- Durbins Alternative Test
- From the OLS estimation of equation(6.12) compute
the residuals . - Then regress
- The test for ?0 is carried out by testing the
significance of the coefficient of in the
latter regression.
836.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- An equation of demand for food estimated from 50
observations gave the following results (figures
in parentheses are standard errors) - where qtfood consumption per capita
- ptfood price (retail price deflated
by the consumer - price index)
- ytper capita disposable income
deflated by the - consumer price index
846.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- We have
- Hence Duribins h-statistic is
- This is significant at the1level.
- Thus we reject the hypothesis ?0, even though
the DW statistic is close to 2 and the estimate
from the OLS residuals is
856.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- Let us keep all the numbers the same and just
change the standard error of . - The following are the results
- Thus, other things equal, the precision with
which is estimated has estimated has
significant effect on the outcome of the h-test.
866.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- In the case where the h-test cannot be used, we
can use the alternative test suggested by Durbin,
- However, the Monte Carlo study by Maddala and Rao
suggests that this test does not have good power
in those cases where the h-test cannot be used.
876.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
- On the other hand, in cases where the h-test can
be used, Durbins second test is almost as
powerful. - It is not often used because it involves more
computations. - However, we will show that Durbins second test
can be generalized to higher-order
autoregressions, whereas the h-test cannot.
886.8 A General Test for Higher-Order Serial
Correlation The LM Test
- The h-test we have discussed is, like the
Durbin-Watson test, a test for first-order
autoregression. - Breusch and Godfrey discuss some general tests
that are easy to apply and are valid for very
general hypotheses about the serial correlation
in the errors.
896.8 A General Test for Higher-Order Serial
Correlation The LM Test
- These tests are derived from a general principle
called the Lagrange multiplier (LM) principle - A discussion of this principle is beyond the
scope of this book. - For the present we will explain what the test is.
- The test is similar to Durbin's second test that
we have discussed
906.8 A General Test for Higher-Order Serial
Correlation The LM Test
- Consider the regression model
- We are interested in testing H0?1?2?p0.
- The xs in equation(6.14) include lagged
dependent variables as well. - The LM test is as follows.
916.8 A General Test for Higher-Order Serial
Correlation The LM Test
- First, estimate (6.14) by OLS and obtain the
least squares residuals . - Next, estimate the regression equation
- And test whether the coefficient of are
all zero. - We take the conventional F-statistic and use p.F
as ?2 with d.f. of p. - We use the ?2-test rather than the F-test because
the LM test is a large sample test.
926.8 A General Test for Higher-Order Serial
Correlation The LM Test
- The test can be used for different specifications
of the error process. - For instance, for the problem of testing for
fourth-order autocorrelation - we just estimate
- Instead of (6.16) and test ?40
936.8 A General Test for Higher-Order Serial
Correlation The LM Test
- The test procedure is the same for autoregressive
or moving average errors. - For instance if we have a moving average (MA)
error - instead of (6.17), the test procedure is still
to estimate (6.18) and test .
946.8 A General Test for Higher-Order Serial
Correlation The LM Test
- In all these case, we just test H0 by estimating
equation (6.16) with p2 and test ?1?20. - What is of importance is the degree of
autoregression, not its nature.
956.8 A General Test for Higher-Order Serial
Correlation The LM Test
- Finally, an alternative to the estimation of
(6.16) is to estimate the equation
966.8 A General Test for Higher-Order Serial
Correlation The LM Test
- The LM test for serial correlation is
- Estimate equation (6.14) by OLS get the residual
. - Estimate equation (6.16) or (6.19) by OLS and
compute the F-statistic for testing the
hypothesis
. - Use with p degrees of freedom.
976.9 Strategies When the DW Test Statistic is
Significant
- The DW test is designed as a test for the
hypothesis ? 0 if the errors follow a
first-order autoregressive process - However, the test has been found to be robust
against other alternatives such as AR(2), MA(1),
ARMA(1, 1), and so on. - Further, and more disturbingly, it catches
specification errors like omitted variables that
are themselves autocorrelated, and misspecified
dynamics (a term that we will explain). - Thus the strategy to adopt, if the DW test
statistic is significant, is not clear. - We discuss three different strategies
986.9 Strategies When the DW Test Statistic is
Significant
- 1. Assume that the significant DW statistic is an
indication of serial correlation but may not be
due to AR(1) errors - 2. Test whether serial correlation is due to
omitted variables. - 3. Test whether serial correlation is due to
misspecified dynamics.
996.9 Strategies When the DW Test Statistic is
Significant
- Errors Not AR(1)
- In case 1, if the DW statistic is significant,
since it does not necessarily mean that the
errors are AR(1), we should check for
higher-order autoregressions by estimating
equations of the form - Once the order been determined, we can estimate
the model with appropriate assumptions about the
error stricture by the methods described in
Section 6.4.
1006.9 Strategies When the DW Test Statistic is
Significant
- Moving average (MA) errors and ARMA errors?
- Estimation with MA errors and ARMA errors is more
complicated than with AR errors. - However, researchers suggest that it is the order
of the error process that is more important than
the particular form the practical point of view,
for most economic data, it is just sufficient to
determine the order of the AR process. - Thus if a significant DW statistic is observed,
the appropriate strategy would be to try to see
whether the errors are generated by a
higher-order AR process than AR(1) and then
undertake estimation.
1016.9 Strategies When the DW Test Statistic is
Significant
- Autocorrelation Caused by Omitted Variables
- Suppose that the true regression equation is
- and instead we estimate
1026.9 Strategies When the DW Test Statistic is
Significant
- Then since , if xt is
autocorrelated, this will produce autocorrelation
in vt. - However vt is no longer independent of xt.
- This not only are the OLS estimators of ß0 and ß1
from (6.20) inefficient, they are inconsistent as
well.
1036.9 Strategies When the DW Test Statistic is
Significant
- Serial correlation due to misspecification
dynamics. - In a seminal paper published in 1964, Sargan
pointed out that a significant DW statistic does
not necessarily imply that we have a serial
correlation problem. - This point was also emphasized by Henry and
Mizon. - The argument goes as follows.
- Consider
- and et are independent with a common variable
s2. - We can write this model as
1046.9 Strategies When the DW Test Statistic is
Significant
- Consider an alternative stable dynamic model
- Equation (6.25) is the same as equation(6.26)
with the restriction
1056.9 Strategies When the DW Test Statistic is
Significant
- A test for ?0 is a test for ß10 (and ß30).
- But before we test this, what Sargan says is that
we should first test the restriction (6.27) and
test for ?0 only if the hypothesis
is not rejected. - If this hypothesis is rejected, we do not have a
serial correlation model and the serial
correlation in the errors in (6.24) is due to
misspecified dynamics, that is the omission of
the variable yt-1 and xt-1 from the equation.
1066.9 Strategies When the DW Test Statistic is
Significant
- If the DW test statistic is significant, a proper
approach is to test the restriction(6.27) to make
sure that what we have is a serial correlation
model before we undertake any autoregressive
transformation of the variables. - In fact, Sargan suggests starting with the
general model (6.26) and testing the restriction
(6.27) first, before attempting any test for
serial correlation.
1076.9 Strategies When the DW Test Statistic is
Significant
- Illustrative Example
- Consider the data in Table 3.11 and the
estimation of the production function (4.24). - In Section 6.4 we presented estimates of the
equation assuming that the errors are AR(1). - This was based on a DW test statistic of 0.86.
- Suppose that we estimate an equation of the
form(6.26). - The results are as follows (all variables in
logs figures in parentheses are standard errors)
1086.9 Strategies When the DW Test Statistic is
Significant
- Illustrative Example
- Under the assumption that the errors are AR(1),
the residual sum of squares, obtained from the
Hildreth-Lu procedure we used in Section 6.4, is
RSS10.02635
1096.9 Strategies When the DW Test Statistic is
Significant
- Since we have two slope coefficients, we have two
restrictions of the form (6.27). - Note that for the general dynamic model we
estimating six parameters (a and five ßs). - For the serial correlation model we are
estimating four parameters (a, two ßs, and ?). - We will use the likelihood ratio test (LR)
1106.9 Strategies When the DW Test Statistic is
Significant
- and -2loge?has a ?2 -distribution with d.f.
2(number of restrictions). - In our example
-
-
- which is significant at the 1 level.
- Thus the hypothesis if a first-order
autocorrelation is rejected. - Although the DW statistic is significant, this
does not mean that the errors are AR(1).
1116.10 Trends and Random Walks
- Throughout our discussion we have assumed that
for all t, and
- for all t and
k, where is serial correlation of lag k
(this is simply a function of the lag k and does
not depend on t). - If these assumptions are satisfied, the series ut
is called covariance stationary (covariances are
constant over time) or just stationary.
1126.10 Trends and Random Walks
- Many economic time series are clearly
nonstationary in the sense that the mean and
variance depend on time, and they tend to depart
ever further from any given value as time goes
on. - If this movement is predominantly in one
direction (up or down) we say that series
exhibits a trend. - More detailed discussion of the topics covered
briefly here can be found in Chapter 14.
1136.10 Trends and Random Walks
- Nonstationary time series are frequently
de-trended before further analysis is done. - There are two procedures used for de-trending
- Estimating regressions on time.
- Successive differencing.
1146.10 Trends and Random Walks
- In the regression approach it is assumed that the
series yt is generated by the mechanism - where f(t) is the trend and ut is a
stationary series with mean zero and variance
. - Let us suppose that f(t) is linear so that we have
1156.10 Trends and Random Walks
- Note that the trend-eliminated series is ,
the least squares residuals that satisfy the
relationship .
- If differencing is used to eliminate the trend we
get . - We have to take a first difference again to
eliminate ß and we get
as the de-trended series.
1166.10 Trends and Random Walks
- On the other hand, suppose we assume that yt is
generated by the model -
- Where et is a stationary series with mean zero
and variance . - In this case the first difference of yt is
stationary with mean ß. - This model is also known as the random-walk
model. - Accumulating yt starting with an initial value y0
we get from eauation(6.30).
1176.10 Trends and Random Walks
-
- which has the same form as (6.29) except for
the fact that the disturbance is not stationary,
it has variance ts2 that increase over time. - Nelson and Plosser call the model (6.29)
tren-stationary processes (TSP) and model (6.30)
difference-stationary processes (DSP)
1186.10 Trends and Random Walks
- Both the models exhibit a linear trend. But the
appropriate method of eliminating the trend
differs - To test the hypothesis that a time series belongs
to the TSP class against the alternative that it
belongs to the DSP class, Nelson and Plosser use
a test developed by Dickey and Fuller - which belong to the DSP class if ?1,ß0 and
the TSP class if .
1196.10 Trends and Random Walks
- Thus we have to test the hypothesis ?1,ß0
against . - The problem here is that we cannot use the usual
least squares distribution theory when ?1. - Dickey and Fuller show that the least squares
estimate of ? is not distributed around unity
under the DSP hypothesis (that is, the true value
?1) but rather around a value less than one. - However, the negative bias diminishes as the
number of observations increases.
1206.10 Trends and Random Walks
- They tabulate the significance points for testing
the hypothesis ?1 against . - Nelson and Plosser applied the Dickey-Fuller test
to a wide range of historical time series for the
U.S. economy and found that the DSP hypothesis
was accepted in all cases, with the exception of
the unemployment rate. - They conclude that for most economic time series
the DSP model is more appropriate
1216.10 Trends and Random Walks
- The problem of testing the hypothesis ?1 in the
first-order autoregressive equation of the form - is called testing for unit roots.
- There is an enormous literature on this problem
but one of the most commonly used tests is the
Dickey-Fuller test. - The standard expression for the large sample
variance of the least squares estimator
which would be zero under the null
hypothesis. - Hence, one needs to derive the limiting
distribution of under H0, ?1 to apply the
test.
122Three Types of RW
- RW without drift Yt1Yt-1ut
- RW with drift Ytalpha1Yt-1ut
- RW with drift and time trend Ytalphabetat1Yt
-1ut - utiid(0,sigma)
- An example
123Augmented D-F (ADF) tests
- Yta1Yt-1ut
- Yt-Yt-1(a1-1)Yt-1ut
- ?Yt(a1-1)Yt-1ut
- ?Yt?Yt-1ut
- H0a11 H0 ?0
- ?Yt?Yt-1S?Yt-iut
- Unit root test an example
- Limitations of ADF Tests
1246.10 Trends and Random Walks
- Spurious Trends
- If ß0 in equation (6.30) the model is called a
trendless random walk or a random walk with zero
drift. - However, from equation(6.31) note that even
though there is no trend in the mean, there is a
trend in the variance. - Suppose that the true model is of the DSP type
with ß0. - What happens if we estimated a TSP type model?
1256.10 Trends and Random Walks
- That is, the true model is one with no trend in
the mean but only a trend in the variance, and we
estimate a model with a trend in the mean but no
trend in the variance. - It is intuitively clear that the trend in the
variance will be transmitted to the mean and we
will find a significant coefficient for t even
though in reality there is no trend in the mean. - How serious is this problem?
1266.10 Trends and Random Walks
- Nelson and Kang analyze this and conclude that
- 1. Regression of a random walk on time by least
squares will produce R2 values of around 0.44
regardless of sample size when, in fact, the mean
of the variable has no relationship with time
whatever. - 2. In the case of random walks with drift, that
is ß?0, the R2 will be higher and will increase
with the sample size, reaching one in the limit
regardless of the value of ß.
1276.10 Trends and Random Walks
- 3. The residual from the regression on time which
we take as the de-trended series, has on the
average only about 14 of the true stochastic
variance of the original series. - 4. The residuals from the regression on time are
also autocorrelated being roughly(1-10/N) at lag
one, where N is the sample size.
1286.10 Trends and Random Walks
- 5. Conventional t-tests to test the significance
of some of the regressors are not valid. They
tend to reject the null hypothesis of no
dependence on time, with very high frequency.
1296.10 Trends and Random Walks
- 6. Regression of one random walk on another, with
time included for trend, is strongly subject to
the spurious regression phenomenon. That is, the
conventional t-test will tend to indicate a
relationship between the variables when none is
present. - An spurious regression example
1306.10 Trends and Random Walks
- The main conclusion is that using a regression on
time has serious consequences when, in fact, the
time series is of the DSP type and, hence,
differencing is the appropriate procedure for
trend elimination - Plosser and Schwert also argue that with most
economic time series it is always best to work
with differenced data rather than data in levels - The reason is that if indeed the data series are
of the DSP type, the errors in the levels
equation will have variances increasing over time
1316.10 Trends and Random Walks
- Under these circumstances many of the properties
of least squares estimators as well as tests of
significance are invalid - On the other hand, suppose that the levels
equation is correctly specified. Then all
differencing will do is produce a moving average
error and at worst ignoring it will give
inefficient estimates - For instance, suppose that we have the model
1326.10 Trends and Random Walks
- where ut are independent with mean zero and
common variance . - If we difference this equation, we get
-
- where the error is a moving average, and,
hence, not serially independent. - But estimating the first difference equation by
least squares still gives us unbiased/consistent
estimates.
1336.10 Trends and Random Walks
- Thus, the consequences of differencing when it is
not needed are much less serious than those of
failing to difference when it is appropriate
(when the true model is of the DSP type). - In practice, it is best to use the Dickey-Fuller
test to check whether the data are of DSP or TSP
type. - Otherwise, it is better to use differencing and
regressions in first differences, rather than
regressions in levels with time as an extra
explanatory variable.
1346.10 Trends and Random Walks
- The Concept of Cointegration Differencing vs.
Long-Run Effects - One drawback of the procedure of differencing is
that it results in a loss of valuable "long-run
information" in the data - Recently, the concept of cointegrated series has
been suggested as one solution to this problem.
First, we need to define the term
"cointegration.
1356.10 Trends and Random Walks
- If yt follow a random walk model, that is
- then we get by successive substitution,
- Thus, yt is a summation of ej , and
1366.10 Trends and Random Walks
- YtI(1)
- Yt is a random walk
- ?Yt is a white noise, or iid
- No one could predict the future price change
- The market is efficient
- The impact of previous shock on the price will
remain and not approach to zero
1376.10 Trends and Random Walks
- We say in this case yt is I(1)intergrated to
order one. If yt is I(1) and we add to this zt
which is I(0), then will be I(1). - When we specify regression models in time series,
we have to make sure that the different variables
are integrated to the same degree. Otherwise, the
equation does not make sense. - For instance, if we specify the regression model
- and we say that , that is
ut is I(0), we have to make sure that yt and xt
are integrated to the same order.
1386.10 Trends and Random Walks
- For instance, if yt is I(1) and xt is I(0) there
will not be any ß that will satisfy the
relationship(6.34). - Suppose yt is I(1) and xt is I(1) then if there
is a nonzero ß such that is I(0),
then yt and xt are said to be cointegrated.
1396.10 Trends and Random Walks
- Suppose that yt and xt are both random walks, so
that they are both I(1).Then an equation in first
differences of the form - is a valid equation, since
, and vt are all I(0). - Equation(6.34) is considered a long-run
relationship between yt and xt and
equation(6.35)describes short-run dynamics. - Engle and Granger suggest estimating(6.34) by
ordinary least squares, obtaining the estimator
and substituting it in equation (6.35) to
estimate the parameters aand ?.
1406.10 Trends and Random Walks
- This two-step estimation procedure, however,
rests on the assumption that yt and xt are
cointegrated. - It is, therefore, important to test for
cointegration. - Engle and Granger suggest estimaing(6.34) by
ordinary least squares, getting the residual
1416.10 Trends and Random Walks
- test amounts to is testing the hypothesis ?1 in
- that is, testing the hypothesis
- H0 ut is I(1)
- In essence, we testing the null hypothesis that
yt and xt are not cointegrated. - Note that yt is I(1) and xt is I(1), so we are
trying to see that ut is not I(1).
142Cointegration
- Co-integration test an example
- Run the VECM (vector error correction model) by
E-view
143Homework
- Find the spot and futures prices
- 5-year daily data at least
- Run the cointegration test
- Run the VECM
- Check the lead-lag relationship using the EC
parameter estimate
1446.11 ARCH Models and Serial Correlation
- We saw in Section 6.9 that a significant DW
statistic can arise through a number of
misspecifications. - We will now discuss one other source. This is the
ARCH model suggested by Engle (1982) which has,
in recent years, been found useful in the
analysis of speculative prices. - ARCH stands for "autoregressive conditional
heteroskedasticity. - Robert Engle and Clive Granger
- NOBEL PRIZE WINNERS FOR ECONOMICS, 2003
1456.11 ARCH Models and Serial Correlation
- GARCH (p,q) Model (by Bollerslev, 1986)
1466.11 ARCH Models and Serial Correlation
- The high level of persistence in GARCH models
- the sum of the two GARCH parameter estimates
approximates unity in most cases - Li and Lin (2003) This finding provides some
support for the notion that GARCH models are
handicapped by the inability to account for
structural changes during the estimation period
and thus suffers from a high persistence problem
in variance settings - An example GARCH (1,1)
1476.11 ARCH Models and Serial Correlation
- Find the stock returns
- 5-year daily data at least
- Run the GARCH(1,1) model
- Check the sum of the two GARCH parameter
estimates - Parameter estimates
- Graph the time-varying variance estimates