Autocorrelation - PowerPoint PPT Presentation

1 / 147
About This Presentation
Title:

Autocorrelation

Description:

Regarding the problem of detection, we start with the Durbin-Watson (DW) ... Granger and Newbold present some examples with artificially generated data where ... – PowerPoint PPT presentation

Number of Views:307
Avg rating:3.0/5.0
Slides: 148
Provided by: ACE5135
Category:

less

Transcript and Presenter's Notes

Title: Autocorrelation


1
Chapter 6
  • Autocorrelation

2
What is in this Chapter?
  • How do we detect this problem?
  • What are the consequences?
  • What are the solutions?

3
What is in this Chapter?
  • Regarding the problem of detection, we start with
    the Durbin-Watson (DW) statistic, and discuss its
    several limitations and extensions.
  • Durbin's h-test for models with lagged dependent
    variables
  • Tests for higher-order serial correlation.
  • We discuss (in Section 6.5) the consequences of
    serially correlated errors and OLS estimators.

4
What is in this Chapter?
  • The solutions to the problem of serial
    correlation are discussed in
  • Section 6.3 estimation in levels versus first
    differences
  • Section 6.9 strategies when the DW test
    statistic is significant
  • Section 6.10 trends and random walks
  • This chapter is very important and the several
    ideas have to be understood thoroughly.

5
6.1 Introduction
  • The order of autocorrelation
  • In the following sections we discuss how to
  • 1. Test for the presence of serial correlation.
  • 2. Estimate the regression equation when the
    errors are serially correlated.

6
6.2 Durbin-Watson Test
7
6.2 Durbin-Watson Test
  • The sampling distribution of d depends on values
    of the explanatory variables and hence Durbin and
    Watson derived upper limits and lower
    limits for the significance level for d.
  • There are tables to test the hypothesis of zero
    autocorrelation against the hypothesis of
    first-order positive autocorrelation. ( For
    negative autocorrelation we interchange
    .)

8
6.2 Durbin-Watson Test
  • If , we reject the null hypothesis
    of no autocorrelation.
  • If , we do not reject the null
    hypothesis.
  • If , the test is
    inconclusive.

9
6.2 Durbin-Watson Test
  • Although we have said that
    this approximation is valid only in large
    samples.
  • The mean of d when has been shown to
    be given approximately by (the proof is rather
    complicated for our purpose)
  • where k is the number of regression
    parameters estimated (including the constant
    term) and n is the sample size.

10
6.2 Durbin-Watson Test
  • Thus, even for zero serial correlation, the
    statistic is biased upward form 2.
  • If k5 and n15, the bias is as large as 0.8.
  • We illustrate the use of the DW test with an
    example.

11
6.2 Durbin-Watson Test
  • Illustrative Example
  • Consider the data in Table 3.11. The estimated
    production function is
  • Referring to the DW table with k2 and n39 for
    5 significance level, we see that
    .
  • Since the observed , we
    reject the hypothesis at the 5 level.

12
6.2 Limitations of D-W Test
  • It test for only first-order serial correlation.
  • The test is inconclusive if the computed value
    lies between .
  • The test cannot be applied in models with lagged
    dependent variables.

13
6.3 Estimation in Levels Versus First Differences
  • Simple solutions to the serial correlation
    problem First Difference
  • If the DW test rejects the hypothesis of zero
    serial correlation, what is the next step?
  • In such cases one estimates a regression by
    transforming all the variables by
  • ?-differencing (quasi-first difference)
  • First-difference

14
6.3 Estimation in Levels Versus First Differences
15
6.3 Estimation in Levels Versus First Differences
16
6.3 Estimation in Levels Versus First Differences
  • When comparing equations in levels and first
    differences, one cannot compare the R2 because
    the explained variables are different.
  • One can compare the residual sum of squares but
    only after making a rough adjustment. (Please
    refer to P.231)

17
6.3 Estimation in Levels Versus First Differences
18
6.3 Estimation in Levels Versus First Differences
  • For instance, if the residual sum of squares is ,
    say, 1.2 by the level equation, and 0.8 by the
    difference equation and n 11, k1, DW0.9, then
    the adjusted residual sum of squares with the
    levels equation is (9/10)(0.9)(1.2)0.97 which is
    the number to be compared with 0.8.

19
6.3 Estimation in Levels Versus First Differences
  • Since we have comparable residual sum of squares
    (RSS), we can get the comparable R2 as well,
    using the relationship RSS S y y (l - R2)

20
6.3 Estimation in Levels Versus First Differences
  • Let
  • from the first difference equation
  • residual sum of squares from the
    levels equation
  • residual sum of squares from the
    first difference
  • equation
  • comparable from the levels
    equation
  • Then

21
6.3 Estimation in Levels Versus First Differences
  • Illustrative Examples
  • Consider the simple Keynesian model discussed by
    Friedman and Meiselman. The equation estimated in
    levels is
  • where Ct personal consumption expenditure
  • (current dollars)
  • At autonomous expenditure
  • (current dollars)

22
6.3 Estimation in Levels Versus First Differences
  • The model fitted for the 1929-1030 period gave
    (figures in parentheses are standard)

23
6.3 Estimation in Levels Versus First Differences
  • This is to be compared with
    from the equation in first differences.

24
6.3 Estimation in Levels Versus First Differences
  • For the production function data in Table 3.11
    the first difference equation is
  • The comparable figures the levels equation
    reported earlier in Chapter 4, equation (4.24) are

25
6.3 Estimation in Levels Versus First Differences
  • This is to be compared with
    from the equation in first differences.

26
6.3 Estimation in Levels Versus First Differences
  • Harvey gives a different definition of .He
    defines it as
  • This does not adjust for the face that the error
    variances in the levels equations and the first
    difference equation are not the same.
  • The arguments for his suggestion are given in his
    paper.

27
6.3 Estimation in Levels Versus First Differences
  • In the example with the Friedman-Meiselman data
    his measure of is given by
  • Although cannot be greater than 1, it can
    be negative.
  • This would be the case when
    ,that is, when the level model is giving a
    poorer explanation than the naïve model, which
    says that is a constant.

28
6.3 Estimation in Levels Versus First Differences
  • Usually, with time-series data, one gets high R2
    values if the regressions are estimated with the
    levels yt and Xt but one gets low R2 values if
    the regressions are estimated in first
    differences (yt - yt-1) and (xt - xt-1).

29
6.3 Estimation in Levels Versus First Differences
  • Since a high R2 is usually considered as proof of
    a strong relationship between the variables under
    investigation, there is a strong tendency to
    estimate the equations in levels rather than in
    first differences.
  • This is sometimes called the R2 syndrome."
  • An example

30
6.3 Estimation in Levels Versus First Differences
  • However, if the DW statistic is very low, it
    often implies a misspecified equation, no matter
    what the value of the R2 is
  • In such cases one should estimate the regression
    equation in first differences and if the R2 is
    low, this merely indicates that the variables y
    and x are not related to each other.

31
6.3 Estimation in Levels Versus First Differences
  • Granger and Newbold present some examples with
    artificially generated data where y, x, and the
    error u are each generated independently so that
    there is no relationship between y and x.
  • But the correlations between yt and yt-1,.Xt and
    Xt-1, and ut and ut-1 are very high.
  • Although there is no relationship between y and x
    the regression of y on x gives a high R2 but a
    low DW statistic.

32
6.3 Estimation in Levels Versus First Differences
  • When the regression is run in first differences,
    the R2 is close to zero and the DW statistic is
    close to 2.
  • Thus demonstrating that there is indeed no
    relationship between y and x and that the R2
    obtained earlier is spurious.
  • Thus regressions in first differences might often
    reveal the true nature of the relationship
    between y and x.
  • An example

33
Homework
  • Find the data
  • Y is the Taiwan stock index
  • X is the U.S. stock index
  • Run two equations
  • The equation in levels (log-based price)
  • The equation in the first differences
  • A comparison between the two equations
  • The beta estimate and its significance
  • The R square
  • The value of DW statistic
  • Q Adopt the equation in levels or the first
    differences?

34
6.3 Estimation in Levels Versus First Differences
  • For instance, suppose that we have quarterly
    data then it is possible that the errors in any
    quarter this year are most highly correlated with
    the errors in the corresponding quarter last year
    rather than the errors in the preceding quarter
  • That is, ut could be uncorrelated with ut-1 but
    it could be highly correlated with ut-4.
  • If this is the case, the DW statistic will fail
    to detect it.

35
6.3 Estimation in Levels Versus First Differences
  • What we should be using is a modified statistic
    defined as
  • The quarterly data (e.g. GDP)
  • The monthly data (e.g. Industrial product index)

36
6.4 Estimation Procedures with Autocorrelated
Errors
  • Now we will derive var(ut) and the correlations
    between ut and lagged values of ut. ..
  • From equation(6.1) note that ut depends on et and
    ut-1 ,ut-1 depends on et-1 and ut-2 ,and so on.

37
6.4 Estimation Procedures with Autocorrelated
Errors
  • This ut depends on et ,et-1 ,et-2 ,.Since et
    are serially independent, and ut-1 depends on
    et-1 ,et-2 and so on, but not et ,we have
  • Since , we have
    for all t.

38
6.4 Estimation Procedures with Autocorrelated
Errors
  • If we denote var(ut) by , we have
  • Thus we have
  • This gives the variance of ut in terms of the
    variance of et and the parameter .

39
6.4 Estimation Procedures with Autocorrelated
Errors
  • Let us now derive the correlations. Denoting the
    correlation between ut and ut-1 (which is called
    the correlation of lag s) by , we get
  • But
  • Hence
  • or

40
6.4 Estimation Procedures with Autocorrelated
Errors
  • Since we get by successive substitution
  • Thus the lag correlations are all powers of
    and decline geometrically.

41
6.4 Estimation Procedures with Autocorrelated
Errors
  • GLS (Generalized least squares)

42
6.4 Estimation Procedures with Autocorrelated
Errors
43
6.4 Estimation Procedures with Autocorrelated
Errors
  • In actual practice is not known
  • There are two types of procedures for estimating
  • 1. Iterative procedures
  • 2. Grid-search procedures.

44
6.4 Estimation Procedures with Autocorrelated
Errors
  • Iterative Procedures
  • Among the iterative procedures, the earliest was
    the Cochrane-Orcutt (C-O) procedure.
  • In the Cochrane-Otcutt procedure we estimate
    equation(6.2) by OLS, get the estimated residuals
    , and estimate
    .

45
6.4 Estimation Procedures with Autocorrelated
Errors
  • Durbin suggested an alternative method of
    estimating .
  • In this procedure, we write equation (6.5) as
  • We regress
    and take the estimated coefficient of as
    an estimate of .

46
6.4 Estimation Procedures with Autocorrelated
Errors
  • Use equation (6.6) and (6.6) and estimate a
    regression of y on x.
  • The only thing to note is that the slop
    coefficient in this equation is , but the
    intercept is .
  • Thus after estimating the regression of y on x,
    we have to adjust the constant term
    appropriately to get estimates of the parameters
    of the original equation (6.2).

47
6.4 Estimation Procedures with Autocorrelated
Errors
  • Further, the standard errors we compute from the
    regression of y on x are now asymptotic
    standard errors because of the fact that has
    been estimated.
  • If there are lagged values of y as explanatory
    variables, these standard errors are not correct
    even asymptotically.
  • The adjustment needed in this case is discussed
    in Section 6.7.

48
6.4 Estimation Procedures with Autocorrelated
Errors
  • Grid-Search Procedures
  • One of the first grid-search procedures is the
    Hildreth and Lu procedure suggested in 1960.
  • The procedure is as follows. Calculate
    in equation(6.6) for different values of
    at intervals of 0.1 in the rang
    .
  • Estimate the regression of and
    calculate the residual sum of squares RSS in each
    case.

49
6.4 Estimation Procedures with Autocorrelated
Errors
  • Choose the value of for which the RSS is
    minimum.
  • Again repeat this procedure for smaller intervals
    of around this value.
  • For instance, if the value of for which RSS
    is minimum is -0.4, repeat this search procedure
    for values of at intervals of 0.01 in the
    range .

50
6.4 Estimation Procedures with Autocorrelated
Errors
  • This procedure is not the same as the ML
    procedure. If the errors et are normally
    distributed, we can write the log-likelihood
    function as (derivation is omitted)
  • where
  • Thus minimizing Q is not the same as maximizing
    log L.
  • We can use the grid-search procedure to get the
    ML estimate.

51
6.4 Estimation Procedures with Autocorrelated
Errors
  • Consider the data in Table 3.11 and the
    estimation of the production function
  • The OLS estimation gave a DW statistic of 0.86,
    suggesting significant positive autocorrelation.
  • Assuming that the errors were AR(1), two
    estimation procedures were used the Hildreth-Lu
    grid search and the iterative Cochrane-Orcutt
    (C-O).

52
6.4 Estimation Procedures with Autocorrelated
Errors
  • The other procedures we have described can also
    be tried, but this is left as an exercise.
  • The Hildreth-Lu procedure gave .
  • The iterative C-O procedure gave .
  • The DW test statistic implied that
    .

53
6.4 Estimation Procedures with Autocorrelated
Errors
  • The estimates of the parameters (with standard
    errors in parentheses) were as follows

54
6.4 Estimation Procedures with Autocorrelated
Errors
  • In this example the parameter estimates given by
    Hildreth-Lu and the iterative C-O procedures are
    pretty close to each other.
  • Correcting for the autocorrelation in the errors
    has resulted in a significant change in the
    estimates of .

55
6.5 Effect of AR(1) Errors on OLS Estimates
  • In Section 6.4 we described different procedures
    for the estimation of regression models with
    AR(1) errors
  • We will now answer two questions that might arise
    with the use of these procedures
  • 1. What do we gain from using these procedures?
  • 2. When should we not use these procedures?

56
6.5 Effect of AR(1) Errors on OLS Estimates
  • First, in the case we are considering (i.e., the
    case where the explanatory variable Xt is
    independent of the error ut), the OLS estimates
    are unbiased
  • However, they will not be efficient
  • Further, the tests of significance we apply,
    which will be based on the wrong covariance
    matrix, will be wrong.

57
6.5 Effect of AR(1) Errors on OLS Estimates
  • In the case where the explanatory variables
    include lagged dependent variables, we will have
    some further problems, which we discuss in
    Section 6.7
  • For the present, let us consider the simple
    regression model

58
6.5 Effect of AR(1) Errors on OLS Estimates
  • For the present, let us consider the simple
    regression model
  • Let
  • If ut are AR(1), we have .

59
6.5 Effect of AR(1) Errors on OLS Estimates
60
6.5 Effect of AR(1) Errors on OLS Estimates
  • If we ignore the autocorrelation problem, we
    would be computing
    .Thus we would be ignoring the expression in the
    parentheses of equation (6.10).
  • To get an idea of the magnitude of this
    expression, let us assume that the xt series also
    follow an AR(1) process with

61
6.5 Effect of AR(1) Errors on OLS Estimates
  • Since we are now assuming xt to be stochastic,
    we will consider the asymptotic variance of
    .
  • The expression in parentheses in equation (6.10)
    is now

62
6.5 Effect of AR(1) Errors on OLS Estimates
  • Thus
  • where T is the number of observations.
  • If ,then
  • Thus ignoring the expression in the parenthesis
    of equation (6.10) results in an underestimation
    by close to 78 for the variance of .

63
6.5 Effect of AR(1) Errors on OLS Estimates
  • If ,this is an unbiased estimate .
  • If ,then under the assumptions we are
    making, we have approximately
  • Again if ,
    we have

64
6.5 Effect of AR(1) Errors on OLS Estimates
  • We can also derive the asymptotic variance of the
    ML estimator when both x and u are
    first-order autoregressive as follow. Note that
    the ML estimator of is asymptotically
    equivalent to the estimator obtained from a
    regression of
    .

65
6.5 Effect of AR(1) Errors on OLS Estimates
  • Hence
  • where

66
6.5 Effect of AR(1) Errors on OLS Estimates
  • When xt is autoregressive we have
  • Hence by substitution we get the asymptotic
    variance of as

67
6.5 Effect of AR(1) Errors on OLS Estimates
  • Thus the efficiency of the OLS estimator is
  • One can compute this for different values of
    .
  • For this efficiency is 0.21.

68
6.5 Effect of AR(1) Errors on OLS Estimates
  • Thus the consequences of autocorrelated errors
    are
  • 1. The least squares estimators are unbiased but
    are not efficient. Sometimes they are
    considerably less efficient than the procedures
    that take account of the autocorrelation
  • 2. The sampling variances are biased and
    sometimes likely to be seriously understated.
    Thus R2 as well as t and F statistics tend to be
    exaggerated.

69
6.5 Effect of AR(1) Errors on OLS Estimates
  • The solution to these problems is to use the
    maximum likelihood procedure (one-step procedure)
    or some other procedure mentioned earlier
    (two-step procedure) that takes account of the
    autocorrelation.
  • However, there are four important points to note

70
6.5 Effect of AR(1) Errors on OLS Estimates
  • If is known, it is true that one can get
    estimators better than OLS that take account of
    autocorrelation. However, in practice is
    known and has to be estimated. In small samples
    it is not necessarily true that one gains (in
    terms of mean-square error for ) by
    estimating .
  • This problem has been investigated by Rao
    and Griliches, who suggest the rule of thumb (for
    sample of size 20) that one can use the methods
    that take account of autocorrelation if
    ,where is the estimated first-order
    serial correlation from an OLS regression. In
    samples of larger sizes it would be worthwhile
    using these methods for smaller than 0.3.

71
6.5 Effect of AR(1) Errors on OLS Estimates
  • 2. The discussion above assumes that the true
    errors are first-order autoregressive. If they
    have a more complicated structure (e.g.,
    second-order autoregressive), it might be thought
    that it would still be better to proceed on the
    assumption that the errors are first-order
    autoregressive rather than ignore the problem
    completely and use the OLS method???
  • Engle shows that this is not necessarily true
    (i.e., sometimes one can be worse off making the
    assumption of first-order autocorrelation than
    ignoring the problem completely).

72
6.5 Effect of AR(1) Errors on OLS Estimates
  • In regressions with quarterly (or monthly) data,
    one might find that the errors exhibit fourth (or
    twelfth)-order autocorrelation because of not
    making adequate allowance for seasonal effects.
    In such case if one looks for only first-order
    autocorrelation, one might not find any. This
    does not mean that autocorrelation is not a
    problem. In this case the appropriate
    specification for the error term may be
    for quarterly data and
    monthly data.

73
6.5 Effect of AR(1) Errors on OLS Estimates
  • Finally, and most important, it is often possible
    to confuse misspecified dynamics with serial
    correlation in the errors. For instance, a static
    regression model with first-order autocorrelation
    in the errors, that is,
    ,can written as

74
6.5 Effect of AR(1) Errors on OLS Estimates
  • 4. The model is the same as
  • with the restriction .
  • We can estimate the model (6.11) and test
    this restriction. If it is rejected, clearly it
    is not valid to estimate (6.11).(the test
    procedure is described in Section 6.8.)

75
6.5 Effect of AR(1) Errors on OLS Estimates
  • The errors would be serially correlated but not
    because the errors follow a first-order
    autoregressive process but because the term xt-1
    and yt-1 have been omitted.
  • Thus is what is meant by misspecified dynamics.
    Thus significant serial correlation in the
    estimated residuals does not necessarily imply
    that we should estimate a serial correlation
    model.

76
6.5 Effect of AR(1) Errors on OLS Estimates
  • Some further tests are necessary (like the
    restriction in the
    above-mentioned case).
  • In face, it is always best to start with an
    equation like (6.11) and test this restriction
    before applying any test for serial correlation.

77
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • In previous sections we considered explanatory
    variables that were uncorrelated with the error
    term
  • This will not be the case if we have lagged
    dependent variables among the explanatory
    variables and we have serially correlated errors
  • There are several situations under which we would
    be considering lagged dependent variables as
    explanatory variables
  • These could arise through expectations,
    adjustment lags, and so on.
  • Let us consider a simple model

78
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • Let us consider a simple model
  • et are independent with mean 0 and variance s2
    and .
  • Because ut depends on ut-1 and yt-1 depends on
    ut-1, the two variables yt-1 and ut will be
    correlated.

79
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
An example
80
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • Durbins h-Test
  • Since the DW test is not applicable in these
    models, Durbin suggests an alternative test,
    called the h-test.
  • This test uses
  • as a standard normal variable.

81
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • Hence is the estimated first-order serial
    correlation from the OLS residual, is
    the estimated variance of the OLS estimate of a,
    and n is the sample size.
  • If , the test is not applicable.
    In this case Durbin suggests the following test.

82
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • Durbins Alternative Test
  • From the OLS estimation of equation(6.12) compute
    the residuals .
  • Then regress
  • The test for ?0 is carried out by testing the
    significance of the coefficient of in the
    latter regression.

83
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • An equation of demand for food estimated from 50
    observations gave the following results (figures
    in parentheses are standard errors)
  • where qtfood consumption per capita
  • ptfood price (retail price deflated
    by the consumer
  • price index)
  • ytper capita disposable income
    deflated by the
  • consumer price index

84
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • We have
  • Hence Duribins h-statistic is
  • This is significant at the1level.
  • Thus we reject the hypothesis ?0, even though
    the DW statistic is close to 2 and the estimate
    from the OLS residuals is

85
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • Let us keep all the numbers the same and just
    change the standard error of .
  • The following are the results
  • Thus, other things equal, the precision with
    which is estimated has estimated has
    significant effect on the outcome of the h-test.

86
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • In the case where the h-test cannot be used, we
    can use the alternative test suggested by Durbin,
  • However, the Monte Carlo study by Maddala and Rao
    suggests that this test does not have good power
    in those cases where the h-test cannot be used.

87
6.7 Tests for Serial Correlation in Models with
Lagged Dependent Variables
  • On the other hand, in cases where the h-test can
    be used, Durbins second test is almost as
    powerful.
  • It is not often used because it involves more
    computations.
  • However, we will show that Durbins second test
    can be generalized to higher-order
    autoregressions, whereas the h-test cannot.

88
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • The h-test we have discussed is, like the
    Durbin-Watson test, a test for first-order
    autoregression.
  • Breusch and Godfrey discuss some general tests
    that are easy to apply and are valid for very
    general hypotheses about the serial correlation
    in the errors.

89
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • These tests are derived from a general principle
    called the Lagrange multiplier (LM) principle
  • A discussion of this principle is beyond the
    scope of this book.
  • For the present we will explain what the test is.
  • The test is similar to Durbin's second test that
    we have discussed

90
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • Consider the regression model
  • We are interested in testing H0?1?2?p0.
  • The xs in equation(6.14) include lagged
    dependent variables as well.
  • The LM test is as follows.

91
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • First, estimate (6.14) by OLS and obtain the
    least squares residuals .
  • Next, estimate the regression equation
  • And test whether the coefficient of are
    all zero.
  • We take the conventional F-statistic and use p.F
    as ?2 with d.f. of p.
  • We use the ?2-test rather than the F-test because
    the LM test is a large sample test.

92
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • The test can be used for different specifications
    of the error process.
  • For instance, for the problem of testing for
    fourth-order autocorrelation
  • we just estimate
  • Instead of (6.16) and test ?40

93
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • The test procedure is the same for autoregressive
    or moving average errors.
  • For instance if we have a moving average (MA)
    error
  • instead of (6.17), the test procedure is still
    to estimate (6.18) and test .

94
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • In all these case, we just test H0 by estimating
    equation (6.16) with p2 and test ?1?20.
  • What is of importance is the degree of
    autoregression, not its nature.

95
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • Finally, an alternative to the estimation of
    (6.16) is to estimate the equation

96
6.8 A General Test for Higher-Order Serial
Correlation The LM Test
  • The LM test for serial correlation is
  • Estimate equation (6.14) by OLS get the residual
    .
  • Estimate equation (6.16) or (6.19) by OLS and
    compute the F-statistic for testing the
    hypothesis
    .
  • Use with p degrees of freedom.

97
6.9 Strategies When the DW Test Statistic is
Significant
  • The DW test is designed as a test for the
    hypothesis ? 0 if the errors follow a
    first-order autoregressive process
  • However, the test has been found to be robust
    against other alternatives such as AR(2), MA(1),
    ARMA(1, 1), and so on.
  • Further, and more disturbingly, it catches
    specification errors like omitted variables that
    are themselves autocorrelated, and misspecified
    dynamics (a term that we will explain).
  • Thus the strategy to adopt, if the DW test
    statistic is significant, is not clear.
  • We discuss three different strategies

98
6.9 Strategies When the DW Test Statistic is
Significant
  • 1. Assume that the significant DW statistic is an
    indication of serial correlation but may not be
    due to AR(1) errors
  • 2. Test whether serial correlation is due to
    omitted variables.
  • 3. Test whether serial correlation is due to
    misspecified dynamics.

99
6.9 Strategies When the DW Test Statistic is
Significant
  • Errors Not AR(1)
  • In case 1, if the DW statistic is significant,
    since it does not necessarily mean that the
    errors are AR(1), we should check for
    higher-order autoregressions by estimating
    equations of the form
  • Once the order been determined, we can estimate
    the model with appropriate assumptions about the
    error stricture by the methods described in
    Section 6.4.

100
6.9 Strategies When the DW Test Statistic is
Significant
  • Moving average (MA) errors and ARMA errors?
  • Estimation with MA errors and ARMA errors is more
    complicated than with AR errors.
  • However, researchers suggest that it is the order
    of the error process that is more important than
    the particular form the practical point of view,
    for most economic data, it is just sufficient to
    determine the order of the AR process.
  • Thus if a significant DW statistic is observed,
    the appropriate strategy would be to try to see
    whether the errors are generated by a
    higher-order AR process than AR(1) and then
    undertake estimation.

101
6.9 Strategies When the DW Test Statistic is
Significant
  • Autocorrelation Caused by Omitted Variables
  • Suppose that the true regression equation is
  • and instead we estimate

102
6.9 Strategies When the DW Test Statistic is
Significant
  • Then since , if xt is
    autocorrelated, this will produce autocorrelation
    in vt.
  • However vt is no longer independent of xt.
  • This not only are the OLS estimators of ß0 and ß1
    from (6.20) inefficient, they are inconsistent as
    well.

103
6.9 Strategies When the DW Test Statistic is
Significant
  • Serial correlation due to misspecification
    dynamics.
  • In a seminal paper published in 1964, Sargan
    pointed out that a significant DW statistic does
    not necessarily imply that we have a serial
    correlation problem.
  • This point was also emphasized by Henry and
    Mizon.
  • The argument goes as follows.
  • Consider
  • and et are independent with a common variable
    s2.
  • We can write this model as

104
6.9 Strategies When the DW Test Statistic is
Significant
  • Consider an alternative stable dynamic model
  • Equation (6.25) is the same as equation(6.26)
    with the restriction

105
6.9 Strategies When the DW Test Statistic is
Significant
  • A test for ?0 is a test for ß10 (and ß30).
  • But before we test this, what Sargan says is that
    we should first test the restriction (6.27) and
    test for ?0 only if the hypothesis
    is not rejected.
  • If this hypothesis is rejected, we do not have a
    serial correlation model and the serial
    correlation in the errors in (6.24) is due to
    misspecified dynamics, that is the omission of
    the variable yt-1 and xt-1 from the equation.

106
6.9 Strategies When the DW Test Statistic is
Significant
  • If the DW test statistic is significant, a proper
    approach is to test the restriction(6.27) to make
    sure that what we have is a serial correlation
    model before we undertake any autoregressive
    transformation of the variables.
  • In fact, Sargan suggests starting with the
    general model (6.26) and testing the restriction
    (6.27) first, before attempting any test for
    serial correlation.

107
6.9 Strategies When the DW Test Statistic is
Significant
  • Illustrative Example
  • Consider the data in Table 3.11 and the
    estimation of the production function (4.24).
  • In Section 6.4 we presented estimates of the
    equation assuming that the errors are AR(1).
  • This was based on a DW test statistic of 0.86.
  • Suppose that we estimate an equation of the
    form(6.26).
  • The results are as follows (all variables in
    logs figures in parentheses are standard errors)

108
6.9 Strategies When the DW Test Statistic is
Significant
  • Illustrative Example
  • Under the assumption that the errors are AR(1),
    the residual sum of squares, obtained from the
    Hildreth-Lu procedure we used in Section 6.4, is
    RSS10.02635

109
6.9 Strategies When the DW Test Statistic is
Significant
  • Since we have two slope coefficients, we have two
    restrictions of the form (6.27).
  • Note that for the general dynamic model we
    estimating six parameters (a and five ßs).
  • For the serial correlation model we are
    estimating four parameters (a, two ßs, and ?).
  • We will use the likelihood ratio test (LR)

110
6.9 Strategies When the DW Test Statistic is
Significant
  • and -2loge?has a ?2 -distribution with d.f.
    2(number of restrictions).
  • In our example
  • which is significant at the 1 level.
  • Thus the hypothesis if a first-order
    autocorrelation is rejected.
  • Although the DW statistic is significant, this
    does not mean that the errors are AR(1).

111
6.10 Trends and Random Walks
  • Throughout our discussion we have assumed that
    for all t, and
  • for all t and
    k, where is serial correlation of lag k
    (this is simply a function of the lag k and does
    not depend on t).
  • If these assumptions are satisfied, the series ut
    is called covariance stationary (covariances are
    constant over time) or just stationary.

112
6.10 Trends and Random Walks
  • Many economic time series are clearly
    nonstationary in the sense that the mean and
    variance depend on time, and they tend to depart
    ever further from any given value as time goes
    on.
  • If this movement is predominantly in one
    direction (up or down) we say that series
    exhibits a trend.
  • More detailed discussion of the topics covered
    briefly here can be found in Chapter 14.

113
6.10 Trends and Random Walks
  • Nonstationary time series are frequently
    de-trended before further analysis is done.
  • There are two procedures used for de-trending
  • Estimating regressions on time.
  • Successive differencing.

114
6.10 Trends and Random Walks
  • In the regression approach it is assumed that the
    series yt is generated by the mechanism
  • where f(t) is the trend and ut is a
    stationary series with mean zero and variance
    .
  • Let us suppose that f(t) is linear so that we have

115
6.10 Trends and Random Walks
  • Note that the trend-eliminated series is ,
    the least squares residuals that satisfy the
    relationship .
  • If differencing is used to eliminate the trend we
    get .
  • We have to take a first difference again to
    eliminate ß and we get
    as the de-trended series.

116
6.10 Trends and Random Walks
  • On the other hand, suppose we assume that yt is
    generated by the model
  • Where et is a stationary series with mean zero
    and variance .
  • In this case the first difference of yt is
    stationary with mean ß.
  • This model is also known as the random-walk
    model.
  • Accumulating yt starting with an initial value y0
    we get from eauation(6.30).

117
6.10 Trends and Random Walks
  • which has the same form as (6.29) except for
    the fact that the disturbance is not stationary,
    it has variance ts2 that increase over time.
  • Nelson and Plosser call the model (6.29)
    tren-stationary processes (TSP) and model (6.30)
    difference-stationary processes (DSP)

118
6.10 Trends and Random Walks
  • Both the models exhibit a linear trend. But the
    appropriate method of eliminating the trend
    differs
  • To test the hypothesis that a time series belongs
    to the TSP class against the alternative that it
    belongs to the DSP class, Nelson and Plosser use
    a test developed by Dickey and Fuller
  • which belong to the DSP class if ?1,ß0 and
    the TSP class if .

119
6.10 Trends and Random Walks
  • Thus we have to test the hypothesis ?1,ß0
    against .
  • The problem here is that we cannot use the usual
    least squares distribution theory when ?1.
  • Dickey and Fuller show that the least squares
    estimate of ? is not distributed around unity
    under the DSP hypothesis (that is, the true value
    ?1) but rather around a value less than one.
  • However, the negative bias diminishes as the
    number of observations increases.

120
6.10 Trends and Random Walks
  • They tabulate the significance points for testing
    the hypothesis ?1 against .
  • Nelson and Plosser applied the Dickey-Fuller test
    to a wide range of historical time series for the
    U.S. economy and found that the DSP hypothesis
    was accepted in all cases, with the exception of
    the unemployment rate.
  • They conclude that for most economic time series
    the DSP model is more appropriate

121
6.10 Trends and Random Walks
  • The problem of testing the hypothesis ?1 in the
    first-order autoregressive equation of the form
  • is called testing for unit roots.
  • There is an enormous literature on this problem
    but one of the most commonly used tests is the
    Dickey-Fuller test.
  • The standard expression for the large sample
    variance of the least squares estimator
    which would be zero under the null
    hypothesis.
  • Hence, one needs to derive the limiting
    distribution of under H0, ?1 to apply the
    test.

122
Three Types of RW
  • RW without drift Yt1Yt-1ut
  • RW with drift Ytalpha1Yt-1ut
  • RW with drift and time trend Ytalphabetat1Yt
    -1ut
  • utiid(0,sigma)
  • An example

123
Augmented D-F (ADF) tests
  • Yta1Yt-1ut
  • Yt-Yt-1(a1-1)Yt-1ut
  • ?Yt(a1-1)Yt-1ut
  • ?Yt?Yt-1ut
  • H0a11 H0 ?0
  • ?Yt?Yt-1S?Yt-iut
  • Unit root test an example
  • Limitations of ADF Tests

124
6.10 Trends and Random Walks
  • Spurious Trends
  • If ß0 in equation (6.30) the model is called a
    trendless random walk or a random walk with zero
    drift.
  • However, from equation(6.31) note that even
    though there is no trend in the mean, there is a
    trend in the variance.
  • Suppose that the true model is of the DSP type
    with ß0.
  • What happens if we estimated a TSP type model?

125
6.10 Trends and Random Walks
  • That is, the true model is one with no trend in
    the mean but only a trend in the variance, and we
    estimate a model with a trend in the mean but no
    trend in the variance.
  • It is intuitively clear that the trend in the
    variance will be transmitted to the mean and we
    will find a significant coefficient for t even
    though in reality there is no trend in the mean.
  • How serious is this problem?

126
6.10 Trends and Random Walks
  • Nelson and Kang analyze this and conclude that
  • 1. Regression of a random walk on time by least
    squares will produce R2 values of around 0.44
    regardless of sample size when, in fact, the mean
    of the variable has no relationship with time
    whatever.
  • 2. In the case of random walks with drift, that
    is ß?0, the R2 will be higher and will increase
    with the sample size, reaching one in the limit
    regardless of the value of ß.

127
6.10 Trends and Random Walks
  • 3. The residual from the regression on time which
    we take as the de-trended series, has on the
    average only about 14 of the true stochastic
    variance of the original series.
  • 4. The residuals from the regression on time are
    also autocorrelated being roughly(1-10/N) at lag
    one, where N is the sample size.

128
6.10 Trends and Random Walks
  • 5. Conventional t-tests to test the significance
    of some of the regressors are not valid. They
    tend to reject the null hypothesis of no
    dependence on time, with very high frequency.

129
6.10 Trends and Random Walks
  • 6. Regression of one random walk on another, with
    time included for trend, is strongly subject to
    the spurious regression phenomenon. That is, the
    conventional t-test will tend to indicate a
    relationship between the variables when none is
    present.
  • An spurious regression example

130
6.10 Trends and Random Walks
  • The main conclusion is that using a regression on
    time has serious consequences when, in fact, the
    time series is of the DSP type and, hence,
    differencing is the appropriate procedure for
    trend elimination
  • Plosser and Schwert also argue that with most
    economic time series it is always best to work
    with differenced data rather than data in levels
  • The reason is that if indeed the data series are
    of the DSP type, the errors in the levels
    equation will have variances increasing over time

131
6.10 Trends and Random Walks
  • Under these circumstances many of the properties
    of least squares estimators as well as tests of
    significance are invalid
  • On the other hand, suppose that the levels
    equation is correctly specified. Then all
    differencing will do is produce a moving average
    error and at worst ignoring it will give
    inefficient estimates
  • For instance, suppose that we have the model

132
6.10 Trends and Random Walks
  • where ut are independent with mean zero and
    common variance .
  • If we difference this equation, we get
  • where the error is a moving average, and,
    hence, not serially independent.
  • But estimating the first difference equation by
    least squares still gives us unbiased/consistent
    estimates.

133
6.10 Trends and Random Walks
  • Thus, the consequences of differencing when it is
    not needed are much less serious than those of
    failing to difference when it is appropriate
    (when the true model is of the DSP type).
  • In practice, it is best to use the Dickey-Fuller
    test to check whether the data are of DSP or TSP
    type.
  • Otherwise, it is better to use differencing and
    regressions in first differences, rather than
    regressions in levels with time as an extra
    explanatory variable.

134
6.10 Trends and Random Walks
  • The Concept of Cointegration Differencing vs.
    Long-Run Effects
  • One drawback of the procedure of differencing is
    that it results in a loss of valuable "long-run
    information" in the data
  • Recently, the concept of cointegrated series has
    been suggested as one solution to this problem.
    First, we need to define the term
    "cointegration.

135
6.10 Trends and Random Walks
  • If yt follow a random walk model, that is
  • then we get by successive substitution,
  • Thus, yt is a summation of ej , and

136
6.10 Trends and Random Walks
  • YtI(1)
  • Yt is a random walk
  • ?Yt is a white noise, or iid
  • No one could predict the future price change
  • The market is efficient
  • The impact of previous shock on the price will
    remain and not approach to zero

137
6.10 Trends and Random Walks
  • We say in this case yt is I(1)intergrated to
    order one. If yt is I(1) and we add to this zt
    which is I(0), then will be I(1).
  • When we specify regression models in time series,
    we have to make sure that the different variables
    are integrated to the same degree. Otherwise, the
    equation does not make sense.
  • For instance, if we specify the regression model
  • and we say that , that is
    ut is I(0), we have to make sure that yt and xt
    are integrated to the same order.

138
6.10 Trends and Random Walks
  • For instance, if yt is I(1) and xt is I(0) there
    will not be any ß that will satisfy the
    relationship(6.34).
  • Suppose yt is I(1) and xt is I(1) then if there
    is a nonzero ß such that is I(0),
    then yt and xt are said to be cointegrated.

139
6.10 Trends and Random Walks
  • Suppose that yt and xt are both random walks, so
    that they are both I(1).Then an equation in first
    differences of the form
  • is a valid equation, since
    , and vt are all I(0).
  • Equation(6.34) is considered a long-run
    relationship between yt and xt and
    equation(6.35)describes short-run dynamics.
  • Engle and Granger suggest estimating(6.34) by
    ordinary least squares, obtaining the estimator
    and substituting it in equation (6.35) to
    estimate the parameters aand ?.

140
6.10 Trends and Random Walks
  • This two-step estimation procedure, however,
    rests on the assumption that yt and xt are
    cointegrated.
  • It is, therefore, important to test for
    cointegration.
  • Engle and Granger suggest estimaing(6.34) by
    ordinary least squares, getting the residual

141
6.10 Trends and Random Walks
  • test amounts to is testing the hypothesis ?1 in
  • that is, testing the hypothesis
  • H0 ut is I(1)
  • In essence, we testing the null hypothesis that
    yt and xt are not cointegrated.
  • Note that yt is I(1) and xt is I(1), so we are
    trying to see that ut is not I(1).

142
Cointegration
  • Co-integration test an example
  • Run the VECM (vector error correction model) by
    E-view

143
Homework
  • Find the spot and futures prices
  • 5-year daily data at least
  • Run the cointegration test
  • Run the VECM
  • Check the lead-lag relationship using the EC
    parameter estimate

144
6.11 ARCH Models and Serial Correlation
  • We saw in Section 6.9 that a significant DW
    statistic can arise through a number of
    misspecifications.
  • We will now discuss one other source. This is the
    ARCH model suggested by Engle (1982) which has,
    in recent years, been found useful in the
    analysis of speculative prices.
  • ARCH stands for "autoregressive conditional
    heteroskedasticity.
  • Robert Engle and Clive Granger
  • NOBEL PRIZE WINNERS FOR ECONOMICS, 2003

145
6.11 ARCH Models and Serial Correlation
  • GARCH (p,q) Model (by Bollerslev, 1986)

146
6.11 ARCH Models and Serial Correlation
  • The high level of persistence in GARCH models
  • the sum of the two GARCH parameter estimates
    approximates unity in most cases
  • Li and Lin (2003) This finding provides some
    support for the notion that GARCH models are
    handicapped by the inability to account for
    structural changes during the estimation period
    and thus suffers from a high persistence problem
    in variance settings
  • An example GARCH (1,1)

147
6.11 ARCH Models and Serial Correlation
  • Find the stock returns
  • 5-year daily data at least
  • Run the GARCH(1,1) model
  • Check the sum of the two GARCH parameter
    estimates
  • Parameter estimates
  • Graph the time-varying variance estimates
Write a Comment
User Comments (0)
About PowerShow.com