Multicollinearity

About This Presentation

Title:

Multicollinearity

Description:

Chapter 7 Multicollinearity What is in this Chapter? How do we detect this problem? What are the consequences? What are the solutions? An example by Gauss What is in ... – PowerPoint PPT presentation

Number of Views:419

Avg rating:3.0/5.0

Slides: 76

Provided by: acer181

Category:

more less

Transcript and Presenter's Notes

Title: Multicollinearity

1
Chapter 7

Multicollinearity

2
What is in this Chapter?

How do we detect this problem?
What are the consequences?
What are the solutions?
An example by Gauss

3
(No Transcript)
4
(No Transcript)
5
What is in this Chapter?

In Chapter 4 we stated that one of the
assumptions in the basic regression model is that
the explanatory variables are not exactly
linearly related. If they are, then not all
parameters are estimable
What we are concerned with in this chapter is the
case where the individual parameters are not
estimable with sufficient precision (because of
high standard errors)
This often occurs if the explanatory variables
are highly intercorrelated (although this
condition is not necessary).

6
(No Transcript)
7
What is in this Chapter?

This chapter is very important, because
multicollinearity is one of the most
misunderstood problems in multiple regression
There have been several measures for
multicollinearity suggested in the literature
(variance-inflation factors VIF, condition
numbers CN, etc.)
This chapter argues that all these are useless
and misleading
They all depend on the correlation structure of
the explanatory variables only.

8
What is in this Chapter?

It is argued here that this is only one of
several factors determining high standard errors
High intercorrelations among the explanatory
variables are neither necessary nor sufficient to
cause the multicollinearity problem
The best indicators of the problem are the
t-ratios of the individual coefficients.

9
What is in this Chapter?

This chapter also discusses the solutions offered
for the multicollinearity problem
principal component regression
dropping of variables
However, they are ad hoc and do not help
The only solutions are to get more data or to
seek prior information

10
7.1 Introduction

Very often the data we use in multiple regression
analysis cannot give decisive (significant)
answers to the questions we pose.
This is because the standard errors are very high
or the t-ratios are very low.
This sort of situation occurs when the
explanatory variables display little variation
and/or high intercorrelations.

11
7.1 Introduction

The situation where the explanatory variables are
highly intercorrelated is referred to as
multicollinearity
When the explanatory variables are highly
intercorrelated, it becomes difficult to
disentangle the separate effects of each of the
explanatory variables on the explained variable

12
7.2 Some Illustrative Examples

Thus only(ß1 2ß2) would be estimable.
We cannot get estimates of ß1 and ß2 separately.
In this case we say that there is perfect
multicollinearity, because x1 and x2 are
perfectly correlated (with 1).
In actual practice we encounter cases where r2 is
not exactly 1 but close to 1.

13
7.2 Some Illustrative Examples

As an illustration, consider the case where
so that the normal equations are
The solution is .
Suppose that we drop an observation and the new
values are

14
7.2 Some Illustrative Examples

Now when we solve the equations
We get

15
7.2 Some Illustrative Examples

Thus very small changes in the variances and
covariances produce drastic changes in the
estimates of regression parameters.
It is easy to see that the correlation
coefficient between the two explanatory variables
is given by
which is very high.

16
7.2 Some Illustrative Examples

In practice, addition or deletion of observations
would produce changes in the variances and
covariances
Thus one of the consequences of high correlation
between x1 and x2 is that the parameter estimates
would be very sensitive to the addition or
deletion of observations
This aspect of multicollinearity can be checked
in practice by deleting or adding some
observations and examining the sensitivity of the
estimates to such perturbations

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
7.2 Some Illustrative Examples
22
7.2 Some Illustrative Examples
Thus the variance of will be high if 1. s2
is high. 2. S11 is low. 3. is high.
23
7.2 Some Illustrative Examples

Even if is high, if s2 is low and S11
high, we will not have the problem of high
standard errors.
On the other hand, even if is low, the
standard errors can be high if s2 is high and
S11 is low (i.e., there is not sufficient
variation in x1).
What this suggests is that high value of do
not tell us anything whether we have a
multicollinearity problem or not.
When we have more than two explanatory variables,
the simple correlations among them become all the
more meaningless.

24
7.2 Some Illustrative Examples

As an illustration, consider the following
example with 20 observations on x1, x2, and x3
x1 (1, 1, 1, 1, 1, 0, 0, 0, 0, 0, and 10
zeros)
x2 (0, 0, 0, 0, 0, 1, 1, 1, 1, 1, and 10
zeros)
x3 (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and 10
zeros)

25
7.2 Some Illustrative Examples

Obviously, x3x1x2 and we have perfect
multicollinearity.
But we can see that
,and thus the simple correlations are not
high.
In the case of more than two explanatory
variables, what we have to consider are multiple
correlations of each of the explanatory variables
with the other explanatory variables.

26
7.2 Some Illustrative Examples

Note that the standard error formulas
corresponding to equations (7.1) and (7.2) are
where s2 and Sii are defined as before in the
case of two explanatory variables and
represents the squared multiple correlation
coefficient between xi and the other explanatory
variables.

27
7.3 Some Measures of Multicollinearity

It is important to be familiar with two measures
that are often suggested in the discussion of
multicollinearity the variance inflation factor
(VIF) and the condition number (CN).
The VIF is defined as
where is the squared multiple
correlation coefficient between xi and the other
explanatory variables.

28
7.3 Some Measures of Multicollinearity

A measure that considers the correlations of the
explanatory variable with the explained variable
is Theils measure, which is defined as
where
R2 squared multiple correlation from a
regression
of y on x1, x2,..,xk
squared multiple correlation from a
regression
of y on x1, x2,..,xk with xi
omitted

29
7.3 Some Measures of Multicollinearity

The quantity is termed the
incremental contribution to the squared
multiple correlation by Theil.
If x1, x2,..,xk are mutually uncorrelated, then
m will be 0 because the incremental contributions
all add up to .
In other cases m can be negative as well as
highly positive.

30
7.4 Problems with Measuring Multicollinearity

Let us define
C real consumption per capita
Y real per capita current income
Yp real per capita permanent income
YT real per capita transitory income
YYTYp and Yp and YT are uncorrelated

31
7.4 Problems with Measuring Multicollinearity

Suppose that we formulate the consumption
function as
All these equation are equivalent. However, the
correlations between the explanatory variables
will be different depending in which of the three
equations is considered.
In equation (7.5), since Y and Yp are often
highly correlated, we would say that there is
high multicollinearity.

32
7.4 Problems with Measuring Multicollinearity

In equation (7.6), since YT and Yp are
uncorrelated, we would say that there is no
multicollinearity.
However, the two equations are essentially the
same.
What we should be talking about is the precision
with which a and ß or (aß ) are estimable.

33
7.4 Problems with Measuring Multicollinearity
Consider, for instance, the following data
34
7.4 Problems with Measuring Multicollinearity

For these data the estimation of equation (7.5)
gives (figures in parentheses are standard
errors)
One reason for the imprecision in the estimates
is that Y and Yp are highly correlated (the
correlation coefficient is 0.95).

35
7.4 Problems with Measuring Multicollinearity

For equation (7.6) the correlation between the
explanatory variables is zero and for equation
(7.7) it is 0.32.
The least squares estimates of a and ß are no
more precise in equation (7.6) or (7.7).
Let us consider the estimation of equation (7.6).
We get

36
7.4 Problems with Measuring Multicollinearity

The estimate at (aß) is thus 0.89 and the
standard error is 0.11.
Thus (aß) is indeed more precisely estimated
than either aorß.
As for a, it is not precisely estimated even
though the explanatory variables in this equation
are uncorrelated.
The reason is that the variance of YT is very low
see formula (7.1)

37
7.4 Problems with Measuring Multicollinearity

In Table 7.1 we present data in C, Y, and L for
the period from the first quarter of 1952 to the
second quarter of 1961.
C is consumption expenditures, Y is disposable
income, and L is liquid assets at the end of the
previous quarter.
All figures are in billions of 1954 dollars.
Using the 38 observations we get the following
regression equations .

38
7.4 Problems with Measuring Multicollinearity
39
7.4 Problems with Measuring Multicollinearity

Equation(7.10) shows that L and Y are very highly
correlated.
In fact, substituting the value of L in terms Y
from (7.10) into equation (7.9) and simplifying,
we get equation (7.8) correct to four decimal
place!
However, looking at the t-ratios in equation
(7.9) we might conclude that multicollinearity is
not a problem.

40
7.4 Problems with Measuring Multicollinearity

Are we justified in this conclusion?
Let us consider the stability of the coefficients
with deletion of some observations.
Using only the first 36 observations we get the
following results

41
7.4 Problems with Measuring Multicollinearity

Comparing equation (7.11) with (7.8) and equation
(7.12) with (7.9) we see that the coefficients in
the latter equation show far greater changes than
in the former equation.
Of course, if one applies the tests for stability
discussed in Section 4.11, one might conclude
that the results are not statistically
significant at the 5 level.
Note that the test for stability that we use us
the predictive test for stability.

42
7.4 Problems with Measuring Multicollinearity

Finally, we might consider predicting C for the
first two quarters of 1961 using equations (7.11)
and (7.12).
The predictions are

43
7.4 Problems with Measuring Multicollinearity

Thus the prediction from the equation including L
is further off from the true value than the
predictions from the equations excluding L.
Thus if prediction was the sole criterion, one
might as well drop the variable L.

44
7.4 Problems with Measuring Multicollinearity

The example above illustrates four different ways
of looking at the multicollinearity problem
1. Correlation between the explanatory variables
L and Y, which is high.
2. Standard errors or t-ratios for the estimated
coefficients
In this example the t-ratios are significant,
suggesting that multicollinearity might not be
serious.

45
7.4 Problems with Measuring Multicollinearity

3. Stability of the estimated coefficients when
some observations are deleted.
4. Examining the predictions from the model
If multicollinearity is a serious problem, the
predictions from the model would be worse than
those from a model that includes only a subset of
the set of explanatory variables.

46
7.4 Problems with Measuring Multicollinearity

The last criterion should be applied if
prediction is the object of the analysis.
Otherwise, it would be advisable to consider the
second and third criteria.
The first criterion is not useful, as we have so
frequently emphasized.

47
7.6 Principal Component Regression

Another solution that is often suggested for the
multicollinearity problem is the principal
component regression.
Suppose that we k explanatory variables.
Then we can consider linear functions of these
variables

48
7.6 Principal Component Regression

Suppose we choose the as so that the variance of
z1 is maximized subject to the condition that
This is called the normalization condition.
Z1 is then said to be the first principal
component.
It is the linear function of the xs that has the
highest variance.

49
7.6 Principal Component Regression

The process of maximizing the variance of the
linear function z subject to the condition that
the sum of squares of the coefficients of the xs
is equal to 1, produces k solutions.
Corresponding to these we construct k linear
functions z1, z2,..,zk. These are called the
principal components of the xs.

50
7.6 Principal Component Regression

They can be ordered so that
var(z1)gtvar(z2)gt..gtvar(zk)
z1, the one with the highest variance, is called
the first principal component
z2, with the next highest variance, is called the
second principal component, and so on
Unlike the xs, which are correlated, the zs are
orthogonal or uncorrelated.

51
7.6 Principal Component Regression

The data are presented in Table 7.3.
First let us estimate an import demand function.
The regression of y on x1, x2, x3 gives the
following results

52
7.6 Principal Component Regression
53
7.6 Principal Component Regression

The R2 is very high and the F-ratio is highly
significant but the individual t-ratios are all
insignificant.
This is evidence of the multicollinearity
problem.
Chatterjee and Price argue that before any
further analysis is made, we should look at the
residuals from this equation.

54
7.6 Principal Component Regression

They find (we are omitting the residual plot
here) a distinctive pattern-the residuals
declining until 1960 and then rising.
Chatterjee and Price argue that the difficulty
with the model is that the European Common Market
began operations in 1960, causing change in
import- export relationships
Hence they drop the years after 1959 and consider
only the 11 years 1949-1959. The regression
results are below

55
7.6 Principal Component Regression
56
7.6 Principal Component Regression

The residual plot (not shown here) is now
satisfactory (there are no systematic patterns),
so we can proceed.
Even though the R2 is very high, the coefficient
of x1 is not significant.
There is thus a multicollinearity problem.

57
7.6 Principal Component Regression

To see what should be done about it, we first
look at the simple correlations among the
explanatory variables.
These are
.
We suspect that the high correlation between x1
and x3 could be the source of the trouble.

58
7.6 Principal Component Regression

Does principal component analysis help us? First,
the principal components (obtained from a
principal components program) are
X1, X2, X3 are the normalized values of x1,
x2 ,x3.

59
7.6 Principal Component Regression

That is,
,where
m1, m2 ,m3 are the means and s1, s2 , s3 are the
standard deviations of x1, x2 ,x3 respectively.
Hence var(X1)var(X2)var(X3)1
The variances of the principal components are
var(z1)1.999 var(z2)0.998
var(z3)0.003

60
7.6 Principal Component Regression

Note that .
The fact that var(z3)0 identifies that linear
function as the source of multicollinearity.
In this example there is only one such linear
function. In some examples there could be more.
Since E(X1)E(X2)E(X3)0 because of
normalization, the zs have mean zero.
Thus z3 has mean zero and its variance is also
close to zero. Thus we can say that
.

61
7.6 Principal Component Regression

Looking at the coefficients of the Xs, we can
say that (ignoring the coefficients that are very
small)

62
7.6 Principal Component Regression

In terms of the original (nonnormalized)
variables the regression of x3 on x1 is (figure
in parentheses is standard error)

63
7.6 Principal Component Regression

then substituting for x3 in terms of x1 we get
This gives the linear functions of the ßs that
are estimable.
They are (ß26.258ß3), (ß10.686ß3), and ß2.
The regression of y and x1 and x2 gave the
following results

64
7.6 Principal Component Regression
65
7.6 Principal Component Regression

Of course, we can estimate a regression of x1 and
x3.
The regression coefficient is 1.451.
We now substitute for x1 and estimate a
regression y on x1 and x3.
The results we get are slightly better (we get a
higher R2).

66
7.6 Principal Component Regression

The results are
The coefficient of x3 now is

67
7.6 Principal Component Regression

Suppose that we consider regressing y on the
components z1 and z2 (z3 is omitted because it is
almost zero).
We saw that
.
We have to transform these to the original
variables.

68
7.6 Principal Component Regression

We get

69
7.6 Principal Component Regression

Thus, using z2 as a regressor is equivalent to
using x2, and using z1 is equivalent to using
.
Thus the principal component regression amounts
to regressing y on
.
In our example .

70
7.6 Principal Component Regression

The results are
This is the regression equation we would have
estimated if we assumed that
.
Thus the principal component regression amounts,
in this example, to the use of the prior
information .

71
7.7 Dropping Variables

Consider the model
If our main interest is ß1. Then we drop x2 and
estimate the equation (7.16)

72
7.7 Dropping Variables

Then we drop x2 and estimate the equation
y ß1 x1v
(7.16)
Let the estimator of ß1 from the complete model
(7.15) be denoted by and the estimator of ß1
from the omitted variable model be denoted by
.
is the OLS estimator and is the OV
(omitted variable) estimator.

73
7.7 Dropping Variables

As an estimator of ß1, we use the conditional
omitted variable (COV) estimator, defined as

74
7.7 Dropping Variables

Also, instead if using ,depending
on we can consider a linear combination of
both, namely
This is called the weighted (WTD) estimator and
it has minimum mean-square error if
.
Again t2 is not known and we have to use its
estimated value .