Chapter 5: Multicollinearity - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Chapter 5: Multicollinearity

Description:

A CLRM assumption: no exact linear relationships among the ... Variables sharing a common time trend. An overdetermined model ... Y is number of cars sold. ... – PowerPoint PPT presentation

Number of Views:394
Avg rating:3.0/5.0
Slides: 40
Provided by: davidmac
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5: Multicollinearity


1
Chapter 5 Multicollinearity
2
  • 1. Introduction

3
Introduction
  • A CLRM assumption no exact linear relationships
    among the explanatory variables.
  • Otherwise, get multicollinearity, where variables
    move together.

4
Example
  • Perfect collinearity example
  • X2 X3 X4
  • 10 50 52
  • 15 75 75
  • 18 90 97
  • 24 120 129
  • X2 and X3 are perfectly correlated - X3 5X2

5
Example
  • Suppose estimate this a consumption function
    where Y consumption, X2 income and X3
    wealth
  • Y b1 b2X2 b3X3
  • Now suppose X3 5X2
  • Y b1 b2X2 b3 5X2
  • Y b1 (b2 5b3)X3

6
Example
  • We can estimate (b2 5b3) but we cannot figure
    out the contributions of b2 and b3 individually.
  • Can't get unique estimators of b2 and b3 - each
    one is dependent on the other.
  • The regression coefficients are indeterminate
  • Their standard errors are infinite.

7
Multicollinearity
  • Perfect collinearity does not usually happen
  • Except in the case of the dummy variable trap
  • Near collinearity (highly correlated variables)
    is common
  • One can estimate the coefficients
  • But they will have very large standard errors in
    relation to the coefficients inaccurate
    coefficients

8
  • 2. Sources of Multicollinearity

9
Sources of Multicollinearity
  • Data collection method used
  • Ex sampling over a limited range of X values
  • Constraints on the population being sampled
  • Ex people with higher incomes to have more
    wealth
  • May not have enough obs on wealthy low income
    individuals or high income low wealth individuals

10
Sources of Multicollinearity
  • Model specification
  • Ex adding polynomial terms to a model,
    especially if the range of the X variable is
    small.
  • Variables sharing a common time trend
  • An overdetermined model
  • More explanatory variables than the number of
    observations.

11
  • 3. Consequences of Multicollinearity

12
Theoretical Consequences
  • If we have near multicollinearity, but not
    perfect collinearity,
  • The OLS estimators are still BLUE
  • Estimators are unbiased and efficient.
  • In theory, the OLS estimators are unbiased,
  • This is a repeated sampling property - and says
    nothing about the estimators in our sample

13
Practical Consequences
  • Standard error of coefficients will be large.
  • The confidence intervals will be large and the t
    stats may not be significant.
  • Th estimates will not be very precise.
  • We interpret variables as having no effect and
    this may not be correct.

14
Practical Consequences
  • Though have insignificant ts, we will probably
    have very high R2 -
  • There is not enough individual variation in the
    Xs
  • There is a lot of common variation
  • We get low ts but high R2s.

15
Practical Consequences
  • Estimation will not be very robust
  • Small changes in the data or in subsample
    analysis will lead to big changes in the
    coefficients.
  • Wrong signs for some regressors
  • Because the effects of these variables are all
    mixed up together

16
Example
  • Consumption function results
  • Y 24.77 0.94X2 - 0.04X3
  • t (3.67) (1.14) (-0.53)
  • R20.96, F 92.40
  • X2 is income
  • X3 is wealth
  • High R2 shows that income and wealth together
    explain about 96 of the variation in consumption

17
Example
  • Neither of the slope coefficients is individually
    significant.
  • The wealth variable has the wrong sign.
  • The high F value means we reject the null the
    coefficients are jointly equal to 0.
  • The two variables are so highly correlated that
    it is impossible to isolate the individual
    effects

18
Example
  • If regress X3 on X2 we get
  • X3 7.54 10.19X2
  • (0.26) ( 62.04) R2 .99
  • There is almost perfect collinearity between X2
    and X3
  • Regress Y on income only
  • Y 24.45 0.51X2
  • (3.81) (14.24) R2 0.96

19
Example
  • Income variable is highly significant, whereas
    before it was insignificant.
  • Regress Y on wealth only
  • Y 24.41 0.05X3
  • t (3.55) (13.29) R2 0.96
  • Wealth variable is highly significant, whereas
    before it was insignificant.

20
  • 4. Detecting Multicollinearity

21
Methods for Detecting
  • High R2 but few significant t ratios.
  • High correlation among regressors
  • Check correlation coefficient between pairs of
    variables.
  • Check partial correlation coefficients
  • Correlations between two Xs holding another X
    constant.

22
Methods for Detecting
  • These are just indicators since may still have
    multicollinearity
  • Combinations of variables may have correlations
  • Auxiliary regressions
  • Regress each X variable on the other X variables
    and look for high R2

23
Methods for Detecting
  • Suppose have 4 Xs - regress each on the other
    three and obtain the R2
  • Then use F test whether the R2 in each case is
    not significantly different from zero.
  • F R2/(k-1) /(1-R2)/(n-k)
  • k is number of Xs
  • If high F, then R2 is significantly different
    from zero

24
Methods for Detecting
  • Variance inflation factor
  • This measures the speed with which variances and
    covariances increase
  • VIF 1/(1-R2) where R2 measures the coefficient
    of correlation between two variables.
  • As R increases then VIF increases
  • If above 10 it is a problem

25
Methods for Detecting
  • Eigenvalue -
  • k maximum eigenvalue/minimum eigenvalue
  • Condition index square root of k.
  • If k is between 100 and 1000
  • moderate to strong multicollinearity
  • If k exceeds 1000 there is severe
    multicollinearity.

26
  • 5. Remedial Measures

27
General Rules of Thumb
  • Dont worry if t stats are higher than 2
  • Dont worry if R2 from regression is higher than
    R2 of any X regressed on other Xs.
  • If only interested in prediction, and the linear
    combination of these Xs may continue in the
    future dont worry.

28
Solutions
  • Drop a variable.
  • Ex drop the wealth variable from our consumption
    function.
  • This is only okay if we think the true
    coefficient on that variable should be zero.
  • If not, we get specification error since theory
    says both should be included

29
Solutions
  • The income variable is picking up some of the
    wealth effect. So this variable will be biased
    upwards.
  • This solution may be worse than the problem since
    we have bias, whereas with multicollinearity we
    do get BLUE.

30
Solutions
  • New data
  • Get another sample, or a bigger sample
  • Even if a bigger sample had same
    multicollinearity problem, the additional
    information would help to reduce the variances

31
Solutions
  • Rethink the model
  • There may be alternative specifications or
    alternative functional forms
  • Use a priori information
  • Know from previous estimation where correlation
    less serious
  • Theory indicates the relationship between the
    correlated variables.

32
Solutions
  • Ex we may know that b3 0.10b2
  • The rate of change of consumption with respect to
    wealth is one tenth the rate in respect of
    income.
  • So run
  • Y b1 b2X2 0.10b2X3 e
  • Y b1 b2X where X X2 0.1X3
  • Once we get b2 we can estimate b3 from our a
    priori relationship.

33
Solutions
  • Transform variables
  • Suppose we have time series data on consumption,
    income, wealth.
  • We have high multicollinearity partly because
    these variables move in the same direction over
    time.

34
Solutions
  • We want to estimate
  • Yt b1 b2X2t b3X3t et
  • Suppose we set up the model at time t-1
  • Yt-1 b1 b2X2t-1 b3X3t-1 et-1
  • Now subtract second from first - this is the
    first difference form
  • Yt-Yt-1b2(X2t-X2t-1)b3(X3t-X3t-1)vt

35
Solutions
  • This sometime solves the problem, since the
    difference in the variables are not correlated
    but their levels are.
  • Problem created is that the new error term v may
    not satisfy the assumption that the disturbances
    are not correlated.

36
Solutions
  • Could also combine the variables in some way to
    form some sort o index
  • This is what principal components analysis does.

37
Solutions
  • Combine cross sectional and time-series data
  • Ex Want to study the demand for autos have time
    series data.
  • lnY b1b2lnPriceb3lnIncome e
  • Y is number of cars sold.
  • Generally, in time series data the price and
    income variables are highly collinear - so can't
    run this model

38
Solutions
  • Suppose we have some cross sectional data
  • Can get a reliable estimate of income elasticity
    for this since in such a sample the prices will
    not vary much.
  • Then estimate time series regression
  • Y b1 b2lnP e
  • where Y lnY - b3lnIncome
  • Y represents the value of Y after removing effect
    of income.

39
Solutions
  • We are assuming that the cross sectional income
    elasticity is the same as the time series
    elasticity.
  • In polynomial regressions we often have
    multicollinearity.
  • We can express the explanatory variables as
    deviations from the mean value and this tends to
    reduce multicollinearity.
Write a Comment
User Comments (0)
About PowerShow.com