Multicollinearity occurs when XX is illconditioned' - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Multicollinearity occurs when XX is illconditioned'

Description:

Large changes in the estimated regression equation estimates when a predictor ... a SAS dataset from the user along with any other parameters you deem appropriate. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 19
Provided by: nobl4
Category:

less

Transcript and Presenter's Notes

Title: Multicollinearity occurs when XX is illconditioned'


1
Multicollinearity occurs when XX is
ill-conditioned.
Indications of the presence of serious
multicollinearity are given by the following
informal diagnostics
  • Large changes in the estimated regression
    equation estimates when a predictor variable is
    added or deleted, or when an observation is added
    or deleted
  • Non-significant results in individual tests on
    the regression coefficients while the global F
    test shows significance.
  • Estimated regression coefficients with an
    algebraic sign that is opposite of that expected
    from theoretical considerations or prior
    experience
  • Large standard errors for regression coefficients

2
Variance Inflation Factors (VIF) are a formal
method for detecting the presence of
multicolliniearity that is widely used. These
factors measure how much the variance of the
estimated regression coefficients are inflated
compared to predictor variables are not linearly
related (orthogonal).
The predictor variables are generally on
different scales so it is useful to work with
dimensionless quantities in order to put each
variable on equal footing. We therefore work with
the standardized variable model.
3
Fit the no intercept model
4
(No Transcript)
5
When the predictors are uncorrelated then
The amount the variance of the bis is inflated
over the orthogonal case is therefore the
diagonal elements R-1 and are called the variance
inflation factors (VIF).
VIF values greater than 10 are considered bad.
6
  • Remediation
  • Remove subsets of variables adding to the near
    linear dependencies
  • Principal Components Analysis
  • Ridge Regression

7
Ridge Regression
A small constant (c) is added to the diagonal of
the standardized XX matrix to make it more stable
8
Properties of ridge estimator
Goal Minimize mean squared error
9
Problem We dont know b
Solution Trace Plot Plot the regression
estimates and choose smallest c (introduce as
small a bias as is necessary to fix inflated
variance of estimates) where the estimates seem
to have settled.
10
(No Transcript)
11
Max VIF at vertical red line is 10
12
Max VIF 5 at the vertical blue line
13
  • Common criticisms
  • Choice of c is very subjective
  • c is generally chosen too to be too large using
    trace plot

14
Proposal Since max VIF should be less than 10
then choose c so that max VIF something less than
10 5 for example.
15
proc reg outestridgeout model y x1 x2 x3 x4
/ vif ridge0.09143117 run proc print
dataridgeout run
16
The SAS System
1
The REG Procedure
Model MODEL1
Dependent Variable Y
Number of Observations Read 160
Number of Observations Used
160 Analysis of
Variance Sum
of Mean Source DF
Squares Square F Value Pr gt
F Model 4 158.98085
39.74521 321781 lt.0001 Error
155 0.01915 0.00012352 Corrected
Total 159 159.00000
Root MSE 0.01111 R-Square
0.9999 Dependent Mean 10.00000
Adj R-Sq 0.9999 Coeff Var
0.11114
Parameter Estimates Parameter
Standard
Variance Variable DF Estimate
Error t Value Pr gt t Inflation Intercep
t 1 898.70570 0.95362 942.41
lt.0001 0 X1 1 0.56404
0.00183 308.57 lt.0001
4.30133 X2 1 -4.77187
0.00535 -892.64 lt.0001 36.78685 X3
1 -4.82067 0.00511 -943.62
lt.0001 33.59635 X4 1 -1.25061
0.00199 -627.61 lt.0001 5.11127
17
The SAS System
2 Obs _MODEL_
_TYPE_ _DEPVAR_ _RIDGE_ _PCOMIT_
_RMSE_ 1 MODEL1 PARMS Y
. . 0.01111 2 MODEL1
RIDGE Y 0.091431 .
0.70899 Obs Intercept X1 X2
X3 X4 Y 1
898.706 0.56404 -4.77187 -4.82067
-1.25061 -1 2 144.024 -0.28152
-0.57944 -0.77588 -0.02338 -1
18
Write a SAS macro that takes as input a SAS
dataset from the user along with any other
parameters you deem appropriate.
  • Perform OLS if maximum VIF less than 10
  • Perform ridge regression if max VIF is 10 or more
  • Have the output from methods look similar to that
    produced by proc reg

e-mail the macro code to me by start of class on
11/8
Write a Comment
User Comments (0)
About PowerShow.com