Regression - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Regression

Description:

KNN Ch7 Multiple Regression II SS Decomposition Tests of significance Multicollinearity * – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 39
Provided by: NickE65
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
KNN Ch7
  • Multiple Regression II
  • SS Decomposition
  • Tests of significance
  • Multicollinearity

2
Extra Sum of Squares (ESS)
  • Marginal reduction in SSE when one or several
    predictor variables are added to the regression
    model given that the other variables are already
    in the model.
  • In what other, equivalent manner, can you state
    the above?
  • The word Extra is used since we would like to
    know what the marginal contribution (or extra
    contribution) is of a variable or a set of
    variables when added as explanatory variables to
    the regression model

3
Decomposition of SSR into ESS
  • A pictorial representation is also possible. See
    page 261, Fig. 7.1 of KNN

SSR(X2)
SSR(X1, X2)
SSR(X1X2)
SSE(X2)
SSE(X1, X2)
4
Decomposition of SSR into ESS
  • For two or three explanatory variables the
    formulae are quite easy.
  • With two variables we have,
  • And with three variables,

ConsideringX3 adjusted for X1 and X2 as the
predictor, this would be SSR
Considering Y adjusted for X1 and X2 as the
response bvariable, and X3 adjusted for X1 and X2
as the predictor, this would be the SSE
Considering Y adjusted for X1 and X2 as the
response bvariable, this would be the SSTO
5
Decomposition of SSR into ESS
  • Note that with three variables, we may also have,
  • To test the hypothesis, v/s
    , the test statistic is given
    as,
  • To test (say), v/s ,
    ,the test statistic is
    given as,

6
Decomposition of SSR into ESS
  • In general however we can write,
  • This form is very convenient to use since we do
    not have to keep track of the individual sums of
    squares
  • Also, this form will minimize any errors due to
    subtraction when calculating the SSRs
  • On the next page we see the ANOVA table with
    decomposition of SSR and three variables

7
The ANOVA Table
Source of variation Sum of squares df Mean Squares
Regression 3
1
1
1
Error n-4
Total n-1
8
Another ANOVA Table (whats the difference?)
Source of variation Sum of squares df Mean Squares
Regression 3
1
1
1
Error n-4
Total n-1
9
An Example
  The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3   Predictor Coeff.
StDev. T P Constant 236.1
254.5 0.93 0.355 X1 -0.20286
0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141   S 1802 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance  Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total 140
10278063714     Source DF Seq SS X1
1 80601012 X2 1
9745311037 X3 1
7134188       Source DF Seq SS X3
1 9733071257 X2 1
61498868 X1 1 38476111  
10
Test for a ßk0, in a general model
  •   Full model with all variables,
  • Compute,
  • Reduced model without Xk
  • Compute,

11
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3   Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141   S 1802 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance   Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714    The regression equation
is Y 881 - 0.0918 X1 0.846 X3   Predictor
Coef StDev T P Constant
881.4 244.2 3.61 0.000 X1
-0.09185 0.06023 -1.52
0.130 X3 0.84614 0.01696
49.88 0.000   S 1971 R-Sq 94.8
R-Sq(adj) 94.7   Analysis of
Variance   Source DF SS
MS F P Regression 2
9742103306 4871051653 1254.21 0.000 Error
138 535960409 3883771 Total
140 10278063714
12
Test for some ßk0, in a general model
See (7.26) pg. 267 of KNN
  •   Full model with all variables,
  • Compute,
  • Reduced model without the vectorXk
  • Compute,
  • OR,

13
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2- 0.330 X3   Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141   S 1802 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714     The regression equation
is Y 14 6.50 X2   Predictor Coef
StDev T P Constant 14.4
194.9 0.07 0.941 X2
6.4957 0.1225 53.05 0.000   S 1866
R-Sq 95.3 R-Sq(adj)
95.3   Analysis of Variance Source
DF SS MS F
P Regression 1 9794265737 9794265737
2813.99 0.000 Residual Error 139 483797978
3480561 Total 140 10278063714
14
Test for ßk ßq, in a general model
  •   Full model with all variables,
  • Compute,
  • Reduced model with XkXq
  • Compute,

15
An Example
The regression equation is Y 236 - 0.203 X1
9.09 X2- 0.330 X3   Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141   S 1802 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Error
137 445017478 3248303 Total
140 10278063714    The regression equation
is Y 324 - 0.200 (X1X3) 8.09 X2   Predictor
Coef StDev T
P Constant 324.2 208.7 1.55
0.123 (X1X3) -0.19971 0.05858 -3.41
0.001 X2 8.0891 0.4820
16.78 0.000   S 1798 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance   Source
DF SS MS F
P Regression 2 9831847860
4915923930 1520.33 0.000 Residual Error
138 446215854 3233448 Total 140
10278063714
16
Coefficients of Partial Determination
  •   Recall the definition of the coefficient of
    (multiple) determination)
  • R-sq is the proportionate reduction in Y
    variation when the set of X variables is
    considered in the model.
  • Now consider a coefficient of partial
    determination
  • R-sq for a predictor, given the presence of a set
    of predictors in the model, measures the marginal
    contribution of each variable given that others
    are already in the model.
  • A graphical representation of the strength of the
    relationship between Y and X1, adjusted for X2,
    is provided by partial regression plots (see HW6)

17
Coefficients of Partial Determination
  •   For a model with two independent variables
  • Interpret this ,
  • Generalization is easy, for e.g.,
  • etc.
  • Is there an alternate interpretation of the above
    partial coefficients? What, is say ??

18
An Example
The regression equation is Y - 4.9 1.12
X1   Predictor Coef StDev T
P Constant -4.92 51.52
-0.10 0.924 X1 1.1209 0.9349
1.20 0.233   S 87.46 R-Sq 1.0
R-Sq(adj) 0.3   Analysis of
Variance   Source DF SS
MS F P Regression 1
10995 10995 1.44 0.233 Residual
Error 139 1063300 7650 Total
140 1074295     The regression equation
is Y - 6.17 0.144 X2   Predictor Coef
StDev T P Constant
-6.167 2.075 -2.97 0.003 X2
0.144481 0.002842 50.83 0.000   S
19.86 R-Sq 94.9 R-Sq(adj)
94.9   Analysis of Variance   Source
DF SS MS F
P Regression 1 1019453 1019453
2583.84 0.000 Residual Error 139 54842
395 Total 140 1074295
19
Another Example
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3   Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141 S 1802 R-Sq 95.7 R-Sq(adj)
95.6   Analysis of Variance Source
DF SS MS F
P Regression 3 9833046236 3277682079
1009.04 0.000 Residual Error 137 445017478
3248303 Total 140
10278063714   Source DF Seq SS X1
1 80601012 X2 1
9745311037 X3 1 7134188   The
regression equation is Y 408 - 0.173 X1
6.55 X2 Predictor Coef StDev
T P Constant 407.8 227.6
1.79 0.075 X1 -0.17253
0.05551 -3.11 0.002 X2
6.5506 0.1201 54.54 0.000 S 1810
R-Sq 95.6 R-Sq(adj)
95.5   Analysis of Variance Source DF
SS MS F
P Regression 2 9825912049 4912956024
1499.47 0.000 Residual Error 138 452151666
3276461 Total 140 10278063714
20
The Standardized Multiple Regression Model
21
The Standardized Multi. Regression Model
  • Why necessary?
  • - Round-off errors in normal equations
    calculations (especially when inverting a large,
    X X matrix. What is the size of this inverse for
    say Yb0b1X1.b7X7)
  • - Lack of comparability of coefficients in
    regression models
  • (differences in units involved)
  • - Especially important in presence of
    multicollinearity. The X X matrix is almost
    close to zero in this case.
  • OK. So we have a problem. How do we take care of
    it?
  • - The Correlation Transformation
  • - Centering Take the difference between
    each observation and the average AND
  • - Scaling Dividing the centered observation
    by the standard deviation of the variable.
  • You must have noticed that this is nothing but
    regular standardization? Whats the twist? See
    next slide

22
The Standardized Multi. Regression Model
Standardization
Correlation Transformation
23
The Standardized Multi. Regression Model
  • Once we have performed the Correlation
    Transformation, then all that remains is to
    obtain the new regression parameters. The
    standardized regression model is
  • where, the original parameters can be had from
    the transformation,
  • In Matrix Notation we have some interesting
    relationships

Is this surprising?
WHY?
24
An Example
Part of the original (unstandardized) data set
The regression equation is Y 236 - 0.203 X1
9.09 X2 - 0.330 X3   Predictor Coef
StDev T P Constant 236.1
254.5 0.93 0.355 X1
-0.20286 0.05894 -3.44 0.001 X2
9.090 1.718 5.29 0.000 X3
-0.3303 0.2229 -1.48
0.141   S 1802 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of Variance   Source
DF SS MS F
P Regression 3 9833046236
3277682079 1009.04 0.000 Residual Error
137 445017478 3248303 Total 140
10278063714
25
An Example (continued)
Standardized
and then Correlation Transformed
26
An Example (continued)
The regression equation is Y - 0.00000 -
0.0660 X1 1.37 X2 - 0.381 X3   Predictor
Coef StDev T P Constant
-0.000000 0.001497 -0.00 1.000 X1
-0.06596 0.01916 -3.44
0.001 X2 1.3661 0.2582
5.29 0.000 X3 -0.3813 0.2573
-1.48 0.141   S 0.01778 R-Sq 95.7
R-Sq(adj) 95.6   Analysis of
Variance   Source DF SS
MS F P Regression 3
0.95670 0.31890 1009.04 0.000 Residual
Error 137 0.04330 0.00032 Total
140 1.00000
  • Compare to the regression model obtained from the
    untransformed variables, what can we say about
    the two models? Is there a difference in
    predictive power, or is there a difference in
    ease of interpretation?
  • Why is b00 ? Just by chance?

27
Multicollinearity
  • One of the assumptions of the OLS model is that
    the predictor variables are uncorrelated.
  • When this assumption is not satisfied, then
    multicollinearity is said to exist.(Think about
    Venn Diagrams for this)
  • Note that multicollinearity is strictly a
    sample phenomenon.
  • We may try to avoid it by doing controlled
    experiments, but in most social sciences
    research, this is very difficult to do.
  • Let us first, consider the case of uncorrelated
    predictor variables, i.e., no multicollinearity.
  • -Usually occurs in controlled experiments
  • -In this case the R2 between each pair of
    variables is zero
  • The ESS for each variable is the same as when the
    variable
  • is regressed alone on the response variable.

28
An Example
  The regression equation is Y - 4.73 0.107
X1 3.75 X2   Predictor Coef StDev
T P Constant -4.732
4.428 -1.07 0.334 X1 0.1071
0.3537 0.30 0.774 X2
3.750 1.621 2.31 0.069   S 2.292
R-Sq 52.1 R-Sq(adj)
33.0   Analysis of Variance   Source
DF SS MS F
P Regression 2 28.607 14.304
2.72 0.159 Residual Error 5 26.268
5.254 Total 7
54.875   Source DF Seq SS X1
1 0.482 X2 1 28.125   X2
1 28.125 X1 1
0.482  
X1 X2 Y
1 2 1
2 2 5
3 3 7
4 3 8
5 3 4
6 3 9
7 2 5
8 2 2
29
An Example (continued)
The regression equation is Y 4.64 0.107
X1   Predictor Coef StDev T
P Constant 4.643 2.346
1.98 0.095 X1 0.1071 0.4646
0.23 0.825   S 3.011 R-Sq 0.9
R-Sq(adj) 0.0   Analysis of
Variance   Source DF SS
MS F P Regression 1
0.482 0.482 0.05 0.825 Residual
Error 6 54.393 9.065 Total
7 54.875   The regression equation is Y
- 4.25 3.75 X2   Predictor Coef
StDev T P Constant -4.250
3.807 -1.12 0.307 X2
3.750 1.493 2.51 0.046   S 2.111
R-Sq 51.3 R-Sq(adj)
43.1   Analysis of Variance   Source
DF SS MS F
P Regression 1 28.125 28.125
6.31 0.046 Residual Error 6 26.750
4.458 Total 7 54.875
Source DF Seq SS X1 1
0.482 X2 1 28.125   X2
1 28.125 X1 1 0.482 (From
previous slide)
30
Multicollinearity (Effects of)
  • The regression coefficient or any independent
    variable cannot be interpreted as usual. One has
    to take into account which other correlated
    variables are included in the model.
  • The predictive ability of the overall model is
    usually unaffected.
  • The ESS are usually reduced to a great extent.
  • The variability of OLS regression parameter
    estimates is inflated.
  • (Let us see an intuitive reason for this based on
    a model with p-12)
  • Note that the standardized regression
    coefficients have equal standard deviations. Will
    this be the case even when p-13? Or is this
    just a special case scenario.

31
Multicollinearity (Effects of)
  • High R2 , but few significant t-ratios
  • (By now, you should be able to guess the reason
    for this)
  • Wider individual confidence intervals for
    regression parameters (This is obvious based on
    what we discussed on the earlier slide)
  • e.g. What would you conclude based on the above
    picture?

32
Multicollinearity (How to detect it?)
  • High R2 (gt0.8) , but few significant t-ratios
  • Caveat There is a particular situation when the
    above is caused w/out any multicollinearity.
    Thankfully this situation never arises in
    practice
  • High pair-wise correlation (gt0.8) between
    independent variables
  • Caveat This is a sufficient, but not necessary
    condition. For example consider the case where,
  • rX1X20.5, rX1X30.5 and rX2X3-0.5. We may
    conclude, no multicollinearity. However, we find
    that R21 when we regress X1 on X2 and X3
    together. This means that X1 is a perfect linear
    combination of the two other independent
    variables. In fact the formula for the R2 is
    given as,
    and one can readily verify
    that the numbers satisfy this equation.
  • Due to the above caveat, always examine the
    partial correlation coefficients.

33
Multicollinearity (How to detect it?)
  • Run auxiliary regressions, i.e . Regress each of
    the independent variables on the other
    independent variables taken together and conclude
    if it is correlated to the other or not based on
    the R2.
  • The Condition Index (CI)
  • If, Moderate to Strong
    multicollinearity.
  • CI gt 30 means severe multicollinearity.

34
Multicollinearity (What is the remedy?)
  • Rely on joint confidence intervals rather than
    individual ones
  • A priori information of relationship between some
    independent variables? Then include it!
  • For example b1ab2 is known. Then use this in
    the regression model which then becomes, Yb0
    b2X, (where, XX2 aX1)
  • Data Pooling (Usually done by combining
    cross-sectional and time series data. Time series
    data is notorious for multicollinearity)

35
Multicollinearity (What is the remedy?)
  • Delete a variable which is causing problems
  • Caveat Beware of specification bias. This arises
    when a model is incorrectly specified.
  • For example, in order to explain consumption
    expenditure, we may only include income and drop
    wealth since it highly correlated to income.
    However economic theory may postulate that you
    use both variables.
  • First difference transformation of variables
    from time series data
  • The regression in run on differences between
    successive values of variables rather than the
    original variables. (Xi,1-Xi1,1) and
    (Xi,2-Xi1,2) etc. The logic is that even if X1
    and X2 are correlated, there is no reason for
    their first differences to be correlated too.
  • Caveat Beware of autocorrelation which usually
    arises due to this procedure. Also, we lose one
    degree of freedom due to the difference
    procedure.
  • Correlation transformation
  • Getting a new sample (Why?) and/or increasing
    sample size (Why?)
  • Factor Analysis, Principal Components Analysis,
    Ridge Regression

36
An Example
37
An Example (continued)
The regression equation is Y - 0.032 6.99 X1
- 0.064 X2   Predictor Coef StDev
T P Constant -0.0322
0.2516 -0.13 0.898 X1 6.986
1.667 4.19 0.000 X2
-0.0640 0.2171 -0.29 0.769   S
1.872 R-Sq 95.3 R-Sq(adj)
95.2   Analysis of Variance   Source
DF SS MS F
P Regression 2 9794.6 4897.3
1397.80 0.000 Residual Error 138 483.5
3.5 Total 140
10278.1   Source DF Seq SS X1
1 9794.3 X2 1
0.3   Source DF Seq SS X2 1
9733.1 X1 1
61.5     Predicted Values   Fit StDev Fit
95.0 CI 95.0 PI 12.020
3.351 (5.394, 18.646) (4.431,
19.609)
  • High R2
  • Low t-value for b2
  • Low ESS for X2 (i.e.SSR(X2X1))
  • Clearly, X2 contributes little to the model.
  • Really? Look at SSR(X2) ..its humungous !!
  • Clear case of Multi.coll.
  • Of course we knew that rX1X20.997. This should
    have made us suspect that something was amiss.

38
Multicollinearity (Specification Bias)
  • Types of Specification Errors
  • Omitting a relevant variable
  • Including an unnecessary or irrelevant variable
  • Incorrect functional form
  • Errors of measurement bias
  • Incorrect specification of stochastic error term
    (This is a model mis-specification error)
  • More on omitting a relevant variable
    (under-fitting)
  • True Model Yi b0b1Xi1b2 Xi2 ui
  • Fitted Model Yi a0a1 Xi1 ni
  • Consequences of omission
  • If r12 is non-zero then the estimators of a0 and
    a1 are biased and inconsistent
  • Variance of estimator of a1 is biased estimate of
    variance of estimator of b1
  • s2 is incorrectly estimated and CIs, hypothesis
    tests are misleading
  • E(Estimator of a1) b1b2 b21
Write a Comment
User Comments (0)
About PowerShow.com