Title: Lab 13
1Lab 13
- Partial Semipartial Correlations, Collinearity,
and Nonlinear Trends
2Partial and Semipartial Correlation
- Partial Correlation correlation between two
variables with the effects of a 3rd variable
removed. - To test need to remove the variance attributable
to the 3rd variable and then compute the
correlation between the two remaining variables.
3Partial and Semipartial Correlation (cont.)
- Run a regression of IV1 on IV3.
- Regression partials out the variance in the DV
(in this case IV1) into two things variance due
to the IV (R-squared) and variance not due to the
IV (residuals). - Run a second regression of IV2 on IV3.
- Compute a correlation between the residuals from
the first regression and the residuals from the
second regression. This is a correlation between
IV1 and IV2 with IV3 partialed out.
4Example of Partial Correlation
- Want to know the correlation between education
and salary. We predict that gender and minority
of the employees will influence this correlation,
we are going to partial out their influence. - Compute correlation between education and salary
controlling for gender and minority status.
5Example of Partial Correlation
- data d1
- infile 'C\WINDOWS\Desktop\lab13.txt'
- input id sex hiredat educ title salary
startsal jobtime prevexp minority - if sex "Male" then gender 1
- if sex "Female" then gender 2
- if minority "Yes" then minor 1
- if minority "No" then minor 2
- proc reg
- model salary gender minor
- output outdata2 rr1
- proc reg
- model educ gender minor
- output outdata3 rr2
- data merged
- merge data2 data3
- proc corr datamerged
- var salary educ gender minor r1 r2
- run
6Output for regressing salary on gender and
minority status
- Model MODEL1
- Dependent
Variable salary - Analysis
of Variance -
Sum of Mean - Source DF
Squares Square F Value Pr gt
F - Model 2
34116446497 17058223249 77.40 lt.0001 - Error 471
1.038E11 220382270 - Corrected Total 473 1.379165E11
- Root MSE
14845 R-Square 0.2474 - Dependent Mean
34420 Adj R-Sq 0.2442 - Coeff Var
43.13034 - Parameter
Estimates - Parameter
Standard - Variable DF Estimate
Error t Value Pr gt t - Intercept 1 42051
3496.63766 12.03 lt.0001 - gender 1 -15961
1373.05406 -11.62 lt.0001 - minor 1 8762.76693
1652.36821 5.30 lt.0001
7Output for regressing education on gender and
minority
- Model MODEL1
- Dependent
Variable educ - Analysis
of Variance -
Sum of Mean - Source DF
Squares Square F Value Pr gt F - Model 2
599.98412 299.99206 42.35 lt.0001 - Error 471
3336.48213 7.08383 - Corrected Total 473 3936.46624
- Root MSE
2.66155 R-Square 0.1524 - Dependent Mean 13.49156
Adj R-Sq 0.1488 - Coeff Var
19.72749 - Parameter
Estimates -
Parameter Standar - Variable DF Estimate
Error t Value Pr gt t - Intercept 1 14.59945
0.62690 23.29 lt.0001 - gender 1 -2.13024
0.24617 -8.65 lt.0001 - minor 1 1.11934
0.29625 3.78 0.0002
8Example of Partial Correlation - Output
- Simple Statistics
- Variable N Mean Std Dev
Sum Minimum Maximum - salary 474 34420 17076
16314875 15750 135000 - educ 474 13.49156 2.88485
6395 8.00000 21.00000 - gender 474 1.45570 0.49856
690.000 1.00000 2.00000 - minor 474 1.78059 0.41428
844.000 1.00000 2.00000 - r1 474 0
14814 0 -22315 91385 - r2 474 0
2.65591 0 -6.70790 6.29210
9Example of Partial Correlation Output (cont.)
- Pearson Correlation
Coefficients, N 474 - Prob gt r
under H0 Rho0 - salary educ
gender minor r1 r2 - salary 1.00000 0.66056 -0.44992
0.17734 0.86754 0.50662 - lt.0001
lt.0001 0.0001 lt.0001 lt.0001 - educ 0.66056 1.00000 -0.35599
0.13289 0.53763 0.92064 - lt.0001
lt.0001 0.0038 lt.0001 lt.0001 - gender -0.44992 -0.35599 1.00000
0.07567 0.00000 0.00000 - lt.0001 lt.0001
0.0999 1.0000 1.0000 - minor 0.17734 0.13289 0.07567
1.00000 0.00000 0.00000 - 0.0001 0.0038
0.0999 1.0000 1.0000 - r1 0.86754 0.53763
0.000 0.0000 1.00000 0.58397 - Residual lt.0001 lt.0001 1.000
1.0000 lt.0001
10Collinearity
- Collinearity means that within the set of IVs,
some of the IVs are (nearly) totally predicted by
the other IVs. - Diagnostics
- Correlation matrix look for large correlations
between IVs - Variance Inflation Factor (VIF) look for values
greater than 10. - Tolerance look for small values close to zero
- Condition indices Look for values greater than
30. Collinearity is spotted by finding 2 or
more variables that have large proportions of
variance (.50 or more) that correspond to large
condition indices.
11Example of Collinearity analysis
- Research on eating disorders.
- BMI is used to approximate body fat.
- Percent overweight
- Appearance anxiety
- Body image
- Eating disorder measures the amount of behaviors
that signal an eating disorder - Check for collinearity by running a correlation
matrix and regression analysis.
12Program
- data d1
- input (bmi percent anxiety image disorder)(5.0)
- cards
- proc corr
- proc reg
- model disorder bmi percent anxiety image /vif
tol collin - run
13Proc Corr Output
- Pearson Correlation Coefficients, N 235
- Prob gt r under H0 Rho0
- bmi percent
anxiety image disorder - bmi 1.00000 0.97992 0.43771
0.65529 0.54376 - lt.0001
lt.0001 lt.0001 lt.0001 - percent 0.97992 1.00000 0.40085
0.68914 0.48138 - lt.0001
lt.0001 lt.0001 lt.0001 - anxiety 0.43771 0.40085 1.00000
0.33633 0.14723 - lt.0001 lt.0001
lt.0001 0.0240 - image 0.65529 0.68914 0.33633
1.00000 0.36574 - lt.0001 lt.0001
lt.0001 lt.0001 - disorder 0.54376 0.48138 0.14723
0.36574 1.00000 - lt.0001 lt.0001
0.0240 lt.0001
14Proc Reg Output
- Analysis of Variance
- Sum of
Mean - Source DF Squares
Square F Value Pr gt F - Model 4 10628
2657.02322 37.79 lt.0001 - Error 230 16171
70.30887 - Corrected Total 234 26799
- Root MSE 8.38504
R-Square 0.3966 - Dependent Mean 80.18298 Adj R-Sq
0.3861 - Coeff Var 10.45738
- Parameter Estimates
- Parameter
Standard - Variable DF Estimate Error
t Value Pr gt t - Intercept 1 56.92129 8.20539
6.94 lt.0001 - bmi 1 2.11256 0.27077
7.80 lt.0001 - percent 1 -1.61429 0.27513
-5.87 lt.0001 - anxiety 1 -0.19021 0.06199
-3.07 0.0024 - image 1 0.18510 0.08071
2.29 0.0227
15Collinearity Output (tol and VIF)
- Parameter Estimates
-
Variance - Variable DF Tolerance
Inflation - Intercept 1 .
0 - bmi 1 0.03632
27.53589 - percent 1 0.03460
28.90430 - anxiety 1 0.77532
1.28979 - image 1 0.50634
1.97497
16Collinearity Output (collin)
- Collinearity Diagnostics
-
Condition - Number Eigenvalue Index
- 1 4.97499
1.00000 - 2 0.01424
18.69085 - 3 0.00840
24.33043 - 4 0.00207
48.99494 - 5 0.00029641
129.55302 - Collinearity
Diagnostics - --------------------Proportion of
Variation-------------------- - Number Intercept bmi percent
anxiety image - 1 0.00017 0.00003 0.00002
0.00044 0.000115 - 2 0.06825 0.01779 0.00724
0.16035 0.00312 - 3 0.13010 0.00157 0.00003
0.77013 0.04982 - 4 0.70537 0.00941 0.00277
0.01361 0.87007 - 5 0.09609 0.97120 0.98993
0.05546 0.07688
17Add collinear variables and rerun
- data d1
- input (bmi percent anxiety image disorder)(5.0)
- bmiperc bmi percent
- cards
- proc corr
- proc reg
- model disorder bmi percent anxiety image /vif
tol collin - run
18Proc Corr Output
- Pearson Correlation Coefficients, N 235
- Prob gt r under H0 Rho0
- bmiperc anxiety
image - bmiperc 1.00000 0.42132
0.67569 - lt.0001
lt.0001 - anxiety 0.42132 1.00000
0.33633 - lt.0001
lt.0001 - image 0.67569 0.33633
1.00000 - lt.0001 lt.0001
19Proc Reg Output
- Sum
of Mean - Source DF Squares
Square F Value Pr gt F - Model 3 7291.63332
2430.54444 28.78 lt.0001 - Error 231 19507
84.44805 - Corrected Total 234 26799
- Root MSE 9.18956
R-Square 0.2721 - Dependent Mean 80.18298 Adj
R-Sq 0.2626 - Coeff Var 11.46074
- Parameter Estimates
- Parameter
Standard - Variable DF Estimate Error
t Value Pr gt t - Intercept 1 39.18791 8.53866
4.59 lt.0001 - bmiperc 1 0.26428 0.03998
6.61 lt.0001 - anxiety 1 -0.09313 0.06615
-1.41 0.1605 - image 1 0.04590 0.08563
0.54 0.5924
20Collinearity Diagnostics
-
Variance - Variable DF Tolerance
Inflation - Intercept 1 .
0 - bmiperc 1 0.50098
1.99609 - anxiety 1 0.81758
1.22313 - image 1 0.54020
1.85116 - Condition
- Number Eigenvalue
Index - 1 3.98060
1.00000 - 2 0.00943
20.54641 - 3 0.00799
22.32086 - 4 0.00198
44.82460 - -----------------Proportion of
Variation---------------- - Number Intercept bmiperc
anxiety image - 1 0.00030972 0.00052415
0.00072670 0.00019231 - 2 0.01865 0.41736
0.56626 0.01061 - 3 0.26238 0.19662
0.41635 0.03812 - 4 0.71866 0.38550
0.01666 0.95108
21Testing Interactions with Regression
- data d1
- input id sex hiredat educ title salary
startsal jobtime prevexp minority - if sex "Male" then gender 1
- if sex "Female" then gender 2
- inter genderprevexp
- cards
- proc reg
- model salary gender prevexp
- proc reg
- model salary gender prevexp inter
- run
22Proc Reg Output w/out interaction
-
Sum of Mean - Source DF
Squares Square F Value Pr gt
F - Model 2
32095090228 16047545114 71.43 lt.0001 - Error 471
1.058214E11 224673896 - Corrected Total 473 1.379165E11
- Root MSE
14989 R-Square 0.2327 - Dependent Mean
34420 Adj R-Sq 0.2295 - Coeff Var
43.54827 - Parameter
Estimates - Parameter
Standard - Variable DF Estimate
Error t Value Pr gt t - Intercept 1 61063
2340.43528 26.09 lt.0001 - prevexp 1 -28.80629
6.68120 -4.31 lt.0001 - gender 1 -16406
1401.56098 -11.71 lt.0001
23Proc Reg Output w/out interaction
-
Sum of Mean - Source DF
Squares Square F Value Pr gt
F - Model 3
32501255237 10833751746 48.30 lt.0001 - Error 470
1.054152E11 224287745 - Corrected Total 473
1.379165E11 - Root MSE
14976 R-Square 0.2357 - Dependent Mean 34420
Adj R-Sq 0.2308 - Coeff Var
43.51083 - Parameter
Estimates -
Parameter Standard - Variable DF Estimate
Error t Value Pr gt t - Intercept 1 63525
2969.20179 21.39 lt.0001 - prevexp 1 -54.37887
20.14155 -2.70 0.0072 - gender 1 -18074
1870.07488 -9.66 lt.0001 - inter 1 18.45578
13.71463 1.35 0.1790
24Significant interaction, run proc corr and proc
gplot
- data d1
- input id sex hiredat educ title salary
startsal jobtime prevexp minority - if sex "Male" then gender 1
- if sex "Female" then gender 2
- inter genderprevexp
- cards
- symbol1 colorblue interpolr1 valuenone
- symbol2 colorblack interpolr2 valuenone
- Proc Sort by gender
- Proc gplot
- plot salary prevexpgender
- Proc corr
- Var salary prevexp
- By gender
- run
25(No Transcript)
26Correlations by gender
- ------------------- gender1 ------------------
- Pearson Correlation Coefficients, N 258
- Prob gt r under H0 Rho0
- salary
prevexp - salary 1.00000
-0.20208 -
0.0011 - prevexp -0.20208 1.00000
- 0.0011
- ----------------- gender2 -----------------
- Pearson Correlation Coefficients, N 216
- Prob gt r under H0 Rho0
- salary
prevexp - salary 1.00000
-0.21958 -
0.0012 - prevexp -0.21958 1.00000
- 0.0012
27In Class Examples
- Download data8lab. Compute a partial correlation
between iq and age controlling for knldge. - Download data8lab. Regress iq on knwldge and
age. Then run the regression again and include
the interaction term of knwldge and age. - Download dataset assign10.txt and check for
multicollinearity.