Title: Polynomial Regression Models
1Polynomial Regression Models
- an option if your data are curved
2Uses of polynomial models
- When the true response function really is a
polynomial function. - (Very common!) When the true response function is
unknown or complex, but a polynomial function
approximates the true function well.
3Example Life expectancy over time
4Would a quadratic function better describe the
relationship?
5A quadratic polynomial regression function
- where
- Yi life expectancy of U.S. population in years
- Xi year by decades (1920, , 1990)
- typical assumptions about error terms
6But, a multicollinearity problem
Pearson correlation of Year and YearSq 1.000
7Instead, center the predictors
Mean of Year 1955.0
Year M 1920 53.6 1930 58.1 1940
60.8 1950 65.6 1960 66.6 1970 67.1 1980
70.0 1990 71.8
8Does it really work?
Pearson correlation of YrCent and YrCentSq 0.000
9A better quadratic polynomial regression function
10The regression equation is M 65.4 0.246
YrCent - 0.00231 YrCentSq Predictor Coef
SE Coef T P Constant
65.4125 0.5450 120.02 0.000 YrCent
0.24619 0.01564 15.74
0.000 YrCentSq -0.0023095 0.0007821
-2.95 0.032 S 1.014 R-Sq 98.1
R-Sq(adj) 97.3 Analysis of Variance Source
DF SS MS F
P Regression 2 263.52 131.76 128.22
0.000 Error 5 5.14 1.03 Total
7 268.66 Source DF Seq
SS YrCent 1 254.56 YrCentSq 1
8.96
11(No Transcript)
12Expressing regression model in terms of original
variables
13(No Transcript)
14What is predicted life expectancy for males in
the year 2100?
There is an even greater danger in extrapolation
when modeling data with a polynomial function,
because of changes in direction
15The good news!
16(No Transcript)
17(No Transcript)
18The regression equation is M - 9243 9.28 Year
- 0.00231 YearSq Predictor Coef SE
Coef T P Constant -9243
2989 -3.09 0.027 Year
9.276 3.058 3.03 0.029 YearSq
-0.0023095 0.0007821 -2.95 0.032 S
1.014 R-Sq 98.1 R-Sq(adj)
97.3 Analysis of Variance Source DF
SS MS F P Regression 2
263.52 131.76 128.22 0.000 Error 5
5.14 1.03 Total 7
268.66 Source DF Seq SS Year
1 254.56 YearSq 1 8.96
19Predicted Values for New Observations New Fit
SE Fit 95.0 CI 95.0 PI 1 52.552
16.197 (10.914,94.191) (10.833,94.272)XX X
denotes a row with X values away from the
center XX denotes a row with very extreme X
values Values of Predictors for New
Observations New Year YearSq 1
2100 4410000
20It is possible to overfit the data with
polynomial models
21It is even theoretically possible to fit the data
perfectly.
If you have n data points, then a polynomial of
order n-1 will fit the data perfectly, that is,
it will pass through each data point.
But, good statistical software will keep an
unsuspecting user from fitting such a model.
Error Not enough non-missing observations
to fit a polynomial of this order execution
aborted
22Hierarchical approach to model fitting
Widely accepted approach is to fit a higher-order
model and then explore whether a lower-order
(simpler) model is adequate.
Is a first-order linear model (line) adequate?
23Hierarchical approach to model fitting
But then if a polynomial term of a given order
is retained, then all related lower-order terms
are also retained. That is, if a quadratic term
was significant, you would use this regression
function
24Example Relationship between entrance test
scores and GPA
25A two-predictor, second-order polynomial
regression function
- where
- Yi college GPA
- Xi1 verbal test score
- Xi2 math test score
26But again, multicollinearity issues
Correlations GPA, Verbal, Math, MSq, VSq, VM
GPA Verbal Math MSq
VSq Verbal 0.529 Math 0.573 -0.107 MSq
0.544 -0.109 0.995 VSq 0.466
0.994 -0.123 -0.125 VM 0.832 0.742
0.571 0.565 0.723
27A better two-predictor, second-order polynomial
regression function
- where
- Yi college GPA
- xi1 centered verbal test score
- xi2 centered math test score
- ß12 interaction effect coefficient
28Reduced multicollinearity
Correlations GPA, MCent, VCent, MCentSq,
VCentSq, MCentVCent GPA MCent
VCent MCentSq VCentSq MCent 0.573 VCent
0.529 -0.107 MCentSq -0.286 -0.015
-0.024 VCentSq -0.555 -0.146 -0.045
0.060 MCentVCe 0.065 -0.026 -0.146 -0.105
-0.156
29The regression equation is GPA - 9.92 0.167
Verbal 0.138 Math - 0.00111 VSq
-0.000843 MSq 0.000241 VM Predictor
Coef SE Coef T P Constant
-9.917 1.354 -7.32 0.000 Verbal
0.16681 0.02124 7.85 0.000 Math
0.13760 0.02673 5.15
0.000 VSq -0.0011082 0.0001173
-9.45 0.000 MSq -0.0008433 0.0001594
-5.29 0.000 VM 0.0002411
0.0001440 1.67 0.103 S 0.1871
R-Sq 93.7 R-Sq(adj) 92.7
30Analysis of Variance Source DF SS
MS F P Regression 5 17.5827
3.5165 100.41 0.000 Error 34
1.1908 0.0350 Total 39 18.7735 Source
DF Seq SS Verbal 1
5.2549 Math 1 7.5311 VSq
1 3.6434 MSq 1 1.0552 VM
1 0.0982
31A simpler model
The regression equation is GPA - 11.5 0.189
Verbal 0.159 Math - 0.00114 VSq
-0.000871 MSq Predictor Coef SE Coef
T P Constant -11.458
1.019 -11.24 0.000 Verbal 0.18887
0.01709 11.05 0.000 Math
0.15874 0.02417 6.57 0.000 VSq
-0.0011412 0.0001186 -9.62 0.000 MSq
-0.0008705 0.0001626 -5.35
0.000 S 0.1919 R-Sq 93.1 R-Sq(adj)
92.3
32(No Transcript)
33(No Transcript)
34Fitting polynomial models in Minitab
- Use Calc gtgt Calculator to create squared
predictors, cubic predictors, and interaction
predictors. - Use Stat gtgt Regression gtgt Regression as always.