Title: Multiple Regression
1Multiple Regression
- 12.1 The Linear Regression Model and Assumptions
- 12.2 The Least Squares Estimates and Prediction
- 12.3 The Mean Squared Error and the Standard
Error - 12.4 Model Utility R2 and Adjusted R2
- 12.5 The Overall F-Test
- 12.6 Testing Significance of Independent
Variables - 12.10 Dummy Variables
2Multiple Regression
- One independent variable may not be sufficient to
adequately explain the variation in our dependent
variable. - We may have to include more than one independent
variable in the model. - There is a separate slope coefficient for each
independent variable - We can use the new multiple regression model to
do predictions on the dependent variable, Y
312.1 The Linear Regression Model
The linear regression model relating y to x1, x2,
, xk is
where
4The Regression Model Assumptions
Assumptions about the model error terms,
?s Mean Zero The mean of the error terms is
equal to 0. Constant Variance The variance of
the error terms s2 is, the same for every
combination values of x1, x2, , xk. Normality
The error terms follow a normal distribution for
every combination values of x1, x2, ,
xk. Independence The values of the error terms
are statistically independent of each other.
512.2 Least Squares Estimates and Prediction
Estimation/Prediction Equation
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x01, x02, , x0k. It
is also the point prediction of an individual
value of the dependent variable when the values
of the independent variables are x01, x02, ,
x0k.
b1, b2, , bk are the least squares point
estimates of the parameters ?1, ? 2, , ?
k. x01, x02, , x0k are specified values of the
independent predictor variables x1, x2, , xk.
6Example The Linear Regression Model
Example 12.1 The Fuel Consumption Case
7(No Transcript)
8(No Transcript)
9Multiple Regression Fuel Consumption based on
Temperature Chill Index
Example 12.3 The Fuel Consumption Case
Minitab Output FuelCons 13.1 - 0.0900 Temp
0.0825 Chill Predictor Coef StDev
T P Constant 13.1087
0.8557 15.32 0.000 Temp -0.09001
0.01408 -6.39 0.001 Chill
0.08249 0.02200 3.75 0.013 S
0.3671 R-Sq 97.4 R-Sq(adj)
96.3 Analysis of Variance Source DF
SS MS F
P Regression 2 24.875 12.438
92.30 0.000 Residual Error 5 0.674
0.135 Total 7
25.549 Predicted Values (Temp 40, Chill 10)
Fit StDev Fit 95.0 CI
95.0 PI 10.333 0.170 ( 9.895,
10.771) ( 9.293, 11.374)
10Example Point Predictions and Residuals
Example 12.3 The Fuel Consumption Case
11Interpreting Slope Estimates in a Multiple
Regression Model
Similar to simple regression but we must consider
the other explanatory variables in the model
being held constant
Temp For a given chill index, the mean weekly
fuel consumption is expected to decrease by 0.09
units for each additional degree F rise in
temperature.
12Interpreting Slope Estimates in a Multiple
Regression Model
Chill For a given average outside temperature,
the mean weekly fuel consumption is expected to
increase by 0.083 units for each unit increase in
chill index
1312.3 Mean Square Error and Standard Error
Sum of Squared Errors
Mean Square Error, point estimate of residual
variance s2
Standard Error, point estimate of residual
standard deviation s
Example 12.3 The Fuel Consumption Case
Analysis of Variance Source DF SS
MS F P Regression 2
24.875 12.438 92.30 0.000 Residual Error
5 0.674 0.135 Total 7 25.549
1412.4 Model Utility Multiple Coefficient of
Determination, R²
R2 is the proportion of the total variation in y
explained by the linear regression model
15The Adjusted R2
- Adding an independent variable to multiple
regression will raise R2 - R2 will rise slightly even if the new variable
has no relationship to y - The adjusted R2 corrects this tendency in R2
- As a result, it gives a better estimate of the
importance of the independent variables
16Comparison with Simple Coefficient of
Determination
The simple coefficient of determination r2 is
r2 is the proportion of the total variation in y
explained by the simple linear regression model
17Comparison with Simple Correlation Coefficient
The simple correlation coefficient measures the
strength of the linear relationship between y and
x and is denoted by r.
Where, b1 is the slope of the least squares line.
1812.5 Model Utility F Test for Multiple
Regression Model. Are any Variables Useful?
To test H0 ?1 ?2 ?k 0 versus Ha At
least one of the ?1, ?2, , ?k is not equal to
0 Sometimes referred to as the Global F-test
Test Statistic
Reject H0 in favor of Ha if F(model) gt Fa or
p-value lt a Fa is based on k numerator and
n-(k1) denominator degrees of freedom.
19Example F Test for Linear Regression
Example 12.5 The Fuel Consumption Case Minitab
Output
Analysis of Variance Source DF SS
MS F P Regression 2
24.875 12.438 92.30 0.000 Residual Error
5 0.674 0.135 Total 7 25.549
Test Statistic
2012.6 Testing Significance of the Independent
Variable Which ones are significant?
If the regression assumptions hold, we can reject
H0 ?j 0 at the ? level of significance
(probability of Type I error equal to ?) if and
only if the appropriate rejection point condition
holds or, equivalently, if the corresponding
p-value is less than ?.
Alternative
Reject H0 if
p-Value
Test Statistic
100(1-?) Confidence Interval for ?j
t?, t?/2 and p-values are based on n (k1)
degrees of freedom.
21Example Testing and Estimation for ?s
Example 12.6 The Fuel Consumption Case
Minitab Output Predictor Coef StDev
T P Constant 13.1087
0.8557 15.32 0.000 Temp -0.09001
0.01408 -6.39 0.001 Chill
0.08249 0.02200 3.75 0.013
Test
Interval
Chill is significant at the ? 0.05 level, but
not at ? 0.01
t?, t?/2 and p-values are based on 5 degrees of
freedom.
2212.7 Confidence and Prediction Intervals in
Simple Regression Compared to Multiple
If the regression assumptions hold,
100(1 - a) confidence interval for the mean
value of y, myxo
100(1 - a) prediction interval for an individual
value of y
ta/2 is based on n-2 degrees of freedom
23Example C.I. P.I. in Simple Regression
Example 11.7 The Fuel Consumption Case Minitab
Output (predicted FuelCons when Temp, x
40) Predicted Values Fit StDev Fit
95.0 CI 95.0 PI 10.721 0.241
( 10.130, 11.312) ( 9.014, 12.428)
24C.I. P.I. in Multiple Regression
Prediction
If the regression assumptions hold,
100(1 - a) confidence interval for the mean
value of y
100(1 - a) prediction interval for an individual
value of y
(Distance value requires matrix algebra provided
in MegaStat output)
ta/2 is based on n-(k1) degrees of freedom
25Example C.I. P.I. in Multiple Regression
Example 12.9 The Fuel Consumption Case Minitab
Output FuelCons 13.1 - 0.0900 Temp 0.0825
Chill Predicted Values (Temp 40, Chill 10)
Fit StDev Fit 95.0 CI 95.0 PI
10.333 0.170 (9.895, 10.771)
(9.293,11.374)
95 Confidence Interval
95 Prediction Interval
2612.10 Using Dummy Variables toModel Qualitative
Independent Variables
Part 3
- So far, we have only looked at including
quantitative data in a regression model - However, we may wish to include descriptive
qualitative data as well - For example, might want to include the gender of
respondents - We can model the effects of different levels of a
qualitative variable by using what are called
dummy variables - Also known as indicator variables
27How to Construct Dummy Variables
- A dummy variable always has a value of either 0
or 1 - For example, to model sales at two locations,
would code the first location as a zero and the
second as a 1 - Operationally, it does not matter which is coded
0 and which is coded 1
28What If We Have More Than TwoCategories?
- Consider having three categories, say A, B, and C
- Cannot code this using one dummy variable
- A0, B1, and C2 would be invalid
- Assumes the difference between A and B is the
same as B and C - Must use multiple dummy variables
- Specifically, a categories requires a-1 dummy
variables - For A, B, and C, would need two dummy variables
- x1 is 1 for A, zero otherwise
- x2 is 1 for B, zero otherwise
- If x1 and x2 are zero, must be C
- This is why the third dummy variable is not needed
2912.10 Dummy Variables
Example 12.11 The Electronics World Case
Code 0 for the category you wish to be the
reference
30Example Regression with a Dummy Variable
Example 12.11 The Electronics World Case
Minitab Output Sales 17.4 0.851 Households
29.2 DM Predictor Coef StDev
T P Constant 17.360 9.447
1.84 0.109 Househol 0.85105
0.06524 13.04 0.000 Mall 29.216
5.594 5.22 0.001 S 7.329
R-Sq 98.3 R-Sq(adj) 97.8 Analysis of
Variance Source DF SS
MS F P Regression 2
21412 10706 199.32 0.000 Residual
Error 7 376 54 Total
9 21788
31Interpreting Slope Estimates for Dummy Variables
Since variable values are limited to 0 and 1 we
cannot refer to a unit increase. We use change
in mean Y value when X 1 compared to when X 0
Mall For a given number of households, the mean
sales volume for stores in a mall location is
expected to be 29.22 units higher than that for
stores in other (street) locations.