Title: Slides Prepared by
1Slides Prepared by JOHN S. LOUCKS St. Edwards
University
2Chapter 15 Multiple Regression
- Multiple Regression Model
- Least Squares Method
- Multiple Coefficient of Determination
- Model Assumptions
- Testing for Significance
- Using the Estimated Regression Equation
- for Estimation and Prediction
- Qualitative Independent Variables
- Residual Analysis
3The Multiple Regression Model
- The Multiple Regression Model
-
- y ?0 ?1x1 ?2x2 . . . ?pxp ?
- The Multiple Regression Equation
- E(y) ?0 ?1x1 ?2x2 . . . ?pxp
- The Estimated Multiple Regression Equation
- y b0 b1x1 b2x2 . . . bpxp
4The Least Squares Method
- Least Squares Criterion
- Computation of Coefficients Values
- The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix
algebra. We will rely on computer software
packages to perform the calculations. - A Note on Interpretation of Coefficients
- bi represents an estimate of the change in y
corresponding to a one-unit change in xi when all
other independent variables are held constant.
5The Multiple Coefficient of Determination
- Relationship Among SST, SSR, SSE
- SST SSR SSE
- Multiple Coefficient of Determination
- R 2 SSR/SST
- Adjusted Multiple Coefficient of Determination
6Model Assumptions
- Assumptions About the Error Term ?
- The error ? is a random variable with mean of
zero. - The variance of ? , denoted by ??2, is the same
for all values of the independent variables. - The values of ? are independent.
- The error ? is a normally distributed random
variable reflecting the deviation between the y
value and the expected value of y given by - ?0 ?1x1 ?2x2 . . . ?pxp
7Testing for Significance F Test
- Hypotheses
- H0 ?1 ?2 . . . ?p 0
- Ha One or more of the parameters
- is not equal to zero.
- Test Statistic
- F MSR/MSE
8Testing for Significance F Test
- Rejection Rule
- Using test statistic Reject H0 if F gt
F? - Using p-value Reject H0 if p-value lt a
-
- where F? is based on an F distribution with
- p d.f. in the numerator and n - p - 1 d.f. in
the denominator
9Testing for Significance F Test
- ANOVA Table (assuming p independent variables)
- Source of Sum of Degrees of Mean
- Variation Squares Freedom Squares
F - Regression SSR p
- Error SSE n - p - 1
- Total SST n - 1
10Testing for Significance t Test
- Hypotheses
- H0 ?i 0
- Ha ?i 0
- Test Statistic
11Testing for Significance t Test
- Rejection Rule
- Using test statistic Reject H0 if t lt
-t????or t gt t???? -
- Using p-value Reject H0 if p-value lt
a - where t??? is based on a t distribution with
- n - p - 1 degrees of freedom
12Testing for Significance Multicollinearity
- The term multicollinearity refers to the
correlation among the independent variables. - When the independent variables are highly
correlated (say, r gt .7), it is not possible
to determine the separate effect of any
particular independent variable on the dependent
variable. - If the estimated regression equation is to be
used only for predictive purposes,
multicollinearity is usually not a serious
problem. - Every attempt should be made to avoid including
independent variables that are highly correlated.
13Using the Estimated Regression Equationfor
Estimation and Prediction
- The procedures for estimating the mean value of y
and predicting an individual value of y in
multiple regression are similar to those in
simple regression. - We substitute the given values of x1, x2, . . . ,
xp into the estimated regression equation and use
the corresponding value of y as the point
estimate. - The formulas required to develop interval
estimates for the mean value of y and for an
individual value of y are beyond the scope of
the text. - Software packages for multiple regression will
often provide these interval estimates.
14Example Programmer Salary Survey
- A software firm collected data for a sample of
20 - computer programmers. A suggestion was made that
- regression analysis could be used to determine if
salary - was related to the years of experience and the
score on - the firms programmer aptitude test.
- The years of experience, score on the aptitude
test, - and corresponding annual salary (1000s) for a
sample - of 20 programmers is shown on the next slide.
15Example Programmer Salary Survey
- Exper. Score Salary Exper.
Score Salary - 4 78 24 9 88 38
- 7 100 43 2 73 26.6
- 1 86 23.7 10 75 36.2
- 5 82 34.3 5 81 31.6
- 8 86 35.8 6 74 29
- 10 84 38 8 87 34
- 0 75 22.2 4 79 30.1
- 1 80 23.1 6 94 33.9
- 6 83 30 3 70 28.2
- 6 91 33 3 89 30
16Example Programmer Salary Survey
- Multiple Regression Model
- Suppose we believe that salary (y) is related to
the years of experience (x1) and the score on the
programmer aptitude test (x2) by the following
regression model - y ?0 ?1x1 ?2x2 ?
-
- where
- y annual salary (000)
- x1 years of experience
- x2 score on programmer aptitude test
-
17Example Programmer Salary Survey
- Multiple Regression Equation
- Using the assumption E (? ) 0, we obtain
- E(y ) ?0 ?1x1 ?2x2
- Estimated Regression Equation
- b0, b1, b2 are the least squares estimates of
?0, ?1, ?2 - Thus
- y b0 b1x1 b2x2
18Example Programmer Salary Survey
- Solving for the Estimates of ?0, ?1, ?2
Least Squares Output
Input Data
Computer Package for Solving Multiple Regression P
roblems
b0 b1 b2 R2 etc.
x1 x2 y 4 78 24 7 100 43 .
. . . . . 3 89 30
19Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Formula Worksheet (showing data entered)
Note Rows 10-21 are not shown.
20Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Performing the Multiple Regression Analysis
- Step 1 Select the Tools pull-down menu
- Step 2 Choose the Data Analysis option
- Step 3 Choose Regression from the list of
Analysis Tools - continued
21Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Performing the Multiple Regression Analysis
- Step 4 When the Regression dialog box appears
- Enter D1D21 in the Input Y Range box
- Enter B1C21 in the Input X Range box
- Select Labels
- Select Confidence Level
- Enter 95 in the Confidence Level box
- Select Output Range and enter A24 in
the - Output Range box
- Click OK
22Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Statistics)
23Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (ANOVA Output)
The Significance F value in cell F35 is the
p-value used to test for overall significance.
24Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Equation Output)
Note Columns F-I are not shown.
The P-value in cell E41 is used to test for the
individual significance of Experience.
25Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Equation Output)
Note Columns F-I are not shown.
The P-value in cell E42 is used to test for the
individual significance of Test Score.
26Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Estimated Regression Equation
- SALARY 3.174 1.404(EXPER) 0.2509(SCORE)
- Note Predicted salary will be in thousands of
dollars
27Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Equation Output)
Note Columns C-E are hidden.
28Example Programmer Salary Survey
- F Test
- Hypotheses H0 ?1 ?2 0
- Ha One or both of the parameters
- is not equal to zero.
- Rejection Rule
- For ? .05 and d.f. 2, 17 F.05
3.59 - Reject H0 if F gt 3.59.
- Test Statistic
- F MSR/MSE 250.16/5.85 42.76
- Conclusion
- We can reject H0.
29Example Programmer Salary Survey
- t Test for Significance of Individual Parameters
- Hypotheses H0 ?i 0
- Ha ?i 0
- Rejection Rule
- For ? .05 and d.f. 17, t.025 2.11
- Reject H0 if t gt 2.11
- Test Statistics
- Conclusions
- Reject H0 ?1 0 Reject H0
?2 0
30Qualitative Independent Variables
- In many situations we must work with qualitative
independent variables such as gender (male,
female), method of payment (cash, check, credit
card), etc. - For example, x2 might represent gender where x2
0 indicates male and x2 1 indicates female. - In this case, x2 is called a dummy or indicator
variable. - If a qualitative variable has k levels, k - 1
dummy variables are required, with each dummy
variable being coded as 0 or 1. - For example, a variable with levels A, B, and C
would be represented by x1 and x2 values of (0,
0), - (1, 0), and (0,1), respectively.
31Example Programmer Salary Survey (B)
- As an extension of the problem involving the
- computer programmer salary survey, suppose that
- management also believes that the annual salary
is - related to whether or not the individual has a
graduate - degree in computer science or information
systems. - The years of experience, the score on the
programmer - aptitude test, whether or not the individual has
a - relevant graduate degree, and the annual salary
(000) - for each of the sampled 20 programmers are shown
on - the next slide.
32Example Programmer Salary Survey (B)
- Exp. Score Degr. Salary Exp. Score
Degr. Salary - 4 78 No 24 9 88 Yes 38
- 7 100 Yes 43 2 73 No 26.6
- 1 86 No 23.7 10 75 Yes 36.2
- 5 82 Yes 34.3 5 81 No 31.6
- 8 86 Yes 35.8 6 74 No 29
- 10 84 Yes 38 8 87 Yes 34
- 0 75 No 22.2 4 79 No 30.1
- 1 80 No 23.1 6 94 Yes 33.9
- 6 83 No 30 3 70 No 28.2
- 6 91 Yes 33 3 89 No 30
33Example Programmer Salary Survey (B)
- Multiple Regression Equation
- E(y ) ?0 ?1x1 ?2x2 ?3x3
- Estimated Regression Equation
- y b0 b1x1 b2x2 b3x3
- where
- y annual salary (000)
- x1 years of experience
- x2 score on programmer aptitude test
- x3 0 if individual does not have a grad.
degree - 1 if individual does have a grad.
degree - Note x3 is referred to as a dummy variable.
34Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Formula Worksheet (showing data)
Note Rows 9-21 are not shown.
35Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Statistics)
36Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (ANOVA Output)
37Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Equation Output)
Note Columns F-I are not shown.
38Using Excels Regression Tool to Developthe
Estimated Multiple Regression Equation
- Value Worksheet (Regression Equation Output)
Note Columns C-E are hidden.
39Example Programmer Salary Survey (B)
- Interpreting the Parameters
- b1 1.15
- Salary is expected to increase by 1,150 for
each additional year of experience (when all
other independent variables are held constant)
40Example Programmer Salary Survey (B)
- Interpreting the Parameters
- b2 0.197
- Salary is expected to increase by 197 for each
additional point scored on the programmer
aptitude test (when all other independent
variables are held constant)
41Example Programmer Salary Survey (B)
- Interpreting the Parameters
- b3 2.28
- Salary is expected to be 2,280 higher for an
individual with a graduate degree than one
without a graduate degree (when all other
independent variables are held constant)
42Residual Analysis
- For simple linear regression the residual plot
against - and the residual plot against x provide
the same information. - In multiple regression analysis it is preferable
to use the residual plot against to determine
if the model assumptions are satisfied.
43Residual Analysis
- Standardized residuals are frequently used in
residual plots for purposes of - Identifying outliers (typically, standardized
residuals lt -2 or gt 2) - Providing insight about the assumption that the
error term e has a normal distribution - The computation of the standardized residuals in
multiple regression analysis is too complex to be
done by hand - Excels Regression tool can be used
44Using Excel to Construct a Standardized Residual
Plot
- Value Worksheet (Residual Output)
Note Rows 37-51 are not shown.
45Using Excel to Construct a Standardized Residual
Plot
Outlier
46End of Chapter 15