Title: Multiple Regression and Regression Model Building
1Multiple Regression and Regression Model Building
2A Comment on Regression
- Woody Durham, commenting on a lopsided game
between the Chicago Bulls and the New Jersey Nets
(Dean Dome, 10/20/90) - Watching this game is as much fun as watching a
multiple regression.
3Multiple regression is a direct extension of
simple regression
- See the comparison in the coursepack on pp. 31-32
4Multiple regression example
- Campus Stationery Store - the model which
predicts sales using both advertising and price
as independent variables - p. 27 - We will let Excel do the calculations for us
- When using Excel, the independent variables need
to be in neighboring columns
5Using the multiple regression model for
forecasting
- Often in business we use historical data to make
forecasts about the future - Forecasting is like trying to drive a car
blindfolded following directions given by a
person who is looking out the back window.
Anonymous
6Forecasting - the mechanics
- Two kinds of forecasting
- Point estimates - single, best guesses about
the value of the dependent variable - Interval estimates - a range of values in which
the dependent variable is likely to occur
7Multiple regression point estimates
- Just as with simple regression, we use the data
to estimate the model parameters (the intercept
and slope coefficients), and combine these with
(given) values of the independent variables to
forecast a value of the dependent variable - Example - What level of sales would you predict
for CSS when advertising level is 13 and price is
150?
8Multiple regression interval estimates
- Just as with simple regression, we build an
interval centered on the point estimate of y - Approximate formulas for these interval estimates
are on p. 32 of the coursepack
9Statistical analyses with the multiple regression
model
- Testing the model itself
- A test for the overall model, i.e., testing the
entire collection of independent variables for
usefulness in predicting the dependent variable - Tests for the usefulness of individual
independent variables
10Testing the overall model in multiple regression
- This is a new test, i.e., one we did not discuss
for simple regression (but it works there as
well) - There are three equivalent ways to express the
hypotheses we will be testing
11Testing the overall model, cont.
- or
- or
- H0 The collection of xs does not help to
predict y - Ha The collection of xs does help to predict
y
12Testing the overall model, cont.
- The statistic we use to conduct these hypotheses
tests is the F statistic in the ANOVA box of
Excels Regression output - Note in passing - the sampling distribution of
this statistic is an F distribution. We will
study this distribution later in the course
13Testing the overall model, cont.
- For the moment, the p-value for the test we want
to conduct is the Significance F value in the
Excel output - Small Significance F values imply that the
collection of independent variables does help to
predict the dependent variable
14Testing the usefulness of individual xs
- Again we will be testing the following hypotheses
15Testing the individual xs, cont.
- These tests will be conducted used Excels
P-values contained in the bottom box of the
Regression output - As in simple regression
- Low p-value means the variable is useful in
helping to predict y - High p-value means the variable is not useful in
helping to predict y
16Potential pitfalls of regression
- Strong relationships between the independent
variables - (Multicollinearity)
- Predicting outside the range of values of the
independent variables
17Checking the assumptions
- We will show how to generate and use two graphs
- The scatter diagram of the residuals vs. an
independent variable - The Normal probability plot of the residual
values - to check for three assumptions
- Constant scatter of the residuals
(homoskedasticity) - Linearity of the data
- Normality of the residuals
18Checking the assumptions, cont.
- All of these checks employ art appreciation
- Check for constant scatter and linearity using
the scatter diagrams of the residuals vs. the
independent variables - In Excel, check the Residual Plots option in
the Regression dialog box
19Interpretation of the scatter diagrams
- Constant scatter is not met if the residuals have
different amounts of variation at different
values of x - e.g., butterfly or fan shapes - Linearity is not met if the residuals show a
curved pattern as x varies
20Checking the assumptions, cont.
- Create the Normal probability plot of the
residuals to check for Normality of the residuals - To do this in Excel, follow the procedure given
in the Doing Regression Residual Analysis in
Excel section of the coursepack
21Interpretation of the Normal probability plot
- The residuals are Normally distributed if the
points lie roughly in a straight-line pattern
(along the reference line) - The residuals are not Normally distributed if the
points are curved relative to the reference line
22Introducing qualitative variables into regression
- Basic idea - use a dummy variable, i.e., one
that has only two values, 0 and 1 - Example - exploration of potential salary bias in
the Illustrating Dummy Variables section of the
coursepack (you may have seen these data before!)
23Interpretation of the dummy variable model
- The original model
- if x2 is a dummy variable defined as
24Interpretation of the dummy variable model, cont.
- can be rewritten as
- which represents a pair of parallel models, with
b2 representing the change for men relative to
women
25Interpretation of the dummy variable model, cont.
- Any dummy variable model has a base or
reference case. (Determined by all the dummy
variables 0) - All dummy variable coefficients are interpreted
as changes relative to the base case
26Building a model in which both slope and
intercept change
- Key - add an interaction term to the model
- and b3 is the change in slope relative to the
base case
27Adding qualitative variables with more than two
values
- Create a dummy variable for each value of the
qualitative variable - Make sure you leave at least one of the dummy
variables out of the model when you run it using
Excel - Example - the weight loss data analyzed in the
Qualitative / Quantitative Interactions section
of the packet
28A statistical test to compare two regression
models
- Basic idea - compare reduced and complete
models
(Reduced)
(Complete)
29Comparing two models, cont.
- Important - every variable in the reduced model
must also be in the complete model - Calculate the comparison statistic using numbers
from both regression outputs
30Comparing two models, cont.
- A confusion - there are two ways to calculate the
value of the statistic - The books method
31Comparing two models, cont.
- And the method shown in the packet
- these will always give the same answer!
32Comparing two models, cont.
- This F statistic has an F sampling
distribution with k - g, n-k1 d.f. - The rejection region is in the upper-tail only
33Fitting curved models to data
- If we use a polynomial model
- we can still estimate the parameters of the model
using regression
34Fitting curved models, cont.
- In Excel, a column for each power of the values
of the independent variable must be created - Example - the chicken feed supplement problem in
the An Example of a One Variable, Second Order
Model in the coursepack