Multiple Regression and Regression Model Building - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Multiple Regression and Regression Model Building

Description:

A statistical test to compare two regression models ... Comparing two models, cont. ... Calculate the comparison statistic using numbers from both regression outputs ... – PowerPoint PPT presentation

Number of Views:376
Avg rating:3.0/5.0
Slides: 35
Provided by: Morgan75
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression and Regression Model Building


1
Multiple Regression and Regression Model Building
2
A Comment on Regression
  • Woody Durham, commenting on a lopsided game
    between the Chicago Bulls and the New Jersey Nets
    (Dean Dome, 10/20/90)
  • Watching this game is as much fun as watching a
    multiple regression.

3
Multiple regression is a direct extension of
simple regression
  • See the comparison in the coursepack on pp. 31-32

4
Multiple regression example
  • Campus Stationery Store - the model which
    predicts sales using both advertising and price
    as independent variables - p. 27
  • We will let Excel do the calculations for us
  • When using Excel, the independent variables need
    to be in neighboring columns

5
Using the multiple regression model for
forecasting
  • Often in business we use historical data to make
    forecasts about the future
  • Forecasting is like trying to drive a car
    blindfolded following directions given by a
    person who is looking out the back window.
    Anonymous

6
Forecasting - the mechanics
  • Two kinds of forecasting
  • Point estimates - single, best guesses about
    the value of the dependent variable
  • Interval estimates - a range of values in which
    the dependent variable is likely to occur

7
Multiple regression point estimates
  • Just as with simple regression, we use the data
    to estimate the model parameters (the intercept
    and slope coefficients), and combine these with
    (given) values of the independent variables to
    forecast a value of the dependent variable
  • Example - What level of sales would you predict
    for CSS when advertising level is 13 and price is
    150?

8
Multiple regression interval estimates
  • Just as with simple regression, we build an
    interval centered on the point estimate of y
  • Approximate formulas for these interval estimates
    are on p. 32 of the coursepack

9
Statistical analyses with the multiple regression
model
  • Testing the model itself
  • A test for the overall model, i.e., testing the
    entire collection of independent variables for
    usefulness in predicting the dependent variable
  • Tests for the usefulness of individual
    independent variables

10
Testing the overall model in multiple regression
  • This is a new test, i.e., one we did not discuss
    for simple regression (but it works there as
    well)
  • There are three equivalent ways to express the
    hypotheses we will be testing

11
Testing the overall model, cont.
  • or
  • or
  • H0 The collection of xs does not help to
    predict y
  • Ha The collection of xs does help to predict
    y

12
Testing the overall model, cont.
  • The statistic we use to conduct these hypotheses
    tests is the F statistic in the ANOVA box of
    Excels Regression output
  • Note in passing - the sampling distribution of
    this statistic is an F distribution. We will
    study this distribution later in the course

13
Testing the overall model, cont.
  • For the moment, the p-value for the test we want
    to conduct is the Significance F value in the
    Excel output
  • Small Significance F values imply that the
    collection of independent variables does help to
    predict the dependent variable

14
Testing the usefulness of individual xs
  • Again we will be testing the following hypotheses

15
Testing the individual xs, cont.
  • These tests will be conducted used Excels
    P-values contained in the bottom box of the
    Regression output
  • As in simple regression
  • Low p-value means the variable is useful in
    helping to predict y
  • High p-value means the variable is not useful in
    helping to predict y

16
Potential pitfalls of regression
  • Strong relationships between the independent
    variables
  • (Multicollinearity)
  • Predicting outside the range of values of the
    independent variables

17
Checking the assumptions
  • We will show how to generate and use two graphs
  • The scatter diagram of the residuals vs. an
    independent variable
  • The Normal probability plot of the residual
    values
  • to check for three assumptions
  • Constant scatter of the residuals
    (homoskedasticity)
  • Linearity of the data
  • Normality of the residuals

18
Checking the assumptions, cont.
  • All of these checks employ art appreciation
  • Check for constant scatter and linearity using
    the scatter diagrams of the residuals vs. the
    independent variables
  • In Excel, check the Residual Plots option in
    the Regression dialog box

19
Interpretation of the scatter diagrams
  • Constant scatter is not met if the residuals have
    different amounts of variation at different
    values of x - e.g., butterfly or fan shapes
  • Linearity is not met if the residuals show a
    curved pattern as x varies

20
Checking the assumptions, cont.
  • Create the Normal probability plot of the
    residuals to check for Normality of the residuals
  • To do this in Excel, follow the procedure given
    in the Doing Regression Residual Analysis in
    Excel section of the coursepack

21
Interpretation of the Normal probability plot
  • The residuals are Normally distributed if the
    points lie roughly in a straight-line pattern
    (along the reference line)
  • The residuals are not Normally distributed if the
    points are curved relative to the reference line

22
Introducing qualitative variables into regression
  • Basic idea - use a dummy variable, i.e., one
    that has only two values, 0 and 1
  • Example - exploration of potential salary bias in
    the Illustrating Dummy Variables section of the
    coursepack (you may have seen these data before!)

23
Interpretation of the dummy variable model
  • The original model
  • if x2 is a dummy variable defined as

24
Interpretation of the dummy variable model, cont.
  • can be rewritten as
  • which represents a pair of parallel models, with
    b2 representing the change for men relative to
    women

25
Interpretation of the dummy variable model, cont.
  • Any dummy variable model has a base or
    reference case. (Determined by all the dummy
    variables 0)
  • All dummy variable coefficients are interpreted
    as changes relative to the base case

26
Building a model in which both slope and
intercept change
  • Key - add an interaction term to the model
  • and b3 is the change in slope relative to the
    base case

27
Adding qualitative variables with more than two
values
  • Create a dummy variable for each value of the
    qualitative variable
  • Make sure you leave at least one of the dummy
    variables out of the model when you run it using
    Excel
  • Example - the weight loss data analyzed in the
    Qualitative / Quantitative Interactions section
    of the packet

28
A statistical test to compare two regression
models
  • Basic idea - compare reduced and complete
    models

(Reduced)
(Complete)
29
Comparing two models, cont.
  • Important - every variable in the reduced model
    must also be in the complete model
  • Calculate the comparison statistic using numbers
    from both regression outputs

30
Comparing two models, cont.
  • A confusion - there are two ways to calculate the
    value of the statistic
  • The books method

31
Comparing two models, cont.
  • And the method shown in the packet
  • these will always give the same answer!

32
Comparing two models, cont.
  • This F statistic has an F sampling
    distribution with k - g, n-k1 d.f.
  • The rejection region is in the upper-tail only

33
Fitting curved models to data
  • If we use a polynomial model
  • we can still estimate the parameters of the model
    using regression

34
Fitting curved models, cont.
  • In Excel, a column for each power of the values
    of the independent variable must be created
  • Example - the chicken feed supplement problem in
    the An Example of a One Variable, Second Order
    Model in the coursepack
Write a Comment
User Comments (0)
About PowerShow.com