MODEL BUILDING - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

MODEL BUILDING

Description:

MODEL BUILDING IN REGRESSION MODELS Model Building and Multicollinearity Suppose we have five factors that we feel could linearly affect y. If all 5 are included we ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 27
Provided by: JohnL248
Category:

less

Transcript and Presenter's Notes

Title: MODEL BUILDING


1
  • MODEL BUILDING
  • IN
  • REGRESSION MODELS

2
Model Building and Multicollinearity
  • Suppose we have five factors that we feel could
    linearly affect y. If all 5 are included we
    have
  • y ?0 ?1 x1 ?2 x2 ?3 x3 ?4 x4 ?5 x5
    ?
  • But while the p-value for the F-test
    (Significance F) might be small, one or more (if
    not all) of the p-values for the individual
    t-tests may be large.
  • Question Which factors make up the best
    model?
  • This is called model building

3
Model Building
  • There many approaches to model building
  • Elimination of some (all) of the variables with
    high p-values is one approach
  • Forward stepwise regression builds the model by
    adding one variable at a time.
  • Modified F-tests can be used to test if the a
    certain subset of the variables should be
    included in the model.

4
The Stepwise Regression Approach
  • y ?0 ?1 x1 ?2 x2 ?3 x3 ?4 x4 ?5 x5
    ?
  • Step 1 Run five simple linear regressions
  • y ?0 ?1 x1
  • y ?0 ?2 x2
  • y ?0 ?3 x3
  • y ?0 ?4 x4
  • y ?0 ?5 x5
  • Check the p-values for each
  • Note for simple linear regression Significance F
    p-value for the t-test.

5
Stepwise Regression
  • Step 2 Run four 2-variable linear regressions
  • Check Significance F and p-values for
  • y ?0 ?4 x4 ?1 x1
  • y ?0 ?4 x4 ?2 x2
  • y ?0 ?4 x4 ?3 x3
  • y ?0 ?4 x4 ?5 x5

6
Stepwise Regression
  • Step 3 Run three 3-variable linear regressions
  • y ?0 ?3 x3 ?4 x4 ?1 x1
  • y ?0 ?3 x3 ?4 x4 ?2 x2
  • y ?0 ?3 x3 ?4 x4 ?5 x5
  • Suppose none of these models have all p-values lt
    a -- STOP -- best model is the one with x3 and x4
    only

7
Example
8
Regression on 5 Variables

9
Summary of Results from1-Variable Tests

10
Performing Tests With More Than One Variable
  • Remember the Range for X must be contiguous
  • Use CUT and INSERT CUT CELLS to arrange the X
    columns so that they are next to each other

11
Summary of Results From2-Variable Tests

12
Summary of Results from3-Variable Tests

13
Summary of Results from4-Variable Tests

14
Best Model
  • The best model is the three-variable model that
    includes x1, x4, and x5.

15
TESTING PARTS OF THE MODEL
  • Sometimes we wish to see whether to keep a set of
    variables as a group or eliminate them from the
    model.
  • Example Model might include 3 dummy variables
    to account for how the independent variable is
    affected by a particular season (or quarter) of
    the year.
  • Will either keep all seasons or will keep none
  • The general approach is to assess how much extra
    value these additional variables will add to the
    model.
  • Approach is a Modified F-test

16
Approach Compare Two Models The Full Model
and The Reduced Model
  • Suppose a model consists of p variables and we
    wish to consider whether or not to keep a set of
    p-q of those p variables in the model.
  • Two models
  • Full model p variables
  • Reduced model q variables
  • For notational convenience, assume the last p-q
    of the p variables are the ones that would be
    eliminated.
  • Sample of size n is taken

17
The Modified F-Test
  • Modified F-Test
  • H0 ßq1 ßq2 .. ßp 0
  • HA At least one of these p-q ßs ? 0
  • This is an F-test of the form
  • Reject H0 (Accept HA) if F gt Fa,p-q,n-p-1

18
The Modified F-Statistic
  • For this model, the F-statistic is defined by

19
Example
  • A housing price model (Full model) is proposed
    for homes in Laguna Hills that takes into account
    p 5 factors
  • House size, Lot Size, Age, Whether or not there
    is a pool, Bedrooms
  • A reduced model that takes into account only the
    first of these (q 3) was discussed earlier.
  • Based on a sample of n 38 sales, can we
    conclude that adding these p-q 2 additional
    variables (Pool, Bedrooms) is significant?

20
(No Transcript)
21
The Modified F-Test For This Example
  • Modified F-Test
  • H0 ß4 ß5 0
  • HA At least one of ß4 and ß5 ? 0
  • For a .05, the test is
  • Reject H0 (Accept HA) if F gt F.05,2,32
  • F.05,2,32 can be generated in Excel by
    FINV(.05,2,32) 3.29.

22
Full Model
23
Reduced Model
24
The Partial F-Test
25
The Modified F-Statistic
  • For this model, the modified F-statistic is
  • The critical value of F F.05,2,32 3.29453087
  • 21.43522834 gt 3.29453087
  • There is enough evidence to conclude that
    including Pool and Bedrooms is significant.

26
Review
  • Stepwise regression helps determine a best
    model from a series of possible independent
    variables (xs)
  • Approach
  • Step 1 Run one variable regressions
  • If there is a p-value lt ?, keep the variable with
    lowest p-value as a variable in the model
  • Step 2 Run 2-variable regressions
  • One of the two variables in each model is the one
    determined in Step 1
  • Keep the one with the lowest p-values if both are
    lt ?
  • Repeat with 3, 4, 5 variables, etc. until no
    model as has p-values lt ?
  • Modified F-test for testing the significance of
    parts of the model
  • Compare F to Fa,p-q,DFE(Full), where
  • F ((SSEReduced SSEFull)/(terms
    removed))/MSEFull
Write a Comment
User Comments (0)
About PowerShow.com