Model Building, Estimation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Model Building, Estimation

Description:

... in determining the price of house are its size, number of bedrooms, and lot size. Accordingly he gathered relevant data on a random sample of 100 recently sold ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 34
Provided by: hrcla
Category:

less

Transcript and Presenter's Notes

Title: Model Building, Estimation


1
Model Building, Estimation Prediction
Topics Motivational Example Multicollinearity
(Redundancy) Avoiding Multicollinearity Stepwise
Procedure Some Caveats Estimation Prediction
with Regression Models
2
Problem Scenario
  • A real estate agent wanted to develop a model to
    predict the selling price of a home. The agent
    believed that the most important variable in
    determining the price of house are its size,
    number of bedrooms, and lot size. Accordingly he
    gathered relevant data on a random sample of 100
    recently sold homes.

3
Problem Scenario Regression Model
4
Contradicting conclusions From Regression Output
  • Global F-test is highly significant
  • At least one explanatory variable is useful in
    predicting home price
  • All t-tests indicate none of the explanatory
    variables have significant marginal contribution
  • Home price decreases as lot size increases

5
Whats Going On Here?
High correlation among X-variables leads to
redundancy
6
MultiCollinearity
  • Exists when any X-variable can be expressed as a
    linear combination of the other X-variables this
    is root cause
  • E.g. X1 0.9 X2 X3 0.4X2 0.3X1
  • Another symptom includes unstable coefficient
    estimates from one sample to the next high
    variance of bi

7
Implications of MultiCollinearity
  • Interpretation of regression coefficients not
    meaningful in presence of multiC
  • Is often a problem in models that contain
    interaction terms and quadratic terms
  • E.g. X1, X2, X1X2, X12, X22
  • Does not affect predictions
  • Significant F-test with one or more significant
    t-tests does NOT imply multiC

8
Avoiding Multicollinearity
  • If chief purpose of the model is to predict Y
    with no interest in interpreting relationships,
    ignore multiC
  • Inspect correlation matrix and choose X variables
    that have highest correlation with Y but low
    correlation with other Xs
  • Eliminate insignificant X variables from model in
    stepwise fashion

9
Avoidance Example
  • H_Size has highest correlation with Price
  • H_Size also highly correlated with Lot_Sz and
    Bedrooms
  • Choose 1-variable model with H_Size only

10
Manual Stepwise Regression Review
  • Include all reasonable potential X variables in
    the model
  • If all t-stat gt 2 or p-value lt .05, stop
    accept model
  • Else, drop X variable with lowest t-stat or
    highest p-value and rerun regression
  • Go back to step 2 and continue until all t-stat
    gt 2 or p-value lt .05

11
Automatic Stepwise Regression
  • Is a search process that adds or deletes
    variables at each step until no changes can
    improve the model
  • Three variants available
  • Forward
  • Backward
  • General

12
Variants of Stepwise Regression
  • Forward procedure begins with no explanatory
    variables in the model and successively adds one
    at a time until no new explanatory variable makes
    a significant contribution
  • Typical criterion to enter p-value lt .05

13
Variants of Stepwise Regression
  • Backward procedure begins with all potential
    explanatory variables in the model and deletes
    them one at a time until further deletion would
    do more harm than good
  • Typical criterion to leave p-value gt .05

14
Variants of Stepwise Regression
  • vGeneral procedure much like Forward variant but
    a variable that enters the model could be deleted
    in a later step
  • Typical criterion to enter p-value lt .05
  • Typical criterion to leave p-value gt .10

15
Application of General Stepwise Predicting
Monthly Rent Payment
16
Caveats with Stepwise
  • Do NOT use automatic procedure mindlessly. Can
    get nonsensical models
  • Does NOT guarantee best model
  • Regard number of X variables given in final model
    as a guide to how many should be your target.
    Check for glaring omissions or inclusions

17
Running Stepwise Regression in StatTools
  • Name the data set in the usual way
  • Place the cursor anywhere in the spreadsheet and
    click on the Regression Classification icon
    (3rd from right)
  • Select Regression then arrow down to highlight
    the regression type forward, backward, (general)
    stepwise

18
Running Stepwise Regression in StatTools
  • By clicking insert a check mark in the box next
    to the Y variable (D) and X-variables (I)
  • By clicking insert a check mark in the box next
    to the advanced option include detailed step
    information
  • Accept the default radio button Use p-values,
    adjust criteria to enter or leave (if needed)
    then click O.K.

19
Prediction Interval for a new Individual
Observation
  • Best point estimate for response variable (Y)
    when explanatory variables take on given values
    is given by plugging given values into final
    regression equation

20
Prediction Interval for a new Individual
Observation
  • Account for sampling variation by expressing
    prediction interval

21
Prediction Interval for a new Individual
Observation
  • tmult value depends on level of confidence
  • Typically 95 level employed
  • Lower and upper prediction limits available in
    StatTools

22
Example Prediction Interval for new Individual
Obs.
  • A particular family not included in the study has
    the following description
  • Family size 5, Located in NE

23
Example Prediction Interval for new Individual
Obs.
  • Rents home, First wage earner 50K
  • Second wage earner 20K
  • Avg. monthly util. 200, total debt 5K

24
StatTools Prediction Interval for new obs.
  • With 95 confidence we predict that the a new
    family fitting this description will pay between
    442 and 1,409 per month in rent

25
Confidence Interval for Mean Obs. with Given
Characteristic
  • Best point estimate for response variable (Y)
    same as for prediction interval given by
    plugging given values into final regression
    equation

26
Confidence Interval for Mean Obs. with Given
Characteristic
  • Sampling variation smaller for estimating mean
    than for predicting individual new observation

27
Confidence Interval formula for mean Obs.
  • Lower and upper confidence limits NOT available
    in StatTools
  • Easy hand calculation in Excel

28
Example Confidence Interval for mean of all Obs.
  • Estimate mean monthly payment for all families
    having the following description
  • Family size 5, Located in NE

29
Example Confidence Interval for mean of all Obs.
  • Rents home, First wage earner 50K
  • Second wage earner 20K
  • Avg. monthly util. 200, total debt 5K

30
Excel Calculation of Confidence Interval for mean
obs.
  • From StatTools, Yhat 925.35

31
Excel Calculation of Confidence Interval for mean
obs.
  • We are 95 confident that families fitting this
    description will pay on average between 905 and
    947 per month in rent

32
Concluding Remarks about Use of Regression Model
  • Some useful variables were found to predict and
    estimate monthly rent/mortage payments
  • R2 of 53 implies there might be other useful
    variables not yet considered

33
Concluding Remarks about Use of Regression Model
  • Wide prediction interval due to large Se
  • Need to investigate assumption violations
Write a Comment
User Comments (0)
About PowerShow.com