Model Building - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Model Building

Description:

Regression analysis is one of the most commonly used techniques in statistics. ... The dealer believes that color is a variable that affects a car's price. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: zvigol
Category:
Tags: building | model

less

Transcript and Presenter's Notes

Title: Model Building


1
Model Building
  • Chapter 19

2
19.1 Introduction
  • Regression analysis is one of the most commonly
    used techniques in statistics.
  • It is considered powerful for several reasons
  • It can cover variety of mathematical models
  • linear relationships.
  • non - linear relationships.
  • qualitative variables.
  • It provides efficient methods for model building,
    to select the best fitting set of variables.

3
19.2 Polynomial Models
  • The independent variables may appear as functions
    of a number of predictor variables.
  • Polynomial models of order p with one predictor
    variable y b0 b1x b2x2 bpxp e
  • Polynomial models with two predictor
    variablesFor exampley b0 b1x1 b2x2 ey
    b0 b1x1 b2x2 b3x1x2 e

4
Polynomial models with one predictor variable
  • First order model (p 1)

y b0 b1x e
  • Second order model (p2)

y b0 b1x e
b2x2 e
5
  • Third order model (p3)

y b0 b1x b2x2 e
b3x3 e
6
Polynomial models with two predictor variables
y
b1 gt 0
  • First order modely b0 b1x1 e

b2x2 e
b1 lt 0
x1
x2
b2 gt 0
b2 lt 0
7
Polynomial models with two predictor variables
  • First order modely b0 b1x1 b2x2 e

The effect of one predictor variable on y is
independent of the effect of the other predictor
variable on y.
X2 3
b0b2(3) b1x1
X2 2
b0b2(3) (b1b3(3))x1
b0b2(2) b1x1
X2 1
b0b2(1) b1x1
b0b2(2) (b1b3(2))x1
b0b2(1) (b1b3(1))x1
x1
8
  • Second order modely b0 b1x1 b2x2 b3x12
    b4x22 e

b5x1x2 e
X2 3
X2 3
y b0b2(3)b4(32) b1x1 b3x12 e
X2 2
X2 2
X2 1
y b0b2(2)b4(22) b1x1 b3x12 e
X2 1
y b0b2(1)b4(12) b1x1 b3x12 e
x1
9
Example 19.1 Location for a new restaurant
  • A fast food restaurant chain tries to identify
    new locations that are likely to be profitable.
  • The primary market for such restaurants is
    middle-income adults and their children (between
    the age 5 and 12).
  • Which regression model should be proposed to
    predict the profitability of new locations?

10
  • Solution
  • The dependent variable will be Gross Revenue
  • There are quadratic relationships between Revenue
    and each predictor variable. Why?
  • Members of middle-class families are more likely
    to visit a fast food family than members of poor
    or wealthy families.

Revenue b0 b1Income b2Age b3Income2
b4Age2 b5(Income)(Age) e
  • Families with very young or older kids will not
    visit the restaurant as frequent as families with
    mid-range ages of kids.

11
Example 19.2
  • To verify the validity of the model proposed in
    example 19.1, 25 areas with fast food restaurants
    were randomly selected.
  • Data collected included (see Xm19-02.xls)
  • Previous years annual gross sales.
  • Mean annual household income.
  • Mean age of children

12
(No Transcript)
13
The model provides a good fit
14
The model can be used to make predictions.
However, do not interpret the coefficients or
test them. Multicollinearity is a problem!!
In excel Tools gt Data Analysis gt Correlation
15
The multicollinearity can be reduced by
modifying the original predictor variables
Income Income - average income Age Age -
average Age
Income Income - 24.2
Age Age - 8.392
(-.7)(2.11)
(-.7)2
16
Regression results of the modified model
Multicolinearity is not a problem anymore
17
19.3 Qualitative Independent Variables
  • In many real-life situations one or more
    independent variables are qualitative.
  • Including qualitative variables in a regression
    analysis model is done via indicator variables.
  • An indicator variable (I) can assume one out of
    two values, zero or one.

1 if a degree earned is in Finance 0 if a
degree earned is not in Finance
1 if the temperature was below 50o 0 if the
temperature was 50o or more
1 if a first condition out of two is met 0 if a
second condition out of two is met
1 if data were collected before 1980 0 if data
were collected after 1980
I
18
Example 17.1 - continued
  • The dealer believes that color is a variable that
    affects a cars price.
  • Three color categories are considered
  • White
  • Silver
  • Other colors
  • Note Color is a qualitative variable.

And what about Other colors? Set I1 0 and I2
0
19
  • Solution
  • the proposed model is y b0 b1(Odometer)
    b2I1 b3I2 e
  • The data

White car
Other color
Silver color
20
From Excel we get the regression equation PRICE
6350-.0278(ODOMETER)45.2I1148I2
For one additional mile the auction
price decreases by 2.78 cents.
A white car sells, on the average, for 45.2
more than a car of the Other color category
A silver color car sells, on the average, for
148 more than a car of the Other color
category
The equation for a car of silver color
Price 6350 - .0278(Odometer) 45.2(0) 148(1)
The equation for a car of white color
The equation for a car of the Other
color category.
Price 6350 - .0278(Odometer) 45.2(1) 148(0)
Price 6350 - .0278(Odometer) 45.2(0) 148(0)
21
There is insufficient evidence to infer that a
white color car and a car of Other color sell
for a different auction price.
There is sufficient evidence to infer that a
silver color car sells for a larger price than
a car of the Other color category.
22
19.4 Regression and the Analysis of Variance
  • We can use regression analysis with indicator
    variables to conduct ANOVA.

ANOVA model
Regression model
Single factor independent samples.
y b0 b1I1 b2I2 e Example Compare 3
treatments y mi e (i 1,2,3) Single factor
randomized block design. y b0 b1 I1
Example Compare 2 treatments, 3 blocks
b2I2 b3I3 y Ti Bj e (i 1,2 and j
1,2,3)
Indicator variable for treatment 1
Indicator variables for blocks
23
19.5 Stepwise Regression
  • Multicollinearity may prevent the study of the
    relationship between dependent and independent
    variables.
  • To reduce multicollinearity we can use stepwise
    regression.
  • In stepwise regression variables are added or
    deleted one at a time, based on their
    contribution to the current model.

24
19.6 Model Building
  • Identify the dependent variable, and clearly
    define it.
  • List potential predictors.
  • Bear in mind the problem of multicolinearity.
  • Consider the cost of gathering, processing and
    storing data.
  • Be selective in your choice (try to use as little
    variables as possible).

25
  • Gather the required observations (have at least
    six observations for each independent variable).
  • Identify several possible models.
  • A scatter diagram of the dependent variables can
    be helpful in formulating the right model.
  • If you are uncertain, start with first order and
    second order models, with and without
    interaction.
  • Try other relationships (transformations) if the
    polynomial models fail to provide a good fit.
  • Use statistical software to estimate the model.

26
  • Determine whether the required conditions are
    satisfied. If not, attempt to correct the
    problem.
  • Select the best model.
  • Use the statistical output.
  • Use your judgment!!
Write a Comment
User Comments (0)
About PowerShow.com