Title: Model Building
1Model Building
219.1 Introduction
- Regression analysis is one of the most commonly
used techniques in statistics. - It is considered powerful for several reasons
- It can cover variety of mathematical models
- linear relationships.
- non - linear relationships.
- qualitative variables.
- It provides efficient methods for model building,
to select the best fitting set of variables.
319.2 Polynomial Models
- The independent variables may appear as functions
of a number of predictor variables. - Polynomial models of order p with one predictor
variable y b0 b1x b2x2 bpxp e - Polynomial models with two predictor
variablesFor exampley b0 b1x1 b2x2 ey
b0 b1x1 b2x2 b3x1x2 e
4 Polynomial models with one predictor variable
y b0 b1x e
y b0 b1x e
b2x2 e
5y b0 b1x b2x2 e
b3x3 e
6 Polynomial models with two predictor variables
y
b1 gt 0
- First order modely b0 b1x1 e
b2x2 e
b1 lt 0
x1
x2
b2 gt 0
b2 lt 0
7 Polynomial models with two predictor variables
- First order modely b0 b1x1 b2x2 e
The effect of one predictor variable on y is
independent of the effect of the other predictor
variable on y.
X2 3
b0b2(3) b1x1
X2 2
b0b2(3) (b1b3(3))x1
b0b2(2) b1x1
X2 1
b0b2(1) b1x1
b0b2(2) (b1b3(2))x1
b0b2(1) (b1b3(1))x1
x1
8- Second order modely b0 b1x1 b2x2 b3x12
b4x22 e
b5x1x2 e
X2 3
X2 3
y b0b2(3)b4(32) b1x1 b3x12 e
X2 2
X2 2
X2 1
y b0b2(2)b4(22) b1x1 b3x12 e
X2 1
y b0b2(1)b4(12) b1x1 b3x12 e
x1
9 Example 19.1 Location for a new restaurant
- A fast food restaurant chain tries to identify
new locations that are likely to be profitable. - The primary market for such restaurants is
middle-income adults and their children (between
the age 5 and 12). - Which regression model should be proposed to
predict the profitability of new locations?
10- Solution
- The dependent variable will be Gross Revenue
- There are quadratic relationships between Revenue
and each predictor variable. Why?
- Members of middle-class families are more likely
to visit a fast food family than members of poor
or wealthy families.
Revenue b0 b1Income b2Age b3Income2
b4Age2 b5(Income)(Age) e
- Families with very young or older kids will not
visit the restaurant as frequent as families with
mid-range ages of kids.
11 Example 19.2
- To verify the validity of the model proposed in
example 19.1, 25 areas with fast food restaurants
were randomly selected. - Data collected included (see Xm19-02.xls)
- Previous years annual gross sales.
- Mean annual household income.
- Mean age of children
12(No Transcript)
13The model provides a good fit
14The model can be used to make predictions.
However, do not interpret the coefficients or
test them. Multicollinearity is a problem!!
In excel Tools gt Data Analysis gt Correlation
15The multicollinearity can be reduced by
modifying the original predictor variables
Income Income - average income Age Age -
average Age
Income Income - 24.2
Age Age - 8.392
(-.7)(2.11)
(-.7)2
16Regression results of the modified model
Multicolinearity is not a problem anymore
1719.3 Qualitative Independent Variables
- In many real-life situations one or more
independent variables are qualitative. - Including qualitative variables in a regression
analysis model is done via indicator variables. - An indicator variable (I) can assume one out of
two values, zero or one.
1 if a degree earned is in Finance 0 if a
degree earned is not in Finance
1 if the temperature was below 50o 0 if the
temperature was 50o or more
1 if a first condition out of two is met 0 if a
second condition out of two is met
1 if data were collected before 1980 0 if data
were collected after 1980
I
18Example 17.1 - continued
- The dealer believes that color is a variable that
affects a cars price. - Three color categories are considered
- White
- Silver
- Other colors
- Note Color is a qualitative variable.
And what about Other colors? Set I1 0 and I2
0
19- Solution
- the proposed model is y b0 b1(Odometer)
b2I1 b3I2 e - The data
White car
Other color
Silver color
20From Excel we get the regression equation PRICE
6350-.0278(ODOMETER)45.2I1148I2
For one additional mile the auction
price decreases by 2.78 cents.
A white car sells, on the average, for 45.2
more than a car of the Other color category
A silver color car sells, on the average, for
148 more than a car of the Other color
category
The equation for a car of silver color
Price 6350 - .0278(Odometer) 45.2(0) 148(1)
The equation for a car of white color
The equation for a car of the Other
color category.
Price 6350 - .0278(Odometer) 45.2(1) 148(0)
Price 6350 - .0278(Odometer) 45.2(0) 148(0)
21There is insufficient evidence to infer that a
white color car and a car of Other color sell
for a different auction price.
There is sufficient evidence to infer that a
silver color car sells for a larger price than
a car of the Other color category.
2219.4 Regression and the Analysis of Variance
- We can use regression analysis with indicator
variables to conduct ANOVA.
ANOVA model
Regression model
Single factor independent samples.
y b0 b1I1 b2I2 e Example Compare 3
treatments y mi e (i 1,2,3) Single factor
randomized block design. y b0 b1 I1
Example Compare 2 treatments, 3 blocks
b2I2 b3I3 y Ti Bj e (i 1,2 and j
1,2,3)
Indicator variable for treatment 1
Indicator variables for blocks
2319.5 Stepwise Regression
- Multicollinearity may prevent the study of the
relationship between dependent and independent
variables. - To reduce multicollinearity we can use stepwise
regression. - In stepwise regression variables are added or
deleted one at a time, based on their
contribution to the current model.
2419.6 Model Building
- Identify the dependent variable, and clearly
define it. - List potential predictors.
- Bear in mind the problem of multicolinearity.
- Consider the cost of gathering, processing and
storing data. - Be selective in your choice (try to use as little
variables as possible).
25- Gather the required observations (have at least
six observations for each independent variable).
- Identify several possible models.
- A scatter diagram of the dependent variables can
be helpful in formulating the right model. - If you are uncertain, start with first order and
second order models, with and without
interaction. - Try other relationships (transformations) if the
polynomial models fail to provide a good fit. - Use statistical software to estimate the model.
26- Determine whether the required conditions are
satisfied. If not, attempt to correct the
problem. - Select the best model.
- Use the statistical output.
- Use your judgment!!