Title: Model Building
1Model Building
2Why Model Building is Important
- By model building, we mean writing a model that
will provide a good fit to a set of data and that
will give good estimates of the mean value of y
and good predictions of future values of y. - The goodness of fit of the model, measured by the
coefficient of determination R 2.
3The Two Types of Independent Variables
Quantitative and Qualitative
- Quantitative variable
- Qualitative variable
4Definition 5.1
- The different values of an independent variable
used in regression are called its levels.
5A p th-Order Polynomial with One Independent
Variable
- where p is an integer and b0, b1,, bp are
unknown parameters that must be estimated.
6First-Order (Straight-Line) Model with One
Independent Variable
- Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Slope of the line the change in E(y) for a
1-unit increase in x
7A Second-Order (Quadratic) Model with One
Independent Variable
- where b0, b1, and b2 are unknown parameters that
must be estimated. - Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Shift parameter changing the value of b1
shifts the parabola to the right or left
(increasing the value of b1 causes the parabola
to shift to the right) - b2 Rate of curvature
8Graphs for Two Second-Order Polynomial Models
B2 gt0
B2 lt 0
9Example of the Use of a Quadratic Model
What happens Out here?
10Third-Order Model with One Independent Variable
- Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Shift parameter (shifts polynomial right or
left on the x-axis) - b2 Rate of curvature
- b3 The magnitude of b3 controls the rate of
reversal of curvature for the polynomial
11First-Order Model in k Quantitative Independent
Variables
- where b0, b1,, bk are unknown parameters that
must be estimated. - Interpretation of model parameters
- b0 y intercept the value of E(y) when x 0
- b1 Change in E(y) for a 1-unit increase in x1,
when x2, x3,, xk, are held fixed - b2 Change in E(y) for a 1-unit increase in x2,
when x1, x3,, xk, are held fixed - .
- .
- .
- bk Change in E(y) for a 1-unit increase in xk,
when x1, x2,, xk-1, are held fixed
12Graph
13Contour Lines
- Plot Y versus x1 for different values of x2.
- Plot y versus x1 for x2 1
- Plot y versus x1 for x2 2
- Plot y versus x1 for x2 3
14Example
15Interaction (Second Order) Model with Two
Independent Variables
- Interpretation of Model Parameters
- b0 y intercept the value of E(y) when x1
x2 0 - b1 and b2 Changing b1 and b2 causes the surface
to shift along the x1 and x2 axes - b3 Controls the rate of twist in the ruled
surface (see Figure 5.10)
16Continued
- When one independent variable is held fixed, the
model - produces straight lines with the following
slopes - b1 b3 x2 Change in E(y) for a 1-unit increase
in x1, when x2 is held fixed - b2 b3 x1 Change in E(y) for a 1-unit increase
in x2, when x1 is held fixed
17Definition 5.2
- Two variables x1 and x2 are said to interact if
the change in E(y) for a 1-unit change in x1
(when x2 is held fixed) is dependent on the value
to x2.
18Graph
19Contours
20Complete Second-Order Model with Two Independent
Variables
- Interpretation of Model Parameters
- b0 y intercept the value of E(y) when x1
x2 0 - b1 and b2 Changing b1 and b2 causes the surface
to shift along the x1 and x2 axes - b3 The value of b3 controls the rotation of
the surface - b4 and b5 Sign and values of these parameters
control the type of surface and the rates of
curvature - Three types of surfaces may be produced by a
second-order model. - A paraboloid that opens upward (Figure 5.12a)
- A paraboloid that opens downward (Figure 5.12b)
- A saddle-shaped surface (Figure 5.12c)
21Complete Second-Order Model with Three
Quantitative Independent Variables
- where b0, b1,, b9 are unknown parameters that
must be estimated.
22Coding Procedure for Observational Data
- Let
- x Uncoded quantitative independent variable
- u Coded quantitative independent variable
- Then if x takes values x1, x2,, xn for the n
data - points in the regression analysis, let
- where sx is the standard deviation of the x
values, i.e.,
23Procedure for Writing with One Qualitative
Independent Variable at k Levels (A,B,C,D,)
- where
- The number of dummy variables for a single
qualitative variable is always 1 less than the
number of levels for the variable. Then, assuming
the base level is A, the mean for each level is
24Continued
25Population Means
- Show Setup
- Example Page 280
26Table 8.4 Summary of the Sample Results for Five
Populations
27Multiple t tests
28Analysis of Variance Procedures
- Each of the five populations has a normal
distribution. - The variances of the five populations are equal
that is - The five sets of measurements are independent
random samples from their respective populations.
29The Null and Alternative Hypotheses
- (i.e., the t population means are equal)
- At least one of the t population means differs
from the rest.
30FIGURE 8.5Distributions of four populations that
satisfy AOV assumptions
31Model
32Main Effect Model with Two Qualitative
Independent Variables, One at Three Levels (F1,
F2, F3) and the Other at Two Levels (B1, B2)
33Interaction Model with Two Qualitative
Independent Variables, One at Three Levels (F1,
F2, F3) and the Other at Two Levels (B1, B2)
34Population Means
35Factorial Treatment Structure in a Completely
Randomized Design
A factorial experiment is an experiment in which
the response y is observed at all factor-level
combinations of the independent variables.
36Population Parameters A by B
37Population Parameters 2 by 2
38Main Effects
39Figure 15.6a Illustration of the Absence of
Interaction in a 2 x 2 Factorial Experiment
Mean response
Factors A and B do not interact
40Figure 15.6b,c Illustration of the Presence of
Interaction in a 2 x 2 Factorial Experiment
Factors A and B interact
Level 1, factor B Level 2, factor B
41Population Parameters 2 by 2 No Interaction
42Population Parameters 2 by 2 Interaction
43engine performance example
44Graph of sample means for engine performance
example
45Pattern of the Model Relating E(y) to k
Qualitative Independent Variables
- Main effect terms for all independent variables
- All two-way interaction terms between pairs of
independent variables - All three-way interaction terms between
different groups of three independent variables -
- All k-way interaction terms for the k
independent variable
46Models with Both Quantitative and Qualitative
Independent Variables
- Perhaps the most interesting data analysis
problems are those that involve both quantitative
and qualitative independent variables. For
example, suppose mean performance of a diesel
engine is a function of one qualitative
independent variable,engine fuel type at levels
F1, F2, and F3 and one quantitative independent
variable, engine speed in revolutions per minute
(rpm). We will proceed to build a model in
stages, showing graphically the interpretation
that we would give to the model at each stage.
This will help you see the contribution of
various terms in the model.
47Analysis of Covariance
48Example 16.14
49Simple Model
Covariate
Common Slope
Factor Level
50Simple Model
51Hypothesis testing
52SPSS
53SPSS Simple Model
54SPSS - Simple
What is being tested ?
55Estimates
eq1 Predicted sales 17.368
.899prev_sales
eq2 Predicted sales 12.292
.899prev_sales
eq3 Predicted sales 4.391
.899prev_sales
56More Complex Model
Covariate
Different Slopes
Factor Level
57Complex Model
58Hypothesis testing
59SPSS - Complex
60SPSS - Complex
What is being tested?
61SPSS Complex
What are the prediction equations ?
62Which model is appropriate?
- Simple ??
- Complex ??
- We do not know at this point
63Need to test
64L Matrix
- /lmatrix betas all 0 0 0 1 -1 0
- all 0 0 0 1 0 -1
65Additional topics
- Expected Marginal Means
- Test at some other X
- RSQ
- Design Matrix
66Problems Areas
- Multi-colinearity
- Problem Points
- Non-constant variance as a function of the
independents - Variable selection
67External Model Validation
- Models that fit the sample data well may not be
successful predictors of y when applied to new
data. For this reason, it is important to assess
the validity of the regression model in addition
to its adequacy before using it in practice. - Model Validation involves an assessment of how
the fitted regression model will perform in
practice - Examining the predicted values
- Examining the estimated model parameters
- Collecting new data for prediction
- Data-splitting (cross validation)