Title: Regression Analysis Model Building
1Lesson 10
2Regression Analysis Model Building
- General Linear Model
- Determining When to Add or Delete Variables
- Analysis of a Larger Problem
- Variable-Selection Procedures
- Residual Analysis
- Multiple Regression Approach
- to Analysis of Variance and
- Experimental Design
3General Linear Model
- Models in which the parameters (?0, ?1, . . . ,
?p ) all - have exponents of one are called linear models.
- First-Order Model with One Predictor Variable
- Second-Order Model with One Predictor Variable
- Second-Order Model with Two Predictor Variables
- with Interaction
4General Linear Model
- Often the problem of nonconstant variance can be
- corrected by transforming the dependent variable
to a - different scale.
- Logarithmic Transformations
- Most statistical packages provide the ability to
apply - logarithmic transformations using either the
base-10 - (common log) or the base e 2.71828... (natural
log). - Reciprocal Transformation
- Use 1/y as the dependent variable instead of y.
5 Transforming y
- Transforming y. If residual vs y-hat is convex up
lower the power on y. - If residual vs y-hat is convex down increase the
power on y - Examples 1/y2-1/y-1/y.5 log y y y2y3
6Determining When to Add or Delete Variables
- F Test
- To test whether the addition of x2 to a model
involving x1 (or the deletion of x2 from a model
involving x1and x2) is statistically significant
7Example
- In a regression analysis involving 27
observations, the following estimated regression
equation was developed - For this estimated regression SST1550 and
SSE520 - a. At alpha .05 test whether x1 is significant
- Suppose that variables x2 and x3 are added to the
model and the following regression is obtained - For this estimated regression equation SST1550
and SSE100 - Use an F test and an alpha level .05 level of
significance to determine whether x2 and x3
contribute significantly to the model
8Example
9Variable-Selection Procedures
- Stepwise Regression
- At each iteration, the first consideration is to
see whether the least significant variable
currently in the model can be removed because its
F value, FMIN, is less than the user-specified
or default F value, FREMOVE. - If no variable can be removed, the procedure
checks to see whether the most significant
variable not in the model can be added because
its F value, FMAX, is greater than the
user-specified or default F value, FENTER. - If no variable can be removed and no variable can
be added, the procedure stops.
10Variable-Selection Procedures
- Forward Selection
- This procedure is similar to stepwise-regression,
but does not permit a variable to be deleted. - This forward-selection procedure starts with no
independent variables. - It adds variables one at a time as long as a
significant reduction in the error sum of squares
(SSE) can be achieved.
11Variable-Selection Procedures
- Backward Elimination
- This procedure begins with a model that includes
all the independent variables the modeler wants
considered. - It then attempts to delete one variable at a time
by determining whether the least significant
variable currently in the model can be removed
because its F value, FMIN, is less than the
user-specified or default F value, FREMOVE. - Once a variable has been removed from the model
it cannot reenter at a subsequent step.
12Variable-Selection Procedures
- Best-Subsets Regression
- The three preceding procedures are
one-variable-at-a-time methods offering no
guarantee that the best model for a given number
of variables will be found. - Some software packages include best-subsets
regression that enables the use to find, given a
specified number of independent variables, the
best regression model. - Minitab output identifies the two best
one-variable estimated regression equations, the
two best two-variable equation, and so on.
13Autocorrelation or Serial Correlation
- Serial correlation or autocorrelation is the
violation of the assumption that different
observations of the error term are uncorrelated
with each other. It occurs most frequently in
time series data-sets. In practice, serial
correlation implies that the error term from one
time period depends in some systematic way on
error terms from another time periods. - The test for serial correlation that is most
widely used is the Durbin-Watson d test.
14Residual Analysis Autocorrelation
- Durbin-Watson Test for Autocorrelation
- Statistic
- The statistic ranges in value from zero to four.
- If successive values of the residuals are close
together (positive autocorrelation), the
statistic will be small. - If successive values are far apart (negative
auto- - correlation), the statistic will be large.
- A value of two indicates no autocorrelation.
15General Linear Model
- Models in which the parameters (?0, ?1, . . . ,
?p ) have - exponents other than one are called nonlinear
models. - In some cases we can perform a transformation of
- variables that will enable us to use regression
analysis - with the general linear model.
- Exponential Model
- The exponential model involves the regression
equation -
-
- We can transform this nonlinear model to a
linear model by taking the logarithm of both
sides.
16Chapter 18Forecasting
- Time Series and Time Series Methods
- Components of a Time Series
- Smoothing Methods
- Trend Projection
- Trend and Seasonal Components
- Regression Analysis
- Qualitative Approaches to Forecasting
17Time Series and Time Series Methods
- By reviewing historical data over time, we can
better understand the pattern of past behavior of
a variable and better predict the future
behavior. - A time series is a set of observations on a
variable measured over successive points in time
or over successive periods of time. - The objective of time series methods is to
discover a pattern in the historical data and
then extrapolate the pattern into the future. - The forecast is based solely on past values of
the variable and/or past forecast errors.
18The Components of a Time Series
- Trend Component
- It represents a gradual shifting of a time series
to relatively higher or lower values over time. - Trend is usually the result of changes in the
population, demographics, technology, and/or
consumer preferences. - Cyclical Component
- It represents any recurring sequence of points
above and below the trend line lasting more than
one year. - We assume that this component represents
multiyear cyclical movements in the economy.
19The Components of a Time Series
- Seasonal Component
- It represents any repeating pattern, less than
one year in duration, in the time series. - The pattern duration can be as short as an hour,
or even less. - Irregular Component
- It is the catch-all factor that accounts for
the deviation of the actual time series value
from what we would expect based on the other
components. - It is caused by the short-term, unanticipated,
and nonrecurring factors that affect the time
series.
20Forecast Accuracy
- Mean Squared Error (MSE)
- It is the average of the sum of all the squared
forecast errors. - Mean Absolute Deviation (MAD)
- It is the average of the absolute values of all
the forecast errors. - One major difference between MSE and MAD is that
- the MSE measure is influenced much more by large
- forecast errors than by small errors.
21Example MSE
22Example MAD
23Using Smoothing Methods in Forecasting
- Moving Averages
- We use the average of the most recent n data
values in the time series as the forecast for the
next period. - The average changes, or moves, as new
observations become available. - The moving average calculation is
- Moving Average ?(most recent n data values)/n
24Example
25Using Smoothing Methods in Forecasting
- Weighted Moving Averages
- This method involves selecting weights for each
of the data values and then computing a weighted
mean as the forecast. - For example, a 3-period weighted moving average
would be computed as follows. - Ft 1 w1(Yt - 2) w2(Yt - 1) w3(Yt)
- where the sum of the weights (w values)
is 1.
26Using Smoothing Methods in Forecasting
- Exponential Smoothing
- It is a special case of the weighted moving
averages method in which we select only the
weight for the most recent observation. - The weight placed on the most recent observation
is the value of the smoothing constant, a. - The weights for the other data values are
computed automatically and become smaller at an
exponential rate as the observations become
older.
27Using Smoothing Methods in Forecasting
- Exponential Smoothing
- Ft 1 ?Yt (1 - ?)Ft
-
- where Ft 1 forecast value for period t
1 - Yt actual value for period t 1
- Ft forecast value for period t
- ? smoothing constant (0 lt ? lt 1)
28Example Executive Seminars, Inc.
- Executive Seminars specializes in conducting
- management development seminars. In order to
better - plan future revenues and costs, management would
like - to develop a forecasting model for their Time
- Management seminar.
- Enrollments for the past ten TM seminars are
- (oldest) (newest)
- Seminar 1 2 3 4 5 6 7 8 9 10
- Enroll. 34 40 35 39 41 36 33 38 43 40
29Example Executive Seminars, Inc.
- Exponential Smoothing
- Let ? .2, F1 Y1 34
- F2 ?Y1 (1 - ?)F1
- .2(34) .8(34)
- 34
- F3 ?Y2 (1 - ?)F2
- .2(40) .8(34)
- 35.20
- F4 ?Y3 (1 - ?)F3
- .2(35) .8(35.20)
- 35.16
- . . . and so on
30Example Executive Seminars, Inc.
- Seminar Actual Enrollment Exp. Sm.
Forecast - 1 34 34.00
- 2 40 34.00
- 3 35 35.20
- 4 39 35.16
- 5 41 35.93
- 6 36 36.94
- 7 33 36.76
- 8 38 36.00
- 9 43 36.40
- 10 40 37.72
- 11 Forecast for the next seminar 38.18
31Using Trend Projection in Forecasting
- Equation for Linear Trend
- Tt b0 b1t
- where
- Tt trend value in period
t - b0 intercept of the trend
line - b1 slope of the trend
line - t time
- Note t is the independent variable.
32Using Trend Projection in Forecasting
- Computing the Slope (b1) and Intercept (b0)
- b1 ?tYt - (?t ?Yt)/n
- ?t 2 - (?t )2/n
- b0 (?Yt/n) - b1?t/n Y - b1t
-
- where
- Yt actual value in period t
- n number of periods in time series
33Example Sailboat Sales, Inc.
- Sailboat Sales is a major marine dealer in
Chicago. The firm has experienced tremendous
sales growth in the past several years.
Management would like to develop a forecasting
method that would enable them to better control
inventories. - The annual sales, in number of boats, for
one particular sailboat model for the past five
years are - Year 1 2 3 4 5
- Sales 11 14 20 26 34
34Example Sailboat Sales, Inc.
- Linear Trend Equation
- t Yt tYt t 2
- 1 11 11 1
- 2 14 28 4
- 3 20 60 9
- 4 26 104 16
- 5 34 170 25
- Total 15 105 373 55
35Example Sailboat Sales, Inc.
- Trend Projection
- b1 373 - (15)(105)/5 5.8
- 55 - (15)2/5
- b0 105/5 - 5.8(15/5) 3.6
- Tt 3.6 5.8t
- T6 3.6 5.8(6) 38.4