Forecasting - PowerPoint PPT Presentation

About This Presentation
Title:

Forecasting

Description:

Title: Forecasting Author: Juei-Chao Chen Last modified by: USER Created Date: 4/17/1996 5:08:18 PM Document presentation format: Other titles – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 71
Provided by: Juei
Category:

less

Transcript and Presenter's Notes

Title: Forecasting


1
Forecasting Methods
Forecasting Methods
Quantitative
Qualitative
Causal
Time Series
Smoothing
Trend Projection
Trend Projection Adjusted for Seasonal Influence
2
General Linear Model
  • Models in which the parameters (?0, ?1, . . . ,
    ?p ) all
  • have exponents of one are called linear models.
  • It does not imply that the relationship
    between y and the xis is linear.
  • A general linear model involving p independent
    variables is
  • Each of the independent variables z is a function
    of x1, x2, ... , xk (the variables for which data
    have been collected).

3
General Linear Model
  • The simplest case is when we have collected data
    for just one variable x1 and want to estimate y
    by using a straight-line relationship. In this
    case z1 x1.
  • This model is called a simple first-order model
    with one predictor variable.

4
Estimated Multiple Regression Equation
A simple random sample is used to compute
sample statistics b0, b1, b2, . . . , bp that are
used as the point estimators of the parameters
b0, b1, b2, . . . , bp.
The estimated multiple regression equation is
5
Estimation Process
Multiple Regression Model E(y) ?0 ?1x1 ?2x2
. . . ?pxp e Multiple Regression
Equation E(y) ?0 ?1x1 ?2x2 . . . ?pxp
Unknown parameters are b0, b1, b2, . . . , bp
b0, b1, b2, . . . , bp provide estimates of b0,
b1, b2, . . . , bp
6
Least Squares Method
  • Least Squares Criterion
  • Computation of Coefficient Values

The formulas for the regression
coefficients b0, b1, b2, . . ., bp involve the
use of matrix algebra. We will rely on computer
software packages to perform the calculations.
7
Multiple Regression Equation
  • Example Butler Trucking Company
  • To develop better work schedules, the managers
    want to estimate the total daily travel time for
    their drivers
  • Data

8
Multiple Regression Equation
  • MINITAB Output

9
Multiple Regression Model
  • Example Programmer Salary Survey

A software firm collected data for a
sample of 20 computer programmers. A
suggestion was made that regression analysis
could be used to determine if salary was
related to the years of experience and the
score on the firms programmer aptitude test.
The years of experience, score on the
aptitude test, and corresponding annual salary
(1000s) for a sample of 20 programmers is
shown on the next slide.
10
Multiple Regression Model
Exper.
Score
Score
Exper.
Salary
Salary
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30
24 43 23.7 34.3 35.8 38 22.2 23.1 30 33
11
Multiple Regression Model
Suppose we believe that salary (y) is related
to the years of experience (x1) and the score
on the programmer aptitude test (x2) by the
following regression model
y ?0 ?1x1 ?2x2 ?
where y annual salary (1000) x1 years
of experience x2 score on programmer
aptitude test
12
Solving for the Estimates of ?0, ?1, ?2
Least Squares Output
Input Data
x1 x2 y 4 78 24 7 100 43 .
. . . . . 3 89 30

Computer Package for Solving Multiple Regression P
roblems
b0 b1 b2 R2 etc.
13
Solving for the Estimates of ?0, ?1, ?2
  • Excel Worksheet (showing partial data entered)

Note Rows 10-21 are not shown.
14
Solving for the Estimates of ?0, ?1, ?2
  • Excels Regression Dialog Box

15
Solving for the Estimates of ?0, ?1, ?2
  • Excels Regression Equation Output

Note Columns F-I are not shown.
16
Estimated Regression Equation
SALARY 3.174 1.404(EXPER) 0.251(SCORE)
Note Predicted salary will be in thousands of
dollars.
17
Interpreting the Coefficients
In multiple regression analysis, we
interpret each regression coefficient as
follows
bi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when
all other independent variables are held
constant.
18
Interpreting the Coefficients
b1 1. 404
Salary is expected to increase by 1,404
for each additional year of experience (when
the variable score on programmer attitude test
is held constant).
19
Interpreting the Coefficients
b2 0.251
Salary is expected to increase by 251 for
each additional point scored on the programmer
aptitude test (when the variable years of
experience is held constant).
20
Multiple Coefficient of Determination
  • Relationship Among SST, SSR, SSE

SST SSR SSE
where SST total sum of squares SSR
sum of squares due to regression SSE
sum of squares due to error
21
Multiple Coefficient of Determination
  • Excels ANOVA Output

SSR
SST
22
Multiple Coefficient of Determination
R2 SSR/SST
R2 500.3285/599.7855 .83418
  • In general, R2 always increases as independent
    variables are added to the model.
  • adjusting R2 for the number of independent
    variables to avoid overestimating the impact of
    adding an independent variable

23
Adjusted Multiple Coefficient of Determination
  • n denoting the number of observations
  • p denoting the number of independent variables

24
Adjusted Multiple Coefficient of Determination
  • Excels Regression Statistics

25
Assumptions About the Error Term ?
The error ? is a random variable with mean of
zero.
The variance of ? , denoted by ??2, is the same
for all values of the independent variables.
The values of ? are independent.
The error ? is a normally distributed random
variable reflecting the deviation between the y
value and the expected value of y given by ?0
?1x1 ?2x2 ... ?pxp.
26
Multiple RegressionAnalysis with Two Independent
Variables
  • Graph

27
Testing for Significance
In simple linear regression, the F and t tests
provide the same conclusion.
In multiple regression, the F and t tests have
different purposes.
28
Testing for Significance F Test
The F test is used to determine whether a
significant relationship exists between the
dependent variable and the set of all the
independent variables.
The F test is referred to as the test for
overall significance.
29
Testing for Significance t Test
If the F test shows an overall significance, the
t test is used to determine whether each of the
individual independent variables is significant.
A separate t test is conducted for each of the
independent variables in the model.
We refer to each of these t tests as a test for
individual significance.
30
Testing for Significance F Test
H0 ?1 ?2 . . . ?p 0 Ha One or
more of the parameters is not equal to
zero.
Hypotheses
F MSR/MSE
Test Statistics
Rejection Rule
Reject H0 if p-value lt a or if F gt F? , where
F? is based on an F distribution with p d.f. in
the numerator and n - p - 1 d.f. in the
denominator.
31
Testing for Significance F Test
  • ANOVA Table for A Multiple Regression Model with
    p Independent Variables

32
F Test for Overall Significance
H0 ?1 ?2 0 Ha One or both of the
parameters is not equal to zero.
Hypotheses
For ? .05 and d.f. 2, 17 F.05 3.59 Reject
H0 if p-value lt .05 or F gt 3.59
Rejection Rule
33
F Test for Overall Significance
  • Excels ANOVA Output

p-value used to test for overall significance
34
F Test for Overall Significance
Test Statistics
F MSR/MSE 250.16/5.85 42.76
Conclusion
p-value lt .05, so we can reject H0. (Also, F
42.76 gt 3.59)
35
Testing for Significance t Test
Hypotheses
Test Statistics
Rejection Rule
Reject H0 if p-value lt a or if t lt -t????or t gt
t???? where t??? is based on a t
distribution with n - p - 1 degrees of freedom.
36
t Test for Significance of Individual Parameters
Hypotheses
Rejection Rule
For ? .05 and d.f. 17, t.025 2.11 Reject H0
if p-value lt .05 or if t gt 2.11
37
t Test for Significance of Individual Parameters
  • Excels Regression Equation Output

Note Columns F-I are not shown.
t statistic and p-value used to test for the
individual significance of Experience
38
t Test for Significance of Individual Parameters
  • Excels Regression Equation Output

Note Columns F-I are not shown.
t statistic and p-value used to test for the
individual significance of Test Score
39
t Test for Significance of Individual Parameters
Test Statistics
Conclusions
Reject both H0 ?1 0 and H0 ?2 0. Both
independent variables are significant.
40
Testing for Significance Multicollinearity
The term multicollinearity refers to the
correlation among the independent variables.
When the independent variables are highly
correlated (say, r gt .7), it is not possible
to determine the separate effect of any
particular independent variable on the dependent
variable.
41
Testing for Significance Multicollinearity
If the estimated regression equation is to be
used only for predictive purposes,
multicollinearity is usually not a serious
problem.
Every attempt should be made to avoid including
independent variables that are highly correlated.
42
Modeling Curvilinear Relationships
  • This model is called a second-order model with
    one predictor variable.

43
Modeling Curvilinear Relationships
  • Example Reynolds, Inc.,
  • Managers at Reynolds want to
  • investigate the relationship
  • between length of employment
  • of their salespeople and the
  • number of electronic laboratory
  • scales sold.
  • Data

44
Modeling Curvilinear Relationships
  • Scatter Diagram for the Reynolds Example

45
Modeling Curvilinear Relationships
  • Let us consider a simple first-order model and
    the estimated regression is
  • Sales 111 2.38 Months,
  • where
  • Sales number of electronic laboratory
    scales sold,
  • Months the number of months the
    salesperson
  • has been employed

46
Modeling Curvilinear Relationships
  • MINITAB output first-order model

47
Modeling Curvilinear Relationships
  • Standardized Residual plot first-order model
  • The standardized residual plot suggests that a
    curvilinear relationship is needed

48
Modeling Curvilinear Relationships
  • Reynolds Example The second-order model
  • The estimated regression equation is
  • Sales 45.3 6.34 Months .0345
    MonthsSq
  • where
  • Sales number of electronic laboratory
    scales sold,
  • MonthsSq the square of the number of
    months the
  • salesperson has been
    employed

49
Modeling Curvilinear Relationships
  • MINITAB output second-order model

50
Modeling Curvilinear Relationships
  • Standardized Residual plot second-order model

51
Variable Selection Procedures
  • Stepwise Regression
  • Forward Selection
  • Backward Elimination

Iterative one independent variable at a time is
added or deleted based on the F statistic
52
Variable Selection Stepwise Regression
Any p-value lt alpha to enter ?
Compute F stat. and p-value for each
indep. variable not in model
No
No
Yes
Indep. variable with largest p-value
is removed from model
Any p-value gt alpha to remove ?
Yes
Stop
Compute F stat. and p-value for each
indep. variable in model
Indep. variable with smallest p-value is entered
into model
Start with no indep. variables in model
53
Variable Selection Forward Selection
Start with no indep. variables in model
Compute F stat. and p-value for each
indep. variable not in model
Any p-value lt alpha to enter ?
Indep. variable with smallest p-value is entered
into model
Yes
No
Stop
54
Variable Selection Backward Elimination
Start with all indep. variables in model
Compute F stat. and p-value for each
indep. variable in model
Any p-value gt alpha to remove ?
Indep. variable with largest p-value is removed
from model
Yes
No
Stop
55
Qualitative Independent Variables
In many situations we must work with
qualitative independent variables such as gender
(male, female), method of payment (cash, check,
credit card), etc.
For example, x2 might represent gender where x2
0 indicates male and x2 1 indicates female.
In this case, x2 is called a dummy or indicator
variable.
56
Qualitative Independent Variables
  • Example Programmer Salary Survey
  • As an extension of the problem involving the
  • computer programmer salary survey, suppose
  • that management also believes that the
  • annual salary is related to whether the
  • individual has a graduate degree in
  • computer science or information systems.
  • The years of experience, the score on the
    programmer
  • aptitude test, whether the individual has a
    relevant
  • graduate degree, and the annual salary (1000)
    for each
  • of the sampled 20 programmers are shown on the
    next
  • slide.

57
Qualitative Independent Variables
Exper.
Score
Score
Exper.
Salary
Salary
Degr.
Degr.
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30
No Yes No Yes Yes Yes No No No Yes
Yes No Yes No No Yes No Yes No No
24 43 23.7 34.3 35.8 38 22.2 23.1 30 33
58
Estimated Regression Equation
y b0 b1x1 b2x2 b3x3
x3 is a dummy variable
59
Qualitative Independent Variables
  • Excels Regression Statistics

60
Qualitative Independent Variables
  • Excels ANOVA Output

61
Qualitative Independent Variables
  • Excels Regression Equation Output

Note Columns F-I are not shown.
Not significant
62
More Complex Qualitative Variables
  • If a qualitative variable has k levels, k - 1
    dummy
  • variables are required, with each dummy variable
  • being coded as 0 or 1.

For example, a variable with levels A, B, and C
could be represented by x1 and x2 values of (0,
0) for A, (1, 0) for B, and (0,1) for C.
Care must be taken in defining and interpreting
the dummy variables.
63
More Complex Qualitative Variables
For example, a variable indicating level
of education could be represented by x1 and x2
values as follows
64
Interaction
  • If the original data set consists of observations
    for y and two independent variables x1 and x2 we
    might develop a second-order model with two
    predictor variables.
  • In this model, the variable z5 x1x2 is added to
    account for the potential effects of the two
    variables acting together.
  • This type of effect is called interaction.

65
Interaction
  • Example Tyler Personal Care
  • New shampoo products, two factors believed to
    have the most influence on sales are unit selling
    price and advertising expenditure.
  • Data

66
Interaction
  • Mean Unit Sales (1000s) for the Tyler Personal
    Care Example
  • At higher selling prices, the effect of increased
    advertising
  • expenditure diminishes. These observations
    provide
  • evidence of interaction between the price and
    advertising
  • expenditure variables.

67
Interaction
  • Mean Sales as
  • a Function of
  • Selling Price
  • and Advertising
  • Expenditure

68
Interaction
  • To account for the effect of interaction, use the
    following regression model
  • where
  • y unit sales (1000s),
  • x1 price (),
  • x2 advertising expenditure (1000s).

69
Interaction
  • General Linear Model involving three independent
  • variables (z1, z2, and z3)
  • where
  • y Sales unit sales (1000s)
  • z1 x1 (price) price of the product ()
  • z2 x2 (AdvExp) advertising expenditure
    (1000s)
  • z3 x1x2 (PriceAdv) interaction term
  • (Price
    times AdvExp)

70
Interaction
  • MINITAB Output for the Tyler Personal Care
    Example
Write a Comment
User Comments (0)
About PowerShow.com