Title: Forecasting Theory
1Forecasting Theory
2What is Forecasting?
- Forecast - to calculate or predict some future
event or condition, usually as a result of
rational study or analysis of pertinent data - Websters Dictionary
3What is Forecasting?
- Forecasting methods
- Qualitative
- intuitive, educated guesses that may or may not
depend on past data - Quantitative
- based on mathematical or statistical models
- The goal of forecasting is to reduce forecast
error.
4What is Forecasting?
- We will consider two types of forecasts based on
mathematical models - Regression forecasting
- Single-variable (time series) forecasting
5What is Forecasting?
- Regression forecasting
- We use the relationship between the variable of
interest and the other variables that explain its
variation to make predictions. - The explanatory variables are non-stochastic.
- The explanatory variables are independent the
variable of interest is dependent.
6What is Forecasting?
- Regression forecasting
- Height is the independent variable
- Weight is the dependent variable
7What is Forecasting?
- Single-variable (time series) forecasting
- We use past history of the variable of interest
to predict the future. - Predictions exploit correlations between past
history and the future. - Past history is stochastic.
8What is Forecasting?
- Single-variable (time series) forecasting
9Normal Distribution
- A continuous random variable X is normally
distributed if its density function is given by - In this case
- EX ?
- var(X) ? 2.
10Normal Density Function
11Maximum Likelihood Estimation
- Suppose that Y1, , Yn are continuous random
variables with respective densities fi(y ?) that
depend on some common parameter ? (which can be
vector-valued). Assume that - ? is unknown
- we observe y1, , yn.
- We want to estimate the value of ? associated
with Y1, , Yn . Intuitively, we want to find the
value of ? that is most likely to give rise to
the data sample y1, , yn.
12Maximum Likelihood Estimation
Example Consider the data sample y1, , y20
below. Assume that all the densities are the
same, and that the unknown parameter is the mean.
Which of the two distributions most likely
produced the data sample below?
13Maximum Likelihood Estimation
- Assume that the observations are independent. We
define the likelihood function - In maximum likelihood estimation, we choose the
value of ? that maximizes the likelihood
function.
14Maximum Likelihood Estimation
- Furthermore, since logarithm is a monotone
increasing function, then the value of ? that
maximizes () also maximizes the log of the
likelihood function
15Maximum Likelihood Estimation and Least Squares
Estimation
- Now assume that Yi is normally distributed with
mean ?i(?), where ? is unknown. Assume also that
all of the densities have a common known variance
? 2. Then the log likelihood function becomes
16Maximum Likelihood Estimation and Least Squares
Estimation
- Hence maximizing L(? y1, , yn) is equivalent
to minimizing the sum of squared deviations - The value of ? that minimizes S(?) is called the
least squares estimate of ?.
17Regression Forecasting
- We suppose that Y is a variable of interest, and
X1, , Xp are explanatory or predictor variables
such that - Y h(X1, , Xp ß).
- h is the mathematical model that determines the
relationship between the variable of interest and
the explanatory variables - ? (ß0, , ßm)? are the model parameters.
18Regression Forecasting
- Further assume that
- we know h (i. e., the model is known), but we do
not know ß - we have noisy measurements of the variable of
interest, Y - yi h(xi1, , xip ß) ei.
19Regression Forecasting
- the random noise ei satisfy
- Eei 0 for all i.
- var(ei) ? 2, a constant that does not depend on
i. - The eis are uncorrelated.
- The eis are each normally distributed (which
implies that they are independent), i.e., - ei N(0, ? 2).
20Regression Forecasting
- Note that since
- yi h(xi1, , xip ß) ei
- and
- ei N(0, ? 2)
- then
- yi N(h(xi1, , xip ß) , ? 2).
21Regression Forecasting
- For any values of the explanatory variables x1,
, xp, if ? is known, we can predict y as - y h(x1, , xp ß).
- Since ? is unknown, we use least squares
estimation to estimate ?, which we denote by
. In this case, we forecast y as
22Regression Forecasting
Example
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
What are the best values for ?0 and ?1?
23Regression Forecasting
Residuals are the differences between the
observed values and the predicted values. We
define the residual for the ith observation
as A good set of parameters is one for
which the residuals are small.
ei
24Regression Forecasting
- More specifically, if
- then we choose to minimize
25Regression Forecasting
- Examples of Regression Models
26Constant Mean Regression
- Suppose that the yis are a constant value plus
noise - yi ?0 ei,
- i.e., ? ?0. Hence
- yi N(?0, ? 2).
- We want to determine the value of ?0 that
minimizes
27Constant Mean Regression
- Taking the derivative of S(?0) gives
- Finally setting this equal to zero leads to
- Hence the sample mean is the least squares
estimator for ?0.
28Constant Mean Regression
y
98.30963
99.18569
101.2684
97.52997
103.4013
98.84521
111.1842
98.70812
93.08922
29Simple Linear Regression
- Consider the model
- yi ?0 ?1xi ei,
- i.e., ? (?0, ?1). Hence
- yi N(?0 ?1xi, ? 2).
- We want to determine the values of ?0 and ?1 that
minimize
30Simple Linear Regression
- Setting the first partial derivatives equal to
zero gives
31Simple Linear Regression
- Solving for ?0 and ?1 leads to the least squares
estimates - (This is left as a homework exercise.)
32Simple Linear Regression
- Define e (e1, , en)?, where
- The equations
- imply that
33Simple Linear Regression
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
34Simple Linear Regression
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
35Simple Linear Regression
- Example (continued)
- Regression equation
-
36General Linear Regression
- Consider the linear regression model
- or
- where xi (1, xi1, , xip)? and ? (?0, , ?p)?.
37General Linear Regression
- Suppose that we have n observations yi. We
introduce matrix notation and define - y (y1, , yn)?, e (e1, , en)?,
- Note that y is n ? 1, e is n ? 1, and X is n ? (p
1).
38General Linear Regression
- Then we can write the regression model as
- Note that y has a mean vector and covariance
matrix given by - where I is the n ? n identity matrix.
39General Linear Regression
- Note that by var(y), we mean the matrix
- Note that this is a symmetric matrix.
40General Linear Regression
- We assume that the matrix X, which is called the
design matrix, is of full rank. This means that
the columns of the X matrix are not linearly
related. - A violation of this assumption would indicate
that some of the independent variables are
redundant, since at least one of the variables
would contain the same information as a linear
combination of the others.
41General Linear Regression
- In matrix notation, the least squares criterion
can be expressed as minimizing - The least squares estimator is given by
- (Proof is omitted.)
42General Linear Regression
- It follows that the prediction for Y is given by
43General Linear Regression
- Example Simple Linear Regression
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
44General Linear Regression
- Example Simple Linear Regression (p 1)
-
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
45General Linear Regression
- Example Simple Linear Regression (p 1)
-
46Properties of Least Squares Estimators
- The least squares estimator of ? is unbiased.
- The (p 1) ? (p 1) covariance matrix of the
least squares estimator is given by - If the errors are normally distributed then
47Properties of Least Squares Estimators
- It follow from the derivation of the least
squares estimate that the residuals satisfy - X?e 0.
- In particular,
48Testing the Regression Model
- How well does the regression line describe the
relationship between the independent and
dependent variables?
49Testing the Regression Model
Explained deviation
Unexplained deviation
50Testing the Regression Model
- Lets analyze these variations
- But
51Testing the Regression Model
- Hence
- Total sum of squares (SSTO)
- Sum of squares due to regression (SSR)
- Sum of squares due to error (SSE)
52Coefficient of Determination
- The coefficient of determination R2 is a measure
of how well the model is doing in explaining the
variation of the observations around their mean - A large R2 (near 1) indicates that a large
portion of the variation is explained by the
model. - A small value of R2 (near 0) indicates that only
a small fraction of the variation is explained by
the model.
53Correlation Coefficient
- The correlation coefficient R is the square root
of the coefficient of determination. For simple
linear regression, it can also be expressed as - It varies between -1 and 1, and quantifies the
strength of the association between the
independent and dependent variables. A value of R
close to 1 indicates a strong positive
correlation a value close to -1 indicates a
strong negative correlation. A value close to
zero indicates weak or no correlation.
54Correlation
positive correlation
negative correlation
no correlation
55Testing the Regression Model
- Example Simple Linear Regression
- Regression equation
-
x y
1 2.6
2.3 2.8
3.1 3.1
4.8 4.7
5.6 5.1
6.3 5.3
SSTO 7.573
SSR 7.172
SSE 0.394
R2 0.947
56Estimating the Variance
- So far we have assumed that we know the variance
? 2. But in general this value will be unknown.
We can estimate ? 2 from the sample data by
57Confidence Interval for Regression Line
- Simple Linear Regression Suppose x0 is a
specified value of the independent variable. A
100?(1-?) confidence interval for the value of
the mean of the dependent variable y0 at x0 is
given by
58Prediction Interval for an Observation
- Simple Linear Regression A 100?(1-?) prediction
interval for an observation y0 associated with x0
is given by
59What is Forecasting (Revisited)?
- Statistical forecasting is not predicting
- a value
- Statistical forecasting is predicting
- the expected value
- variability about the expected value
60Homework
- Complete the proof for the result that for the
simple linear regression model, - Prove that if Y is a random variable with finite
expected value, then the constant c that
minimizes E(Y c)2 is c EY.
61Homework
- Suppose that the following data represent the
total costs and the number of units produced by a
company. - Graph the relationship between X and Y.
- Determine the simple linear regression line
relating Y to X. - Predict the costs for producing 10 units. Give a
95 confidence interval for the costs, and for
the expected value (mean) of the costs associated
with 10 units. - Compute the SSTO, SSR, SSE, R and R2. Interpret
the value of R2.
Total Cost (Y) 25 11 34 23 32
Units Produced (X) 5 2 8 4 6
62Homework
- Consider the fuel consumption data on the next
slide, and the following model which relates fuel
consumption (Y) to the average hourly temperature
(X1) and the chill index (X2) - Plot Y versus X1 and Y versus X2.
- Determine the least squares estimates for the
model parameters. - Predict the fuel consumption when the temperature
is 35 and the chill index is 10. - Compute the SSTO, SSR, SSE and R2. Interpret the
value of R2.
63Data for Problem 4
Average Hourly Temperature, xi1 Chill Index, xi2 Weekly Fuel Consumption, yi
28 18 12.4
32.5 24 12.3
28 14 11.7
39 22 11.2
57.8 16 9.5
45.9 8 9.4
58.1 1 8.0
62.5 0 7.5
64References
- Bovas Abraham, Johannes Ledolter, Statistical
Methods for Forecasting, Wiley Series in
Probability and Mathematical Sciences, 1983. - Stanton A. Glantz, Bryan K. Slinker, Primer of
Applied Regression and Analysis of Variance,
Second Edition, McGraw-Hill, Inc., 2001. - Spyros Makridakis, Steven C. Wheelwright, Rob J.
Hyndman, John Wiley Sons, Inc., 1998.