Title: Topic 6: Estimation and Prediction of Yh
1Topic 6 Estimation and Prediction of Yh
2Outline
- Estimation and inference of E(Yh)
- Prediction of a new observation
- Construction of a confidence band for the entire
regression line
3Estimation of E(Yh)
- E(Yh) µh ß0 ß1Xh, the mean value of Y for
the subpopulation with XXh - We will estimate E(Yh) by
-
- KNNL use for this estimate, see equation
(2.28) on pp 52
4Theory for Estimation of E(Yh)
- is Normal with mean µh and variance
-
- The Normality is a consequence of the fact that
b0 b1Xh is a linear combination of Yis - See KNNL pp 52-54 for details
5Application of the Theory
- We estimate s2( ) by
- It then follows that
- Details for confidence intervals and significance
tests are consequences
695 Confidence Interval for E(Yh)
- tcs( )
- where tc t(.975, n-2)
- NOTE significance tests can be constructed but
they are rarely used in practice
7Toluca Company Example (pg 19)
- Manufactures refrigeration equipment
- One replacement part manufactured in lots of
varying sizes - Company wants to determine the optimum lot size
- To do this, company needs to first describe the
relationship between work hours and lot size
8Scatterplot w/ regr line
9SAS CODE
Generating the data set data toluca
infile ../data/CH01TA01.txt' input lotsize
hours data other size65 output
size100 output data toluca1 set toluca
other proc print datatoluca1 run
10SAS CODE
Generating the confidence intervals for all
values of X in the data set proc reg
datatoluca1 model hourssize/clm id
lotsize run
clm option generates confidence intervals for the
mean
11Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates
Variable DF ParameterEstimate StandardError t Value Pr gt t
Intercept 1 62.36586 26.17743 2.38 0.0259
lotsize 1 3.57020 0.34697 10.29 lt.0001
Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics
Obs lotsize DependentVariable PredictedValue Std ErrorMean Predict 95 CL Mean 95 CL Mean
1 80 399.0000 347.9820 10.3628 326.5449 369.4191
25 70 323.0000 312.2800 9.7647 292.0803 332.4797
26 65 . 294.4290 9.9176 273.9129 314.9451
27 100 . 419.3861 14.2723 389.8615 448.9106
12Notes
- Standard error affected by how far Xh is from
(see Figure 2.6) - Recall teeter-totter ideaa change in the slope
has bigger impact on Y as you move away from
13Prediction of Yh(new)
- Want to predict value for a new observation at
XXh - Model Yh(new) ß0 ß1Xh ?
- Since E(e)0 same value as for E(Yh)
- Prediction interval, however, relies heavily on
assumption that e are Normally distributed
Note!!
14Prediction of Yh(new)
- Var(Yh(new))Var( )Var(? )
- Then follows that
15Notes
- Procedure can be modified for the mean of m
observations at XXh (see 2.39a and 239b on page
60) - Standard error affected by how far Xh is from
(see Figure 2.6)
16SAS CODE
Generating the prediction intervals for all
values of X in data set proc reg
datatoluca1 model hourslotsize/cli id
lotsize run
cli option generates prediction interval for a
new observation
17Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics
Obs lotsize DependentVariable PredictedValue Std ErrorMean Predict 95 CL Predict 95 CL Predict
1 80 399.0000 347.9820 10.3628 244.7333 451.2307
25 70 323.0000 312.2800 9.7647 209.2811 415.2789
26 65 . 294.4290 9.9176 191.3676 397.4904
27 100 . 419.3861 14.2723 314.1604 524.6117
These are wrongsame as before. Does not include
variability about regression line
18Notes
- The standard error (Std Error Mean Predict)given
in this output is the standard error of
not s(pred) - The prediction interval is correct and wider than
the previous confidence interval
19Notes
- To get correct standard error need to add the
variance about the regression line
20Confidence band for regression line
- Ws( )
- where W22F(1-a 2, n-2)
- This gives combined confidence intervals for
all Xh - Boundary values of confidence bands define a
hyperbola - Will be wider at Xh than single CI
21Confidence band for regression line
- Theory comes from the joint confidence region for
(ß0, ß1 ) which is an ellipse (Stat 524) - We can find an alpha for tc that gives the same
results - We find W2 and then find the alpha for tc that
will give W tc
22SAS CODE
data a1 n25 alpha.10 dfn2 dfdn-2
tsingletinv(1-alpha/2,dfd)
w22finv(1-alpha,dfn,dfd) wsqrt(w2)
alphat2(1-probt(w,dfd)) t_ctinv(1-alphat/2,
dfd) output proc print dataa1 run
23SAS OUTPUT
n alpha dfn dfd tsingle w2 w alphat t_c
25 0.1 2 23 1.71387 5.09858 2.25800 0.033740 2.25800
Used for 90 confidence band
Used for single 90 CI
24SAS CODE
symbol1 vcircle irlclm97 proc gplot
datatoluca plot hourslotsize run
25(No Transcript)
26Estimation of E(Yh) and Prediction of Yh
27SAS CODE
symbol1 vcircle irlclm95 proc gplot
datatoluca plot hourslotsize symbol1
vcircle irlcli95 proc gplot datatoluca
plot hourslotsize run
Confidence intervals
Prediction intervals
28Confidence band
29Confidence intervals
30Prediction intervals
31Background Reading
- Program topic6.sas has the code for the various
plots and calculations - Sections 2.7, 2.8, and 2.9