Title: Prediction concerning the response Y
1Prediction concerning the response Y
2Where does this topic fit in?
- Model formulation
- Model estimation
- Model evaluation
- Model use
3Translating two research questions into two
reasonable statistical answers
- What is the mean weight, µ, of all American
women, aged 18-24? - If we want to estimate µ, what would be a good
estimate? - What is the weight, y, of a randomly selected
American woman, aged 18-24? - If we want to predict y, what would be a good
prediction?
4Could we do better by taking into account a
persons height?
5One thing to estimate (µy) and one thing to
predict (y)
6Two different research questions
- What is the mean response µY when the predictor
value is xh? - What value will a new observation Ynew be when
the predictor value is xh?
7Example Skin cancer mortality and latitude
- What is the expected (mean) mortality rate for
all locations at 40o N latitude? - What is the predicted mortality rate for 1 new
randomly selected location at 40o N?
8Example Skin cancer mortality and latitude
9Point estimators
- That is, it is
- the best guess of the mean response at xh
- the best guess of a new observation at xh
But, as always, to be confident in the answer to
our research question, we should put an interval
around our best guess.
10It is dangerous to extrapolate beyond scope of
model.
11It is dangerous to extrapolate beyond scope of
model.
12A confidence interval for the population mean
response µY
- when the predictor value is xh
13Again, what are we estimating?
14(1-a)100 t-interval for mean response µY
Formula in words
Sample estimate (t-multiplier standard error)
Formula in notation
15Example Skin cancer mortality and latitude
Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
150.08 2.75 (144.56, 155.61)
(111.23,188.93) Values of Predictors for New
Observations New Obs Lat 1 40.0
16Factors affecting the length of the confidence
interval for µY
- As the confidence level decreases,
- As MSE decreases,
- As the sample size increases,
- The more spread out the predictor values,
- The closer xh is to the sample mean,
17Does the estimate of µY when xh 1 vary more
here ?
Var N StDev yhat(x1) 5 0.320
18 or here?
Var N StDev yhat(x1) 5 2.127
19Does the estimate of µY vary more when xh 1 or
when xh 5.5?
Var N StDev yhat(x1) 5
2.127 yhat(x5.5) 5 0.512
20Example Skin cancer mortality and latitude
Predicted Values for New Observations New Fit
SE Fit 95.0 CI 95.0 PI 1 150.08 2.75
(144.6,155.6) (111.2,188.93) 2 221.82 7.42
(206.9,236.8) (180.6,263.07)X X denotes a row
with X values away from the center Values of
Predictors for New Observations New Obs
Latitude 1 40.0 Mean of Lat
39.533 2 28.0
21When is it okay to use the confidence interval
for µY formula?
- When xh is a value within the scope of the model
xh does not have to be one of the actual x
values in the data set. - When the LINE assumptions are met.
- The formula works okay even if the error terms
are only approximately normal. - If you have a large sample, the error terms can
even deviate substantially from normality.
22Prediction interval for a new response Ynew
23Again, what are we predicting?
24(1-a)100 prediction interval for new response
Ynew
Formula in words
Sample prediction (t-multiplier standard
error)
Formula in notation
25Example Skin cancer mortality and latitude
Predicted Values for New Observations New Obs
Fit SE Fit 95.0 CI 95.0 PI 1
150.08 2.75 (144.56, 155.61)
(111.23,188.93) Values of Predictors for New
Observations New Obs Lat 1 40.0
26When is it okay to use the prediction interval
for Ynew formula?
- When xh is a value within the scope of the model
xh does not have to be one of the actual x
values in the data set. - When the LINE assumptions are met.
- The formula for the prediction interval depends
strongly on the assumption that the error terms
are normally distributed.
27Whats the difference in the two formulas?
Confidence interval for µY
Prediction interval for Ynew
28Prediction of Ynew if the mean µY is known
Suppose it were known that the mean skin cancer
mortality at xh 40o N is 150 deaths per
million (with variance 400)? What is the
predicted skin cancer mortality in Columbus, Ohio?
29And then reality sets in
- The mean µY is not known.
- Estimate it with the predicted response
to estimate µY is the
variance of
- The variance s2 is not known.
30Variance of the prediction
The variation in the prediction of a new response
depends on two components
1. the variation due to estimating the mean µY
with
2. the variation in Y
31Whats the effect of the difference in the two
formulas?
Confidence interval for µY
Prediction interval for Ynew
32Whats the effect of the difference in the two
formulas?
- A (1-a)100 confidence interval for µY at xh will
always be narrower than a (1-a)100 prediction
interval for Ynew at xh. - The confidence intervals standard error can
approach 0, whereas the prediction intervals
standard error cannot get close to 0.
33Confidence intervals and prediction intervals for
response in Minitab
- Stat gtgt Regression gtgt Regression
- Specify response and predictor(s).
- Select Options
- In Prediction intervals for new observations
box, specify either the X value or a column name
containing multiple X values. - Specify confidence level (default is 95).
- Click on OK. Click on OK.
- Results appear in session window.
34Confidence intervals and prediction intervals for
response in Minitab
35Confidence intervals and prediction intervals for
response in Minitab
C6 40 28
36Example Skin cancer mortality and latitude
Predicted Values for New Observations New Fit
SE Fit 95.0 CI 95.0 PI 1 150.08 2.75
(144.6,155.6) (111.2,188.93) 2 221.82 7.42
(206.9,236.8) (180.6,263.07)X X denotes a row
with X values away from the center Values of
Predictors for New Observations New Obs
Latitude 1 40.0 Mean of Lat
39.533 2 28.0
37A plot of the confidence interval and prediction
interval in Minitab
- Stat gtgt Regression gtgt Fitted line plot
- Specify predictor and response.
- Under Options
- Select Display confidence bands.
- Select Display prediction bands.
- Specify desired confidence level (95 default)
- Select OK. Select OK.
38A plot of the confidence interval and prediction
interval in Minitab
39A plot of the confidence interval and prediction
interval in Minitab
40(No Transcript)