Prediction and Lack of Fit in Regression - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Prediction and Lack of Fit in Regression

Description:

This quantity is a statistic, a random variable, hence it has a sampling ... 2. Associate to each residual a standard normal quantile [z[i]=normsinv((i-.5)/n) ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 22

Provided by: port48

Category:

more less

Transcript and Presenter's Notes

Title: Prediction and Lack of Fit in Regression

1
Prediction, Correlation, and Lack of Fit in
Regression (11.4, 11.5, 11.7)

Outline
Confidence interval and prediction interval.
Regression Assumptions.
Checking Assumptions (model adequacy).
Correlation.
Influential observations.

2
Prediction
Our regression model is
Number Repair of components time
i xi yi 1 1 23 2 2
29 3 4 64 4 4 72
5 4 80 6 5 87 7 6
96 8 6 105 9 8 127
10 8 119 11 9 145 12 9
149 13 10 165 14 10 154
so that the average value of the response at Xx
is
3
The estimated average response at Xx is therefore
The expected value!
This quantity is a statistic, a random variable,
hence it has a sampling distribution.
Regression Assumptions
Normal Distribution for ?
Sample estimate, and associated variance
A (1-a)100 CI for the average response at Xx is
therefore
4
Prediction and Predictor Confidence
The best predictor of an individual response y at
Xx, yx,pred, is simply the average response at
Xx.
Random variables -- they vary from
sample-to-sample.
Variance associated with an individual
prediction is larger than that for the mean
value! Why?
Hence the predicted value is also a random
variable.
A (1-a)100 CI for an individual response at Xx
5
Prediction band - what would we expect for one
new observation.
Confidence band - what would we expect for the
mean of many observations taken at the value of
Xx.
6
(No Transcript)
7
Regression Assumptions and Lack of Fit
Regression Model Assumptions

Effect additivity (multiple regression)
Normality of the residuals
Homoscedasticity of the residuals
Independence of the residuals

8
Additivity
Additivity assumption.
The expected value of an observation is a
weighted linear combination of a number of
factors.

Which factors? (model uncertainty)
number of factors in the model
interactions of factors
powers or transformations of factors

9
Homoscedasticity and Normality
Observations never equal their expected
values.
No systematic biases.
Homoscedasticity assumption.
The unexplained component has a common
variance for all values i.
Normality assumption.
The unexplained component has a normal
distribution.
10
Independence
Independence assumption.
Responses in one experimental unit are not
correlated with, affected by, or related to,
responses for other experimental units.
11
Correlation Coefficient
A measure of the strength of the linear
relationship between two variables.
Product Moment Correlation Coefficient
In SLR, r is related to the slope of the fitted
regression equation.
r2 (or R2) represents that proportion of total
variability of the Y-values that is accounted for
by the linear regression with the independent
variable X.
R2 Proportion of variability in Y explained by X.
12
Properties of r
1. r lies between -1 and 1. r gt 0 indicates a
positive linear relationship. r lt
0 indicates a negative linear
relationship. r 0 indicates no linear
relationship. r ?1 indicates perfect linear
relationship. 2. The larger the absolute value of
r, the stronger the linear relationship. 3. r2
also lies between 0 and 1.
13
Checking Assumptions
How well does the model fit?
Do predicted values seem to be placed in the
middle of observed values?
Do residuals satisfy the regression assumptions?
(Problems seen in plot of X vs. Y will be
reflected in residual plot.)
y

Constant variance?
Regularities suggestive of lack of independence
or more complex model?
Poorly fit observations?

x
14
Model Adequacy
Studentized residuals (ei)
Allows us to gauge whether the residual is too
large. It should have a standard normal
distribution, hence it is very unlikely that any
studentized residual will be outside the range
-3,3.
MSE(I) is the calculated MSE leaving observation
i out of the computations. hi is the ith diagonal
of the projection matrix for the predictor space
(ith hat diagonal element).
15
Normality of residuals
Kolmogorov-Smirnov Test Shapiro-Wilks Test
(nlt50) DAgostinos Test (n³50)
Formal Goodness of fit tests
All quite conservative - they fail to reject the
hypothesis of normality more often than they
should.
Graphical Approach Quantile-quantile plot
(qq-plot)
1. Compute and sort the simple residuals
e1,e2,en. 2. Associate to each residual a
standard normal quantile zinormsinv((i-.5)/n
). 3. Plot zI versus eI. Compare to 45o
line.
16
(No Transcript)
17
Influence Diagnostics (Ways to detect
influential observations)
Does a particular observation consisting of a
pair of (X,Y) values (a case) have undue
influence on the fit of the regression model?
i.e. what cases are greatly affecting the
estimates of the p regression parameters in the
model. (For simple linear regression p2.)
Standardized/Studentized Residuals. The ei are
used to detect cases that are outlying with
respect to their Y values. Check cases with ei
gt 2 or 3. Hat diagonal elements. The hi are used
to detect cases that are outlying with respect to
their X values. Check cases with hi gt 2p/n.
18
Dffits. Measures the influence that the ith case
has on the ith fitted value. Compares the ith
fitted value with the ith fitted value obtained
by omitting the ith case. Check cases for which
Dffitsgt2Ö(p/n). Cooks Distance. Similar to
Dffits, but considers instead the influence of
the ith case on all n fitted values. Check when
Cooks Dist gt Fp,n-p,0.50. Covariance Ratio. The
change in the determinant of the covariance
matrix that occurs when the ith case is deleted.
Check cases with Cov Ratio ? 1 ³
3p/n. Dfbetas. A measure of the influence of the
ith case on each estimated regression parameter.
For each regression parameter, check cases with
Dfbeta gt 2/Ön.
19
Cutoffs Hat0.29, CovRatio0.43, Dffits0.76,
Dfbetas0.53
20
(No Transcript)
21
Obs 5
Obs 1
Obs 2

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Assumptions Underlying Multiple Regression Analysis PowerPoint PPT Presentation

Assumptions Underlying Multiple Regression Analysis - (2) describe how to tell if they have been met; and (3) suggest how to overcome or adjust for violations of ... proves NOT to be statistically significant. ... | PowerPoint PPT presentation | free to view

Logistic Regression and Discriminant Function Analysis PowerPoint PPT Presentation

Logistic Regression and Discriminant Function Analysis - Requires an estimation and validation sample to assess predictive accuracy ... of the following variables predict whether a woman is hired to be a Hooters girl? ... | PowerPoint PPT presentation | free to view

Transductive Regression Piloted by InterManifold Relations PowerPoint PPT Presentation

Transductive Regression Piloted by InterManifold Relations - YAMAHA Dataset. Experiments.Age Dataset ... set evaluation on YAMAHA database. Open set evaluation for the kernelized regression on the YAMAHA database. ... | PowerPoint PPT presentation | free to view

Statistics 483 PowerPoint PPT Presentation

Statistics 483 - ... Regression. Inferences and model comparison. Example: Fuel Consumption ... F-test for comparing Two Regression Models. P-value = P(F(df1,df2) Fratio) ... | PowerPoint PPT presentation | free to view

Logistic Regression PowerPoint PPT Presentation

Logistic Regression - Good news regression coefficients and their standard errors are found through ... numbers of predictors (nested) can also be compared in the same fashion. ... | PowerPoint PPT presentation | free to view

Model Evaluation and Selection via Prediction PowerPoint PPT Presentation

Model Evaluation and Selection via Prediction - Developing and evaluating prediction rules based on a set of markers for ... person's biological and genetic make up to tailor strategies for the prevention, ... | PowerPoint PPT presentation | free to view

Nonlinear Regression Analysis with Fitter Software Application PowerPoint PPT Presentation

Nonlinear Regression Analysis with Fitter Software Application - ... Regression Analysis. with Fitter Software Application. Alexey Pomerantsev ... Non-Linear Regression and Fitter. Tool. Thermo Gravimetric Method. Experiment ... | PowerPoint PPT presentation | free to view

Regression - Social Relationships and Health. Strong correlation between lack of social relationships and illness. Does lack of social relationships cause people to become ill? ... | PowerPoint PPT presentation | free to view

Introduction to Logistic Regression PowerPoint PPT Presentation

Introduction to Logistic Regression - ... independent random samples is to use the Pearson goodness of fit chi-square test. ... Request Pearson chi-square. goodness-of-fit analysis. Goodness-of-fit Results ... | PowerPoint PPT presentation | free to view

2DS00 - lack-of-fit. Least Squares. measurements of time and distance ... If applicable, apply lack-of-fit test. study residual plots for constant variance ... | PowerPoint PPT presentation | free to view

Multivariable regression models with continuous covariates PowerPoint PPT Presentation

Multivariable regression models with continuous covariates - 'Quantifying epidemiologic risk factors using non-parametric regression: model ... JRSS(A) 162: 71-94. Corrigendum JRSS(A) 165: 399--400, 2002 ... | PowerPoint PPT presentation | free to view

Prediction of new observations PowerPoint PPT Presentation

Prediction of new observations - R2 is the amount of variability in the data explained by the regression model. Misconceptions about R2. 4. Lack-of-fit Test. Hypotheses: ... | PowerPoint PPT presentation | free to view

$Multivariable regression modelling a pragmatic approach based on fractional polynomials for continuo PowerPoint PPT Presentation$

Multivariable regression modelling a pragmatic approach based on fractional polynomials for continuo - Multivariable regression modelling. a pragmatic approach based on fractional ... Small underpowered studies, poor study design, varying and sometimes ... | PowerPoint PPT presentation | free to view

New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks PowerPoint PPT Presentation

New Models for Perceived Voice Quality Prediction and their Applications in Playout Buffer Optimization for VoIP Networks - New Models for Perceived Voice Quality Prediction and their ... Parameters for different codecs (PESQ) 20.42. 25.63. 22.45. 31.66. 14.96. c. 9.45. 10.24 ... | PowerPoint PPT presentation | free to view

$Multivariable regression models with continuous covariates with a practical emphasis on fractional polynomials and applications in clinical epidemiology PowerPoint PPT Presentation$