Title: LECTURE 12 Multiple regression analysis
1LECTURE 12Multiple regression analysis
- Epsy 640
- Texas AM University
2Multiple regression analysis
- The test of the overall hypothesis that y is
unrelated to all predictors, equivalent to - H0 ?2y?123 0
- H1 ?2y?123 0
- is tested by
- F R2y?123 / p / ( 1 - R2y?123) / (n p
1) - F SSreg / p / SSe / (n p 1)
3Multiple regression analysis
- SOURCE df Sum of Squares Mean Square F
- x1, x2 p SSreg SSreg / p SSreg/ 1
- SSe /(n-p-1)
- e (residual) n-p-1 SSe SSe / (n-p-1)
- total n-1 SSy SSy / (n-1)
- Table 8.2 Multiple regression table for Sums of
Squares
4Multiple regression analysis predicting Depression
LOCUS OF CONTROL, SELF-ESTEEM, SELF-RELIANCE
5SSreg
ssx1
SSy
SSe
ssx2
Fig. 8.4 Venn diagram for multiple regression
with two predictors and one outcome measure
6Type I ssx1
SSx1
SSy
SSe
SSx2
Type III ssx2
Fig. 8.5 Type I contributions
7Type III ssx1
SSx1
SSy
SSe
SSx2
Type III ssx2
Fig. 8.6 Type IIII unique contributions
8Multiple Regression ANOVA table
- SOURCE df Sum of Squares Mean Square F
- (Type I)
- Model 2 SSreg SSreg / 2 SSreg / 2
- SSe / (n-3)
- x1 1 SSx1 SSx1 / 1 SSx1/ 1
- SSe /(n-3)
- x2 1 SSx2 ? x1 SSx2 ? x1 SSx2 ? x1/ 1
- SSe /(n-3)
- e n-3 SSe SSe / (n-3)
- total n-1 SSy SSy / (n-3)
- Table 8.3 Multiple regression table for Sums of
Squares of each predictor
9PATH DIAGRAM FOR REGRESSION
? .5
X1
.387
r .4
Y
e
X2
? .6
R2 .742 .82 - 2(.74)(.8)(.4)
? (1-.42) .85
10Depression
e
.471
?.4
LOC. CON.
DEPRESSION
-.317
SELF-EST
R2 .60
-.186
SELF-REL
11Shrinkage R2
- Different definitions ask which is being used
- What is population value for a sample R2?
- R2s 1 (1- R2)(n-1)/(n-k-1)
- What is the cross-validation from sample to
sample? - R2sc 1 (1- R2)(nk)/(n-k)
12Estimation Methods
- Types of Estimation
- Ordinary Least Squares (OLS)
- Minimize sum of squared errors around the
prediction line - Generalized Least Squares
- A regression technique that is used when the
error terms from an ordinary least squares
regression display non-random patterns such as
autocorrelation or heteroskedasticity. - Maximum Likelihood
13Maximum Likelihood Estimation
- Maximum likelihood estimation
- There is nothing visual about the maximum
likelihood method - but it is a powerful method
and, at least for large samples, very
preciseMaximum likelihood estimation begins with
writing a mathematical expression known as the
Likelihood Function of the sample data. Loosely
speaking, the likelihood of a set of data is the
probability of obtaining that particular set of
data, given the chosen probability distribution
model. This expression contains the unknown model
parameters. The values of these parameters that
maximize the sample likelihood are known as the
Maximum Likelihood Estimatesor MLE's. Maximum
likelihood estimation is a totally analytic
maximization procedure. - MLE's and Likelihood Functions generally have
very desirable large sample properties - they become unbiased minimum variance estimators
as the sample size increases - they have approximate normal distributions and
approximate sample variances that can be
calculated and used to generate confidence bounds
- likelihood functions can be used to test
hypotheses about models and parameters - With small samples, MLE's may not be very precise
and may even generate a line that lies above or
below the data pointsThere are only two drawbacks
to MLE's, but they are important ones - With small numbers of failures (less than 5, and
sometimes less than 10 is small), MLE's can be
heavily biased and the large sample optimality
properties do not apply - Calculating MLE's often requires specialized
software for solving complex non-linear
equations. This is less of a problem as time goes
by, as more statistical packages are upgrading to
contain MLE analysis capability every year.
14Outliers
- Leverage (for a single predictor)
- Li 1/n (Xi Mx)2 / ?x2 (min1/n, max1)
- Values larger than 1/n by large amount should be
of concern - Cooks Di ?(Y Yi) 2 / (k1)MSres
- the difference between predicted Y with and
without Xi
?
?
?
15Outliers
- In SPSS Regression, under the SAVE option, both
leverage and Cooks D will be computed and saved
as new variables with values for each case