Title: Multiple Regression and the General Linear Model
1Multiple Regression and the General Linear Model
2Figure 11.2 Theoretical Distribution of y in
Regression
3Which Model is the Best ?
4Which Model is the Best ?
Can you compare bell peppers and apples?
5Which Model is the Best ?
- To compare models using RSQ both models must have
the same dependent variable - To compare models with different dependent
variables, we use Predicted Mean Squares or PREDMS
6PREDMS
For the original model, we use
7PREDMS1
8Example
9Problem Points
- High Leverage Point
- High Influence Point
10Figure 11.11(a) High Influence Points
11Figure 11.11(b) Low Influence Points
12Diagnostic Measures
- Residuals
- Residual Standard Deviation
- Sample standard deviation around the regression
line, the standard error of estimate, or the
residual standard deviation.
13SPSS
14Multiple Regression Model
- Cross-Product Term equal to x1x2
- First-Order Model
- Partial Slopes
15Assumptions for Multiple Regression
- The mathematical form of the relation is correct,
so for all i. - Var for all i.
- The s are independent.
- is normally distributed.
16General Linear Model
17Estimating Multiple Regression Coefficients
- Least-squares prediction equation
- Minimize
18Residual Standard Deviation
- Residual Standard Deviation
-
-
19Coefficient of Determination
- Coefficient of determination, R2
20Definition 12.2
21R versus R2 versus Adjusted R2
- My R .7 is that not super
- No, you have only explained 49 of the variation
in Y there is 51 unexplained - Adjusted R2 an index to keep you honest
22Adjusted R2
- Adjusted R2 1- (1 - R2 )((n - 1)/(n - k - 1))
- where
- R2 Coefficient of Determination
- n number of observations
- k number of Independent Variables
23F Test of H0
- H0
- Ha At least one
- T.S.
- R.R. With df1k and df2n-(k1), reject H0 if
- FgtF .
- Check assumptions and draw conclusions.
24Definition 12.3
- Estimated standard error of in a multiple
regression - where is the value obtained
by letting xj be the dependent variable in a
multiple regression, with all other xs
independent variables. Note that is the
residual standard deviation for the multiple
regression of y on - .
25Collinearity
- When the independent variables are themselves
correlated, collinearity (sometimes called
multicollinearity) is present
26Effect of Collinearity
- is by definition very
large and 1- is near zero. Division by a
near-zero number yields a very large standard
error.
27Variance Inflation Factor
- The term 1/(1- )
- If the VIF is very large, such as 10 or more,
collinearity is a serious problem.
28Definition 12.4
- The confidence interval for is
- where cuts off area in the tail of
a t distribution with df , the
error df.
29Interpretation of H0
- The usual null hypothesis for inference about
is . This hypothesis does not assert that
has no predictive value by itself. It
asserts that it has no additional predictive
value over and above that contributed by the
other independent variables.
30Summary for Testing
- H0 1. Ha 1.
- 2. 2.
- 3. 3.
- T.S. R.R. 1.
- 2.
- 3.
- where t? cuts off a right-tail area a in the t
distribution with df n-(k1). - Check assumptions and draw conclusions.
31F Test of a Subset of Predictors
- H0
- Ha H0 is not true.
- T.S.
- R.R. ,where cuts off a
right-tail area of the F distribution with
df1(k-g) and df2n-(k1). - Check assumptions and draw conclusions.
32Forecasting Using Multiple Regression
- Confidence Interval
- Prediction Interval
33Extrapolation in Multiple Regression