Title: Chapter 15, continued
1Chapter 15, continued
2III. Adjusted R2
- During single variable regression, we assess
goodness of fit with R2, the coefficient of
determination. - R2 SSR/SST
- This value is interpreted as the proportion of
the variability in y that is explained by the
estimated regression equation.
3A. Inclusion of more variables
- An unfortunate result of adding more independent
variables to our regression is that R2 will
increase, even if we are adding insignificant
variables. - For example, if we had added x2Color of the
car to our repair regression, R2 would have
marginally increased, despite the ridiculous idea
that the color of a car should influence its
repair cost.
4B. Adjustment
- To adjust for the addition of more and more
variables, just to increase R2, we compensate for
the number of independent variables in the model.
With n denoting the of observations in the
sample and p is the of independent variables
included in the model,
5C. An Example
- Y is of hours of television watched in a week.
- X1 is the amount of alcohol consumed in a typical
week.
Can you interpret these estimated coefficients
and test their significance? Can you correctly
evaluate the fit of the equation?
6Include one more variable
- Now Ill add X2Age of the student, which I dont
believe affects television viewing, but am adding
to make a point.
If you looked simply at R2, you would conclude
that the goodness of fit slightly improved.
However, looking at Ra you can see that adding
this insignificant variable actually decreased
the fit. Alcohol is still significant and
positive, but Age is insignificant.
7IV. Model Assumptions
These assumptions are modified from chapter 14 to
accommodate the inclusion of multiple independent
variables.
- The error term is a normally distributed random
variable and thus, - The variance of ? is constant for all values of
x1, x2,,xp. - All ? are independent, not influenced by any
other error term. Thus the size of ? is also
constant.
8V. Testing for Significance
- Now that we have more than one independent
variables, we can conduct a true F-test of
overall significance. - Ho ß1ß2ßp 0
- Ha One or more of the parameters is not equal
to zero.
9A. The F-test
- Described in Chapter 14, the test statistic is
calculated by F MSR/MSE - where
- MSR SSR/p and p is the of x-variables.
- and MSE SSE/(n-p-1)
10B. Rejection Rule
- The critical F? is based on an F distribution
with p degrees of freedom in the numerator and
(n-p-1) degrees of freedom in the denominator. - So Ill test the overall significance of my
Television watching model.
11C. The Example
- I have a sample of n60 and p2 independent
variables. - I have d.f.2 in the numerator and d.f.57 in the
denominator. - So at the .05 level of significance, my critical
F is approximately 3.15. - If my test F is greater than 3.15, I reject the
null and conclude that at least one of my
coefficients is NOT zero and my model has overall
significance.
12Excel Output
My test statistic is greater than 3.15 so I
cannot reject Ho. You can see from the p-value
that it is less than ? (.05), which also
indicates a failure to reject Ho. However, it is
not less than ? (.01). Thus my model is
significant at the 95 level, but not the 99
level of confidence.
13E. T-Tests
- A t-test of a coefficients statistical
significance is done the same way as in Chapter
14. - If tgtt?, reject the null that ?0 for that
coefficient. - Reproducing my Excel output reveals that
the coefficient on Age is insignificant. You
cant reject the null that that coefficient is
non-zero. You CAN reject the null for the
Alcohol coefficient. It is statistically
significant.