Title: Inference for Regression
1Lesson 15 - 1
2Knowledge Objectives
- Identify the conditions necessary to do inference
for regression. - Explain what is meant by the standard error about
the least-squares line.
3Construction Objectives
- Given a set of data, check that the conditions
for doing inference for regression are present. - Compute a confidence interval for the slope of
the regression line. - Conduct a test of the hypothesis that the slope
of the regression line is 0 (or that the
correlation is 0) in the population.
4Vocabulary
- Statistical Inference tests to see if the
relationship is statistically significant
5Conditions for Regression Inference
- Repeated responses y are independent of each
other - The mean response, µy, has a straight-line
relationship with x
µy a ßxwhere the slope ß and
intercept a are unknown parameters - The standard deviation of y (call it s) is the
same for all values of x. The value of s is
unknown. - For any fixed value of x, the response variable y
varies according to a Normal distribution
6Sampling Distribution Concepts
- Remember from our sampling distribution lesson
how repeated samplings of the mean will be
Normally distributed (n gt 30, CLT applies)
7Checking Regression Conditions
- Observations are independent
- No repeated observations on the same individual
- The true relationship is linear
- Scatter plot the data to check this
- Remember the transformations to make non-linear
data linear - Response standard deviation is the same
everywhere - Check the scatter plot to see if this is violated
- Response varies Normally about the true
regression line - To check this, we look at the residuals (since
they must be Normally distributed as well) either
with a box plot or normality plot - These procedures are robust, so slight departures
from Normality will not affect the inference
8Estimating the Parameters
- We need to estimate parameters for µy a ßx
and s - From the least square regression line y-hat a
bx we get unbiased estimators a (for a) and b
(for ß) - We use n 2 because we used a and b as estimators
9Confidence Interval on ß
- Remember our form Point Estimate Margin of
Error - Since ß is the true slope, then b is the point
estimate - The Margin of Error takes the form of t ? SEb
10Confidence Intervals in Practice
- We use rarely have to calculate this by hand
- Output from Minitab
Parameters b (1.4929), a (91.3), s (17.50)
t 2.042 from n 2, 95 CL
CI PE MOE 1.4929 (2.042)(0.4870)
1.4929 0.9944
0.4985, 2.4873
Since 0 is not in the interval, then we might
conclude that ß ? 0
11Inference Tests on ß
- Since the null hypothesis can not be proved, our
hypotheses for tests on the regression slope will
beH0 ß 0 (no correlation between
x and y)Ha ß ? 0 (some linear
correlation) - Testing correlation makes sense only if the
observations are a random sample. - This is often not the case in regression
settings, where researchers often fix in advance
the values of x being tested
12Test Statistic
13Beer vs BAC Example
- 16 student volunteers at Ohio State drank a
randomly assigned number of cans of beer. Thirty
minutes later, a police officer measured their
BAC. Here are the data - Enter the data into your calculator.
- Draw a scatter plot of the data and the
regression line - Conduct an inference test on the effect of beers
on BAC
Student 1 2 3 4 5 6 7 8
Beers 5 2 9 8 3 7 3 5
BAC 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06
Student 9 10 11 12 13 14 15 16
Beers 3 5 4 6 5 7 1 4
BAC 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05
LinReg(a bx) L1, L2, Y1
14Scatter plot and Regression Line
D F S O C
- Interpret the scatter plot
15Output from Minitab
- Could we have used this instead of output from
our calculator?
16Using the TI for Inference Test on ß
- Enter explanatory data into L1
- Enter response data into L2
- Stat ? Tests ? ELinRegTTest
- Xlist L1
- Ylist L2
- (Test type) ß ? ? 0 lt0 gt0
- RegEq (leave blank)
- Test will take two screens to output the
dataInference t-statistic, degrees of freedom
and p-valueRegression a, b, s, r², and r
17TI Output from page 907
- y a bx
- ß ? 0 and ? ? 0
- t 3.06548
- p .004105
- df 36
- a 91.26829
- b 1.492896
- s 17.49872
- r2 .206999
- r .4549725
Minitab Output
18Interpreting Computer Output
- In the following examples of computer output from
commonly used statistical packages - Find the a and b values for the regression eqn
- Find r and r2
- Find SEb, t-value and p-value (if available)
- We can use these outputs to finish an inference
test on the association of our explanatory and
response variables.
19Sample from Excel prob 15.10
20Sample from CrunchIt prob 15.20
21Summary and Homework
- Summary
- Inference Conditions Needed1) Observations
independent2) True relationship is linear3) s
is constant4) Responses Normally distributed
about the line - Confidence Intervals on ß can be done
- Inference testing on ß use the t statistic
b/SEb - Homework
- Pg 914 918 15.18-19, 15.21-23