Title: Regression lecture 2
1Regression lecture 2
- 1. Review deterministic and random components
- 2. The coefficient of determination
- 3. Using the regression line
- 4. Estimation of the mean value of Y for some X
- 5. Prediction of an individual value of Y for
some X - 6. Estimation and prediction contrasted
- 7. Estimation and prediction formulas
- 8. Examples
21. Deterministic random components
- Our basic question is whether there is a
relationship between two variables, X and Y. - To answer this question, we compare the
deterministic part of the relationship to the
random part. - the deterministic part is the part that would
look the same if we sampled and measured again
its there for a reason (if its there at all).
31. Deterministic random components
- Deterministic part the least squares line
(which determines a Y for each value of X). - Random part deviations of observed Y scores
from least squares line - (Note similarity here to t, F, and Z tests, where
we compare the numerator (treatment error) to
the denominator of (error).)
42. Coefficient of determination
- Once we have regression line, we can assess its
usefulness as a numerical model of the X-Y
relationship. - We can do this by testing a hypothesis about the
slope ß1 or the correlation ?, as last week. - We can also square the correlation coefficient,
to get the coefficient of determination, r2.
5The sum of the line (---) lengths gives the
total error when we compute the Yi using the
mean, Y
Y
X
SSYY S(Yi Y)2
6Regression line
Y
The sum of the line (---) lengths gives the total
error when we compute the Yi using the regression
line.
X
SSE S(Yi Yi)2
72. The coefficient of determination
- If knowing X reduces our uncertainty about Y,
then SSE ltlt SSYY. In that case, r2 the
coefficient of determination tells us something
useful - SSYY SSE
- SSYY
- r2 explained sample variability in Y
- total sample variability in Y
83. Using the regression line
- So far, weve learned how to decide whether our
regression line is useful. - Suppose the test of hypothesis tells us the line
is useful. What can we do with it? - Well consider two alternative uses estimation
and prediction.
93. Using the regression line
- Estimation
- gives the average value of Y (Y) for all cases
that have a given value of X - Prediction
- gives an individual Y score for one case that
has a given value of X
104. Estimation of the mean value of Y for some X
- We can estimate the mean value of Y for a
specific value of X. - e.g., we can estimate Y for ALL people whose
blood contains a 4 concentration of some drug - here, Y would be some variable of interest such
as (for example) reaction time (RT) to perform
some task - we could estimate mean RT for all people who
have the 4 drug concentration in their blood
115. Prediction of an individual value of Y for
some X
- We can predict an individual value of Y for a
given value of X. - e.g., we could predict RT for a specific person
whose blood contains a 4 concentration of the
drug
126. Estimation and prediction contrasted
- Recall from last week the two sources of error
when using X to calculate an expected Y - 1. In the population, Y is not uniquely
determined by X. As a result, for each value of
X, there is a distribution of possible Y values. - if we knew the line Y ß0 ß1X e, we would
still have this source of error
136. Estimation and prediction contrasted
- Two sources of error when using X to calculate an
expected value of Y - 2. The line we do have, Y ß0 ß1X, is not
precisely correct - it does not capture the relationship between X
and Y very precisely, because it is based on
sample data.
146. Estimation and prediction contrasted
- Estimation
- only the second source of error is at work
- things other than X that influence Y in the
population are random effects, so on average
across all cases they cancel out - Predicting
- both sources of error are at work
157. Estimation and prediction formulas
- Estimation interval
- Y (ta/2)(s) 1 (XP X)2
- n SSXX
- ta/2 is based on d.f. n 2
v
167. Estimation and prediction formulas
- Prediction interval
- Y (ta/2)(s) 1 1 (XP X)2
- n SSXX
- ta/2 is based on d.f. n 2
v
17Examples Emotional intelligence
- First, we find X and Y
- X SX 74 10.571
- n 7
- Y SY 82 11.714
- n 7
18Examples Emotional intelligence
- From last week
- SSXY 109.143
- SSXX 139.71
- Thus, ß1 109.143 .781
- 139.71
19Examples Emotional intelligence
- ß0 Y ß1X
- 11.714 .781(10.571)
- 3.46
- SSE SSYY ß1(SSXY)
- 115.429 .781 (109.143)
- 30.188
20Examples Emotional intelligence
v
v
21Examples Emotional intelligence
- The question says Use the data to form a 95
prediction interval for the Openness score of
someone with an EI score of 13. - Y ß0 ß1(X) 3.46 .781 (13) 13.613
- tcrit t(5, a/2 .025) 2.571.
22Examples Emotional intelligence
- Interval is
- 13.613 (2.571) (2.457) 1 1 (13
10.571)2 - 7 139.71
- 13.613 6.877
v
23Examples Laughing
- First, we find X and Y
- X SX 4.2 .60
- n 7
- Y SY 32 4.5714
- n 7
24Examples Laughing
- From last week
- SSXY 2.15
- SSXX .34
- Thus, ß1 2.15 6.3235
- .34
25Examples Laughing
- ß0 Y ß1X
- 4.5714 6.3235(.60)
- .7773
- SSE SSYY ß1(SSXY)
- 15.2143 6.3235 (2.15)
- 1.6188
26Examples Laughing
v
v
27Examples Laughing
- The question says Regardless of your answer to
part (a), form the 95 confidence interval for
the predicted y value for a delay of .5 seconds
(i.e., for all instances of .5). - Y ß0 ß1(X) .7773 6.3235 (.5) 3.939
- tcrit t(5, a/2 .025) 2.571.
28Examples Laughing
- Interval is
- 3.939 (2.571) (.569) 1 (.5 .6)2
- 7 .34
- 3.939 (2.571) 9.569) (.4151)
- 3.939 .607
v