Title: Correlation and Regression
1Chapter 11
11-1
- Correlation and Regression
2Outline
11-2
- 11-1 Introduction
- 11-2 Scatter Plots
- 11-3 Correlation
- 11-4 Regression
3Outline
11-3
- 11-5 Coefficient of Determination and
Standard Error of Estimate
4Objectives
11-4
- Draw a scatter plot for a set of ordered pairs.
- Find the correlation coefficient.
- Test the hypothesis H0 ? 0.
- Find the equation of the regression line.
5Objectives
11-5
- Find the coefficient of determination.
- Find the standard error of estimate.
- Find a prediction interval.
611-2 Scatter Plots
11-6
- A scatter plot is a graph of the ordered pairs
(x, y) of numbers consisting of the independent
variable, x, and the dependent variable, y.
711-2 Scatter Plots - Example
11-7
- Construct a scatter plot for the data obtained in
a study of age and systolic blood pressure of six
randomly selected subjects. - The data is given on the next slide.
811-2 Scatter Plots - Example
11-8
S
u
b
e
c
t
A
e
,
x
P
r
e
s
s
u
r
e
y
,
g
j
A
4
3
1
2
8
B
4
8
1
2
0
C
5
6
1
3
5
D
6
1
1
4
3
E
6
7
1
4
1
F
7
0
1
5
2
911-2 Scatter Plots - Example
11-9
Positive Relationship
e
1
5
0
e
r
1
5
0
u
r
u
s
s
s
s
e
e
r
r
P
1
4
0
P
1
4
0
1
3
0
1
3
0
1
2
0
1
2
0
7
0
6
0
5
0
4
0
7
0
6
0
5
0
4
0
A
g
e
A
g
e
1011-2 Scatter Plots - Other Examples
11-10
Negative Relationship
1111-2 Scatter Plots - Other Examples
11-11
No Relationship
1211-3 Correlation Coefficient
11-12
- The correlation coefficient computed from the
sample data measures the strength and direction
of a relationship between two variables. - Sample correlation coefficient, r.
- Population correlation coefficient, ??
1311-3 Range of Values for the Correlation
Coefficient
11-13
Strong negative relationship
Strong positive relationship
No linear relationship
??
??
?
1411-3 Formula for the Correlation Coefficient r
11-14
(
)
(
)
(
)
?
?
?
?
n
xy
x
y
?
r
(
(
(
(
)
)
)
)
?
?
?
?
?
?
?
?
?
?
2
2
n
x
x
n
y
y
2
2
Where n is the number of data pairs
1511-3 Correlation Coefficient - Example (Verify)
11-15
- Compute the correlation coefficient for the age
and blood pressure data.
x
y
x
y
3
4
5
8
1
9
4
7
6
3
4
,
,
,
?
?
?
x
y
2
0
3
9
9
1
1
2
4
4
3
,
,
,
.
2
2
?
?
S
u
b
s
t
i
t
u
t
i
n
g
i
n
t
h
e
f
o
r
m
u
l
a
f
o
r
r
g
i
v
e
s
r
0
8
9
7
.
.
1611-3 The Significance of the Correlation
Coefficient
11-16
- The population corelation coefficient, ?, is the
correlation between all possible pairs of data
values (x, y) taken from a population.
1711-3 The Significance of the Correlation
Coefficient
11-17
- H0 ?? 0 H1 ??? 0
- This tests for a significant correlation between
the variables in the population.
1811-3 Formula for the t tests for the
Correlation Coefficient
11-18
?
n
2
?
t
?
r
1
2
?
?
with
d
f
n
2
.
.
1911-3 Example
11-19
- Test the significance of the correlation
coefficient for the age and blood pressure data.
Use ? 0.05 and r 0.897. - Step 1 State the hypotheses.
- H0 ?? 0 H1 ??? 0
2011-3 Example
11-20
- Step 2 Find the critical values. Since ?
0.05 and there are 6 2 4 degrees of freedom,
the critical values are t 2.776
and t 2.776. - Step 3 Compute the test value. t
4.059 (verify).
2111-3 Example
11-21
- Step 4 Make the decision. Reject the null
hypothesis, since the test value falls in the
critical region (4.059 gt 2.776). - Step 5 Summarize the results. There is a
significant relationship between the variables of
age and blood pressure.
2211-4 Regression
11-22
- The scatter plot for the age and blood pressure
data displays a linear pattern. - We can model this relationship with a straight
line. - This regression line is called the line of best
fit or the regression line. - The equation of the line is y a bx.
2311-4 Formulas for the Regression Line y
a bx.
11-23
(
)
(
(
(
)
)
)
?
?
?
?
?
y
x
x
xy
2
?
?
a
(
(
)
)
?
?
?
2
n
x
x
2
(
)
(
)
(
)
?
?
?
?
n
xy
x
y
?
b
(
)
(
)
?
?
?
2
n
x
x
2
Where a is the y intercept and b is the slope
of the line.
2411-4 Example
11-24
- Find the equation of the regression line for the
age and the blood pressure data. - Substituting into the formulas give a
81.048 and b 0.964 (verify). - Hence, y 81.048 0.964x.
- Note, a represents the intercept and b the slope
of the line.
2511-4 Example
11-25
y ? 81.048 0.964x
2611-4 Using the Regression Line to
Predict
11-26
- The regression line can be used to predict a
value for the dependent variable (y) for a given
value of the independent variable (x). - Caution Use x values within the experimental
region when predicting y values.
2711-4 Example
11-27
- Use the equation of the regression line to
predict the blood pressure for a person who is 50
years old. - Since y 81.048 0.964x, theny 81.048
0.964(50) 129.248?? 129. - Note that the value of 50 is within the range of
x values.?
2811-5 Coefficient of Determination and Standard
Error of Estimate
11-28
- The coefficient of determination, denoted by r2,
is a measure of the variation of the dependent
variable that is explained by the regression line
and the independent variable.
2911-5 Coefficient of Determination and Standard
Error of Estimate
11-29
- r2 is the square of the correlation coefficient.
- The coefficient of nondetermination is (1 r2).
- Example If r 0.90, then r2 0.81.
3011-5 Coefficient of Determination and Standard
Error of Estimate
11-30
- The standard error of estimate, denoted by sest,
is the standard deviation of the observed y
values about the predicted y values. - The formula is given on the next slide.
3111-5 Formula for the Standard Error of
Estimate
11-31
(
)
?
2
?
y
y
?
s
?
n
2
est
or
?
?
?
?
?
y
a
y
b
xy
2
?
s
?
n
2
est
3211-5 Standard Error of Estimate - Example
11-32
- From the regression equation, y
55.57 8.13x and n 6, find sest. - Here, a 55.57, b 8.13, and n 6.
- Substituting into the formula gives sest
6.48 (verify).
3311-5 Prediction Interval
11-33
- A prediction interval is an interval constructed
about a predicted y value, y , for a specified
x value.
3411-5 Prediction Interval
11-34
- For given ? value, we can state with (1 ?)100
confidence that the interval will contain the
actual mean of the y values that correspond to
the given value of x.
3511-5 Formula for the Prediction Interval about a
Value y
11-35
2
-
)
(
1
X
x
n
-
1
s
t
y
2
est
a
2
2
(
)
n
å
-
å
x
x
n
2
-
)
(
1
X
x
n
1
s
t
y
2
est
a
2
2
(
)
n
å
-
å
x
x
n
?
?
2
.
.
n
f
d
with
3611-5 Prediction interval - Example
11-36
- A researcher collects the data shown on the next
slide and determines that there is a significant
relationship between the age of a copy machine
and its monthly maintenance cost. The regression
equation is y 55.57 8.13x. Find the 95
prediction interval for the monthly maintenance
cost of a machine that is 3 years old.
3711-5 Prediction Interval - Example
11-37
A
1
62
B
2
78
C
3
70
D
4
90
E
4
93
F
6
103
3811-5 Prediction Interval - Example
11-38
- Step 1 Find ?x, ?x2 and . ?x 20, ?x2
82, - Step 2 Find y ? for x 3. y 55.57
8.13(3) 79.96 - Step 3 Find sest sest 6.48 as shown in
previous example.
3911-5 Prediction Interval - Example
11-39
- Step 4 Substitute in the formula and solve.
t?/2 2.776, d.f. 6 2 4 for 95 60.53 lt
y lt 99.39 (verify)Hence, one can be 95
confident that the interval 60.53 lt y lt 99.39
contains the actual value of y.