Correlation and Regression - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Correlation and Regression

Description:

Chapter 11 Correlation and Regression Outline 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination ... – PowerPoint PPT presentation

Number of Views:265
Avg rating:3.0/5.0
Slides: 40
Provided by: NewM157
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression


1
Chapter 11
11-1
  • Correlation and Regression

2
Outline
11-2
  • 11-1 Introduction
  • 11-2 Scatter Plots
  • 11-3 Correlation
  • 11-4 Regression

3
Outline
11-3
  • 11-5 Coefficient of Determination and
    Standard Error of Estimate

4
Objectives
11-4
  • Draw a scatter plot for a set of ordered pairs.
  • Find the correlation coefficient.
  • Test the hypothesis H0 ? 0.
  • Find the equation of the regression line.

5
Objectives
11-5
  • Find the coefficient of determination.
  • Find the standard error of estimate.
  • Find a prediction interval.

6
11-2 Scatter Plots
11-6
  • A scatter plot is a graph of the ordered pairs
    (x, y) of numbers consisting of the independent
    variable, x, and the dependent variable, y.

7
11-2 Scatter Plots - Example
11-7
  • Construct a scatter plot for the data obtained in
    a study of age and systolic blood pressure of six
    randomly selected subjects.
  • The data is given on the next slide.

8
11-2 Scatter Plots - Example
11-8
S
u
b
e
c
t
A
e
,

x
P
r
e
s
s
u
r
e

y
,
g
j
A
4
3
1
2
8
B
4
8
1
2
0
C
5
6
1
3
5
D
6
1
1
4
3
E
6
7
1
4
1
F
7
0
1
5
2
9
11-2 Scatter Plots - Example
11-9
Positive Relationship
e
1
5
0
e
r
1
5
0
u
r
u
s
s
s
s
e
e
r
r
P
1
4
0
P
1
4
0
1
3
0
1
3
0
1
2
0
1
2
0
7
0
6
0
5
0
4
0
7
0
6
0
5
0
4
0
A
g
e
A
g
e
10
11-2 Scatter Plots - Other Examples
11-10
Negative Relationship
11
11-2 Scatter Plots - Other Examples
11-11
No Relationship
12
11-3 Correlation Coefficient
11-12
  • The correlation coefficient computed from the
    sample data measures the strength and direction
    of a relationship between two variables.
  • Sample correlation coefficient, r.
  • Population correlation coefficient, ??

13
11-3 Range of Values for the Correlation
Coefficient
11-13
Strong negative relationship
Strong positive relationship
No linear relationship
??
??
?
14
11-3 Formula for the Correlation Coefficient r
11-14
(
)
(
)
(
)
?
?
?
?
n
xy
x
y
?
r
(
(
(
(
)
)
)
)
?
?
?
?
?
?
?
?
?
?
2
2
n
x
x
n
y
y
2
2
Where n is the number of data pairs
15
11-3 Correlation Coefficient - Example (Verify)
11-15
  • Compute the correlation coefficient for the age
    and blood pressure data.

x
y
x
y
3
4
5
8
1
9
4
7
6
3
4
,


,


,

?
?
?
x
y
2
0
3
9
9
1
1
2
4
4
3
,
,
,
.



2
2
?
?
S
u
b
s
t
i
t
u
t
i
n
g
i
n
t
h
e
f
o
r
m
u
l
a
f
o
r
r
g
i
v
e
s






r
0
8
9
7
.
.



















16
11-3 The Significance of the Correlation
Coefficient
11-16
  • The population corelation coefficient, ?, is the
    correlation between all possible pairs of data
    values (x, y) taken from a population.

17
11-3 The Significance of the Correlation
Coefficient
11-17
  • H0 ?? 0 H1 ??? 0
  • This tests for a significant correlation between
    the variables in the population.

18
11-3 Formula for the t tests for the
Correlation Coefficient
11-18
?
n
2
?
t
?
r
1
2
?
?
with
d
f
n
2

.
.
19
11-3 Example
11-19
  • Test the significance of the correlation
    coefficient for the age and blood pressure data.
    Use ? 0.05 and r 0.897.
  • Step 1 State the hypotheses.
  • H0 ?? 0 H1 ??? 0

20
11-3 Example
11-20
  • Step 2 Find the critical values. Since ?
    0.05 and there are 6 2 4 degrees of freedom,
    the critical values are t 2.776
    and t 2.776.
  • Step 3 Compute the test value. t
    4.059 (verify).

21
11-3 Example
11-21
  • Step 4 Make the decision. Reject the null
    hypothesis, since the test value falls in the
    critical region (4.059 gt 2.776).
  • Step 5 Summarize the results. There is a
    significant relationship between the variables of
    age and blood pressure.

22
11-4 Regression
11-22
  • The scatter plot for the age and blood pressure
    data displays a linear pattern.
  • We can model this relationship with a straight
    line.
  • This regression line is called the line of best
    fit or the regression line.
  • The equation of the line is y a bx.

23
11-4 Formulas for the Regression Line y
a bx.
11-23
(
)
(
(
(
)
)
)
?
?
?
?
?
y
x
x
xy
2
?
?
a

(
(
)
)
?
?
?
2
n
x
x
2
(
)
(
)
(
)
?
?
?
?
n
xy
x
y
?
b

(
)
(
)
?
?
?
2
n
x
x
2
Where a is the y intercept and b is the slope
of the line.
24
11-4 Example
11-24
  • Find the equation of the regression line for the
    age and the blood pressure data.
  • Substituting into the formulas give a
    81.048 and b 0.964 (verify).
  • Hence, y 81.048 0.964x.
  • Note, a represents the intercept and b the slope
    of the line.

25
11-4 Example
11-25
y ? 81.048 0.964x
26
11-4 Using the Regression Line to
Predict
11-26
  • The regression line can be used to predict a
    value for the dependent variable (y) for a given
    value of the independent variable (x).
  • Caution Use x values within the experimental
    region when predicting y values.

27
11-4 Example
11-27
  • Use the equation of the regression line to
    predict the blood pressure for a person who is 50
    years old.
  • Since y 81.048 0.964x, theny 81.048
    0.964(50) 129.248?? 129.
  • Note that the value of 50 is within the range of
    x values.?

28
11-5 Coefficient of Determination and Standard
Error of Estimate
11-28
  • The coefficient of determination, denoted by r2,
    is a measure of the variation of the dependent
    variable that is explained by the regression line
    and the independent variable.

29
11-5 Coefficient of Determination and Standard
Error of Estimate
11-29
  • r2 is the square of the correlation coefficient.
  • The coefficient of nondetermination is (1 r2).
  • Example If r 0.90, then r2 0.81.

30
11-5 Coefficient of Determination and Standard
Error of Estimate
11-30
  • The standard error of estimate, denoted by sest,
    is the standard deviation of the observed y
    values about the predicted y values.
  • The formula is given on the next slide.

31
11-5 Formula for the Standard Error of
Estimate
11-31
(
)
?
2
?
y
y

?
s
?
n
2
est
or
?
?
?
?
?
y
a
y
b
xy
2
?
s
?
n
2
est
32
11-5 Standard Error of Estimate - Example
11-32
  • From the regression equation, y
    55.57 8.13x and n 6, find sest.
  • Here, a 55.57, b 8.13, and n 6.
  • Substituting into the formula gives sest
    6.48 (verify).

33
11-5 Prediction Interval
11-33
  • A prediction interval is an interval constructed
    about a predicted y value, y , for a specified
    x value.

34
11-5 Prediction Interval
11-34
  • For given ? value, we can state with (1 ?)100
    confidence that the interval will contain the
    actual mean of the y values that correspond to
    the given value of x.

35
11-5 Formula for the Prediction Interval about a
Value y
11-35
2
-
)
(
1
X
x
n



-
1
s
t
y
2
est
a
2
2
(
)
n
å
-
å
x
x
n
2
-
)
(
1
X
x
n




1
s
t
y
2
est
a
2
2
(
)
n
å
-
å
x
x
n
?
?
2
.
.
n
f
d
with
36
11-5 Prediction interval - Example
11-36
  • A researcher collects the data shown on the next
    slide and determines that there is a significant
    relationship between the age of a copy machine
    and its monthly maintenance cost. The regression
    equation is y 55.57 8.13x. Find the 95
    prediction interval for the monthly maintenance
    cost of a machine that is 3 years old.

37
11-5 Prediction Interval - Example
11-37
A
1
62
B
2
78
C
3
70
D
4
90
E
4
93
F
6
103
38
11-5 Prediction Interval - Example
11-38
  • Step 1 Find ?x, ?x2 and . ?x 20, ?x2
    82,
  • Step 2 Find y ? for x 3. y 55.57
    8.13(3) 79.96
  • Step 3 Find sest sest 6.48 as shown in
    previous example.

39
11-5 Prediction Interval - Example
11-39
  • Step 4 Substitute in the formula and solve.
    t?/2 2.776, d.f. 6 2 4 for 95 60.53 lt
    y lt 99.39 (verify)Hence, one can be 95
    confident that the interval 60.53 lt y lt 99.39
    contains the actual value of y.
Write a Comment
User Comments (0)
About PowerShow.com