Title: Chapter 9 Correlation and Regression
1STATISTICS
Chapter 5 Correlation/Regression
MVS 250 V. Katch
2 Overview
- Paired Data
- is there a relationship
- if so, what is the equation
- use the equation for prediction
3 4Definition
- Correlation
- exists between two variables when one of them is
related to the other in some way
5Assumptions
- 1. The sample of paired data (x,y) is a
random sample. - 2. The pairs of (x,y) data have a
bivariate normal distribution.
6Definition
- Scatterplot (or scatter diagram)
- is a graph in which the paired (x,y) sample data
are plotted with a horizontal x axis and a
vertical y axis. Each individual (x,y) pair is
plotted as a single point.
7Scatter Diagram of Paired Data
8Scatter Diagram of Paired Data
9Positive Linear Correlation
y
y
y
x
x
x
(a) Positive
(b) Strong positive
(c) Perfect positive
Scatter Plots
10Negative Linear Correlation
y
y
y
x
x
x
(d) Negative
(e) Strong negative
(f) Perfect negative
Scatter Plots
11No Linear Correlation
y
y
x
x
(h) Nonlinear Correlation
(g) No Correlation
Scatter Plots
12 - Definition
- Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample
Where ?xy/n is the mean of the cross products
(?x/n) is the mean of the x variable (?y/n) is
the mean of the y variable SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
13Notation for the Linear Correlation Coefficient
- n number of pairs of data presented
- ? denotes the addition of the items indicated.
- ?x/n denotes the mean of all x values.
- ?y/n denotes the mean of all y values.
- ?xy/n denotes the mean of the cross products x
times y, summed divided by n - r linear correlation coefficient for a sample
- ? linear correlation coefficient for a
population
14Rounding the Linear Correlation Coefficient r
- Round to three decimal places
- Use calculator or computer if possible
15Properties of the Linear Correlation Coefficient
r
- 1. -1 ? r ? 1
- 2. Value of r does not change if all values of
either variable are converted to a different
scale. - 3. The r is not affected by the choice of x and
y. Interchange x and y and the value of r will
not       change. - 4. r measures strength of a linear relationship.
16Interpreting the Linear Correlation Coefficient
- If the absolute value of r exceeds the value in
Sig. Table, conclude that there is a significant
linear correlation. - Otherwise, there is not sufficient evidence to
support the conclusion of significant linear
correlation. - Remember to use n-2
17Common Errors Involving Correlation
- 1. Causation It is wrong to conclude that
correlation implies causality. - 2. Averages Averages suppress individual
variation and may inflate the correlation
coefficient. - 3. Linearity There may be some relationship
between x and y even when there is no
significant linear correlation.
18Common Errors Involving Correlation
19Correlation is Not Causation
A
B
C
20Correlation Calculations
Rank Order Correlation - RhoPearsons - r
21Rank Order Correlation
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
22Rank Order Correlation, cont
Rho 1- 6 (?D2) / N (N2-1)
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
Rho 1- 6(58)/10(102-1) Rho 1- 348 / 10
(100 -1) Rho 1- 348 / 990 Rho 1- 0.352 Rho
0.648
(?D2 58)
N10
23Pearsons r
Hits HR ?xy
1 3 3
2 4 8
3 5 15
4 1 4
5 7 35
6 6 36
7 2 14
8 10 80
9 9 81
10 8 80
?x/n5.5 ?x/n 5.5 ?xy/n 32.86
r 32.86 - (5.5) (5.5)/(3.03) (3.03) r 35.86 -
30.25 / 9.09 r 5.61 / 9.09 r 0.6172
24Pearsons r
Excel Demonstration
25Is there a significant linear correlation?
26Is there a significant linear correlation?
27Is there a significant linear correlation?
28Is there a significant linear correlation?
n 8 ? 0.05 H0 ? 0
H1 ? ? 0
Test statistic is r 0.842
Critical values are r - 0.707 and 0.707 (Table
R with n 8 and ? 0.05)
TABLE R Critical Values of the Pearson
Correlation Coefficient r
29Is there a significant linear correlation?
0.842 gt 0.707, That is the test statistic does
fall within the critical region.
Therefore, we REJECT H0 ? 0 (no correlation)
and conclude there is a significant linear
correlation between the weights of discarded
plastic and household size.
Fail to reject ? 0
Reject ??? 0
Reject ??? 0
1
- 1
r - 0.707
0
r 0.707
Sample data r 0.842
30Method 1 Test Statistic is t (follows format
of earlier chapters)
31Formal Hypothesis Test
- To determine whether there is a
significant linear correlation between two
variables - Two methods
- Both methods let H0 ??????
- (no significant
linear correlation) - H1 ????????
- (significant linear
correlation)
32Method 2 Test Statistic is r (uses fewer
calculations)
- Test statistic r
- Critical values Refer to Table R
- (no degrees of freedom)
33Method 2 Test Statistic is r (uses fewer
calculations)
- Test statistic r
- Critical values Refer to Table A-6
- (no degrees of freedom)
Fail to reject ? 0
Reject ??? 0
Reject ??? 0
r 0.811
1
0
r - 0.811
-1
Sample data r 0.828
34Method 1 Test Statistic is t (follows format
of earlier chapters)
Test statistic
r
t
1 - r 2
n - 2
Critical values use Table T with
degrees of freedom n - 2
35- Testing for a
- Linear Correlation
Start
Let H0 ? 0 H1 ? ? 0
METHOD 1
METHOD 2
36Why does the critical value of r increase as
sample size decreases?
A correlation by chance is more likely.
37Coefficient of Determination(Effect Size)
r2
The part of variance of one variable that can be
explained by the variance of a related variable.
38Justification for r Formula
? (x -x) (y -y)
r
(x, y) centroid of sample points
(n -1) Sx Sy
x 3
y
x - x 7- 3 4
(7, 23)
24
20
y - y 23 - 11 12
Quadrant 1
Quadrant 2
16
12
y 11
(x, y)
8
Quadrant 3
Quadrant 4
4
x
0
0
1
2
3
4
5
6
7