Chapter 9 Correlation and Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 9 Correlation and Regression

Description:

Chapter 5 Correlation ... strength of the linear relationship between paired x and y values in a sample Where xy/n is the mean of the cross products ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 39
Provided by: Addi50
Category:

less

Transcript and Presenter's Notes

Title: Chapter 9 Correlation and Regression


1
STATISTICS
Chapter 5 Correlation/Regression
MVS 250 V. Katch
2
Overview
  • Paired Data
  • is there a relationship
  • if so, what is the equation
  • use the equation for prediction

3
  • Correlation

4
Definition
  • Correlation
  • exists between two variables when one of them is
    related to the other in some way

5
Assumptions
  • 1. The sample of paired data (x,y) is a
    random sample.
  • 2. The pairs of (x,y) data have a
    bivariate normal distribution.

6
Definition
  • Scatterplot (or scatter diagram)
  • is a graph in which the paired (x,y) sample data
    are plotted with a horizontal x axis and a
    vertical y axis. Each individual (x,y) pair is
    plotted as a single point.

7
Scatter Diagram of Paired Data
8
Scatter Diagram of Paired Data
9
Positive Linear Correlation
y
y
y
x
x
x
(a) Positive
(b) Strong positive
(c) Perfect positive
Scatter Plots
10
Negative Linear Correlation
y
y
y
x
x
x
(d) Negative
(e) Strong negative
(f) Perfect negative
Scatter Plots
11
No Linear Correlation
y
y
x
x
(h) Nonlinear Correlation
(g) No Correlation
Scatter Plots
12



















  • Definition
  • Linear Correlation Coefficient r

measures strength of the linear relationship
between paired x and y values in a sample

Where ?xy/n is the mean of the cross products
(?x/n) is the mean of the x variable (?y/n) is
the mean of the y variable SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
13
Notation for the Linear Correlation Coefficient
  • n number of pairs of data presented
  • ? denotes the addition of the items indicated.
  • ?x/n denotes the mean of all x values.
  • ?y/n denotes the mean of all y values.
  • ?xy/n denotes the mean of the cross products x
    times y, summed divided by n
  • r linear correlation coefficient for a sample
  • ? linear correlation coefficient for a
    population

14
Rounding the Linear Correlation Coefficient r
  • Round to three decimal places
  • Use calculator or computer if possible

15
Properties of the Linear Correlation Coefficient
r
  • 1. -1 ? r ? 1
  • 2. Value of r does not change if all values of
    either variable are converted to a different
    scale.
  • 3. The r is not affected by the choice of x and
    y. Interchange x and y and the value of r will
    not       change.
  • 4. r measures strength of a linear relationship.

16
Interpreting the Linear Correlation Coefficient
  • If the absolute value of r exceeds the value in
    Sig. Table, conclude that there is a significant
    linear correlation.
  • Otherwise, there is not sufficient evidence to
    support the conclusion of significant linear
    correlation.
  • Remember to use n-2

17
Common Errors Involving Correlation
  • 1. Causation It is wrong to conclude that
    correlation implies causality.
  • 2. Averages Averages suppress individual
    variation and may inflate the correlation
    coefficient.
  • 3. Linearity There may be some relationship
    between x and y even when there is no
    significant linear correlation.

18
Common Errors Involving Correlation
19
Correlation is Not Causation
A
B
C
20
Correlation Calculations
Rank Order Correlation - RhoPearsons - r
21
Rank Order Correlation
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
22
Rank Order Correlation, cont
Rho 1- 6 (?D2) / N (N2-1)
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4
Rho 1- 6(58)/10(102-1) Rho 1- 348 / 10
(100 -1) Rho 1- 348 / 990 Rho 1- 0.352 Rho
0.648
(?D2 58)
N10
23
Pearsons r
Hits HR ?xy
1 3 3
2 4 8
3 5 15
4 1 4
5 7 35
6 6 36
7 2 14
8 10 80
9 9 81
10 8 80
?x/n5.5 ?x/n 5.5 ?xy/n 32.86
r 32.86 - (5.5) (5.5)/(3.03) (3.03) r 35.86 -
30.25 / 9.09 r 5.61 / 9.09 r 0.6172
24
Pearsons r
Excel Demonstration
25
Is there a significant linear correlation?
26
Is there a significant linear correlation?
27
Is there a significant linear correlation?
28
Is there a significant linear correlation?
n 8 ? 0.05 H0 ? 0
H1 ? ? 0
Test statistic is r 0.842
Critical values are r - 0.707 and 0.707 (Table
R with n 8 and ? 0.05)
TABLE R Critical Values of the Pearson
Correlation Coefficient r
29
Is there a significant linear correlation?
0.842 gt 0.707, That is the test statistic does
fall within the critical region.
Therefore, we REJECT H0 ? 0 (no correlation)
and conclude there is a significant linear
correlation between the weights of discarded
plastic and household size.
Fail to reject ? 0
Reject ??? 0
Reject ??? 0
1
- 1
r - 0.707
0
r 0.707
Sample data r 0.842
30
Method 1 Test Statistic is t (follows format
of earlier chapters)
31
Formal Hypothesis Test
  • To determine whether there is a
    significant linear correlation between two
    variables
  • Two methods
  • Both methods let H0 ??????
  • (no significant
    linear correlation)
  • H1 ????????
  • (significant linear
    correlation)

32
Method 2 Test Statistic is r (uses fewer
calculations)
  • Test statistic r
  • Critical values Refer to Table R
  • (no degrees of freedom)

33
Method 2 Test Statistic is r (uses fewer
calculations)
  • Test statistic r
  • Critical values Refer to Table A-6
  • (no degrees of freedom)

Fail to reject ? 0
Reject ??? 0
Reject ??? 0
r 0.811
1
0
r - 0.811
-1
Sample data r 0.828
34
Method 1 Test Statistic is t (follows format
of earlier chapters)
Test statistic
r
t
1 - r 2
n - 2
Critical values use Table T with
degrees of freedom n - 2
35
  • Testing for a
  • Linear Correlation

Start
Let H0 ? 0 H1 ? ? 0
METHOD 1
METHOD 2
36
Why does the critical value of r increase as
sample size decreases?
A correlation by chance is more likely.
37
Coefficient of Determination(Effect Size)
r2
The part of variance of one variable that can be
explained by the variance of a related variable.
38
Justification for r Formula
? (x -x) (y -y)
r
(x, y) centroid of sample points
(n -1) Sx Sy
x 3
y
x - x 7- 3 4
(7, 23)

24
20
y - y 23 - 11 12
Quadrant 1
Quadrant 2
16

12
y 11
(x, y)

8
Quadrant 3
Quadrant 4


4
x
0
0
1
2
3
4
5
6
7
Write a Comment
User Comments (0)
About PowerShow.com