Title: Chapter 7 Part 1
1Chapter 7 -Part 1
2Correlation Topics
- Correlational research what is it and how do
you do co-relational research? - The three questions
- Is it a linear or curvilinear correlation?
- Is it a positive or negative relationship?
- How strong is the relationship?
- Solving these questions with t scores and r, the
estimated correlation coefficient derived from
the tx and ty scores of individuals in a random
sample.
3Correlational research how to start.
- To begin a correlational study, we select a
population or, far more frequently, select a
random sample from a population. - (Since we use samples most of the time, for the
most part, we will use the formulae and symbols
for computing a correlation from a sample.) - We then obtain two scores from each individual,
one score on each of two variables. These are
usually variables that we think might be related
to each other for interesting reasons). We call
one variable X and the other Y.
4Correlational research comparing tX tY scores
- We translate the raw scores on the X variable to
t scores (called tX scores) and raw scores on the
Y variable to tY scores. - So each individual has a pair of scores, a tX
score and a tY score. - You determine how similar or different the tX and
tY scores in the pairs are, on the average, by
subtracting tY from tX, then squaring, summing,
and averaging the tX and tY differences.
5The estimated correlation coefficient, Pearsons r
- With a simple formula, you transform the average
squared differences between the t scores to
Pearsons correlation coefficient, r - Pearsons r indicates (with a single number),
both the direction and strength of the
relationship between the two variables in your
sample. - r also estimates the correlation in the
population from which the sample was drawn - In Ch. 8, you will learn when you can use r that
way.
6Going from pairs of raw scores to r Linearity -
A preliminary question.
- Once you have scores on two variables, you
- ask, Is this a linear or curvilinear
relationship? - Psychology is a relatively new science and this
is an intro stat course - For both reasons, you will only learn how to deal
with linear relationships between two variables
and save correlation with three or more variables
and curvilinear relationships for grad school. - BUT YOU MUST KNOW WHAT A LINEAR RELATIONSHIP IS,
AND HOW TO RECOGNIZE A NONLINEAR (CURVILINEAR)
CORRELATION.
7Linearity vs. Curvilinearity
- In a linear relationship, as scores on one
- variable go from low to high, scores on the
- other variable either generally increase or
- generally decrease.
- In a curvilinear relationship, as scores on one
- variable go from low to high, scores on the
- other variable change directions. They can go
- 1.)down and then up, 2.) up and then down, 3.)
up and down and then up again, 4.) up or down
then flat. ETC. -
8Examples of linear relationships.
- For example, think of the relationship of the
size of a pleasure boat (X) and its cost (Y). - As one variable (boat size) increases, scores
- on the other variable (cost) also increase.
- Another example of a linear relationship the
relationship between the size of a car and the
number of miles per gallon it gets. - In general, as cars get gradually larger (X),
they tend to get fewer miles per gallon (Y).
9A curvilinear relationship
- In a curvilinear relationship, as scores on the X
variable go gradually from low to high, the Y
variable changes direction. - For example, think of the relationship between
age (X) and height (Y). - As age increases from 0-14 or so, height
increases also. - But then people stop growing. As age increases,
height stays the same. - Thus the Y variable, height, changes direction.
It goes from gradually rising to flat. - If you graph age and height, the best fitting
line is a curved line.
10Correlation Characteristics Which line best
shows the relationship between age (X) and height
(Y)
Linear vs Curvilinear
11Another non-linear relationship shortstops and
linemen great shortstops may be too small to be
great football lineman.
Football potential Terrible Average Average Very
Good Excellent Good Poor
Is this a linear relationship?
12Plot the dots!
- To check whether a relationship is linear, make a
graph and place the scores on it. - Thats what I mean by Plot the dots.
- If you really want to know what is going on with
data, Plot the dots! - Here is a graph for the baseball skills and
football potential data.
13When you plot the dots, is this linear?
Football Skill
Chuck
Frank
Al
Baseball Skill
Ben
Ed
George
NO! It is best described by a curved line. It is
a curvilinear relationship!
David
14After you know a correlation is linear, there are
other two questions Direction and Strength of a
correlation. But first, a definition of high and
low scores.
- Definition of high and low scores
- High scores are scores above the mean. They are
represented by positive t scores. - Low scores are scores below the mean of each
variable. - They are represented by negative t scores.
15Positive relationships
- In a positive relationship, as X scores gradually
increase, Y scores tend to increase as well.
Example The longer a sailboat is, the more it
tends to cost. As length goes up, price tends to
go up. - In a positive correlation, X and Y scores tend to
be on the same side of their respective means. - As a result, the tX and tY scores tend to be
similar and the difference between them (tX tY)
tends to be small. - Since (tX tY) is small, the squared difference
between them, (tX tY)2 also tends to be small
16(No Transcript)
17Graphing a positive relationship.
- In a positive correlation high scores on X tend
to go with high scores on Y. On a graph, as the
line runs from left to right, scores increase on
the X axis. At the same time, Y scores also
generally get higher. So, the line will tend to
rise as it runs. - Remember from math, slope equals how far a line
rises on the Y axis for each unit it moves from
left to right or runs along the X axis. - If a line rises from left to right, rise is
positive. Run is always positive. So a positive
rise divided by an (always) positive run results
in a positive slope. (Thats why we call it a
positive correlation.)
18Positive vs Negative scatterplot
19Graphic display of a strong POSITIVE correlation.
20Negative relationships
- In a negative relationship, as X scores gradually
increase, Y scores tend to decrease. Example
The more years a sailboat is used, the less it
tends to cost. As use goes up, price tends to go
down. - In a negative correlation, X and Y scores tend to
be on opposite sides of their respective means. - As a result, the tX and tY scores tend to be
dissimilar and the difference between them (tX
tY) tends to be large. - Since (tX tY) is large, the squared difference
between them, (tX tY)2 also tends to be large.
21(No Transcript)
22Graphing a negative relationship
- In a negative correlation, high scores on X tend
to go with low scores on Y. On a graph, as the
line runs from left to right, scores increase on
the X axis. At the same time, Y scores get lower.
So, the line will tend to fall as it runs. - Remember from math, slope equals how far a line
rises on the Y axis for each unit it moves from
left to right or runs along the X axis. - If a line falls from left to right, rise is
negative. Run is always positive. So a negative
rise divided by an (always) positive run results
in a negative slope. (Thats why we call it a
negative correlation.)
23Positive vs Negative scatterplot
24Summary
- When t scores are consistently more similar than
different, we have a positive correlation. On a
graph the dots will rise from your left to your
right. - When t scores are consistently more different
than similar, we have a negative correlation. On
a graph the dots will fall from your left to your
right.
25Positive vs Negative scatterplot
26How strong is the relationship between the tX and
tY scores?
- Here the question is about the consistency with
which tX and tY scores are either similar or
dissimilar.
27t scores sign and size
- There are two aspects to the consistency of the
relationship between tX and tY scores. - First, are the t scores consistently of the same
sign (positive correlation) or opposite signs
(negative correlation). - If they are almost always one way or the other,
you have at least a moderately strong
relationship. - On the other hand, if you sometimes see t scores
on the same side of the mean and sometimes on
opposite sides, you have a relatively weak
correlation.
28t scores sign and size
- If there is a consistent pattern of same signed t
scores (positive correlation) or a consistent
pattern of opposite signed t scores (negative
correlation), then whether the tX and tY scores
are about the same distance from the mean comes
into play. - The large majority of t scores (usually well over
95, range from 2.50 to 2.50 - Given a consistent positive or negative
correlation, the more similar in size the t
scores, the stronger the correlation.
29Positive correlations
- Perfect tX and tY scores are all the same sign
and are identical in size. - Strong tX and tY scores are almost all the same
sign and are fairly similar in size. - Moderate tX and tY scores are predominately the
same sign. This is especially true for pairs in
which one of the values is one or more standard
deviations from the mean. Size may be fairly
dissimilar. - Weak tX and tY scores are a little more often
the same sign than opposite in sign. Nothing can
be said about size.
30Negative correlations
- Perfect tX and tY scores are all of the opposite
sign and are identical in size. - Strong tX and tY scores are almost all of
opposite sign and are fairly similar in size. - Moderate tX and tY scores are predominately
opposite in sign. This is especially true for
pairs in which one of the values is one or more
standard deviations from the mean. Size may be
fairly dissimilar. - Weak tX and tY scores are a little more often of
opposite signs than the same in sign. Nothing can
be said about size.
31Unrelated (independent) variables
- When the size and sign of the tX scores bears no
relationship to the size and sign of the tY
scores, the variables are unrelated. - We also can call the variables independent of
or orthogonal to each other. The three terms,
unrelated, independent and orthogonal are
synonymous in this context.
32Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
- Perfect - scores fall exactly on a straight line
whose slope will be 1.00 or 1.00. - Strong - most scores fall near the line whose
slope will be close to .750 or -.750. - Moderate - some are near the line, some not. The
slope of the line will be close to .500 or -.500.
33Graphing it on t axes The strength of a
relationship tells us approximately how the dots
representing pairs of t scores will fall around a
best fitting line.
- Weak some scores fall fairly close to the line,
but others fall quite far from it. The slope of
the line will be close to .250 or -.250 - Independent - the scores are not close to the
line and form a circular or square pattern. The
best fitting line will be the X axis, a line with
a slope of 0.000.
34Strength of a relationship
35Strength of a relationship
36Strength of a relationship
Moderate
37Strength of a relationship
38What is this relationship?
39What is this?
40What is this?
41What is this?
42Computing the correlation coefficient.
43Comparing apples to oranges? Use Z or t scores!
- You can use correlation to look for the
relationship between ANY two values that you can
measure of a single subject. - However, there may not be any relationship
(independent). - A correlation tells us if scores are consistently
similar on two measures, consistently different
from each other, or have no real pattern
44Comparing apples to oranges? Use t scores!
- To compare scores on two different variables, you
transform them into ZX and ZY scores if you are
studying a population or tX and tY scores if you
have a sample. - ZX and ZY scores (or tX and tY scores) can be
directly compared to each other to see whether
they are consistently similar, consistently quite
different, or show no consistent pattern of
similarity or difference
45Comparing variables
- Anxiety symptoms, e.g., heartbeat, with number of
hours driving to class. - Hat size with drawing ability.
- Math ability with verbal ability.
- Number of children with IQ.
- Turn them all into Z or t scores
46Pearsons Correlation Coefficient
- coefficient - noun, a number that serves as a
measure of some property. - The correlation coefficient indexes BOTH the
consistency and direction of a correlation with a
single number
47Pearsons rho
- Pearsons rho (?) is the parameter that
characterizes the strength and direction of a
linear relationship (and only a linear
relationship) between two variables. To compute
rho, you must have the entire population. Then
you can compute sigma, mu, Z scores and rho. - The formula rho 1 -(1/2 ?(ZX - ZY)2 / (NP))
where NP is the number of pairs of Z scores in
the population - In English The correlation coefficient equals 1
minus half the average squared distance between
the Z scores.
48Pearsons rho
- When you have a perfect positive correlation, the
Z scores will be identical in size and sign. So
the average squared distance will be zero and rho
1.000-1/2(0.000) 1.000 - When you have a perfect negative correlation, the
Z scores will be identical in size and opposite
in sign. It can be proven algebraically that the
average squared distance in that case will be
4.000 rho 1.000-1/2(4.000) -1.000 - When you have two totally independent variables,
the average squared distance will be 2.000
(halfway between 0.000 and 4.000). Thus, rho
1.000-1/2(2.000) 0.000
49Pearsons Correlation Coefficient
- Thus, rho varies from -1.000 (perfect negative
correlation to 0.000 (independent variables) to
1.000 (perfect positive correlation). - A negative value indicates a negative
relationship a positive value indicates a
positive relationship. - Values of r close to 1.000 or -1.000 indicate a
strong (consistent) relationship values close
to 0.000 indicate a weak (inconsistent) or
independent relationship.
50Estimating rho with r
- Computing rho involves finding the actual average
squared distance between the ZX and ZY scores in
the whole population. - In computing r, we are estimating rho.
51The formula for r
- Pearsons r is a least squares, unbiased estimate
of rho, based on the relationships found between
tX and tY scores in a random sample. - r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
equals one less than the number of pairs of t
scores in the sample. - In English Pearsons r equals 1.000 minus half
the estimated average squared difference between
the Z scores in the population based on squared
differences between the t scores in the sample.
52Look at those formulae again.
- rho 1 -(1/2 ?(ZX - ZY)2 / (NP)) where NP is the
number of pairs of Z scores in the population - ?(ZX - ZY)2 / (NP) is the average squared
distance between the Z scores. - The rest of the formula, simply transforms the
average squared distance between the Z scores
into a variable that goes from 1.000 to 1.000.
53Look at those formulae again.
- r 1 - (1/2 ?(tX - tY)2 / (nP - 1)) where nP-1
equals one less than the number of pairs of t
scores in the sample. - REMEMBER, t scores are estimated Z scores
- . ?(tX - tY)2 / (nP - 1)) is a least squared,
unbiased estimate of the average squared
difference between the Z scores in the population
based on the differences between the tX and tY
scores in a random sample. - The rest of the formula, simply transforms the
estimated average squared distance between the Z
scores into a variable that goes from 1.000 to
1.000.
54Thus, r, the least squared, unbiased estimate of
rho, is basically an estimate of the average
squared difference between the ZX and ZY scores
in the population transformed into a variable
that goes from -1.00 to 1.00.
55Similarities of r and rho
- r and rho vary from -1.000 to 1.000.
- For both r and rho, a negative value indicates a
negative relationship a positive value indicates
a positive relationship. - Values of r or rho close to 1.000 or -1.000
indicate a strong (consistent) relationship
values close to 0.000 indicate a weak
(inconsistent) or independent relationship.
56Since we almost always are studying random
samples, not populations, we almost always
compute Pearsons r, not Pearsons rho.
57r, strength and direction
Perfect, positive 1.00 Strong, positive
.75 Moderate, positive .50 Weak, positive
.25 Independent .00 Weak, negative -
.25 Moderate, negative - .50 Strong, negative
- .75 Perfect, negative -1.00
58Calculating Pearsons r
- Select a random sample from a population obtain
scores on two variables, which we will call X and
Y. - Convert all the scores into t scores.
59Calculating Pearsons r
- First, subtract the tY score from the tX score in
each pair. - Then square all of the differences and add them
up, that is, ?(tX - tY)2.
60Calculating Pearsons r
- Estimate the average squared distance between ZX
and ZY by dividing by the sum of squared
differences between the t scores by (nP -
1). ?(tX - tY)2 / (nP - 1) - To turn this estimate into Pearsons r, use the
formula r 1 - (1/2 ?(tX - tY)2 / (nP - 1))
61Example Calculate t scores for X
DATA 2 4 6 8 10
MSW 40.00/(5-1) 10
sX 3.16
62Calculate t scores for Y
DATA 9 11 10 12 13
MSW 10.00/(5-1) 2.50
sY 1.58
63Calculate r
tY -0.63 -1.26 -0.63 0.63 1.26
tX -1.26 -0.63 0.00 0.63 1.26
tX - tY 0.00 -0.63 0.63 0.00 0.00
(tX - tY)2 0.00 0.40 0.40 0.00 0.00
This is a very strong, positive relationship.
? (tX - tY)2 / (nP - 1)0.200
r 1.000 - (1/2 (? (tX - tY)2 / (nP - 1)))
r 1.000 - (1/2 .200)
1 - .100 .900
64By the way - True graphs.
- Ch.7 has true graphs, displays in which each dot
stands for a score on two (in this case) or more
(in more advanced cases) variables. - In Ch. 1 through Ch. 6, most of the figures have
represented the frequency of scores on a single
variable. - Formally, displays of frequencies are figures,
but they are not graphs.