Title: Describing Relationships: Scatterplots and Correlation
1Chapter 14
- Describing Relationships Scatterplots and
Correlation
2Correlation
Objective Analyze a collection of paired data
(sometimes called bivariate data). A correlation
exists between two variables when there is a
relationship (or an association) between them.
- We will consider only linear relationships.
- - when graphed, the points approximate a
- straight-line pattern.
3Scatterplot
- A scatterplot is a graph in which paired (x, y)
data (usually collected on the same individuals)
are plotted with one variable represented on a
horizontal (x -) axis and the other variable
represented on a vertical (y-) axis. Each
individual pair (x, y) is plotted as a single
point.
Example
4Examining a Scatterplot
- You can describe the overall pattern of a
scatterplot by the - Form linear or non-linear ( quadratic,
exponential, no - correlation etc.)
- Direction negative, positive.
- Strength strong, very strong, moderately
strong, - weak etc.
- Look for outliers and how they affect the
correlation.
5Scatterplot
Example Draw a scatter plot for the data below.
What is the nature of the
relationship between X and Y.
Strong, positive and linear.
6Examining a Scatterplot
- Two variables are positively correlated when high
values of the variables tend to occur together
and low values of the variables tend to occur
together. - The scatterplot slopes upwards from left to
right. - Two variables are negatively correlated when
high values of one of the variables tend to occur
with low values of the other and vice versa. - The scatterplot slopes downwards from left to
right.
7Types of Correlation
As x increases, y tends to decrease.
As x increases, y tends to increase.
Negative Linear Correlation
Positive Linear Correlation
No Correlation
Non-linear Correlation
8Examples of Relationships
9Thought Question 1
What type of association would the following
pairs of variables have positive, negative, or
none?
- Temperature during the summer and electricity
bills - Temperature during the winter and heating costs
- Number of years of education and height
- Frequency of brushing and number of cavities
- Number of churches and number of bars in cities
- Height of husband and height of wife
10Thought Question 2
- Consider the two scatterplots below. How does
the outlier impact the correlation for each plot? - does the outlier increase the correlation,
decrease the correlation, or have no impact?
11Measuring Strength Directionof a Linear
Relationship
- How closely does a non-horizontal straight line
fit the points of a scatterplot? - The correlation coefficient (often referred to as
just correlation) r - measure of the strength of the relationship
the stronger the relationship, the larger the
magnitude of r. - measure of the direction of the relationship
positive r indicates a positive relationship,
negative r indicates a negative relationship.
12Correlation Coefficient
Greek Capital Letter Sigma denotes summation or
addition.
13Correlation Coefficient
- The range of the correlation coefficient is -1 to
1.
If r -1 there is a perfect negative correlation
If r 1 there is a perfect positive correlation
If r is close to 0 there is no linear correlation
14Linear Correlation
r ?0.91
r 0.88
Strong negative correlation
Strong positive correlation
r 0.42
r 0.07
Try
Weak positive correlation
Non-linear Correlation
15Correlation Coefficient
- special values for r
- a perfect positive linear relationship would have
r 1 - a perfect negative linear relationship would have
r -1 - if there is no linear relationship, or if the
scatterplot points are best fit by a horizontal
line, then r 0 - Note r must be between -1 and 1, inclusive
- r gt 0 as one variable changes, the other
variable tends to change in the same direction - r lt 0 as one variable changes, the other
variable tends to change in the opposite direction
16Examples of Correlations
- Husbands versus Wifes ages
- r .94
- Husbands versus Wifes heights
- r .36
- Professional Golfers Putting Success Distance
of putt in feet versus percent success - r -.94
Plot
17Correlation Coefficient
- Because r uses the z-scores for the observations,
it does not change when we change the units of
measurements of x , y or both. - Correlation ignores the distinction between
explanatory and response variables. - r measures the strength of only linear
association between variables. - A large value of r does not necessarily mean that
there is a strong linear relationship between the
variables the relationship might not be linear
always look at the scatterplot. - When r is close to 0, it does not mean that there
is no relationship between the variables, it
means there is no linear relationship. - Outliers can inflate or deflate correlations.
Try
18Not all Relationships are LinearMiles per Gallon
versus Speed
- Curved relationship(r is misleading)
- Speed chosen for each subject varies from 20 mph
to 60 mph - MPG varies from trial to trial, even at the same
speed - Statistical relationship
r-0.06
19Common Errors Involving Correlation
- 1. Causation It is wrong to conclude that
correlation implies causality. - 2. Averages Averages suppress individual
variation and may inflate the correlation
coefficient. - 3. Linearity There may be some relationship
between x and y even when there is no linear
correlation.
20Correlation and Causation
- The fact that two variables are strongly
correlated does not in itself imply a
cause-and-effect relationship between the
variables. - If there is a significant correlation between two
variables, you should consider the following
possibilities. - Is there a direct cause-and-effect relationship
between the variables? - Does x cause y?
21Correlation and Causation
- Is there a reverse cause-and-effect relationship
between the variables? - Does y cause x?
- Is it possible that the relationship between the
variables can be caused by a third variable or
by a combination of several other variables? - Is it possible that the relationship between two
variables may be a coincidence?
22Example
- A survey of the worlds nations in 2004 shows a
strong - positive correlation between percentage of
countries - using cell phones and life expectancy in years at
birth. - Does this mean that cell phones are good for your
health? - No. It simply means that in countries where cell
phone use is high, the life expectancy tends to
be high as well. - What might explain the strong correlation?
- The economy could be a lurking variable. Richer
countries generally have more cell phone use and
better health care.
23Example
- The correlation between Age and Income as
measured on 100 - people is r 0.75. Explain whether or not each
of these - conclusions is justified.
- When Age increases, Income increases as well.
- The form of the relationship between Age and
Income is linear. - There are no outliers in the scatterplot of
Income vs. Age. - Whether we measure Age in years or months, the
correlation will still be 0.75.
24Example
- Explain the mistakes in the statements below
- My correlation of -0.772 between GDP and Infant
Mortality Rate shows that there is almost no
association between GDP and Infant Mortality
Rate. - There was a correlation of 0.44 between GDP and
Continent - There was a very strong correlation of 1.22
between Life Expectancy and GDP.
25Warnings aboutStatistical Significance
- Statistical significance does not imply the
relationship is strong enough to be considered
practically important. - Even weak relationships may be labeled
statistically significant if the sample size is
very large. - Even very strong relationships may not be labeled
statistically significant if the sample size is
very small.
26Key Concepts
- Strength of Linear Relationship
- Direction of Linear Relationship
- Correlation Coefficient
- Problems with Correlations
- r can only be calculated for quantitative data.