Describing Relationships: Scatterplots and Correlation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Describing Relationships: Scatterplots and Correlation

Description:

A correlation exists between two variables when there is a relationship (or an ... Outliers can inflate or deflate correlations. Try. Chapter 13. 18 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 27
Provided by: vcujemaysw
Category:

less

Transcript and Presenter's Notes

Title: Describing Relationships: Scatterplots and Correlation


1
Chapter 14
  • Describing Relationships Scatterplots and
    Correlation

2
Correlation
Objective Analyze a collection of paired data
(sometimes called bivariate data). A correlation
exists between two variables when there is a
relationship (or an association) between them.
  • We will consider only linear relationships.
  • - when graphed, the points approximate a
  • straight-line pattern.

3
Scatterplot
  • A scatterplot is a graph in which paired (x, y)
    data (usually collected on the same individuals)
    are plotted with one variable represented on a
    horizontal (x -) axis and the other variable
    represented on a vertical (y-) axis. Each
    individual pair (x, y) is plotted as a single
    point.

Example
4
Examining a Scatterplot
  • You can describe the overall pattern of a
    scatterplot by the
  • Form linear or non-linear ( quadratic,
    exponential, no
  • correlation etc.)
  • Direction negative, positive.
  • Strength strong, very strong, moderately
    strong,
  • weak etc.
  • Look for outliers and how they affect the
    correlation.

5
Scatterplot
Example Draw a scatter plot for the data below.
What is the nature of the
relationship between X and Y.
Strong, positive and linear.
6
Examining a Scatterplot
  • Two variables are positively correlated when high
    values of the variables tend to occur together
    and low values of the variables tend to occur
    together.
  • The scatterplot slopes upwards from left to
    right.
  • Two variables are negatively correlated when
    high values of one of the variables tend to occur
    with low values of the other and vice versa.
  • The scatterplot slopes downwards from left to
    right.

7
Types of Correlation
As x increases, y tends to decrease.
As x increases, y tends to increase.
Negative Linear Correlation
Positive Linear Correlation
No Correlation
Non-linear Correlation
8
Examples of Relationships
9
Thought Question 1
What type of association would the following
pairs of variables have positive, negative, or
none?
  • Temperature during the summer and electricity
    bills
  • Temperature during the winter and heating costs
  • Number of years of education and height
  • Frequency of brushing and number of cavities
  • Number of churches and number of bars in cities
  • Height of husband and height of wife

10
Thought Question 2
  • Consider the two scatterplots below. How does
    the outlier impact the correlation for each plot?
  • does the outlier increase the correlation,
    decrease the correlation, or have no impact?

11
Measuring Strength Directionof a Linear
Relationship
  • How closely does a non-horizontal straight line
    fit the points of a scatterplot?
  • The correlation coefficient (often referred to as
    just correlation) r
  • measure of the strength of the relationship
    the stronger the relationship, the larger the
    magnitude of r.
  • measure of the direction of the relationship
    positive r indicates a positive relationship,
    negative r indicates a negative relationship.

12
Correlation Coefficient
Greek Capital Letter Sigma denotes summation or
addition.
13
Correlation Coefficient
  • The range of the correlation coefficient is -1 to
    1.

If r -1 there is a perfect negative correlation
If r 1 there is a perfect positive correlation
If r is close to 0 there is no linear correlation
14
Linear Correlation
r ?0.91
r 0.88
Strong negative correlation
Strong positive correlation
r 0.42
r 0.07
Try
Weak positive correlation
Non-linear Correlation
15
Correlation Coefficient
  • special values for r
  • a perfect positive linear relationship would have
    r 1
  • a perfect negative linear relationship would have
    r -1
  • if there is no linear relationship, or if the
    scatterplot points are best fit by a horizontal
    line, then r 0
  • Note r must be between -1 and 1, inclusive
  • r gt 0 as one variable changes, the other
    variable tends to change in the same direction
  • r lt 0 as one variable changes, the other
    variable tends to change in the opposite direction

16
Examples of Correlations
  • Husbands versus Wifes ages
  • r .94
  • Husbands versus Wifes heights
  • r .36
  • Professional Golfers Putting Success Distance
    of putt in feet versus percent success
  • r -.94

Plot
17
Correlation Coefficient
  • Because r uses the z-scores for the observations,
    it does not change when we change the units of
    measurements of x , y or both.
  • Correlation ignores the distinction between
    explanatory and response variables.
  • r measures the strength of only linear
    association between variables.
  • A large value of r does not necessarily mean that
    there is a strong linear relationship between the
    variables the relationship might not be linear
    always look at the scatterplot.
  • When r is close to 0, it does not mean that there
    is no relationship between the variables, it
    means there is no linear relationship.
  • Outliers can inflate or deflate correlations.

Try
18
Not all Relationships are LinearMiles per Gallon
versus Speed
  • Curved relationship(r is misleading)
  • Speed chosen for each subject varies from 20 mph
    to 60 mph
  • MPG varies from trial to trial, even at the same
    speed
  • Statistical relationship

r-0.06
19
Common Errors Involving Correlation
  • 1. Causation It is wrong to conclude that
    correlation implies causality.
  • 2. Averages Averages suppress individual
    variation and may inflate the correlation
    coefficient.
  • 3. Linearity There may be some relationship
    between x and y even when there is no linear
    correlation.

20
Correlation and Causation
  • The fact that two variables are strongly
    correlated does not in itself imply a
    cause-and-effect relationship between the
    variables.
  • If there is a significant correlation between two
    variables, you should consider the following
    possibilities.
  • Is there a direct cause-and-effect relationship
    between the variables?
  • Does x cause y?

21
Correlation and Causation
  • Is there a reverse cause-and-effect relationship
    between the variables?
  • Does y cause x?
  • Is it possible that the relationship between the
    variables can be caused by a third variable or
    by a combination of several other variables?
  • Is it possible that the relationship between two
    variables may be a coincidence?

22
Example
  • A survey of the worlds nations in 2004 shows a
    strong
  • positive correlation between percentage of
    countries
  • using cell phones and life expectancy in years at
    birth.
  • Does this mean that cell phones are good for your
    health?
  • No. It simply means that in countries where cell
    phone use is high, the life expectancy tends to
    be high as well.
  • What might explain the strong correlation?
  • The economy could be a lurking variable. Richer
    countries generally have more cell phone use and
    better health care.

23
Example
  • The correlation between Age and Income as
    measured on 100
  • people is r 0.75. Explain whether or not each
    of these
  • conclusions is justified.
  • When Age increases, Income increases as well.
  • The form of the relationship between Age and
    Income is linear.
  • There are no outliers in the scatterplot of
    Income vs. Age.
  • Whether we measure Age in years or months, the
    correlation will still be 0.75.

24
Example
  • Explain the mistakes in the statements below
  • My correlation of -0.772 between GDP and Infant
    Mortality Rate shows that there is almost no
    association between GDP and Infant Mortality
    Rate.
  • There was a correlation of 0.44 between GDP and
    Continent
  • There was a very strong correlation of 1.22
    between Life Expectancy and GDP.

25
Warnings aboutStatistical Significance
  • Statistical significance does not imply the
    relationship is strong enough to be considered
    practically important.
  • Even weak relationships may be labeled
    statistically significant if the sample size is
    very large.
  • Even very strong relationships may not be labeled
    statistically significant if the sample size is
    very small.

26
Key Concepts
  • Strength of Linear Relationship
  • Direction of Linear Relationship
  • Correlation Coefficient
  • Problems with Correlations
  • r can only be calculated for quantitative data.
Write a Comment
User Comments (0)
About PowerShow.com