Describing Relationships: Scatterplots and Correlation - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Describing Relationships: Scatterplots and Correlation

Description:

Chapter 4 Describing Relationships: Scatterplots and Correlation Chapter 4 * Scatterplots and Correlations Chapter 4 * Scatterplots and Correlations Chapter 4 ... – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 33
Provided by: VCU1
Category:

less

Transcript and Presenter's Notes

Title: Describing Relationships: Scatterplots and Correlation


1
Chapter 4
  • Describing Relationships Scatterplots and
    Correlation

2
Objectives (BPS chapter 4)
  • Relationships Scatterplots and correlation
  • Explanatory and response variables
  • Displaying relationships scatterplots
  • Interpreting scatterplots
  • Adding categorical variables to scatterplots
  • Measuring linear association (correlation)
  • Facts about correlation

3
Scatterplot
  • A scatterplot is a graph in which paired (x, y)
    data (usually collected on the same individuals)
    are plotted with one variable represented on a
    horizontal (x -) axis and the other variable
    represented on a vertical (y-) axis. Each
    individual pair (x, y) is plotted as a single
    point.

Example
4
Student Number of Beers Blood Alcohol Level
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
Here we have two quantitative variables for
each of 16 students. 1. How many beers they
drank, and 2. Their blood alcohol level
(BAC) We are interested in the relationship
between the two variables How is one affected by
changes in the other one?
5
Scatterplots
  • In a scatterplot one axis is used to represent
    each of the variables, and the data are plotted
    as points on the graph.

Student Beers BAC
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
6
Explanatory and response variables
  • A response variable measures or records an
    outcome of a study. An explanatory variable
    explains changes in the response variable.
  • Typically, the explanatory or independent
    variable is plotted on the x axis and the
    response or dependent variable is plotted on the
    y axis.

7
Some plots dont have clear explanatory and
response variables.
Do calories explain sodium amounts?
Does percent return on Treasury bills explain
percent return on common stocks?
8
Examining a Scatterplot
  • You can describe the overall pattern of a
    scatterplot by the
  • Form linear or non-linear ( quadratic,
    exponential, no
  • correlation etc.)
  • Direction negative, positive.
  • Strength strong, very strong, moderately
    strong,
  • weak etc.
  • Look for outliers and how they affect the
    correlation.

9
Scatterplot
Example Draw a scatter plot for the data below.
What is the nature of the
relationship between X and Y.
x 1 2 3 4 5
y -4 -2 1 0 2
Strong, positive and linear.
10
Examining a Scatterplot
  • Two variables are positively correlated when high
    values of the variables tend to occur together
    and low values of the variables tend to occur
    together.
  • The scatterplot slopes upwards from left to
    right.
  • Two variables are negatively correlated when
    high values of one of the variables tend to occur
    with low values of the other and vice versa.
  • The scatterplot slopes downwards from left to
    right.

11
Types of Correlation
As x increases, y tends to decrease.
As x increases, y tends to increase.
Negative Linear Correlation
Positive Linear Correlation
No Correlation
Non-linear Correlation
12
Examples of Relationships
13
Caution
  • Relationships require that both variables be
    quantitative (thus the order of the data points
    is defined entirely by their value).
  • Correspondingly, relationships between
    categorical data are meaningless.
  • Example Beetles trapped on boards of different
    colors
  • What association? What relationship?

14
Thought Question 1
What type of association would the following
pairs of variables have positive, negative, or
none?
  1. Temperature during the summer and electricity
    bills
  2. Temperature during the winter and heating costs
  3. Number of years of education and height
    (Elementary School)
  4. Frequency of brushing and number of cavities
  5. Number of churches and number of bars in cities
  6. Height of husband and height of wife

15
Thought Question 2
  • Consider the two scatterplots below. How does
    the outlier impact the correlation for each plot?
  • does the outlier increase the correlation,
    decrease the correlation, or have no impact?

16
Strength of the association
  • The strength of the relationship between the two
    variables can be seen by how much variation, or
    scatter, there is around the main form.

With a strong relationship, you can get a pretty
good estimate of y if you know x.
With a weak relationship, for any x you might get
a wide range of y values.
17
How to scale a scatterplot
Same data in all four plots
  • Using an inappropriate scale for a scatterplot
    can give an incorrect impression.
  • Both variables should be given a similar amount
    of space
  • Plot roughly square
  • Points should occupy all the plot space (no
    blank space)

18
Adding categorical variables to scatterplots
  • Often, things are not simple and one-dimensional.
    We need to group the data into categories to
    reveal trends.

What may look like a positive linear relationship
is in fact a series of negative linear
associations. Plotting different habitats in
different colors allowed us to make that
important distinction.
19
Comparison of mens and womens racing records
over time. Each group shows a very strong
negative linear relationship that would not be
apparent without the gender categorization.
Relationship between lean body mass and metabolic
rate in men and women. While both men and women
follow the same positive linear trend, women show
a stronger association. As a group, males
typically have larger values for both variables.
20
Measuring Strength Directionof a Linear
Relationship
  • How closely does a non-horizontal straight line
    fit the points of a scatterplot?
  • The correlation coefficient (often referred to as
    just correlation) r
  • measure of the strength of the relationship
    the stronger the relationship, the larger the
    magnitude of r.
  • measure of the direction of the relationship
    positive r indicates a positive relationship,
    negative r indicates a negative relationship.

21
Correlation Coefficient
Greek Capital Letter Sigma denotes summation or
addition.
22
Example Find the correlation between X and Y
x 1 2 3 4 5
y -4 -2 1 0 2
x y
1 -2 -4 -3.4 6.8
2 -1 -2 -1.4 1.4
3 0 1 1.6 0
4 1 0 0.6 0.6
5 2 2 2.6 5.2
23
Correlation Coefficient
  • The range of the correlation coefficient is -1 to
    1.

If r -1 there is a perfect negative correlation
If r 1 there is a perfect positive correlation
If r is close to 0 there is no linear correlation
24
Linear Correlation
r ?0.91
r 0.88
Strong negative correlation
Strong positive correlation
r 0.42
r 0.07
Try
Weak positive correlation
Non-linear Correlation
25
Correlation Coefficient
  • special values for r
  • a perfect positive linear relationship would have
    r 1
  • a perfect negative linear relationship would have
    r -1
  • if there is no linear relationship, or if the
    scatterplot points are best fit by a horizontal
    line, then r 0
  • Note r must be between -1 and 1, inclusive
  • r gt 0 as one variable changes, the other
    variable tends to change in the same direction
  • r lt 0 as one variable changes, the other
    variable tends to change in the opposite direction

26
Correlation Coefficient
  • Because r uses the z-scores for the observations,
    it does not change when we change the units of
    measurements of x , y or both.
  • Correlation ignores the distinction between
    explanatory and response variables.
  • r measures the strength of only linear
    association between variables.
  • A large value of r does not necessarily mean that
    there is a strong linear relationship between the
    variables the relationship might not be linear
    always look at the scatterplot.
  • When r is close to 0, it does not mean that there
    is no relationship between the variables, it
    means there is no linear relationship.
  • Outliers can inflate or deflate correlations.

Try
27
Not all Relationships are LinearMiles per Gallon
versus Speed
  • Curved relationship(r is misleading)
  • Speed chosen for each subject varies from 20 mph
    to 60 mph
  • MPG varies from trial to trial, even at the same
    speed
  • Statistical relationship

r-0.06
28
Common Errors Involving Correlation
  • 1. Causation It is wrong to conclude that
    correlation implies causality.
  • 2. Averages Averages suppress individual
    variation and may inflate the correlation
    coefficient.
  • 3. Linearity There may be some relationship
    between x and y even when there is no linear
    correlation.

29
Example
  • A survey of the worlds nations in 2004 shows a
    strong
  • positive correlation between percentage of
    countries
  • using cell phones and life expectancy in years at
    birth.
  • Does this mean that cell phones are good for your
    health?
  • No. It simply means that in countries where cell
    phone use is high, the life expectancy tends to
    be high as well.
  • What might explain the strong correlation?
  • The economy could be a lurking variable. Richer
    countries generally have more cell phone use and
    better health care.

30
Example
  • The correlation between Age and Income as
    measured on 100
  • people is r 0.75. Explain whether or not each
    of these
  • conclusions is justified.
  • When Age increases, Income increases as well.
  • The form of the relationship between Age and
    Income is linear.
  • There are no outliers in the scatterplot of
    Income vs. Age.
  • Whether we measure Age in years or months, the
    correlation will still be 0.75.

31
Example
  • Explain the mistakes in the statements below
  • My correlation of -0.772 between GDP and Infant
    Mortality Rate shows that there is almost no
    association between GDP and Infant Mortality
    Rate.
  • There was a correlation of 0.44 between GDP and
    Continent
  • There was a very strong correlation of 1.22
    between Life Expectancy and GDP.

32
Key Concepts
  • Strength of Linear Relationship
  • Direction of Linear Relationship
  • Correlation Coefficient
  • Common Problems with Correlations
  • r can only be calculated for quantitative data.
Write a Comment
User Comments (0)
About PowerShow.com