BIVARIATE DATA - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

BIVARIATE DATA

Description:

The airfare from Pittsburgh to Baltimore is $30.08 more than you would expect ... 63% of the variability in airfare cost can be explained by a linear relationship ... – PowerPoint PPT presentation

Number of Views:1036
Avg rating:3.0/5.0
Slides: 59
Provided by: itd98
Category:

less

Transcript and Presenter's Notes

Title: BIVARIATE DATA


1
Chapter 5
  • BIVARIATE DATA

2
BIVARIATE DATA
  • The study designs considered so far investigated
    only one characteristic of a population. They are
    single variable studies.
  • Many study designs aim to look for an association
    between two quantitative variables measured on
    the same subject. These are bivariate study
    designs. The prefix bi means two.

3
Scatterplots
  • Scatterplots are the most useful graphical device
    to examine the possible association between two
    quantitative variables.
  • Scatterplots help us identify
  • Trends
  • Outliers

4
Linear Relationship
  • A scatterplot of systolic blood pressure and age
    for 29 subjects.

5
Constructing a Scatterplot
  • A scatterplot is a two dimension display
  • The data for one variable is plotted on the
    horizontal axis and the data for the other
    variable is plotted on the vertical axis.
  • The convention is to place the studied variable
    (response variable) on the vertical (y-axis). The
    variable that is used to do the predicting (the
    explanatory variable) is placed on the horizontal
    (x-axis).

6
Response Explanatory Variables
  • Some studies are conducted to predict the value
    of one variable using the value of anther
    associated variables. In these studies we can
    identify response and explanatory variables.
  • Others are conducted to simply look for potential
    associations between two bivariate variables. In
    these studies the choice of response and
    explanatory variables is arbitrary.

7
Examples
  • Decide which variable is the response and which
    is the explanatory variable
  • Serving size of an ice cream cone and the
    calories of the ice cream cone
  • Explanatorysize Response calories
  • The gas mileage of an automobile and the weight
    of the automobile
  • Explanatory weight Response mpg
  • The price of a theatre ticket and the number of
    ticket sales
  • Explanatory price Response sold
  • The age at marriage of the women and the age at
    marriage of the man
  • No clear choice just ? association

8
Linear Relationship
  • A scatterplot of systolic blood pressure and age
    for 29 subjects.
  • Points rise to the right
  • Positive association

9
Example Car Fuel Efficiency
Fuel_efficiency.xls
10
Scatterplot for Fuel Efficiency
  • Points fall to the right Negative association

11
Positive and Negative Trends
  • Two variables are said to have a positive (or
    direct) association if larger values of one
    variable occur with larger values of the other
    variable
  • They are said to have a negative (or inverse)
    association is smaller values of one variable
    occur with larger values of the other variable

12
Examples
  • Classify each association in the following slides
    as a strong, weak or no association
  • If there is an association go onto classify
    the association as either a positive or negative
    association

13
Classify Graph 1
  • Moderate, positive association

14
Classify Graph 2
  • Moderate, negative association

15
Classify Graph 3
  • Strong, positive association

16
Classify Graph 4
  • Strong, negative association

17
Classify Graph 5
  • No association

18
Non-linear Relationship
  • Monthly temperatures in Raleigh, N.C.

19
Measuring the Strength of a Linear Relationship
  • We have seen that associations can be
  • Strong, weak or non-existent
  • Positive or negative
  • Linear or non-linear
  • Now we want to numerically measure the
    strength of linear relationships

20
Correlation Coefficient
  • The correlation coefficient, r, measures the
    strength of a linear association
  • It is always the case that
  • If r 1 then there is perfect positive
    association
  • If r -1 then there is perfect negative
    association
  • We only quote a correlation coefficient if
    there is a linear relationship!

21
Correlation Guessing Game
  • http//www.stat.uiuc.edu/courses/stat100/java/GCAp
    plet/GCAppletFrame.html

22
Computational Formula
  • The computational formula for the sample
    correlation coefficient is

23
Good News
  • Graphical and statistical calculators as well as
    programs like Excel and StatCrunch provide the
    correlation coefficient as one of their options
    when doing bivariate analysis

24
Example Car Fuel Efficiency
Fuel_efficiency.xls
25
Scatterplot for Fuel Efficiency
  • Describe the association and estimate r

26
Results
r -0.816
27
Association Does Not Mean Causation!
Life_expectancy.xls
28
Results
r -0.789
29
Question
  • The Number of TVs per Person has a strong
    negative correlation with Life Expectancy
  • Does this mean buying more TVs will increase
    life expectancy?
  • Association does not mean Causation!

30
Linear Models
  • The correlation coefficient measures the strength
    of the linear relationship between two
    quantitative variables x and y.
  • A linear equation describing how an dependant
    variable, y, is associated with an explanatory
    variable, x, looks like
  • y b mx

31
Example
  • A college charges a basic fee of 100 a semester
    for a meal plan plus 2 a meal. The linear
    equation describing the association between the
    cost of the meal plan, y, and the number of meals
    purchased, x, is
  • y 100 2x

32
Linear Equations
  • A linear equation takes the form
  • y b mx
  • m slope
  • b y-intercept
  • The slope measures the rate of change of y
    with respect to x
  • The y-intercept measures the initial value of y
    (value of y when x 0)

33
Linear Modeling
  • Rarely does an exact linear relationship exist
    between two studied variables.
  • The correlation coefficient and the scatter plot
    help us decide if there is a reasonably strong
    linear relationship between two studied variables.

34
Airfare from Baltimore, MD (1995 data)
Airfare.xls
35
What is the Line of Best Fit?
  • What properties do we want the line we fit to the
    data to have?
  • It should be as close as possible to the data
  • To decide if one line is better than another we
    could measure how far data values are from the
    line
  • We will use the symbol, , to designate a
    predicted y-value

36
Scatter Plot
Y 278
.
Residual for Dallas
The airfare for Dallas is 52 higher than
predicted
37
Residuals
  • To find a residual we take
  • observed y-value and subtract the predicted
  • y-value
  • Positive residuals imply the observed value is
    higher than predicted
  • Negative residuals imply the observed value is
    less than predicted

38
Minimize the Residuals
  • It would seem like a good criterion would be to
    have the line of best fit be one that the
    residuals.
  • Since residuals can be both positive and negative
    we dont want to just add them up . Cancellation
    of positive and negative residuals would occur.

39
Remember the Standard Deviation?
  • Remember how when we defined the standard
    deviation we squared the difference between a
    data value and the mean to create a non-negative
    quantity?

40
Do it Again
  • We apply the same process to create positive
    residuals . We square them
  • Then we look for the line than minimizes the sum
    of these squared residuals

41
How it Works
  • Minimize
  • Its a good plan but you need calculus to carry
    it out.
  • The end results are actually quite easy. All
    thats important is that you realize the formulas
    that will be presented minimize the residuals

42
The Formulas
  • The methods of calculus can be used to find
    equations for the slope and y-intercept of the
    least squares line. Here are the results.

43
Least Squares Line for Airfare Data
  • Distance (x)
  • Airfare (y)
  • r .795

44
Prediction Equation
  • Airfare 83.53 (.117)distance
  • Each additional 100 miles costs an additional
    11.70

45
A Few Residuals
Pittsburgh Distance from Baltimore 210
miles Airfare 138 Predicted airfare .117(210)
83.53 107.92 Residual 138 107.92
30.08 The airfare from Pittsburgh to Baltimore
is 30.08 more than you would expect based on
the distance between these cities
St. Lois Distance from Baltimore 737
miles Airfare 98 Predicted airfare .117(737)
83.53 169.77 Residual 98 169.77 -
71.77 The airfare from St. Lois to Baltimore is
71.77 less than you would expect based on the
distance between these cities
46
Coefficient of Determination
  • The coefficient of determination is the
    correlation coefficient squared
  • It can help us determine how good the least
    squares line is as a prediction equation
  • It is the fractional amount of total variation in
    y that can be explained by the linear
    relationship with x

47
Airfare Data
  • r .795
  • r2 .63
  • 63 of the variability in airfare cost can be
    explained by a linear relationship with distance
  • 37 of the variability in airfare cost is due to
    factors other than distance

48
Age Systolic Blood Pressure for 30 Adults
SBP.xls
49
Approximate Positive Linear Relationship
50
Equation of Fitted Line
SBP 98.7 0.97(AGE)y 98.7 0.97 x
51
Interpretation of the Slope
  • The slope of the SBP vs Age fitted equation is
    0.97
  • 0.97 rate of change of SBP with respect to age
  • Every year a subjects blood pressure rises
    approximately 0.97 units.

52
Analysis of Variability
  • r .657 and r2 .43
  • 43 of the variability in SBP can be explained by
    a linear relationship with age
  • 57 of the variability in SBP is due to factors
    other than age

53
Interpreting Residuals
Subject 2 Age 47 SBP 220 Predicted SBP
98.71 .97(47) 144.35 Residual 220 144.35
75.65 This subjects SBP is 75.65 units higher
than would be expected for his age
Subject 23 Age 39 SBP 120 Predicted SBP
98.71 .97(39) 136.58 Residual 120 136.58
- 16.58 This subjects SBP is 16.58 units lower
than would be expected for his age
54
More Good News
  • Many computer programs including Excel and
    StatCrunch as well as graphing calculators
    provide the slope and
  • y-intercept of the least squares line

55
(No Transcript)
56
(No Transcript)
57
Only selected items are relevant
Regression equation
Correlation coefficient
Coefficient of determination
Not relevant
Not relevant
Look for these in the data table
58
Notice, the dialog box that presents the summary
statistics has a Next button. Press next to get
the scatterplot
Look back at the data table and notice
the predicted (fitted) and residual values
are now included
Write a Comment
User Comments (0)
About PowerShow.com