ScatterPlots, Correlation, Regression - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

ScatterPlots, Correlation, Regression

Description:

Whenever any reasonable' attempt can be made of using one variable ... note that r and b are not the same thing, but their signs will agree ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 36
Provided by: jinb
Category:

less

Transcript and Presenter's Notes

Title: ScatterPlots, Correlation, Regression


1
ScatterPlots, Correlation, Regression
2
(No Transcript)
3
  • Annual wine consumption (liters of alcohol from
    drinking wine/person) and deaths/100,000

4
More Examples
  • Whenever any reasonable attempt can be made of
    using one variable (characteristic) to explain
    another variable (characteristic) of the
    subjects, the first variable is called
    explanatory and the latter is called response
    variable.
  • However this does NOT imply the explanatory
    variable is the actual cause of the response
    (variable) even though they may have a strong
    relationship.
  • IQ scores and school grades (strong relationship,
    not cause).
  • Family income level and education level (either
    could be explanatory, the other is response
    variable).
  • Someones height and grades (not reasonable?).
  • Husbands age and wifes age (strong
    relationship, not cause).

5
More Examples
  • In the study of the effect of alcohol on the
    change of body temperature.
  • Intake of carbohydrate and weight.
  • Overweight and mortality rate.
  • study on Antibiotics and breast cancer.
  • Heights of parents and heights of children.
  • Explanatory or response variables dont have to
    be quantitative. But often they are quantitative.

6
More Examples
  • Observed meditation practice and age-related
    enzyme level (general concern for ones well
    being may also be affecting the response (and the
    decision to try meditation)Noetic Sciences
    Review, Summer 1993, page 28
  • Lack of social relationship and frequency of
    illness (other factor that predisposes people
    both to have lower social activity and become
    ill?) Science, Vol. 241 (1988), pp 540-545

7
ScatterPlots
  • If we have two quantitative variables
    (characteristics) of subjects, we plot one
    variable against the other. The result is called
    ScatterPlot (graph representation of the two
    quantitative variables).
  • If one of the two quantitative variables is
    explanatory, the other is response variable, the
    explanatory variable is on horizontal axis, the
    response variable on the vertical axis.

8
How to draw a scatterplot an example
Miles/gal
A cars speed mile/hr and Miles/gal at different
times
Speed
Horizontal Explanatory Vertical Response
9
One of the reasons we care about overall pattern
is that we want to make predictions! (Example
Previous slide)
10
Page 85
11
Page 95
12
If the overall pattern of the scatterplot
approaches a straight line, the two variables are
linearly associated otherwise they are not
linearly associated. The strength of the
association, whether linear or non-linear, can be
roughly described and compared. Linear is the
simplest.
A relationship is strong if the points lie close
to the overall pattern curve, and weak if they
are widely scattered about the overall pattern
curve
13
More Examples of Relationships
14
(No Transcript)
15
Introducing categorical variable in Scatterplots
South Rising Page 82
16
Linear AssociationCorrelation
  • Correlation is a number that measures the
    direction and strength of linear association.
    Notation r.
  • The number r is always between 1 and 1.
  • If r is negative, the two quantitative variables
    are negatively associated otherwise, positively
    associated.
  • The association is strong if r is close to 1 or
    1 (or r is close to 1) Weak if r is close to
    0.
  • Only for linear association. In nonlinear
    situation, even if there is a strong association,
    the correlation r can be 0. Studying linear
    association simply because its simple. r doesnt
    depend on the units used.

17
Correlation Calculation
  • Suppose we have data on variables X and Y for n
    individuals
  • x1, x2, , xn and y1, y2, , yn
  • Each variable has a mean and std dev

18
Correlation 0 Nonlinear Relationship
Miles/gallon Versus Speed
  • Linear relationship?
  • Correlation is close to zero.
  • Strong association!

19
Correlation 0 Nonlinear Relationship
Miles/gallon Versus Speed
  • Curved (nonlinear) relationship.

20
A Simple Example Some students midterm grades
and their final grades
Its a linear relationship!!!
21
A Simple Example Some students midterm grades
and their final grades
Sum 85
22
(No Transcript)
23
Which Relation Is Highly Correlated?
  • Husbands versus wifes ages.
  • Husbands versus wifes heights.
  • Professional golfers putting success distance
    of putt in feet versus percent success.

24
Problems With Correlations
  • Outliers can inflate or deflate correlations (see
    next slide)
  • Groups combined inappropriately may mask
    relationships (a third variable)
  • groups may have different relationships when
    separated

25
Outliers and Correlation
A
B
For each scatterplot above, how does the outlier
affect the correlation?
A outlier decreases the correlation (weaken
the trend) B outlier increases the
correlation (strengthen the trend) Example 2004
Election Kerry-Edwards
26
Linear Regression and Prediction
  • If two quantitative variables are linearly
    associated (the overall pattern is close to a
    straight line), what line is the best line to
    describe the data? (want to do predictions!).
  • The best line is called the least-squares
    regression line. This line helps the prediction
    of the value of response variable (y) for a given
    value of explanatory variable (x).
  • The slope of the regression line and correlation
    r always have the same sign.

27
page 105
28
Page 106
29
Least-squares Regression Line
  • Regression equation y bx a

  • x is the value of the explanatory variable
  • y-hat is the average value of the response
    variable (predicted response for a value of x)
  • a is the intercept
  • b is the slope of the straight line
  • note that r and b are not the same thing, but
    their signs will agree

sx and sy are the standard deviations of the two
variables, and r is their correlation.
30
Prediction Via Regression Line Number of New
Birds and Percent Returning
  • The regression equation is y-hat
    31.9343 ? 0.3040x
  • y-hat is the average number of new birds for all
    colonies with percent x returning
  • For all colonies with 60 returning, we predict
    the average number of new birds to be 13.69
  • 31.9343 ? (0.3040)(60) 13.69 birds
  • Suppose we know that an individual colony has 60
    returning. What would we predict the number of
    new birds to be for just that colony?

31
Midterm Grades and Final Grades
The slope of the regression line is
The y-intercept is
The regression line is
The predicted final grade for Peter (if his
midterm is 70)?
32
Residuals
  • A residual is the difference between an observed
    value of the response variable and the value
    predicted by the regression line
  • residual y ? y


33
Case Study Residuals
Gesell Adaptive Score and Age at First Word
Draper, N. R. and John, J. A. Influential
observations and outliers in regression,
Technometrics, Vol. 23 (1981), pp. 21-26.
34
CautionsAbout Correlation and Regression
  • only describe linear relationships
  • are both affected by outliers
  • always plot the data before interpreting
  • beware of extrapolation
  • predicting outside of the range of x
  • beware of lurking variables
  • have important effect on the relationship among
    the variables in a study, but are not included in
    the study
  • association does not imply causation

35
Excel Instructions
  • Download/open excel file GDPnLifeE.xls
  • ScatterPlot only Select Data Block (B3C12) ?
    Insert ? Chart ? XY (Scatter) ? Choose Scatter
    subtype ? Next ? Series in Column ? Next ? Change
    Title etc.? Next and Finish ? Enhancements
  • Correlation only Enter correl(Xarray,Yarray)
    (correl(B3B12,C3C12))
  • Scatter and Regression Line Tools ? Data
    Analysis ? Regression ? Next ? Enter Y data
    (C3C12) ? Enter X data (B3B12) ? Enter output
    position ? Check Line Fit Plot (or residual
    plots) ? Ok ? Cursor on predicted value ? right
    click ? add Trendline (no trendline for residual
    plots) ? Enhancements
Write a Comment
User Comments (0)
About PowerShow.com