Correlation and Regression Wisdom - PowerPoint PPT Presentation

About This Presentation
Title:

Correlation and Regression Wisdom

Description:

Lesson 3 - 3 Correlation and Regression Wisdom Knowledge Objectives Recall the three limitations on the use of correlation and regression. Explain what is meant by an ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 14
Provided by: ChrisH110
Category:

less

Transcript and Presenter's Notes

Title: Correlation and Regression Wisdom


1
Lesson 3 - 3
  • Correlation and Regression Wisdom

2
Knowledge Objectives
  • Recall the three limitations on the use of
    correlation and regression.
  • Explain what is meant by an outlier in bivariate
    data.
  • Explain what is meant by an influential
    observation and how it relates to regression.
  • Define a lurking variable.
  • Give an example of what it means to say
    association does not imply causation.

3
Construction Objectives
  • Given a scatterplot in a regression setting,
    identify outliers and influential observations
  • Explain how correlations based on averages differ
    from correlations based on individuals

4
Vocabulary
  • Influential Observation an observation that if
    removed would markedly change the result of the
    regression calculation

5
Limitations
  • Correlation and regression describe only linear
    relationships
  • Extrapolation (using model outside range of the
    data) often produces unreliable predications
  • Correlation is not resistant (to outliers!)

6
Outliers vs Influential Observation
  • Outlier is an observation that lies outside the
    overall pattern of the other observations
  • Outliers in the Y direction will have large
    residuals. but may not influence the slope of the
    regression line
  • Outliers in the X direction are often influential
    observations
  • Influential observation is one that if by
    removing it, it would markedly change the result
    of the regression calculation

7
Example 1
  • Does the age at which a child begins to talk
    predict later score on a test of metal ability?
    A study of the development of 21 children
    recorded the age in months at which they spoke
    their first word and their later Gesell Adaptive
    Score (GAS).

Child Age GAS Child Age GAS Child Age GAS
1 15 95 8 11 100 15 11 102
2 26 71 9 8 104 16 10 100
3 10 83 10 20 94 17 12 105
4 9 91 11 7 113 18 42 57
5 15 102 12 9 96 19 17 121
6 20 87 13 10 83 20 11 86
7 18 93 14 11 84 21 10 100
8
Example 1 cont
  1. What is the equation of the LS regression line
    used to model this data?
  2. What is the interpretation of this data?

y-hat 109.8738 1.127x r -0.64
The scatter plot and the slope of the regression
line indicates a negative association. Children
who begin to speak later tend to have lower test
scores than early talkers. The slope suggests
that for every month older a child is when they
begin to speak, their score on the Gesell test
will decrease by about 1.13 points. The
y-intercept has no real meaning in this case.
9
Example 1 cont
  1. Are there any outliers?
  2. Are there any influential observations?

Child 19 is an outlier in the Y-direction and
child 18 is an outlier in the X-direction.
Child 18 is an outlier in the X-direction and
also an influential observation because it has a
strong influence on the positioning of the
regression line.
10
Example 1 cont
Scatterplot w/ Regression Line
Residual Plot
11
Lurking or Extraneous Variable
  • The relationship between two variables can often
    be misunderstood unless you take other variables
    into account
  • Association does not imply causation!
  • Instances of Rocky Mt spotted fever and drownings
    reported per month are highly correlated, but
    completely without causation

12
Remember Sampling Distributions
  • When we looked at individual values, they had
    much broader spreads (variances) than when we
    looked at the distributions of x-bar
  • Same is true with correlations based on averaged
    data strong correlations may exist between
    averages, but individuals will have much greater
    variances
  • Correlations based on averages are usually too
    high when applied to individuals.

13
Summary and Homework
  • Summary
  • Correlation and regression must be interpreted
    with caution
  • Plot data to be sure that the relationship is
    roughly linear and to detect outliers
  • Check for influential observations that
    substantially change the regression line
  • Lurking variables may explain the relationship
    between the explanatory and response variables
  • Homework
  • pg 242-3 3.63-67
Write a Comment
User Comments (0)
About PowerShow.com