Regression - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Regression

Description:

Correlation can be seen as the spread of points around the regression line. ... a false assumption, otherwise by January 1st 1980, she would be a giantess. ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 28
Provided by: lisamc9
Learn more at: http://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Chapter 5
  • Regression

2
Chapter outline
  • The least-squares regression line
  • Facts about least-squares regression
  • Residuals
  • Influential observations
  • Cautions about correlation and regression
  • Association does not imply causation

3
Correlation and Regression
  • Regression effects are depicted by the slope of
    the line.
  • Correlation can be seen as the spread of points
    around the regression line. The greater the
    amount of spread of points around the regression
    line, the less predictive is X of Y and
    consequently, the weaker the correlation.

4
Correlation r 1
5
(No Transcript)
6
Imperfect Correlation and Relationships
  • We rarely see perfect correlation
  • While Correlation is never perfect, we can draw a
    line to summarize the trend in the data points.
    This is the Regression Line

7
Regression Line
  • Regression Line A straight line that describes
    how a response variable y changes as an
    explanatory variable x changes.
  • It can sometimes be used to predict the value of
    y for a given value of x.

8
Making Predictions
9
Where do we Draw the Line?
10
Minimize the sum of the distances between the
points and the line
-.25
2
2
-3.5
-.25
Square the Distances
11
(No Transcript)
12
The best fitting line would minimize the sum of
the squared distance of every point in the
scatterplot from the regression line
Minimize ? This line -- the best-fitting
line -- is that line which -- compared to any
other line you could plot through the points --
produced the smallest sum of squared deviations.

13
(No Transcript)
14
  • The slope b is the change in y when x increases
    by 1.
  • The intercept a is the predicted value of y when
    x 0.

15
Finding the equation of the regression line
  • Exercise 5.16 (Page 125)

16
Facts about least-squares regression line
  • Fact 1It is a mathematical model for the data.
  • Fact 2 The distinction between explanatory and
    response variables is essential in regression.
  • Fact 3 There is a close connection between
    correlation and the slope of least squares line.
  • Fact 4 The least-squares regression line always
    passes through the point , where is the
    mean of the x values, and is the mean of the y
    values.
  • Fact 5 The correlation r describes the strength
    of a straight-line relationship. In the
    regression setting, this description takes a
    specific form the square of the correlation, r2,
    is the fraction of the variation in the value of
    y that is explained by the least squares
    regression of y on x.

17
(No Transcript)
18
Residual plots
  • A residual plot is a scatterplot of the
    regression residuals against the explanatory
    variable. Residual plots help us assess the fit
    of a regression line.
  • A residual is the difference between an observed
    of the response variable and the value predicted
    by the regression line. That is,
  • Residual observed y predicted y

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Outliers and Influential Observations
  • An outlier is an observation that lies outside
    the overall pattern of the other observations
  • An observation is influential for a statistical
    calculation if removing it would markedly change
    the result of the calculation.
  • Points that are outliers in the x direction of a
    scatterplot are often influential for the
    least-squares regression line. Influential
    observations can also be described as outliers.

23
Outliers and Influential Observations
24
(No Transcript)
25
Beware extrapolation
  • Extrapolation is the use of a regression line for
    prediction far outside the range of values of the
    explanatory variable x that you used to obtain
    the line. Such predictions are often not
    accurate.
  • Example
  • Suppose Angela was 1.20m tall on January 1st
    1975, and 1.40m tall on January 1st 1976. By
    extrapolation, estimate her height on January 1st
    1977.
  • By extrapolation, it could be estimated that by
    January 1st 1977 she would have grown another
    0.20m to be 1.60m tall. This however assumes that
    she continued to grow at the same rate. This must
    eventually become a false assumption, otherwise
    by January 1st 1980, she would be a giantess.

26
Lurking variable
  • A lurking variable is a variable that has an
    important effect on the relationship among the
    variable in a study but is not included among the
    variables studied.
  • Example Studies of relationship between
    treatment of heart disease and the patients
    gender show that women are in general treated
    less aggressively than men with similar symptoms.
    Women are less likely to undergo bypass
    operation.
  • Question Might this be discrimination? Answer
    No. Be aware of the lurking variable Although
    half of heart disease victim are women, they are
    on the average much older than male victim.

27
Association does not imply causation
Example Sales of rum and number of Methodist
ministers is positively correlated, but a large
number of ministers does not encourage rum
drinking. Is there a lurking variable that infl
uences both rum sales and Methodist ministers?
The the previous example, both the sales of rum
and the number of Methodists ministers were
correlated with the number of people in the U.S.
As the number of people increases, it causes an
increase in demand for both Methodist ministers
and for rum.
Write a Comment
User Comments (0)
About PowerShow.com