Sections 2'3 and 2'4 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Sections 2'3 and 2'4

Description:

a b is the value of y that goes with x=1. a 2b is the value of y that goes with x =2. ... Lurking Variables. Association Does Not Imply Causation. Example. ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 21
Provided by: math53
Category:
Tags: lurking | sections

less

Transcript and Presenter's Notes

Title: Sections 2'3 and 2'4


1
Sections 2.3 and 2.4
  • Regression Lines

2
Review of Lines
  • The equation yabx describes a line.
  • a is the y-intercept or the value of y that goes
    with x0.
  • b is the slope of the line.
  • ab is the value of y that goes with x1.
  • a2b is the value of y that goes with x 2.
  • When the value of x goes up by one, the value of
    y goes up by b.
  • Plot a line, by first plotting two points.

3
Used Honda Civics
4
Can mileage be used to predict price?
  • Sketch line.
  • Estimate slope
  • Estimate y-intercept.
  • Give equation of line.
  • Estimate price of car with 100,000 miles.

5
Regression Lines
  • Can we agree on the best line?
  • The least squares regression line is the line
    that best fits.
  • Goodness of fit is determined by calculating the
    sum of all squared vertical distances. The
    smaller this sum, the better the fit.
  • The regression line is given by yabx, where

6
Back to Civics
  • Use the formulas for finding least squares
    regression line. The descriptive statistics for
    our data are as follows
  • Describe what the slope and y-intercept mean in
    the vocabulary of the problem.
  • Does the y-intercept make sense in the context of
    the data?

7
Extrapolation
  • extrapolation using the regression line for
    prediction far outside the range of values of the
    explanatory variable used to obtain the line.
    Such predictions are often not accurate.
  • Usually in regression x0 is outside range of
    data so the y-intercept will not make sense.
  • Extrapolation would also occur if we tried to
    predict the price of a used Civic with 300,000
    miles.

8
How successful is the regression in explaining
the response?
  • r2 - the proportion of the variation in the
    response variable that is explained by the
    regression line.

Observed Prices Leaf Unit 100 1 7 3
1 8 2 9 9 2 10 (2) 11 99
2 12 9 1 13 9 Mean 11379 S2347
Predicted Prices Leaf Unit 100 1 7
2 1 8 1 9 2 10 1 2 11
(2) 12 25 2 13 00 Mean 11379 S2295
9
Residuals
  • The residual is the vertical distance from the
    line (positive if above line, negative if below
    line.)
  • Find the residual for the car with 43903 miles.

10
How well does line fit data?
  • Look at a residual plot.
  • residual observed y predicted
  • plot explanatory variable on x-axis residuals on
    y- axis.
  • If the regression captured the overall pattern,
    then there should be no pattern remaining in
    residual plot.
  • How well does the least squares regression line
    fit the relationship between mileage and price?

11
Outliers
  • An outlier is a data point that falls outside the
    overall pattern. They are most obvious from
    looking at the residual plot.
  •  
  • Are there any outliers in the previous example?

12
Influential Observations
  • An observation is influential if removing it
    markedly changes the calculated regression line.
  • Example on Web
  • Points that are separated from the data in the x
    direction of a scatterplot tend to be
    influential. (These points are may not be
    outliers as defined on previous slide.)
  • In the previous example, which point may be
    influential?

13
Influential Observations, Continued
  • The equations with and without this point
    are PRICE 15589.6 - 0.0697811 MILEAGE PRICE
    15990.5 - 0.0786652 MILEAGE
  • How far apart are these?
  • We would predict the price of a car with 30,000
    miles to be 13496 vs 13631
  • We would predict the price of a car with 70,000
    miles to be 10705 vs 10484
  • Sketch Line

14
Summary
  • Get equation of regression line.
  • If data is given, use calculator or Minitab.
  • If summary statistics are given use formula.
  • Interpret slope and y-intercept in context of
    problem.
  • r2 gives proportion of variation in response
    variable that is explained by the explanatory
    variable via the line
  • Residual plots are used to check for outliers and
    pattern that is not picked up by line.
  • Outliers vs Influential data points.

15
Year as an Explanatory Variable
16
Residual Plot
17
Fitting Quadratic
18
Example. SAT Math vs SAT Verbal
  • States average SAT math scores against states
    average SAT verbal scores.
  • verbal 34.2 0.940 math
  • r 0.970 (r2 0.941).

19
Example. SAT Math vs SAT Verbal
  • Describe what the slope and y-intercept mean in
    the vocabulary of the problem.
  • How well does the line fit the data?
  • How successful is the regression in explaining
    response?
  • How well would this line do to predict individual
    verbal scores ?

20
Words of Caution
  • Extrapolation
  • Using Averaged Data
  • Lurking Variables
  • Association Does Not Imply Causation
  • Example. Among elementary aged children there is
    a strong correlation between shoe size and
    reading aptitude. What explains this correlation?
Write a Comment
User Comments (0)
About PowerShow.com