Chapters 8 - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Chapters 8

Description:

I visited cnnsi's website and checked out some of Kobe Bryant's personal scoring numbers. ... a one shot increase from Kobe Bryant, on average we would expect ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 61
Provided by: amyfro
Category:
Tags: bryant | chapters | kobe

less

Transcript and Presenter's Notes

Title: Chapters 8


1
Chapters 8 9
  • Linear Regression Regression Wisdom

2
Price of Homes Bases on Size (in Square Feet)
Sold in Ames between Sep. 2004 and Oct. 2005
r 0.8718945
3
Statistical Modeling
  • Statistical Model An equation that fits the
    pattern between a response variable and possible
    explanatory variables, accounting for deviations
    from the model. (Simplest case one quantitative
    response variable and one quantitative
    explanatory variable.)
  • Response Variable (Y) The quantitative outcome
    of a study.
  • Explanatory Variable (X) A quantitative variable
    that may explain or predict the response variable
  • What is the beset model for Predicting weight
    (Y) from height (X)?
  • What is the best model for Predicting blood
    pressure (Y) from age (X)?

4
Correlation and the Line
Price of Homes Based on Square Feet Price
-90.2458 0.1598SQFT r 0.8718945
5
Regression line
  • Explains how the response variable (y) changes in
    relation to the explanatory variable (x)
  • Use the line to predict value of y for a given
    value of x

6
Regression line
  • Need a mathematical formula
  • We want to predict y from x
  • The predicted values are called .
  • The observed values are called y.

7
Which Line is Best?
  • What are some ways we can determine which model
    out of all the possible models is the best one?
  • What are some ways that we can numerically rank
    the different models. (i.e. the different lines)

8
Which Model is Best?
Price -90.2458 0.1598SQFT (red) Price
-300 0.3SQFT (blue) Price 0 0.1SQFT
(green)
9
Regression line
  • Putting a hat on it means we have predicted
    something from the model
  • Look at vertical distance
  • Amount of error in the regression line
  • The goal is to find the line so that these errors
    are minimized.

10
Least squares regression
  • Most commonly used regression line
  • Makes the sum of the squared errors as small as
    possible
  • Based on the statistics

11
Regression line equation
  • where

12
Regression line equation
  • b1 slope of line. For every unit increase in
    x, y changes by the amount of the slope.
  • Interpreting b1 (slope)
  • For every one unit increase in the explanatory
    variable, there will be, on average, a b1 unit(s)
    increase/decrease in the response variable.
  • For example For every one square foot increase
    in size, on average, there will be a 159.80
    increase in home price.
  • MEMORIZE THIS!!!!!

13
Regression line equation
  • b0 y-intercept of line. The value of y when x
    0.
  • Interpreting b0 (y-intercept)
  • When the explanatory variable 0, on average,
    the value of the response variable b0.
  • For example When the sq. ft. of a home is 0,
    the price of the home will be -90,245.80 on
    average.
  • MEMORIZE THIS!!!!!
  • BE CAREFUL. The interpretation of the intercept
    does not always make sense. When interpreting,
    be sure to mention if the interpretation does not
    make sense.

14
Example Kobes Shooting
  • I visited cnnsis website and checked out some of
    Kobe Bryants personal scoring numbers. I looked
    at the number of times he shot the ball and his
    point total for each game so far this year.
  • Lets come up with the regression equation for
    this data.

15
Kobes Shooting
r 0.7293762 Form Linear Strength Moderate to
Strong Direction Positive
16
Calculating the regression line
  • Remember that
  • Our explanatory variable(x) is the number of
    shots
  • Our response variable(y) is the number of points
  • So the five numbers needed are

17
Calculating the Regression Line
  • Find the Slope
  • Find the Intercept

18
Calculating the regression line.
  • Dont forget to write the equation.
  • DONT FORGET TO WRITE THE EQUATION IN THE CONTEXT
    OF THE PROBLEM.

19
Interpretation
  • How would we interpret b1?
  • For a one shot increase from Kobe Bryant, on
    average we would expect him to score 1.19 more
    points.
  • How would we interpret b0?
  • If Kobe Bryant did not take one shot then on
    average we would expect him to score 3.436 points

20
Prediction
  • Use the regression equation to predict y from x.
  • Ex. What is the predicted number of points when
    Kobe shoots 30 times in a game?
  • Ex. What is the predicted number of points when
    Kobe shoots 55 times in a game?

21
Plotting the regression line
  • Find two points on the line
  • Ex. x 30, y 39 and x 55, y 69
  • If you are plotting by hand it is ok to round
    values
  • Plot these two points on the graph
  • Connect the points
  • This is the regression line

22
Plotting the Regression Line
23
Properties of regression line
  • r is related to the value of b1
  • r has the same sign as b1
  • One standard deviation change in x corresponds to
    r times one standard deviation change in y
  • The regression line always goes through the point

24
Properties of regression line
  • r2
  • Percent of variation in y that is explained by
    the least squares regression of y on x
  • The higher the value of r2, the more the
    regression line explains the changes that occur
    in the y variable
  • The higher the values of r2, the better the
    regression line fits the data

25
Properties of regression line
  • r2
  • 0 ? r2 ? 1 since -1 ? r ? 1
  • Interpretation of r2
  • r2 is the percent of variation in the response
    variable that can be explained by the least
    squares regression of the response variable on
    the explanatory variable.
  • For Kobes example 53.1 of the variability in
    the number of points Kobe Bryant scores in a game
    can be explained by the LS regression of points
    per game on number of shots per game (g).
  • MEMORIZE THIS!!!!

26
Residuals
  • Amount of variation in y not taken into account
    by regression line
  • Formula
  • There is a residual for each data point
  • Mean of the residuals is zero

27
Calculating Residuals Kobe
  • Find the residual for the point (46,81)
  • First find the predicted number of calories for a
    sandwich with a serving weight of 182 g
  • Now find residual

28
Calculating Residuals Kobe
  • Find the residual for the point (26,35)

29
Residual Plots
  • Scatterplot of Residuals
  • Explanatory variable on horizontal axis
  • Residuals on vertical axis
  • Horizontal line at residual 0

30
Residual Plots
31
Interpreting Residual Plots
  • Is there a curved pattern?
  • This could mean a non-linear relationship
  • Is there increasing spread about the line as x
    increases?
  • This could mean non-constant variance
  • Is there decreasing spread about the line as x
    increases?
  • This could mean non-constant variance

32
Interpreting Residual Plots
  • Points with large residuals
  • These are probably outliers in the y direction
  • These will pull the regression line in the
    direction of the outlier (up or down)
  • Extreme points in the x direction
  • These are called influential points
  • They do not always show up in residuals because
    the residual could be small
  • Removing the outlier could markedly change the
    regression line

33
Reading JMP Data
  • Bivariate Fit of BAC by of Beers

34
Reading JMP Data
  • Linear Fit
  • BAC -0.011654 0.0180112 of Beers

This is the regression line for the data. Slope
is 0.0180112. y-Intercept is -0.011654. The
response variable is the BAC. The explanatory
variable is the of Beers.
35
Reading JMP Data
  • Summary of Fit
  • RSquare 0.803536
  • RSquare Adj 0.788424
  • Root Mean Square Error 0.020920
  • Mean of Response 0.076000
  • Observations (or Sum Wgts) 15

This gives some summary of the data. RSquare r2
(r)2 (correlation)2 Root Mean Square Error
s Mean of response Observations n
36
Reading JMP Data
  • Analysis of Variance
  • Source DF Sum of Squares Mean Square
    F Ratio
  • Model 1 0.02327041 0.023270 53.1700
    Error 13 0.00568959 0.000438 Prob F C.
    Total 14 0.02896000

This is called the ANOVA Table. This is another
way to analyze the data. We arent going to
discuss this in this class.
37
Reading JMP Data
  • Parameter Estimates
  • Term Estimate Std Error
    t Ratio Probt
  • Intercept -0.011654 0.013179
    -0.88 0.3926 beers 0.0180112 0.002470
    7.29

This tells you what the y-intercept and slope
are. It also gives the standard error for each
of the estimates. If you were to form confidence
intervals for the parameter estimates, you would
need these values. We wont discuss that in this
class.
38
Reading JMP Data
Here is your residual plot. Check it to see if
there are any problems with linearity of the data
and constant variance.
39
Example
40
Example
  • Age at first word vs. Gesell score.
  • Scatterplot Weak negative linear relationship
    between two variables. Possible outliers at
    (42,57) and (17,121).
  • Regression r -0.64, r2 40.96.

41
Example
42
Example
  • Age at first word vs. Gesell score.
  • Prediction
  • When x17
  • When x42
  • Residuals
  • point (17,121)
  • point (42,57)

43
Example
44
Example
  • Residual Plot
  • Outliers at x17 and x42
  • Small residual for x42
  • Could be influential
  • Remove (42,57) from data.
  • Regression line changes markedly.
  • r -0.33, r2 10.89.

45
Example
46
Outliers--What should you do?
  • Make sure data points have been recorded
    correctly
  • Collect more data
  • Remove the outlier
  • Examine collection techniques
  • Examine outside influences

47
Cautions about regression
  • Linear relationship only
  • Not resistant
  • Using averaged data
  • Makes relationship appear stronger
  • Taking average removes variation
  • Extrapolation
  • Predicting y when x value is outside the original
    data

48
Cautions about Regression
  • Extrapolation
  • Remember the data about home prices vs. the
    amount of sq. footage in the home.
  • The regression line we found based on data
    collected from homes with 900 to 3,000 sq. ft. is
  • This would mean that if my home has no square
    footage, then I pay -75,470.
  • If you must extrapolate, at least dont expect
    that your prediction will come true.

49
Cautions about regression
  • ASSOCIATION IS NOT CAUSATION!
  • Strong association between explanatory and
    response variables does not mean that the
    explanatory variable causes the response
    variable.

50
Proving Causation
  • Experiment
  • Change the values of x and control for lurking
    variables.
  • Not all problems can be solved by experiment
  • Smoking causes lung cancer.
  • Living near power lines causes leukemia.

51
Proving Causation
  • Lurking variable
  • Important effect on variables, but not included
    in study.
  • Example
  • Do taller people make more money? What do you
    think a lurking variable might be?

52
Proving Causation
  • Proving smoking causes lung cancer
  • Association is strong
  • Association is consistent
  • High doses are associated with stronger response
  • Cause precedes the effect in time
  • Cause is plausible

53
Review
Number of Calories By Sugar Content (g) for 13
Cereals
Lets calculate the formula for this regression
line
54
Review
  • Lets review all the formulas we need

55
Review
  • Here are all the numbers you need

56
Review
  • First, calculate sx and sy

57
Review
  • Second, calculate r
  • Third, calculate b1

58
Review
  • Fourth, calculate and
  • Fifth, calculate a (were almost done!!)

59
Review
  • Last, but definitely the most important, WRITE
    DOWN THE EQUATION IN THE CONTEXT OF THE PROBLEM

60
Review
  • Interpret b1
  • For every one gram increase in sugar, the number
    of calories will increase by 3.36.
  • Interpret r2
  • About 55 of the variability in the number of
    calories in cereal can be explained by the LS
    regression of calories on sugar content.
Write a Comment
User Comments (0)
About PowerShow.com