Regression - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Regression

Description:

Social Relationships and Health. Strong correlation between lack of social relationships and illness. Does lack of social relationships cause people to become ill? ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 33
Provided by: jamesmaysm2
Learn more at: http://www.sjsu.edu
Category:
Tags: regression

less

Transcript and Presenter's Notes

Title: Regression


1
Chapter 5
  • Regression

2
Objectives of Regression
  • To describe the change in Y per unit X
  • To predict the average level of Y at a given
    level of X

3
Returning Birds Example
  • Plot data first to see if relation can be
    described by straight line (important!)
  • Illustrative data from Exercise 4.4
  • Y adult birds joining colony
  • X percent of birds returning, prior year

4
If data can be described by straight line
  • describe relationship with equation
  • Y (intercept) (slope)(X)
  • May also be written
  • Y (slope)(X) (intercept)

Intercept ? where line crosses Y axis Slope ?
angle of line
5
Linear Regression
  • Algebraic line ? every point falls on
    lineexact y intercept (slope)(X)
  • Statistical line ? scatter cloud suggests a
    linear trend
  • predicted y intercept (slope)(X)

6
Regression Equation
  • y a bx, where
  • y (y-hat) is the predicted value of Y
  • a is the intercept
  • b is the slope
  • x is a value for X
  • Determine a b for best fitting line

The TI calculators reverse a b!
7
What Line Fits Best?
  • If we try to draw the line by eye, different
    people will draw different lines
  • We need a method to draw the best line
  • This method is called least squares

8
The least squares regression line
  • Each point has
  • Residual observed y predicted y
  • distance of point from prediction line

The least squares line minimizes the sum of the
square residuals
9
Calculating Least Squares Regression Coefficients
  • Formula (next slide)
  • Technology
  • TI-30XIIS
  • Two variable Applet
  • Other

10
Formulas
  • b slope coefficient
  • a intercept coefficient

where sx and sy are the standard deviations of
the two variables, and r is their correlation
11
Technology Calculator
BEWARE! TI calculators label the slope and
intercept backwards!
12
Regression Line
  • For the bird data
  • a 31.9343
  • b ?0.3040
  • The linear regression equation is
  • y 31.9343 ? 0.3040x

The slope (-0.3040) represents the average change
in Y per unit X
13
Use of Regression for Prediction
  • Suppose an individual colony has 60 returning (x
    60). What is the predicted number of new birds
    for this colony?
  • Answer
  • y a bx 31.9343 ? (0.3040)(60) 13.69
  • Interpretation the regression model predicts
    13.69 new birds (y) for a colony with x 60.

14
Prediction via Regression Line Number of new
birds and Percent returning
When X 60, the regression model predicts Y
13.69
15
Case Study
Per Capita Gross Domestic Product and Average
Life Expectancy for Countries in Western Europe
16
Regression CalculationCase Study
Country Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
17
Life Expectancy and GDP (Europe)
18
Regression Calculationby Hand (Life Expectancy
Study)
Calculations
y 68.716 0.420x
19
BPS/3e Two Variable Applet
20
Applet Data Entry
21
Applet Calculations
22
Applet Scatterplot
23
Applet least squares line
24
InterpretationLife Expectancy Case Study
  • Model y 68.716 (0.420)X
  • Slope For each increase in GDP ?? 0.420 years
    increase in life expectancy
  • Prediction example What is the life expectancy
    in a country with a GDP of 20.0?ANSWER
  • y 68.716 (0.420)(20.0) 77.12

25
Coefficient of Determination (R2)(Fact 4 on p.
111)
  • Coefficient of determination, (R2)
  • Quantifies the fraction of the Y mathematically
    explained by X
  • Examples
  • r1 R21 regression line explains all (100)
    of the variation in Y
  • r.7 R2.49 regression line explains almost
    half (49) of the variation in Y

26
We are NOT going to cover the analysis of
residual plots (pp. 113-116)
27
Outliers and Influential Points
  • An outlier is an observation that lies far from
    the regression line
  • Outliers in the y direction have large residuals
  • Outliers in the x direction are influential
  • removal of influential point would markedly
    change the regression and correlation values

28
OutliersCase Study
Gesell Adaptive Score and Age at First Word
r2 11
r2 41
29
CautionsAbout Correlation and Regression
  • Describe only linear relationships
  • Are influenced by outliers
  • Cannot be used to predict beyond the range of X
    (do not extrapolate)
  • Beware of lurking variables (variables other than
    X and Y)
  • Association does not always equal causation!

30
Do not extrapolate (Sarahs height)
  • Sarahs height is plotted against her age
  • Can you predict her height at age 42 months?
  • Can you predict her height at age 30 years (360
    months)?

31
Do not extrapolate (Sarahs height)
  • Regression equation y 71.95 .383(X)
  • At age 42 months y 71.95 .383(42) 88
  • (Reasonable)
  • At age 360 months
  • y 71.95 .383(360) 209.8
  • (Thats over 17 feet tall!)

32
Caution Correlation does not always mean
causation
  • Even very strong correlations may not correspond
    to a causal relationship between x and y
  • (Beware of the lurking variable!)

33
CautionCorrelation Does Not Imply Causation
Social Relationships and Health
House, J., Landis, K., and Umberson, D. Social
Relationships and Health, Science, Vol. 241
(1988), pp 540-545.
  • Strong correlation between lack of social
    relationships and illness
  • Does lack of social relationships cause people to
    become ill?
  • Maybe(?)
  • but perhaps unhealthy people are less likely to
    establish and maintain social relationships
    (reversed relationship)
  • Or, some other factor (lurking variable)
    predisposes people both to have lower social
    activity and become ill?

34
Criteria for causation (skip)
  • Do not rely on statistical evidence alone for
    causal inference
  • Here are some criteria to consider when trying to
    determine causality
  • Strong relationships more likely to be causal
  • Properly executed experiments needed (chapter 8)
  • Replication under varying conditions needed
  • Dose-response relationship found
  • Cause precedes effect in time
  • Plausible biological explanations needed
Write a Comment
User Comments (0)
About PowerShow.com