Title: Regression
1Chapter 5
2Objectives of Regression
- To describe the change in Y per unit X
- To predict the average level of Y at a given
level of X
3Returning Birds Example
- Plot data first to see if relation can be
described by straight line (important!) - Illustrative data from Exercise 4.4
- Y adult birds joining colony
- X percent of birds returning, prior year
4If data can be described by straight line
- describe relationship with equation
- Y (intercept) (slope)(X)
- May also be written
- Y (slope)(X) (intercept)
Intercept ? where line crosses Y axis Slope ?
angle of line
5Linear Regression
- Algebraic line ? every point falls on
lineexact y intercept (slope)(X) - Statistical line ? scatter cloud suggests a
linear trend - predicted y intercept (slope)(X)
6Regression Equation
- y a bx, where
- y (y-hat) is the predicted value of Y
- a is the intercept
- b is the slope
- x is a value for X
- Determine a b for best fitting line
The TI calculators reverse a b!
7What Line Fits Best?
- If we try to draw the line by eye, different
people will draw different lines - We need a method to draw the best line
- This method is called least squares
8The least squares regression line
- Each point has
- Residual observed y predicted y
- distance of point from prediction line
The least squares line minimizes the sum of the
square residuals
9Calculating Least Squares Regression Coefficients
- Formula (next slide)
- Technology
- TI-30XIIS
- Two variable Applet
- Other
10Formulas
- b slope coefficient
- a intercept coefficient
where sx and sy are the standard deviations of
the two variables, and r is their correlation
11Technology Calculator
BEWARE! TI calculators label the slope and
intercept backwards!
12Regression Line
- For the bird data
- a 31.9343
- b ?0.3040
- The linear regression equation is
- y 31.9343 ? 0.3040x
The slope (-0.3040) represents the average change
in Y per unit X
13 Use of Regression for Prediction
- Suppose an individual colony has 60 returning (x
60). What is the predicted number of new birds
for this colony? - Answer
- y a bx 31.9343 ? (0.3040)(60) 13.69
- Interpretation the regression model predicts
13.69 new birds (y) for a colony with x 60.
14Prediction via Regression Line Number of new
birds and Percent returning
When X 60, the regression model predicts Y
13.69
15Case Study
Per Capita Gross Domestic Product and Average
Life Expectancy for Countries in Western Europe
16Regression CalculationCase Study
Country Per Capita GDP (x) Life Expectancy (y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
17Life Expectancy and GDP (Europe)
18Regression Calculationby Hand (Life Expectancy
Study)
Calculations
y 68.716 0.420x
19BPS/3e Two Variable Applet
20Applet Data Entry
21Applet Calculations
22Applet Scatterplot
23Applet least squares line
24InterpretationLife Expectancy Case Study
- Model y 68.716 (0.420)X
- Slope For each increase in GDP ?? 0.420 years
increase in life expectancy - Prediction example What is the life expectancy
in a country with a GDP of 20.0?ANSWER - y 68.716 (0.420)(20.0) 77.12
25Coefficient of Determination (R2)(Fact 4 on p.
111)
- Coefficient of determination, (R2)
- Quantifies the fraction of the Y mathematically
explained by X - Examples
- r1 R21 regression line explains all (100)
of the variation in Y - r.7 R2.49 regression line explains almost
half (49) of the variation in Y
26We are NOT going to cover the analysis of
residual plots (pp. 113-116)
27Outliers and Influential Points
- An outlier is an observation that lies far from
the regression line - Outliers in the y direction have large residuals
- Outliers in the x direction are influential
- removal of influential point would markedly
change the regression and correlation values
28OutliersCase Study
Gesell Adaptive Score and Age at First Word
r2 11
r2 41
29CautionsAbout Correlation and Regression
- Describe only linear relationships
- Are influenced by outliers
- Cannot be used to predict beyond the range of X
(do not extrapolate) - Beware of lurking variables (variables other than
X and Y) - Association does not always equal causation!
30Do not extrapolate (Sarahs height)
- Sarahs height is plotted against her age
- Can you predict her height at age 42 months?
- Can you predict her height at age 30 years (360
months)?
31Do not extrapolate (Sarahs height)
- Regression equation y 71.95 .383(X)
- At age 42 months y 71.95 .383(42) 88
- (Reasonable)
- At age 360 months
- y 71.95 .383(360) 209.8
- (Thats over 17 feet tall!)
32Caution Correlation does not always mean
causation
- Even very strong correlations may not correspond
to a causal relationship between x and y - (Beware of the lurking variable!)
33CautionCorrelation Does Not Imply Causation
Social Relationships and Health
House, J., Landis, K., and Umberson, D. Social
Relationships and Health, Science, Vol. 241
(1988), pp 540-545.
- Strong correlation between lack of social
relationships and illness - Does lack of social relationships cause people to
become ill? - Maybe(?)
- but perhaps unhealthy people are less likely to
establish and maintain social relationships
(reversed relationship) - Or, some other factor (lurking variable)
predisposes people both to have lower social
activity and become ill?
34Criteria for causation (skip)
- Do not rely on statistical evidence alone for
causal inference - Here are some criteria to consider when trying to
determine causality - Strong relationships more likely to be causal
- Properly executed experiments needed (chapter 8)
- Replication under varying conditions needed
- Dose-response relationship found
- Cause precedes effect in time
- Plausible biological explanations needed