Title: Regression
1Regression
- Review of Simple linear regression
- Introduction to multiple linear
regression
ESM 206A 10 March 2009
23 major purposes of linear regression
- Describe the linear relationship between X and Y
- To determine how much of the variation
(uncertainty) in Y can be explained by the linear
relationship with X, and how much of this
variation remains unexplained - To predict new values of Y from new values of X
3X
Y
4Y
X
5Results from JMP
6Results from JMP
- H0 regression slopes equal zero
7Results from JMP
8Assumptions of simple linear regression
Model Yi ß0 ß1X ei
- Normality Population of Y-values and the error
terms (ei) are normally distributed for each
level of the predictor variable Xi - (Test for normality- and transform data)
9(No Transcript)
10Assumptions of simple linear regression
Model Yi ß0 ß1X ei
- Normality Population of Y-values and the error
terms (ei) are normally distributed for each
level of the predictor variable Xi - (Test for normality- and transform data)
- Homogeneity of variance Population of Y-values
and the error terms (ei) have the same variance
for each Xi -
s12 s22 s32 se2 (for i 1 to n)
(Test for homogeneity of variance with graph of
residuals vs. predicted values)
11Do you have outliers that are influencing your
results?
If so, should you remove them?
12Assumptions of simple linear regression
Model Yi ß0 ß1X ei
- Normality Population of Y-values and the error
terms (ei) are normally distributed for each
level of the predictor variable Xi - (Test for normality- e.g., using box plots and
transform data) - Homogeneity of variance Population of Y-values
and the error terms (ei) have the same variance
for each Xi -
s12 s22 s32 se2 for i 1 to n
(Test for homogeneity of variance with graph of
residuals vs. predicted values)
- Independence Population of Y-values and the
error terms (ei) are - independent of each other, i.e., the Y-values
for any Xi does not - influence the Y-values of any other Xi.
(Test independence with graph of residuals vs.
predicted values)
13(No Transcript)
14Multiple regression
- Linear model with multiple predictor variables
- When all the predictor variables are continuous
multiple regression model - When all categorical ?
-
15(No Transcript)
16Example
- Question which aspects of habitat and human
activity affect the biodiversity and abundance of
organisms? an important - aim of modern conservation biology
Lyon (1987)
- What characteristics of forest habitat were
related to the abundance of birds?
- 56 forest patches in southern Australia
17- Patch area (ha)
- No. of years since isolated by clearing (yrs)
- Distance from nearest patch (km)
- Distance to nearest larger patch (km)
- Index of cow grazing intensity (1-5)
- Mean altitude (m)
18Correlation matrix
Log10 dist
Log10 L dist
Log10 area
Grazing
Altitude
Years
Log10 dist
1.000
Log10 L dist
0.604
1.000
Log10 area
1.000
0.302
0.382
Grazing
1.000
-0.143
-0.034
-0.599
0.275
Altitude
1.000
-0.219
-0.274
-0.407
Years
1.000
-0.020
0.161
-0.278
0.636
-0.233
19Assumptions
- No outlier data points or influential values
- Response variable not skewed
- Some heterogeneity of spread of residual
20(No Transcript)
21Results
Model (bird abundance) ß0 ß1(log10 area)
ß2(log10 dist) ß3(log10 Ldist) ß4(grazing)
ß5(altitude) ß6(years) ei
- Additive model that does not have interactions
(multiplicative effects) between - predictor variables, although such interactions
are possible (even likely- - so hold on and see below)
22Results from JMP
- R2 0.685 , what does this mean?
- R2 adjusted 0.609
23Results from JMP
- H0 all partial regression slopes equal zero
24Next step
- Now fit a second model to investigate
interactions between predictor variables - A model with 6 predictor variables is unwieldy so
we simplify the model first by omitting those
predictors that contributed little to the
original model
25Log10 dist, Log10 Ldist, and altitude - Lose em!
26Model and Results
- Model (bird abundance) ß0 ß1(log10 area)
ß2(grazing) ß3(years) ß4(log10 area x
grazing) ß5(log10 area x years) - ß6(grazing x grazing) ß7(log10 area x
grazing x years) ei
27Interpreting interaction terms
- The log10 area x grazing term indicates how
much of the effect of grazing on bird density
depends on log10 area. - This interaction is significant, so lets look for
a effects of grazing on bird density for
different values of log10 area. - We choose mean log10 area (0.932) one standard
deviation (0.120, 1.744). Because three-way
interaction was not significant, we simply set
years since isolation to its mean (33.25). The
simple slopes of bird abundance against grazing
for different log10 area values and mean years
since isolation .
28Interpreting interactions terms
- Which means The negative effect of cow grazing
on bird density is - stronger in small fragments and there is no
relationship between - bird abundance and grazing in large fragments.
29Assumptions of multiple regression
- Same as simple regression plus
- Predictor values must be uncorrelated with each
other. Multi-collinearity is critical!! - Number of observations must exceed the number of
predictor variables
30Correlation matrix
Log10 dist
Log10 L dist
Log10 area
Grazing
Altitude
Years
Log10 dist
1.000
Log10 L dist
0.604
1.000
Log10 area
1.000
0.302
0.382
Grazing
1.000
-0.143
-0.034
-0.599
0.275
Altitude
1.000
-0.219
-0.274
-0.407
Years
1.000
-0.020
0.161
-0.278
0.636
-0.233
31(No Transcript)