Title: Regression
1Regression
1. Simplest case Least Squares Regression
Fit a line to data, to model that data
2. Test hypotheses about the the parameters of
the fitted model
3. Understand assumptions of the model
4. Describe diagnostic tests to evaluate the fit
of the data to the model
5. Explain how to use the model make predictions
- ESM 206A
- 25 February 2009
2Regression
6. Learn other models Logistic regression,
probit regression, multiple regression,
non-linear regression, robust regression,
quantile regression
- 7. Model selection how to choose an appropriate
subset of predictor - variables
8. How to compare the relative fit of different
models to the same data set
- ESM 206A
- 25 February 2009
3Basic idea
4Linear regression
of regression
- State a hypothesis about cause and effect
- the value of the X variable causes either
directly or indirectly - the value of the Y variable
- Some cases the cause and effect is straight
forward - the area of rocky reef influences the
number of lobsters, but - the number of lobsters do not influence the
area of rocky reef
- Other cases not so straight forward
- Do predators control the abundance of the
prey, or do the - number of prey control the number of
predators? -
5Linear regression
- Once a decision is made about the direction of
the cause and effect, - the next step is to describe the relationship
as a mathematical - function
- Y f(x)
- We apply the function f to each value of variable
X (the input) to - generate the corresponding value of Y (the
output)
- Many interesting and complex functions that can
describe the - relationship between 2 variables, but the
simplest one is that - Y is a linear function of X
- Y ?0 ?1X
- This equation describes the graph of a.?
6Most basic form
16
14
12
10
Number of lobster per trap
8
Y ?0 ?1X
6
4
2
0
0
200
100
300
400
500
600
7Y ?0 ?1X
- Has 2 parameters ?0 and ?1 which are the.?
?0 the predicted value from the equation when X
.?
8Y ?0 ?1X
9Y ?0 ?1X
- ?1, the slope, measures the change in the Y
variable for each unit - change in the X variable
- The slope therefore is a rate measured in units
of ?Y/?X)
10- Nothing says that nature has to obey a linear
equation
- Many economic, ecological, and social
relationships are inherently - non-linear
- Linear model is the simplest starting place for
fitting functions to data
- Even complex, non-linear functions may be
approximately linear over - a limited range of the X variable. If we
restrict our conclusions to that range - of X, a linear model may be a valid
approximation of the function.
11- Interpolation within the limits of our data may
be acceptably accurate, - even though the linear model (green line) does
not describe the true - functional relationship between Y and X (the
back curve)
- Extrapolation will be become increasingly
inaccurate as the forecasts - move farther away from range of collected data
- A very important assumption is that the
relationship between - X and Y (or transformations of these variables)
is linear.
1216
14
12
10
Number of lobster per trap
8
6
4
2
0
0
200
100
300
400
500
600
13Fitting data to a linear model
- The data for a regression analysis consists of a
series of paired observations
- Each observation includes an X value (Xi) and a
corresponding Y value (Yi) - that both have been measured for the same
replicate.
14Fitting data to a linear model
- But most data sets exhibit more variation than
this - a single variable rarely - will account for most of the variation in the
data -the data points will fall - within a fuzzy band rather than a sharp line.
- the bigger the ?2, the more the noise, or error,
there will be around the - regression line
15Adding some data to the story
Species-area relationship
relationship between the number of species and
the area of an island (or a sample)
See Data Data_6_Galapagos.xls
- Number of species seems to follow a power
relationship - Island areas range over 3 orders of magnitude (1
- 7500 km2) - -Species richness spans two orders of magnitude
(7-325)
- So data follow a power function S cAz
16Adding some data to the story
Species-area relationship
17Transforming data
S cAz
log (S) log(cAz)
log (S) log(c) zlog(Az)
S c zA
So- we plot logarithims of the data.
18Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
But how do we define the best fit for the line?
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
19Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
20Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
21Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
22Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
For any Yi, could pass regression line Through
the point, so that di 0
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
23Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)
24Adding some data to the story
Species-area relationship
2.5
2.0
log10(Number of species)
1.5
1
0.5
0
1.0
2.0
3.0
4.0
-1.0
log10(Island area)