Title: Lake Eutrophication and a Golf Course
1Lake Eutrophication and a Golf Course
- Chlorophyll-a (C) widely used indicator measure
of eutrophication - Nitrogen (N) associated with eutrophication
- Q Golf Course Development. Nitrogen expected to
?. By how much will C increase/decrease in the
local lake?
- Lets look at data from other lakes and fit a
linear relation between C and N - Slope of relationship will give us the expected
effect on C of a unit increase in N
2Ordinary Least Squares (OLS) Regression
- Estimators have many properties.
- 6 is an estimator, but not a very good one.
- Two main properties we care about
- Unbiased The expected distance of estimator from
thing it is estimating is 0. - Efficient Small variance (uncertainty)
- 6 is biased, but has a very small variance
(zero).
- Also called Classical Linear Regression Model
(CLRM) - Find the intercept and slope parameters such that
the sum of squared residuals is as small as
possible - OLS is an estimator for the parameters of the
model
Given certain assumptions are satisfied, OLS
estimator is unbiased and has minimum variance of
all unbiased estimators.
3Fraction of variance in Chlorophyll-a explained
by model
Standard deviation of residuals
P-value for model as a whole
1-sample t-tests
P-value for intercept
P-value for slope parameter
Parameter estimates
4Is something wrong here?
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Multiple regression
- Eutrophication may be affected by both nitrogen
and phosphorus - We are interested in the effects of N on C, while
holding P constant - We cant get that independence directly from the
data N and P are correlated - Multiple regression is the key
- Equation on board
- Slopes are partial coefficients
- In simple regression, slope is marginal
coefficient
9(No Transcript)
10Back to the golf course
- Use data to estimate parameter values that give
best fit b0-9.4, b10.3, b21.2 - Answer A one unit increase in N, results in
about a 1.2 unit increase in C. - Importance Omitting phosphorus from model
introduced significant bias!!!
- But theres a lot of uncertainty in the estimate
of the effect of N - 95 CI ranges from about 1.2 to about 3.6
- Question does nitrogen have any effect on
chlorophyll A in these lakes?
11(No Transcript)
12Does nitrogen have an effect?
- In multiple regression, cant tell just from
looking at the P values of the individual
coefficients - If two independent variables are collinear
(correlated), then the P values will be inflated
or deflated - Removing one may decrease P value of other
- Instead, look at effect of removing each
variable, one at a time, from the model - Uses F-test to test null hypothesis that
increased goodness of fit from that variable is
just due to chance
13Enhance understanding
Make predictions
Estimate parameters
Describe patterns and relationships in data
Select models
Test statistical hypotheses
Test theories
Make decisions
14Assumptions of OLS regression
- Model is linear in parameters
- The data are a random sample of the population
- The residuals are statistically independent from
one another - The expected value of the residuals is always
zero - The independent variables are not too strongly
collinear - The residuals have constant variance
- If assumptions 1-4 are satisfied, then OLS
estimator is unbiased - If assumption 5 is also satisfied, then
- OLS estimator has minimum variance of all
unbiased estimators. - How can we test these assumptions?
- If assumptions are violated,
- what does this do to our conclusions?
- how do we fix the problem?
15What makes it linear regression?
- Model is linear in parameters
- Parameters cant occur inside a nonlinear
function - Residuals are additive
- Nonlinearity in variables is OK
16Dummy variables
- How can we handle categorical explanatory
(independent) variables in a regression? - Dichotomous
- Male/Female
- Pre-regulation/Post-regulation
- Island/Mainland
- Polytomous
- Continent
- Political party
- Soil type
17Alien Species
- Exotic species cause economic and ecological
damage - Not all countries equally invaded
- Want to understand characteristics of country
that make it more likely to be invaded.
- Well measure invasiveness as fraction of
species that are Alien - Two hypotheses
- Human population density plays a role in a
countrys invasiveness. - Island nations are more invaded than mainland
nations.
18Island
Mainland
19A Simple Model
- ISL is a Dummy variable, coded 0 if mainland, 1
if island - Dummy changes intercept (explain).
- Interaction dummy variable?
- E.g. Invasions of island nations more strongly
affected by population density.
20A model with interactions
21What about a polytomous variable (e.g.,
Continent)?
22(No Transcript)