Title: Lake Eutrophication and a Golf Course
1Lake Eutrophication and a Golf Course
- Chlorophyll-a (C) widely used indicator measure
of eutrophication - Nitrogen (N) associated with eutrophication
- Q Golf Course Development. Nitrogen expected to
?. By how much will C increase/decrease in the
local lake?
- Lets look at data from other lakes and fit a
linear relation between C and N - Slope of relationship will give us the expected
effect on C of a unit increase in N
2Ordinary Least Squares (OLS) Regression
- Estimators have many properties.
- 6 is an estimator, but not a very good one.
- Two main properties we care about
- Unbiased The expected distance of estimator from
thing it is estimating is 0. - Efficient Small variance (uncertainty)
- 6 is biased, but has a very small variance
(zero).
- Also called Classical Linear Regression Model
(CLRM) - Find the intercept and slope parameters such that
the sum of squared residuals is as small as
possible - OLS is an estimator for the parameters of the
model
Given certain assumptions are satisfied, OLS
estimator is unbiased and has minimum variance of
all unbiased estimators.
3gt chlor lt- read.csv("Chlorophyll.csv") gt c1.lm lt-
lm(Chlorophyll.a Nitrogen, datachlor) gt
summary(c1.lm) Call lm(formula Chlorophyll.a
Nitrogen, data chlor) Residuals Min 1Q
Median 3Q Max -58.73 -34.13 -10.73 30.97
92.77 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 110.337
23.997 4.598 0.000127 Nitrogen
-4.300 1.596 -2.694 0.012946
--- Signif. codes 0 ' 0.001 ' 0.01 '
0.05 .' 0.1 ' 1 Residual standard error
44.55 on 23 degrees of freedom Multiple
R-Squared 0.2399, Adjusted R-squared 0.2068
F-statistic 7.259 on 1 and 23 DF, p-value
0.01295
4But theres a problem...
5(No Transcript)
6Call lm(formula Chlorophyll.a Phosphorus,
data chlor) Residuals Min 1Q Median
3Q Max -36.148 -13.901 -5.022 5.254
61.037 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 11.34093
6.72380 1.687 0.105 Phosphorus
0.30241 0.03512 8.610 1.19e-08
--- Signif. codes 0 ' 0.001 ' 0.01
' 0.05 .' 0.1 ' 1 Residual standard error
24.86 on 23 degrees of freedom Multiple
R-Squared 0.7632, Adjusted R-squared 0.7529
F-statistic 74.13 on 1 and 23 DF, p-value
1.189e-08
7Call lm(formula Chlorophyll.a Phosphorus
Nitrogen, data chlor) Residuals Min
1Q Median 3Q Max -37.008 -14.115
-7.214 7.675 61.875 Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) -9.38601 21.32504 -0.440 0.664
Phosphorus 0.33321 0.04622 7.210
3.17e-07 Nitrogen 1.20043 1.17221
1.024 0.317 --- Signif. codes 0 '
0.001 ' 0.01 ' 0.05 .' 0.1 ' 1 Residual
standard error 24.84 on 22 degrees of
freedom Multiple R-Squared 0.774, Adjusted
R-squared 0.7534 F-statistic 37.67 on 2 and 22
DF, p-value 7.867e-08
8(No Transcript)
9Back to the golf course
- Use data to estimate parameter values that give
best fit b0-9.4, b10.3, b21.2 - Answer A one unit increase in N, results in
about a 1.2 unit increase in C. - Importance Omitting phosphorus from model
introduced significant bias!!!
- But theres a lot of uncertainty in the estimate
of the effect of N - 95 CI ranges from about 1.2 to about 3.6
- Question does nitrogen have any effect on
chlorophyll A in these lakes?
10Does nitrogen have an effect?
- In multiple regression, cant tell just from
looking at the P values of the individual
coefficients - If two independent variables are colinear
(correlated), then the P values will be inflated
or deflated - Instead, look at effect of removing each
variable, one at a time, from the model - Uses F-test to test null hypothesis that
increased goodness of fit from that variable is
just due to chance
- gt Anova(c3.lm)
- Anova Table (Type II tests)
- Response Chlorophyll.a
- Sum Sq Df F value Pr(gtF)
- Phosphorus 32070 1 51.9830 3.171e-07
- Nitrogen 647 1 1.0487 0.3169
- Residuals 13572 22
11Call lm(formula Chlorophyll.a Phosphorus
Nitrogen, data chlor) Residuals Min
1Q Median 3Q Max -22.193 -11.292
-3.648 4.538 47.546 Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) -4.883608 15.889700 -0.307
0.761609 Phosphorus 0.161319
0.052467 3.075 0.005748 Nitrogen
0.241565 0.899195 0.269 0.790824
PhosphorusNitrogen 0.024162 0.005573 4.335
0.000291 --- Signif. codes 0 ' 0.001
' 0.01 ' 0.05 .' 0.1 ' 1 Residual
standard error 18.47 on 21 degrees of
freedom Multiple R-Squared 0.8807, Adjusted
R-squared 0.8637 F-statistic 51.69 on 3 and 21
DF, p-value 7.192e-10
Anova Table (Type II tests) Response
Chlorophyll.a Sum Sq Df F
value Pr(gtF) Phosphorus 32070 1
94.031 3.312e-09 Nitrogen 647
1 1.897 0.1829167 PhosphorusNitrogen
6410 1 18.795 0.0002914 Residuals
7162 21
12Call lm(formula Chlorophyll.a Phosphorus
PhosphorusNitrogen, data
chlor) Residuals Min 1Q Median 3Q
Max -23.415 -11.417 -3.248 3.648 47.170
Coefficients Estimate
Std. Error t value Pr(gtt) (Intercept)
-0.896421 5.553804 -0.161 0.873246
Phosphorus 0.152876 0.041115 3.718
0.001196 PhosphorusNitrogen 0.024530
0.005287 4.640 0.000126 --- Signif. codes
0 ' 0.001 ' 0.01 ' 0.05 .' 0.1 ' 1
Residual standard error 18.07 on 22 degrees of
freedom Multiple R-Squared 0.8803, Adjusted
R-squared 0.8694 F-statistic 80.91 on 2 and 22
DF, p-value 7.219e-11
Anova Table (Type II tests) Response
Chlorophyll.a Sum Sq Df F
value Pr(gtF) Phosphorus 45828 1
140.287 5.107e-11 PhosphorusNitrogen 7033
1 21.528 0.0001264 Residuals
7187 22
13R code, part 1
- Read in the data
- chlor lt- read.csv("Chlorophyll.csv")
- Perform OLS regression of C on N, and look at
the results - c1.lm lt- lm(Chlorophyll.a Nitrogen, datachlor)
- summary(c1.lm)
- Plot the data. Note that when both variables
are in the data frame, - you can use the formula notation
- plot(Chlorophyll.aNitrogen, datachlor)
- Add the OLS regression line to the plot
- abline(c1.lm)
- Look at scatterplots of the all the variables
(the first column is just the - lake ID number, so we dont include it). This
is in the CAR library. - library(car)
- scatterplot.matrix(chlor,24)
- OLS regression of C on P
- c2.lm lt- lm(Chlorophyll.a Phosphorus,
datachlor) - summary(c2.lm)
- OLS regression of C on N and P
- c3.lm lt- lm(Chlorophyll.a PhosphorusNitrogen,
datachlor)
14R code, part 2
- Plot the actual values vs. the fitted values
- plot(fitted(c3.lm),chlorChlorophyll.a)
- Show the line of equality
- abline(0,1)
- Look at the significance of individual terms in
the previous regression. - Notice the capital A. This is in the car
library. - Anova(c3.lm)
- OLS regression of C on N, P, and their
interaction - c4.lm lt- lm(Chlorophyll.a PhosphorusNitrogen,
datachlor) - summary(c4.lm)
- Anova(c4.lm)
- Drop the N term from the previous regression
- c5.lm lt- lm(Chlorophyll.a Phosphorus
PhosphorusNitrogen, datachlor) - summary(c5.lm)
- Anova(c5.lm)