Title: Assumptions of OLS regression
1Assumptions of OLS regression
2Assumptions of OLS regression
- Model is linear in parameters
- The residuals are normally distributed
- The residuals have constant variance
- The expected value of the residuals is always
zero - The residuals are independent from one another
- The X values are precise
- The independent variables are not too strongly
collinear
- If these assumptions are satisfied, then OLS
estimator is unbiased and has minimum variance of
all unbiased estimators. - How can we test these assumptions?
- If assumptions are violated,
- what does this do to our conclusions?
- how do we fix the problem?
3Model not linear in parameters
- Problem Cant fit the model!
- Diagnosis Look at the model
- Solutions
- Re-frame the model
- Use nonlinear least squares (NLS) regression
4Residuals not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
5(No Transcript)
6Residuals not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
7Try transforming the response variable
Box-Cox Transformations
8But weve introduced nonlinearity
Actual by Predicted Plot (Chlorophyll)
Actual by Predicted Plot (sqrtChlorophyll)
9Residuals not normally distributed
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Regression fits the mean with skewed residuals
the mean is not a good measure of central
tendency - Diagnosis examine QQ plot of Studentized
residuals - Corrects for bias in estimates of residual
variance
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
- Fit a generalized linear model (GLM)
- Allows us to assume the residuals follow a
different distribution (binomial, gamma, etc.)
10Residuals have non-constant variance
(heteroskedasticity)
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Diagnosis plot studentized residuals against
fitted values
11(No Transcript)
12Residuals have non-constant variance
(heteroskedasticity)
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Diagnosis plot studentized residuals against
fitted values
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
13Try our square root transform
14(No Transcript)
15Residuals have non-constant variance
(heteroskedasticity)
- Problem
- Parameter estimates are unbiased
- P-values are unreliable
- Diagnosis plot studentized residuals against
fitted values
- Solutions
- Transform the dependent variable
- May create nonlinearity in the model
- Fit a generalized linear model (GLM)
- For some distributions, the variance changes with
the mean in predictable ways - Fit a weighted least squares regression (WLS)
- Also good when data points have differing amount
of precision
16Average error not everywhere zero (nonlinearity)
- Problem indicates that model is wrong
- Diagnosis look for curvature in
componentresidual plots (CR plots also
partial-residual plots)
17(No Transcript)
18Average error not everywhere zero (nonlinearity)
- Problem indicates that model is wrong
- Diagnosis look for curvature in
componentresidual plots (CR plots also
partial-residual plots)
- Solutions
- If pattern is monotonic, try transforming
independent variable - If not, try adding additional terms (e.g.,
quadratic)
19Residuals not independent (autocorrelation)
- Problem parameter estimates are biased
- Diagnosis look at autocorrelation function to
find patterns in - time
- space
- sample number
- Solutions fit model using generalized least
squares (GLS)
20X-values not precise (measurement error)
- Problem parameter estimates are biased
- Diagnosis know how your data were collected!
- Solution very hard
- State space models
- Restricted maximum likelihood (REML)
- Use simulations to estimate bias
- Consult a professional!
21Independent variables are collinear
- Problem parameter estimates are imprecise
- Diagnosis
- Look for correlations among independent variables
- In regression output, none of the individual
terms are significant, even though the model as a
whole is
- Solutions
- Live with it
- Remove statistically redundant variables
22Summary of OLS assumptions
23What can we do about chlorophyll regression?
- Square root transform helps a little with
non-normality and a lot with heteroskedasticity
- But it creates nonlinearity
24A new model its linear
25 its normal (sort of) and homoskedastic
26 and it fits well!