Title: Heteroskedasticity
1Heteroskedasticity
- Outline
- 1) What is it?
- 2) What are the consequences for our Least
Squares estimator when we have heteroskedasticity - 3) How do we test for heteroskedasticity?
- 4) How do we correct a model that has
heteroskedasticity
2What is Heteroskedasticity
Review the assumption of Gauss-Markov
- Linear Regression Model
y ?1 ?2x e - Error Term has a mean of zero E(e) 0 ? E(y)
?1 ?2x - Error term has constant variance Var(e) E(e2)
?2 - Error term is not correlated with itself (no
serial correlation) Cov(ei,ej) E(eiej) 0
i?j - Data on X are not random and thus are
uncorrelated with the error term Cov(X,e)
E(Xe) 0
This is the assumption of a homoskedastic error
A homoskedastic error is one that has constant
variance. A heteroskedastic error is one that has
a nonconstant variance.
Heteroskedasticity is more commonly a problem for
cross-section data sets, although a time-series
model can also have a non-constant variance.
3This diagram shows a non-constant variance for
the error term that appears to increase as X
increases. There are other possibilities. In
general, any error that has a non-constant
variance is heteroskedastic.
f(yx)
y
.
.
.
x
x1
x2
x3
4What are the Implications for Least Squares?
- We have to ask where did we used the
assumption? Or why was the assumption needed in
the first place? - We used the assumption in the derivation of the
variance formulas for the least squares
estimators, b1 and b2. - For b2 is was
This last step uses the assumption that ?t2 is a
constant ?2.
5If we dont make this assumption, then the
formula is
Remember
Therefore, if we ignore the problem of a
heteroskedastic error and estimate the variance
of b2 using the formula on the previous slide,
when in fact we should have used the formula
directly on this slide, then our estimates of
the variance of b2 are wrong. Any hypothesis
tests or confidence intervals based on them will
be invalid. However, E(b2) ??2 (Verify that
the proof of Unbiasedness did not use the
assumption of a homoskedastic error.
6How do We Test for a Heteroskedastic Error
- 1) Visual Inspection of the residuals
- Because we never observe actual values for the
error term, we never know for sure whether it is
heteroskecastic or not. However, we can run a
least squares regression and examine the
residuals to see if they show a pattern
consistent with a non- constant variance.
7This regression resulted in the following
residuals plotted against the variable X (weekly
income). What do you see?
8- 2) Formal Tests for Heteroskedasticity (Goldfeld
Quandt Test) - Many different tests, we will study the Goldfeld
Quandt test - a) Examine the residuals and notice that the
variance in the residuals appears to be larger
for larger values of xt - Must make some assumption about the form of the
heteroskedasticity (how the variance of et
changes) - For the food expenditure problem, the residuals
tell us that an increasing function of xt (weekly
income) is a good candidate. Other models may
have a variance that is a decreasing function of
xt or is a function of some variable other than
xt.
9- The Goldfeld Quandt Test
- Sort the data in descending order, and the split
the data in half. - Run the regression on each half of the data.
- use the SSE from each regression to conduct a
formal hypothesis test for heteroskedasticity - If the error is heteroskedastic with a larger
variance for the larger values of xt, then we
should find
Where
And where SSE1 comes from the the
regression using the subset of large values of
xt., which has t1 observations SSE2 comes from
the regression using the subset of small values
of xt, which has t2 observations
10The error is Homoskedastic so that
The error is Heteroskedastic
It can be shown that the GQ statistic has a
F-distribution with (t1-k) d.o.f. in the
numerator and (t2-k) d.o.f. in the
denominator. If GQ gt Fc ? we reject Ho. We find
that the error is heteroskedastic.
11Food Expenditure Example
This code sorts the data according to X because
we believe that the error variance is increasing
in xt.
proc sort datafood
by descending x
data food1
set food
if _n_ lt 20
proc reg
bigvalues model y x
data food2
set food
if _n_ gt 21
proc reg
littlevalues
model y x run
This code estimates the model for the first 20
observations, which are the observations with
large values of xt.
This code estimates the model for the second 20
observations, which are the observations will
small values of xt.
12 The REG Procedure
Model bigvalues
Dependent Variable y
Analysis of Variance
Sum of
Mean Source DF Squares
Square F Value Pr gt F Â Model
1 4756.81422 4756.81422
2.08 0.1663 Error 18
41147 2285.93938 Corrected Total
19 45904 Â Root MSE
47.81150 R-Square 0.1036
Dependent Mean 148.32250 Adj R-Sq
0.0538 Coeff Var
32.23483 Parameter
Estimates Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept
1 48.17674 70.24191 0.69
0.5015 x 1 0.11767
0.08157 1.44 0.1663
The REG Procedure
Model littlevalues
Dependent Variable y Â
Analysis of Variance
Sum of
Mean Source DF Squares
Square F Value Pr gt F Model
1 8370.95124 8370.95124
12.27 0.0025 Error 18
12284 682.45537 Corrected Total
19 20655 Â Root MSE
26.12385 R-Square 0.4053
Dependent Mean 112.30350 Adj R-Sq
0.3722 Coeff Var
23.26183 Â Parameter
Estimates Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept
1 12.93884 28.96658 0.45
0.6604 x 1 0.18234
0.05206 3.50 0.0025
Fc 2.22 (see SAS) ? Reject Ho
13How Do We Correct for a Heteroskedastic Error?
- White Standard Errors the correct formula for
the variance of b2 is - Estimate ?2t in the above formula using the
squared residual for each observation as the
estimate of its variance - This gives us what are called Whites estimator
of the error variance. - In SAS PROC REG
- MODEL Y X / ACOV RUN
- Food Expenditure example
- White standard error se(b2) 0.0382
- Typical Least Squares se(b2) 0.0305
14- 2) Generalized Least Squares
- Idea Transform the model with a heteroskedastic
error into a model with a homoskedastic error.
Then do least squares.
Where
Suppose we knew st. Transform the model by
dividing every piece of it by the standard
deviation of the error
This model has an error with a constant variance
15- 2) Generalized Least Squares (cont)
- Problem we dont know st. This requires us to
assume a specification for the error variance.
Lets assume that the variance increases linearly
with xt.
Where
Transform the model by dividing every piece of it
by the standard deviation of the error.
16This new model has an error term that is the
original error term divided by the square root of
xt. Its variance is constant.
- This method is called Weighted Least Squares.
- More efficient than Least Squares
- Least Squares gives equal weight to all
observations. - Weighted Least Squares gives each observation a
weight that is inversely related to its value of
the square root of xt ? large values of xt which
we have assumed have a large variance will get
less weight than smaller values of xt when
estimating the intercept and slope of the
regression line
17We need to estimate this model
This requires us to construct 3 new variables
We estimate this model
Notice that it doesnt have an intercept
18SAS code to do Weighted Least Squares
ystar y/sqrt(x)
x1star 1/sqrt(x)
x2star x/sqrt(x)
proc reg
foodglsmodel ystarx1star x2star/noint
run
19(No Transcript)