Title: Chapter 4: Finite Sample Properties of Least Squares
1Chapter 4 Finite Sample Properties of Least
Squares
- Assumptions from previous chapters
- Linearity
- Full rank
- Exogeneity of independent variables
- Homoscedasticity and nonautocorrelation
(spherical disturbances) - Exogenously generated data
- Normal distribution
24.2.1 Motivating Least Squares
- By assumption 3 Eex0 so Covx,e0
- gt ExEexe ExEyx(y-xß 0
- gt ExEyxy Exxxß
- knowing XyXXb and dividing by n yields
- n-1 S xiyi ( n-1 S xixi ) b
- So by using Least Squares, the relationship in
the population is imitated in the sample.
34.2.2 Motivating Least Squares
- Another way to obtain the same result is by
trying to find an estimator of coefficients that
minimizes the expected mean square error linear
predictor. This estimator will be the LSE. - MSE EyEx y- x? 2
- where x? will be the min. mean sq. error lin.
pred. of y - (?MSE) / (? ? ) -2 EyEx x Eyx - x? 0
- gt ExEyxy Exxx ?
- which is the same as the equation on the last
slide with ?b. So it will yield the same
conclusion.
44.2.2 Motivating Least Squares
- Theorem 4.1
- The min exp. sq. error lin. pred. of yi can be
estimated by the least squares regression line if
the law of large numbers can be applied to the
estimators - n-1 S xiyi ( n-1 S xixi ) b
- of the matrices ExEyxy Exxxß.
- LLN sample mean converges to the population
mean if the population variance is finite.
54.3 Unbiased Estimation
- We know b (XX) -1 Xy and y Xß e
- So b ß (XX) -1 X e
- Then b will be an unbiased estimator of ß
- Eb Ex EbX
- Ex Eß (XX) -1 X e X
- Ex ß E (XX) -1 X e X
- by assumption 3 Eei xj1, .., xjK 0
- Ex ß ß
- This holds for any sample size n and distribution
of e !
64.4.1 Variance of the LSE
- If the regressors are nonstochastic
- gt Sampling variance of LSE can be derived by
treating X as a matrix of constants. - If the regressors are stochastic
- gt Sampling variance of LSE can be derived
- by taking the conditional variance
- VarbX and then averaging over X
74.4.1 Variance of the LSE
- b ß (XX) -1 X e ß A e
- Var (bX) E(b - ß)(b - ß) X
- E(XX)-1 XeeX(XX) -1
- (XX)-1 X Eee X X(XX) -1
- (XX)-1 X ( s2 I) X(XX) -1
- Var (bX) s2(XX) -1
- As a result b will be a Best Linear Unbiased
estimator of ß.
84.4.2 Gauss Markov Theorem
- Theorem 4.2
- In the classical regression model the LSE b will
be a Best Linear Unbiased Estimator (BLUE) of ß
because b is the minimum variance linear
estimator of ß.
94.4.2 Gauss Markov TheoremProof of theorem 4.2
- Consider any other linear estimator w of ß
- Where wCy and suppose w is unbiased.
- Then ECy X E CX ß Ce X ß
- Because yX ß e
- So this means CX I
- Var (w X) E(w - ß)(w - ß) X
- E C ee C X
- C s2 C s2 CC
104.4.2 Gauss Markov TheoremProof of theorem 4.2 -
repeated
- Lets define D C - (XX)-1 X
- This means DX CX - I I - I 0
- Then Dy Cy - (XX)-1 Xy w - ß
- Now we can prove that the conditional
- variance of w is larger or equal than that of b
- Var (w X) s2 CC
- s2(D (XX)-1 X)(D (XX)-1 X)
- s2 DD s2 (XX)-1 (because DX 0)
- s2(XX) -1 Var (bX)
114.5 The implications of stochastic regressors
- Theorem 4.3
- The LSE b of ß is the minimum variance unbiased
linear estimator of ß in the classical linear
regression model. This holds as long as the 6
assumptions hold, whether X is stochastic or
non-stochastic. - To prove the theorem we should show that
- the unconditional variance of b is BLUE and
- has minimum variance.
124.5 The implications of stochastic regressors
Proof of Th.4.3
- We already proved b is unbiased
- Eb Ex EbX Ex ß E (XX) -1 X e
X ß - We already know Var (bX) s2(XX) -1
- E bX ß, is a constant
- So now we can determine the variance of b
- Var b Ex Var bX VarXE bX
- Exs2(XX) -1 VarX ß
- s2 Ex (XX) -1 0
134.5 The implications of stochastic regressors
Proof of Th.4.3
- To proof b has the minimum variance of all
- linear estimators consider any other linear
- estimator w of ß
- Var b Ex Var bX
- Ex Var wX Var w
- The inequality holds because
- Var wX Var bX which was proved earlier.
- So the LSE is a minimum variance unbiased
- estimator of ß.
-
144.6 Estimating s²
- To test hypotheses about ß or to form confidence
intervals we require a sample estimate of
VarbX s²(XX)-1 - We have to find an unbiased estimator of s²
Vare Ee2 - We can write estimator e of e as
- e My MXß e Me (MX 0)
- gt ee eMe, which is a 1x1 matrix.
154.6 Estimating s²
- The trace of an KxK Matrix A is
- ?K aii (sum of diagonal elements)
- So eMe tr(eMe)
- ee eMe ? Eee EeMe Etr(eMe)
- tr(MEee) tr(Ms²) s² tr(M)
- Tr(M) tr(In X(XX)-1X)
- tr(In) tr(X(XX)-1X) tr(In) tr(Ik) n K
- (X is NxK and M is NxN)
164.6 Estimating s²
- But this means that Eee (n K)s², which is
not unbiased - So we must construct an unbiased estimator
- s2 ee / (n K) gt Es2 s²
- We call vs2 s, the standard error of
regression
174.7 Normality Assumption and Statistical Inference
- Earlier we defined b as an linear function of e
b ß (XX)-1Xe,where Eb ß and VarbX
s²(XX)-1 and we assumed that e has a normal
distribution - b Nß, s²(XX)-1 and bk Nß, s²
- Now we denote by Skk and we will try to
base statistical inference about ßk on the
statistic zk (which is obtained by standardizing
bk)
184.7.1 Testing hypothesis about coefficients
- We get zk (bk ßk) / vs²Skk N0,1
- But this holds only when s² is known. So if s² is
not known we must use s² ee/(n-K) i.o. s² - What is the distribution of
- tk (bk ßk) / v(s²Skk)
- In order to find the distribution of this
statistic we note that zkN0,1 and that s²/s²
ee/(n-K)/s²
194.7.1 Testing hypothesis about coefficients
- Now we can write ee/s² (ee
eMe) - Since e/s N0,1 we see that ee/s² ?2n-K
- This means that s²/s² ee/(n-K)/s² ?2n-K
and that tk t(n-K). (ratio of N0,1 and a
?2n-K)
204.7.2 Confidence Intervals
- We base confidence intervals for ßk on tk and
define them as - P(bk- ta/2sbk ßk bk ta/2sbk) 1-a
- where sbk v(s²Skk), 1-a is the desired
confidence level and ta/2 is the appropriate
critical value from the t-distr. with (n-K)
degrees of freedom
214.7.4 Testing the Significance of the Regression
- When we test whether the regression as a whole is
significant, we test the hypotheses that all
coefficients except the constant term are zero. - So we can use R² for this and we use
- FK-1,n-K R²/(K-1) / (1-R²)/(n-K)
- This statistic has an F distr. under the
hypothesis.
224.7.4 Testing the Significance of the Regression
- This means that large values of F give evidence
against the validity of the hypothesis - Large values of F imply large values of R²
- So the F statistic measures the loss of fit when
we set all coefficients equal to zero, so when F
is large, the loss of fit is large and the
regression is significant
234.9 Data Problems
- Multicollinearity The variables in the data are
(highly) correlated - Missing Observations The data is incomplete
- Influential Datapoints Datapoints that are
inconsistent with the rest of the data can cause
to bias the Least Squares Estimator
244.9.1 Effects of Multicollinearity
- Small changes in the data produce wide swings in
the parameter estimates - Coefficients have high standard errors and low
significance levels but the F-value is
significant - Coefficients have implausible signs
- How to solve multicollinearity
- Drop the regressor which has the highest
correlation coefficient with the other regressors
(might cause other problems) - Use the ridge estimator (is biased but has
smaller covariance matrix)
254.9.2 Missing Observations
- 2 Cases
- Data is unavailable
- There are gaps in the data set
- Data is unavailable Data is representative for
the population, but there is a time
inconsistency. (On a quarterly basis when monthly
is needed) - Gaps in data set The data is not representative
for the population. (Some groups of the
population are missing in the data) - How to solve
- Replace missing observations with
- Add variable that has value 1 when data point is
missing
264.9.3 Influential Data Points
- Identification of outliers data points that seem
inconsistent with the rest of the data - How to solve
- Outliers can be considered for removal from the
data, but the consequences of this should be well
considered