Chapter 4: Finite Sample Properties of Least Squares - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Chapter 4: Finite Sample Properties of Least Squares

Description:

knowing X'y=X'Xb and dividing by n yields. n-1 S xiyi = ( n-1 S xix'i ) b. So by using Least Squares, the relationship in the population is imitated in the sample. ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 27

Provided by: personee

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 4: Finite Sample Properties of Least Squares

1
Chapter 4 Finite Sample Properties of Least
Squares

Assumptions from previous chapters
Linearity
Full rank
Exogeneity of independent variables
Homoscedasticity and nonautocorrelation
(spherical disturbances)
Exogenously generated data
Normal distribution

2
4.2.1 Motivating Least Squares

By assumption 3 Eex0 so Covx,e0
gt ExEexe ExEyx(y-xß 0
gt ExEyxy Exxxß
knowing XyXXb and dividing by n yields
n-1 S xiyi ( n-1 S xixi ) b
So by using Least Squares, the relationship in
the population is imitated in the sample.

3
4.2.2 Motivating Least Squares

Another way to obtain the same result is by
trying to find an estimator of coefficients that
minimizes the expected mean square error linear
predictor. This estimator will be the LSE.
MSE EyEx y- x? 2
where x? will be the min. mean sq. error lin.
pred. of y
(?MSE) / (? ? ) -2 EyEx x Eyx - x? 0
gt ExEyxy Exxx ?
which is the same as the equation on the last
slide with ?b. So it will yield the same
conclusion.

4
4.2.2 Motivating Least Squares

Theorem 4.1
The min exp. sq. error lin. pred. of yi can be
estimated by the least squares regression line if
the law of large numbers can be applied to the
estimators
n-1 S xiyi ( n-1 S xixi ) b
of the matrices ExEyxy Exxxß.
LLN sample mean converges to the population
mean if the population variance is finite.

5
4.3 Unbiased Estimation

We know b (XX) -1 Xy and y Xß e
So b ß (XX) -1 X e
Then b will be an unbiased estimator of ß
Eb Ex EbX
Ex Eß (XX) -1 X e X
Ex ß E (XX) -1 X e X
by assumption 3 Eei xj1, .., xjK 0
Ex ß ß
This holds for any sample size n and distribution
of e !

6
4.4.1 Variance of the LSE

If the regressors are nonstochastic
gt Sampling variance of LSE can be derived by
treating X as a matrix of constants.
If the regressors are stochastic
gt Sampling variance of LSE can be derived
by taking the conditional variance
VarbX and then averaging over X

7
4.4.1 Variance of the LSE

b ß (XX) -1 X e ß A e
Var (bX) E(b - ß)(b - ß) X
E(XX)-1 XeeX(XX) -1
(XX)-1 X Eee X X(XX) -1
(XX)-1 X ( s2 I) X(XX) -1
Var (bX) s2(XX) -1
As a result b will be a Best Linear Unbiased
estimator of ß.

8
4.4.2 Gauss Markov Theorem

Theorem 4.2
In the classical regression model the LSE b will
be a Best Linear Unbiased Estimator (BLUE) of ß
because b is the minimum variance linear
estimator of ß.

9
4.4.2 Gauss Markov TheoremProof of theorem 4.2

Consider any other linear estimator w of ß
Where wCy and suppose w is unbiased.
Then ECy X E CX ß Ce X ß
Because yX ß e
So this means CX I
Var (w X) E(w - ß)(w - ß) X
E C ee C X
C s2 C s2 CC

10
4.4.2 Gauss Markov TheoremProof of theorem 4.2 -
repeated

Lets define D C - (XX)-1 X
This means DX CX - I I - I 0
Then Dy Cy - (XX)-1 Xy w - ß
Now we can prove that the conditional
variance of w is larger or equal than that of b
Var (w X) s2 CC
s2(D (XX)-1 X)(D (XX)-1 X)
s2 DD s2 (XX)-1 (because DX 0)
s2(XX) -1 Var (bX)

11
4.5 The implications of stochastic regressors

Theorem 4.3
The LSE b of ß is the minimum variance unbiased
linear estimator of ß in the classical linear
regression model. This holds as long as the 6
assumptions hold, whether X is stochastic or
non-stochastic.
To prove the theorem we should show that
the unconditional variance of b is BLUE and
has minimum variance.

12
4.5 The implications of stochastic regressors
Proof of Th.4.3

We already proved b is unbiased
Eb Ex EbX Ex ß E (XX) -1 X e
X ß
We already know Var (bX) s2(XX) -1
E bX ß, is a constant
So now we can determine the variance of b
Var b Ex Var bX VarXE bX
Exs2(XX) -1 VarX ß
s2 Ex (XX) -1 0

13
4.5 The implications of stochastic regressors
Proof of Th.4.3

To proof b has the minimum variance of all
linear estimators consider any other linear
estimator w of ß
Var b Ex Var bX
Ex Var wX Var w
The inequality holds because
Var wX Var bX which was proved earlier.
So the LSE is a minimum variance unbiased
estimator of ß.

14
4.6 Estimating s²

To test hypotheses about ß or to form confidence
intervals we require a sample estimate of
VarbX s²(XX)-1
We have to find an unbiased estimator of s²
Vare Ee2
We can write estimator e of e as
e My MXß e Me (MX 0)
gt ee eMe, which is a 1x1 matrix.

15
4.6 Estimating s²

The trace of an KxK Matrix A is
?K aii (sum of diagonal elements)
So eMe tr(eMe)
ee eMe ? Eee EeMe Etr(eMe)
tr(MEee) tr(Ms²) s² tr(M)
Tr(M) tr(In X(XX)-1X)
tr(In) tr(X(XX)-1X) tr(In) tr(Ik) n K
(X is NxK and M is NxN)

16
4.6 Estimating s²

But this means that Eee (n K)s², which is
not unbiased
So we must construct an unbiased estimator
s2 ee / (n K) gt Es2 s²
We call vs2 s, the standard error of
regression

17
4.7 Normality Assumption and Statistical Inference

Earlier we defined b as an linear function of e
b ß (XX)-1Xe,where Eb ß and VarbX
s²(XX)-1 and we assumed that e has a normal
distribution
b Nß, s²(XX)-1 and bk Nß, s²
Now we denote by Skk and we will try to
base statistical inference about ßk on the
statistic zk (which is obtained by standardizing
bk)

18
4.7.1 Testing hypothesis about coefficients

We get zk (bk ßk) / vs²Skk N0,1
But this holds only when s² is known. So if s² is
not known we must use s² ee/(n-K) i.o. s²
What is the distribution of
tk (bk ßk) / v(s²Skk)
In order to find the distribution of this
statistic we note that zkN0,1 and that s²/s²
ee/(n-K)/s²

19
4.7.1 Testing hypothesis about coefficients

Now we can write ee/s² (ee
eMe)
Since e/s N0,1 we see that ee/s² ?2n-K
This means that s²/s² ee/(n-K)/s² ?2n-K
and that tk t(n-K). (ratio of N0,1 and a
?2n-K)

20
4.7.2 Confidence Intervals

We base confidence intervals for ßk on tk and
define them as
P(bk- ta/2sbk ßk bk ta/2sbk) 1-a
where sbk v(s²Skk), 1-a is the desired
confidence level and ta/2 is the appropriate
critical value from the t-distr. with (n-K)
degrees of freedom

21
4.7.4 Testing the Significance of the Regression

When we test whether the regression as a whole is
significant, we test the hypotheses that all
coefficients except the constant term are zero.
So we can use R² for this and we use
FK-1,n-K R²/(K-1) / (1-R²)/(n-K)
This statistic has an F distr. under the
hypothesis.

22
4.7.4 Testing the Significance of the Regression

This means that large values of F give evidence
against the validity of the hypothesis
Large values of F imply large values of R²
So the F statistic measures the loss of fit when
we set all coefficients equal to zero, so when F
is large, the loss of fit is large and the
regression is significant

23
4.9 Data Problems

Multicollinearity The variables in the data are
(highly) correlated
Missing Observations The data is incomplete
Influential Datapoints Datapoints that are
inconsistent with the rest of the data can cause
to bias the Least Squares Estimator

24
4.9.1 Effects of Multicollinearity

Small changes in the data produce wide swings in
the parameter estimates
Coefficients have high standard errors and low
significance levels but the F-value is
significant
Coefficients have implausible signs
How to solve multicollinearity
Drop the regressor which has the highest
correlation coefficient with the other regressors
(might cause other problems)
Use the ridge estimator (is biased but has
smaller covariance matrix)

25
4.9.2 Missing Observations

2 Cases
Data is unavailable
There are gaps in the data set
Data is unavailable Data is representative for
the population, but there is a time
inconsistency. (On a quarterly basis when monthly
is needed)
Gaps in data set The data is not representative
for the population. (Some groups of the
population are missing in the data)
How to solve
Replace missing observations with
Add variable that has value 1 when data point is
missing

26
4.9.3 Influential Data Points

Identification of outliers data points that seem
inconsistent with the rest of the data
How to solve
Outliers can be considered for removal from the
data, but the consequences of this should be well
considered

Write a Comment

User Comments (0)