Chapter 4: Finite Sample Properties of Least Squares - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Chapter 4: Finite Sample Properties of Least Squares

Description:

knowing X'y=X'Xb and dividing by n yields. n-1 S xiyi = ( n-1 S xix'i ) b. So by using Least Squares, the relationship in the population is imitated in the sample. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 27
Provided by: personee
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4: Finite Sample Properties of Least Squares


1
Chapter 4 Finite Sample Properties of Least
Squares
  • Assumptions from previous chapters
  • Linearity
  • Full rank
  • Exogeneity of independent variables
  • Homoscedasticity and nonautocorrelation
    (spherical disturbances)
  • Exogenously generated data
  • Normal distribution

2
4.2.1 Motivating Least Squares
  • By assumption 3 Eex0 so Covx,e0
  • gt ExEexe ExEyx(y-xß 0
  • gt ExEyxy Exxxß
  • knowing XyXXb and dividing by n yields
  • n-1 S xiyi ( n-1 S xixi ) b
  • So by using Least Squares, the relationship in
    the population is imitated in the sample.

3
4.2.2 Motivating Least Squares
  • Another way to obtain the same result is by
    trying to find an estimator of coefficients that
    minimizes the expected mean square error linear
    predictor. This estimator will be the LSE.
  • MSE EyEx y- x? 2
  • where x? will be the min. mean sq. error lin.
    pred. of y
  • (?MSE) / (? ? ) -2 EyEx x Eyx - x? 0
  • gt ExEyxy Exxx ?
  • which is the same as the equation on the last
    slide with ?b. So it will yield the same
    conclusion.

4
4.2.2 Motivating Least Squares
  • Theorem 4.1
  • The min exp. sq. error lin. pred. of yi can be
    estimated by the least squares regression line if
    the law of large numbers can be applied to the
    estimators
  • n-1 S xiyi ( n-1 S xixi ) b
  • of the matrices ExEyxy Exxxß.
  • LLN sample mean converges to the population
    mean if the population variance is finite.

5
4.3 Unbiased Estimation
  • We know b (XX) -1 Xy and y Xß e
  • So b ß (XX) -1 X e
  • Then b will be an unbiased estimator of ß
  • Eb Ex EbX
  • Ex Eß (XX) -1 X e X
  • Ex ß E (XX) -1 X e X
  • by assumption 3 Eei xj1, .., xjK 0
  • Ex ß ß
  • This holds for any sample size n and distribution
    of e !

6
4.4.1 Variance of the LSE
  • If the regressors are nonstochastic
  • gt Sampling variance of LSE can be derived by
    treating X as a matrix of constants.
  • If the regressors are stochastic
  • gt Sampling variance of LSE can be derived
  • by taking the conditional variance
  • VarbX and then averaging over X

7
4.4.1 Variance of the LSE
  • b ß (XX) -1 X e ß A e
  • Var (bX) E(b - ß)(b - ß) X
  • E(XX)-1 XeeX(XX) -1
  • (XX)-1 X Eee X X(XX) -1
  • (XX)-1 X ( s2 I) X(XX) -1
  • Var (bX) s2(XX) -1
  • As a result b will be a Best Linear Unbiased
    estimator of ß.

8
4.4.2 Gauss Markov Theorem
  • Theorem 4.2
  • In the classical regression model the LSE b will
    be a Best Linear Unbiased Estimator (BLUE) of ß
    because b is the minimum variance linear
    estimator of ß.

9
4.4.2 Gauss Markov TheoremProof of theorem 4.2
  • Consider any other linear estimator w of ß
  • Where wCy and suppose w is unbiased.
  • Then ECy X E CX ß Ce X ß
  • Because yX ß e
  • So this means CX I
  • Var (w X) E(w - ß)(w - ß) X
  • E C ee C X
  • C s2 C s2 CC

10
4.4.2 Gauss Markov TheoremProof of theorem 4.2 -
repeated
  • Lets define D C - (XX)-1 X
  • This means DX CX - I I - I 0
  • Then Dy Cy - (XX)-1 Xy w - ß
  • Now we can prove that the conditional
  • variance of w is larger or equal than that of b
  • Var (w X) s2 CC
  • s2(D (XX)-1 X)(D (XX)-1 X)
  • s2 DD s2 (XX)-1 (because DX 0)
  • s2(XX) -1 Var (bX)

11
4.5 The implications of stochastic regressors
  • Theorem 4.3
  • The LSE b of ß is the minimum variance unbiased
    linear estimator of ß in the classical linear
    regression model. This holds as long as the 6
    assumptions hold, whether X is stochastic or
    non-stochastic.
  • To prove the theorem we should show that
  • the unconditional variance of b is BLUE and
  • has minimum variance.

12
4.5 The implications of stochastic regressors
Proof of Th.4.3
  • We already proved b is unbiased
  • Eb Ex EbX Ex ß E (XX) -1 X e
    X ß
  • We already know Var (bX) s2(XX) -1
  • E bX ß, is a constant
  • So now we can determine the variance of b
  • Var b Ex Var bX VarXE bX
  • Exs2(XX) -1 VarX ß
  • s2 Ex (XX) -1 0

13
4.5 The implications of stochastic regressors
Proof of Th.4.3
  • To proof b has the minimum variance of all
  • linear estimators consider any other linear
  • estimator w of ß
  • Var b Ex Var bX
  • Ex Var wX Var w
  • The inequality holds because
  • Var wX Var bX which was proved earlier.
  • So the LSE is a minimum variance unbiased
  • estimator of ß.

14
4.6 Estimating s²
  • To test hypotheses about ß or to form confidence
    intervals we require a sample estimate of
    VarbX s²(XX)-1
  • We have to find an unbiased estimator of s²
    Vare Ee2
  • We can write estimator e of e as
  • e My MXß e Me (MX 0)
  • gt ee eMe, which is a 1x1 matrix.

15
4.6 Estimating s²
  • The trace of an KxK Matrix A is
  • ?K aii (sum of diagonal elements)
  • So eMe tr(eMe)
  • ee eMe ? Eee EeMe Etr(eMe)
  • tr(MEee) tr(Ms²) s² tr(M)
  • Tr(M) tr(In X(XX)-1X)
  • tr(In) tr(X(XX)-1X) tr(In) tr(Ik) n K
  • (X is NxK and M is NxN)

16
4.6 Estimating s²
  • But this means that Eee (n K)s², which is
    not unbiased
  • So we must construct an unbiased estimator
  • s2 ee / (n K) gt Es2 s²
  • We call vs2 s, the standard error of
    regression

17
4.7 Normality Assumption and Statistical Inference
  • Earlier we defined b as an linear function of e
    b ß (XX)-1Xe,where Eb ß and VarbX
    s²(XX)-1 and we assumed that e has a normal
    distribution
  • b Nß, s²(XX)-1 and bk Nß, s²
  • Now we denote by Skk and we will try to
    base statistical inference about ßk on the
    statistic zk (which is obtained by standardizing
    bk)

18
4.7.1 Testing hypothesis about coefficients
  • We get zk (bk ßk) / vs²Skk N0,1
  • But this holds only when s² is known. So if s² is
    not known we must use s² ee/(n-K) i.o. s²
  • What is the distribution of
  • tk (bk ßk) / v(s²Skk)
  • In order to find the distribution of this
    statistic we note that zkN0,1 and that s²/s²
    ee/(n-K)/s²

19
4.7.1 Testing hypothesis about coefficients
  • Now we can write ee/s² (ee
    eMe)
  • Since e/s N0,1 we see that ee/s² ?2n-K
  • This means that s²/s² ee/(n-K)/s² ?2n-K
    and that tk t(n-K). (ratio of N0,1 and a
    ?2n-K)

20
4.7.2 Confidence Intervals
  • We base confidence intervals for ßk on tk and
    define them as
  • P(bk- ta/2sbk ßk bk ta/2sbk) 1-a
  • where sbk v(s²Skk), 1-a is the desired
    confidence level and ta/2 is the appropriate
    critical value from the t-distr. with (n-K)
    degrees of freedom

21
4.7.4 Testing the Significance of the Regression
  • When we test whether the regression as a whole is
    significant, we test the hypotheses that all
    coefficients except the constant term are zero.
  • So we can use R² for this and we use
  • FK-1,n-K R²/(K-1) / (1-R²)/(n-K)
  • This statistic has an F distr. under the
    hypothesis.

22
4.7.4 Testing the Significance of the Regression
  • This means that large values of F give evidence
    against the validity of the hypothesis
  • Large values of F imply large values of R²
  • So the F statistic measures the loss of fit when
    we set all coefficients equal to zero, so when F
    is large, the loss of fit is large and the
    regression is significant

23
4.9 Data Problems
  • Multicollinearity The variables in the data are
    (highly) correlated
  • Missing Observations The data is incomplete
  • Influential Datapoints Datapoints that are
    inconsistent with the rest of the data can cause
    to bias the Least Squares Estimator

24
4.9.1 Effects of Multicollinearity
  • Small changes in the data produce wide swings in
    the parameter estimates
  • Coefficients have high standard errors and low
    significance levels but the F-value is
    significant
  • Coefficients have implausible signs
  • How to solve multicollinearity
  • Drop the regressor which has the highest
    correlation coefficient with the other regressors
    (might cause other problems)
  • Use the ridge estimator (is biased but has
    smaller covariance matrix)

25
4.9.2 Missing Observations
  • 2 Cases
  • Data is unavailable
  • There are gaps in the data set
  • Data is unavailable Data is representative for
    the population, but there is a time
    inconsistency. (On a quarterly basis when monthly
    is needed)
  • Gaps in data set The data is not representative
    for the population. (Some groups of the
    population are missing in the data)
  • How to solve
  • Replace missing observations with
  • Add variable that has value 1 when data point is
    missing

26
4.9.3 Influential Data Points
  • Identification of outliers data points that seem
    inconsistent with the rest of the data
  • How to solve
  • Outliers can be considered for removal from the
    data, but the consequences of this should be well
    considered
Write a Comment
User Comments (0)
About PowerShow.com