Review III: Failure of Assumptions - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Review III: Failure of Assumptions

Description:

The error term is correlated with independent variables, violating assumption MLR. ... Measurement error in independent variables ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 22
Provided by: zhiga9
Category:

less

Transcript and Presenter's Notes

Title: Review III: Failure of Assumptions


1
Review III Failure of Assumptions
  • Data Problems
  • Misspecification
  • Multicollinearity
  • Heteroskedasticity

2
Main Points from Last Class
  • Yiß0,iß1X1,iß2X2,iß3X3,ißkXk,iu,i
  • Causal relationship is more informative but also
    harder to estimate than correlation.
  • Roughly speaking, identifying a causal
    relationship is equivalent to addressing the
    endogeneity (violation of assumption MLR 3).
  • Another important assumption is random sampling.

3
Omitted variables problem (Underspecification)
  • Some variables that should be in the model are
    not included (sometimes called the omitted
    variable problem)
  • If the omitted variables are correlated with any
    explanatory variable Xj, then Xj will be
    correlated with u. We call Xj endogenous. ßj will
    be inconsistent.
  • If Xj is further correlated with other
    explanatory variables, then their coefficient
    estimates also become inconsistent even if they
    are exogenous.

4
The Omitted Variables Bias
5
(No Transcript)
6
How to address the omitted-variable bias?
  • Add in the omitted variables
  • Hardly possible unless we control the whole
    process.
  • Random assignment of the independent variable of
    interest, conditional on observed control
    variables
  • Occasionally possible
  • The independent variable of interest is affected
    by some exogenous variables not in the system.
  • Often possible. The focus of this course.

7
Class quiz
  • What are the possible omitted variables in the
    following cases?
  • Regress wage on education
  • Regress the prices of an apartment on its
    distance to the Central
  • Regress GDP on capital and labor force

8
Overspecification
  • When redundant variables are added to the
    regression
  • they do not affect the consistency of the
    estimates if they are not related to any of the
    variables in the regression. They only affect the
    precision of the estimates.
  • Otherwise, the redundant variables may affect the
    estimates.
  • Bad controls

9
Bad control 1
  • The control variable is an outcome of the
    independent variable of interest
  • e.g. occupations in the wage model with education
    as the variable of interest

10
Bad control 2
  • Proxy variables that are the outcome of the
    independent variable of interest
  • Note that a proxy variable is a variable that is
    not in the system (i.e. not directly affecting
    the dependent variable)

11
A suggestion for choosing control variables
  • Pay particular attention to control variables
    that are the outcome of the independent variables
    of interest.
  • How to judge whether a variable is the outcome of
    another variable? Timing may be useful. Or use
    economic reasoning.

12
Class quiz
  • Think of some regressions you want to make to
    assess the relationship between some variables.
    Is there any problems as we have described above?

13
Nonrandom Sampling (p. 310 or 326)
  • Exogenous Sample Selection
  • Sample selection is based on the observed
    independent variables.
  • Estimates are not affected.
  • Endogenous Sample Selection
  • Sample selection is based on the dependent
    variables
  • The error term is correlated with independent
    variables, violating assumption MLR. 3.
    Generally all estimates are inconsistent.
  • yjaXjej
  • If yigtc, then aXjejgtc, or aXjgtcej.
  • Hence, Xi and ej are correlated.

14
Nonrandom Sampling
  • Stratified sampling
  • The population is divided into nonoverlapping,
    exhaustive groups. Sampling frequency differs
    across groups.
  • Whether estimates are affected depends on whether
    the stratification is decided by the dependent or
    independent variables.
  • Homogeneity assumption across stratum?

15
Measurement Errors (p. 302-309 or 318-325)
  • If the measurement error is in the dependent
    variables and is unrelated to the independent
    variables, then the estimates are not affected
    (except for the intercept), but the variances of
    the estimates are larger.
  • Measurement error in independent variables
  • Classical Errors-in-variables (CEV) Assumption
    If the error is uncorrelated with the true
    variable, then the measured variable is
    correlated with the error, making it an
    endogeneity problem. The magnitude and direction
    of biases are hard to determine in reality.
    (Instrumental Variable approach can solve the
    engogeneity problem.)
  • If the error is uncorrelated with the measured
    variable, the estimates are not affected, but
    their variances are.

16
Outliers (p. 312-317)
  • Outliers are observations that can significantly
    affect OLS estimates when they are added to the
    sample. These could happen because
  • Typos
  • Nature of some variables
  • To address this problem, we may
  • Manually pick and test the influence of outliers
  • Using statistics package to pick outliers
  • Use log specification that is less sensitive to
    outliers
  • Use Least Absolute Deviations (LAD) estimators
    that put less weight on outliers.

17
Functional Form Misspecification(pp. 289-295 or
304-310)
  • If the adopted functional form is not the same as
    the true functional form, of course the estimates
    can be inconsistent.
  • Is there really a true model and who knows what
    it is?
  • Reduced-form approach Try different functional
    forms until it fits the data best.
  • Structural-form approach Use the functional form
    suggested by economic theory (and under some
    assumptions).

18
Multicollinearity
  • Multicollinearity is not a violation of
    assumptions, so it does not affect the
    consistency of estimates.
  • However, coefficient estimates become less
    precise (p.96-98)
  • s2 Variance of the error term
  • SSTj The sample variation in Xj
  • Rj2 The linear relationships between Xj and
    other explanatory variables

19
Heteroskedasticity
  • Heteroskedasticity is present if the variance of
    u differ for different segments of the
    population, where the segments are determined by
    the different values of the Xs. (p. 54-55 and
    chapter 8). This is a violation of assumption
    MLR.5.
  • Coefficient estimates are still consistent, but
    estimated variances of the estimates are
    inconsistent.

20
Testing for Heteroskedasticity
  • Breusch-Pagan test (BP test)
  • u2d0d1X1d2X2dkXkerror
  • Test H0 d1d2dk0
  • An F test provided by most statistics package (a
    p-value is reported).
  • Reject the null when p is small.
  • White Test and its extensions
  • Adding the squares and cross products of
    independent variables to the BP test.

21
Heteroskedasticity-Robust Standard Error (p.260)
  • When heteroskedasticity is present, the correct
    estimator of the variance of a coefficient is
  • The heteroskedasticity-robust standard error (or
    Huber-White standard error) is reported by most
    statistics package.
  • Note that this is a large-sample result, and it
    also requires some assumptions.
Write a Comment
User Comments (0)
About PowerShow.com