Title: Specification Error I
1Specification Error I
2Aims and Learning Objectives
- By the end of this session students should be
able to -
- Understand the causes and consequences of
- specification error
- Analyse regression results for possible
- specification error
- Undertake appropriate remedies for specification
- error
3Introduction
- Before any equation can be estimated, it must be
completely specified. Broadly speaking,
specifying an econometric equation consists of
the following - Choosing the correct explanatory variables
- Choosing the correct functional form
- Choosing the correct form of the error term
Specification error can arise in a number of
ways (i) Omission of a relevant explanatory
variable (ii) Inclusion of an irrelevant
explanatory variable (iii) Adopting the wrong
functional form (iv) Endogenous explanatory
variable(s)
4Recap Assumptions of the Multivariate
Regression Model
A1. Yi ?1 ?2X2i ?3X3i ?kXki Ui
A2. E(Ui) 0 A3. Cov(Ui, X2i) Cov(Ui,
X3i)... ... Cov(Ui,Xki) 0 A4. Var(Ui)
??2 A5. Cov(Ui,Uj) 0 A6.
A7. No exact collinearity or perfect
multicollinearity among the explanatory variables
5Specification Error Omission of a Relevant
Explanatory Variable
True Model
Under-fitted Model
X3 omitted from the under-fitted model
In general, E(b2) ? ?2
6WHY?
It can be shown that
Where b32 is derived from
If we estimate the true model then b2 measures
the net effect of X2 on Y (since the influence
of X3 is included in the model). When we omit a
relevant variable (X3), b2 includes the net
effect and the impact of the omitted variable
(X3) on Y (we call this the indirect effect).
E(b2) is biased.
If there is no relationship between X2 and X3
then b32 is zero and there is no bias
7Example
Possible Omitted Variable Bias in a Wage Equation
Dependent Variable Wage or Earnings Explanatory
Variable Education
8Var (b2) will be smaller (in general)
Variance of ?2 in the true model is
Variance of b2 in the under-fitted model is
9Recap 1. If the left-out variable is correlated
with the included variable the coefficient
attached to the included variable is
biased. 2. The variance of included variable is
generally smaller
In addition 3. The coefficient is also
inconsistent - the bias does not disappear as
the sample size gets bigger 4. Confidence
intervals and hypothesis tests may be
misleading
10Specification Error Inclusion of an Irrelevant
Explanatory Variable
True Model
Over-fitted Model
X3i included in the over-fitted model
E(b2) and var (b2) are still unbiased. However,
estimates are now inefficient (variances are
generally larger)
11Var (b2) is inefficient
Variance of ?2 in the true model is
Variance of b2 in the over-fitted model is
As a result, confidence intervals will be wider
and we run the risk of not rejecting a false null
hypothesis
12Does this suggest, therefore, that it is better
to include irrelevant variables than to exclude
relevant ones?
- No, because as well as a loss of efficiency of
the - estimators including irrelevant variables will
- also result in
- Loss of degrees of freedom
- And may result in
- Problems of multicollinearity
- (more on this in lecture 10)
13Functional Form Mis-specification
Adopting an incorrect functional form
For example, if we estimate a linear model
But the true model is a log-linear model
Then the mis-specification arises because we
estimate the wrong functional form
14Mis-specification Tests
- Mis-specification generally occurs when
- We omit a relevant variable, or
- We include an irrelevant variable, or
- We use an incorrect functional form
In most circumstances we do not know what the
true model is. How can we determine, therefore,
whether the model we estimate is correctly
specified?
15Mis-specification Tests
Preliminary Analysis (informal Tests)
- Variables based on economic theory (if possible)
- Observe sign and significance of coefficients
what - happens when an additional variable is added
or - deleted?
- Does adj R2 increase when more variables are
added - Look at the pattern of the residuals
- (if there are noticeable patterns then it is
possible - that the model has been mis-specified)
16Ramseys RESET Test
A more formal test of mis-specification
Proxy variables
RESET test proxies based on the predicted value
of Y
17Example
Suppose we estimate the following model
and want to test for mis-specification.
The RESET test uses the predicted values
And creates various powers of
to the original model, we then
Adding these powers
estimate a new model
18Example
Perform an F-test on the significance of the
additional variables
If additional variables are significant
evidence of mis-specification
Cautionary Note RESET is easy to apply but cannot
tell us the reason for the mis-specification
(i.e. omitted variable or functional form)
19Summary
In this lecture we have 1. Started to look at
regression models which violate the
CLRM assumptions 2. Outlined the theoretical and
practical consequences of under-fitting
and over-fitting regression models and
choosing an incorrect functional form 3.
Outlined a number of procedures for detecting
possible specification error