Title: Violation of Assumptions
1Violation of Assumptions
2Omitted Variables
- This problem occurs if a variable is omitted from
the specification either due to an error by the
researcher or lack of data. - If the variable is uncorrelated with the included
variables - The estimated slopes are inefficient (their
variance is too large). - The estimated slopes are unbiased.
3Omitted Variables
- If the variable is correlated with the included
variables - The t-tests are biased (the estimated variance of
the slopes is too small). - The estimated slopes are biased.
- This is a serious problem because it leads us to
reject true null hypotheses too often.
4Omitted Variables
- This suggests that great care be taken in model
building. It is generally not good procedure to
allow the sample to dictate the model. It is
better to include a variable that should not be
there than exclude a variable that should.
5EXAMPLE
6EXAMPLE
7Effect of Omitted Variable
8Measurement Error
- In the dependent variable
- Slopes are biased toward zero - null hypotheses
that are false are more difficult to reject.
Measurement error makes it more difficult to
reject null hypotheses. - In an independent variable
- Slope is biased toward zero. Slopes of other
variables that are correlated with this variable
can also be biased. Measurement error can lead
to rejecting true nulls.
9Measurement Error
- Implications
- Your dependent variable is hard to measure
product satisfaction or quality of work. If you
do find results they would be even stronger if
you could measure the variable accurately. A
significant result with a variable that is
difficult to measure should not be dismissed!
10Measurement Error
- Implications
- Your independent variable is hard to measure
product satisfaction or quality of work. Same as
dependent variable (a significant result would be
even more significant). HOWEVER, poor
measurement can lead you to give MORE credit than
is due to another variable.
11Measurement Error
- Conclusions
- Measure your variables as accurately as possible
to improve the power of your tests - If your independent variable is difficult to
measure, you must worry about the results for
other variables in the model.
12Heteroscedasticity
- Typically a problem in cross sectional data
- Slopes are unbiased, but inefficient.
- However, this is often an indication of an
omitted variable problem, in which case the
slopes are potentially biased.
13Heteroscedasticity
- Usually occurs due to a few outliers. Possible
cures - Drop the outliers.
- Use a transformation like a log transformation
that eliminates the problem. - Use advanced procedures to correct the problem
(Weighted Least Squares Generalized Least
Squares)
14Heteroscedasticity
- Examples
- Data on firms of different sizes - there is
likely to be more heterogeneity in management for
small firms - Small firms -gt big errors
- Large firms -gt small errors
- Data on proportions gathered from groups of
different sizes - Large groups likely to give better estimates
- Example College graduation rates
15Autocorrelation
- This occurs when the error in one observation is
correlated with the error in another observation. - This is generally a time series problem.
- This correlation can be quite simple, or very
complicated. - If the correlation is with the previous
observation error, this is called 1st order
autocorrelation.
16Example Plots of Residuals
Positive Autocorrelation Negative
Autocorrelation None
17The Durbin-Watson Statistic
- Used when data is collected over time to detect
autocorrelation (Residuals in one time period
are related to residuals in another period) - Measures Violation of independence assumption
- Approximately 0positive autocorrelation
- Approximately 2none
- Approximately 4negative autocorrelation.
18The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
19The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dgtDH indicates ACCEPT NULL
20The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dltDL indicates REJECT NULL
21The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dgtDL and dltDH is inconclusive
22The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical
values) ?.05
Test for NEGATIVE autocorrelation USE
4-d Example d3.5 n15, p2 use d.5 reject
null
23Durbin-Watson Example
Relationship between sales and customers
24Durbin-Watson Example
DW.88 p1 ( of variables) n15 ( of
observations) dl1.08 dh1.36
Conclusion Reject null of no positive
autocorrelation (DWlt dl)
25Problem and Cure
- Autocorrelation present
- t-tests are biased - estimated standard error too
small - Degree of autocorrelation known (or estimated)
- Remove by differencing the data.
- Special case correlation 1 -gt first
difference the data
We then run the regression using Y and X
instead of Y and X
26EXAMPLE
How is birth rate related to wars and women in
the labor force? WLFlabor force participation of
women Divorcedivorce rate returnyr3 years
following a war
27Results
28Residuals
DW.55 reject Null of no autocorrelation
Estimate rho as r1-d/2 1-.55/2.725
29Differencing Data
30Results
31Multicollinearity
- Does not violate assumptions of least squares
(unless it is perfectly collinear) - Estimates have low ability to reject false null
hypotheses (low power). - A post hoc problem.
- Little that can be done - eliminating a variable
could cause omitted variable bias.
32Multicollinearity
- May require testing groups of variables instead
of individual slopes. - Use F-test for a group of variables that are
measuring a similar idea rather than testing the
idea by looking at individual t-tests
33Example of Collinearity
- How is MPG influenced by car characteristics?
34Regression Results
35Correlations of Independent Variables
36Qualitative Dependent Variables
- The dependent variable is a category or
membership in a group - Are you a union member?
- Which type of transportation do you use?
- Did you graduate from college?
- Type of worker (Good Average Poor)
37Bivariate Dependent Variable
- Only two values (union-nonunion graduate or not)
- Values can be coded as zero or 1
- What is the probability that the person is a
graduate? - It is likely that this probability changes in a
nonlinear way
38Probability changing with independent variable
values
39Transforming the dependent variable
- Two most popular transformations are LOGIT and
PROBIT - LOGIT is log of odds
- log
- or where e is the
- natural log
40LOGIT
- Around .5, estimates from linear and nonlinear
model yield similar results - LOGIT or PROBIT constrain forecasts to the zero
to 1 range, making them more efficient - LOGIT or PROBIT yield coefficients like
regression that are interpreted as changes in
probabilities
41LOGIT
- Coefficients are distributed as Z (normal)
- Since this is a multiplicative model, the actual
change in probability depends on the level of the
other independent variables
42Example Logit Application Measuring Bank
Discrimination
43Example Logit Application Measuring Bank
Discrimination
44Example Predicting Promotion
45More than two categories
- If the categories are ORDERED (1 to 3, three
being best) can use ORDERED LOGIT or PROBIT - If the categories are random, need a
multinomial LOGIT OR PROBIT (type of
transportation used)
46Chapter Summary
- Presented The Multiple Regression Model
- Considered Contribution of Individual Independent
Variables - Discussed Coefficient of Determination
- Addressed Categorical Explanatory Variables
- Considered Transformation of Variables