Violation of Assumptions - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Violation of Assumptions

Description:

Durbin-Watson table (one-tailed critical values) d DH indicates ACCEPT NULL ... How is MPG influenced by car characteristics? 1999 Prentice-Hall, Inc. Chap. 14 - 34 ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 47
Provided by: ilir2
Category:

less

Transcript and Presenter's Notes

Title: Violation of Assumptions


1
Violation of Assumptions
2
Omitted Variables
  • This problem occurs if a variable is omitted from
    the specification either due to an error by the
    researcher or lack of data.
  • If the variable is uncorrelated with the included
    variables
  • The estimated slopes are inefficient (their
    variance is too large).
  • The estimated slopes are unbiased.

3
Omitted Variables
  • If the variable is correlated with the included
    variables
  • The t-tests are biased (the estimated variance of
    the slopes is too small).
  • The estimated slopes are biased.
  • This is a serious problem because it leads us to
    reject true null hypotheses too often.

4
Omitted Variables
  • This suggests that great care be taken in model
    building. It is generally not good procedure to
    allow the sample to dictate the model. It is
    better to include a variable that should not be
    there than exclude a variable that should.

5
EXAMPLE
6
EXAMPLE
7
Effect of Omitted Variable
8
Measurement Error
  • In the dependent variable
  • Slopes are biased toward zero - null hypotheses
    that are false are more difficult to reject.
    Measurement error makes it more difficult to
    reject null hypotheses.
  • In an independent variable
  • Slope is biased toward zero. Slopes of other
    variables that are correlated with this variable
    can also be biased. Measurement error can lead
    to rejecting true nulls.

9
Measurement Error
  • Implications
  • Your dependent variable is hard to measure
    product satisfaction or quality of work. If you
    do find results they would be even stronger if
    you could measure the variable accurately. A
    significant result with a variable that is
    difficult to measure should not be dismissed!

10
Measurement Error
  • Implications
  • Your independent variable is hard to measure
    product satisfaction or quality of work. Same as
    dependent variable (a significant result would be
    even more significant). HOWEVER, poor
    measurement can lead you to give MORE credit than
    is due to another variable.

11
Measurement Error
  • Conclusions
  • Measure your variables as accurately as possible
    to improve the power of your tests
  • If your independent variable is difficult to
    measure, you must worry about the results for
    other variables in the model.

12
Heteroscedasticity
  • Typically a problem in cross sectional data
  • Slopes are unbiased, but inefficient.
  • However, this is often an indication of an
    omitted variable problem, in which case the
    slopes are potentially biased.

13
Heteroscedasticity
  • Usually occurs due to a few outliers. Possible
    cures
  • Drop the outliers.
  • Use a transformation like a log transformation
    that eliminates the problem.
  • Use advanced procedures to correct the problem
    (Weighted Least Squares Generalized Least
    Squares)

14
Heteroscedasticity
  • Examples
  • Data on firms of different sizes - there is
    likely to be more heterogeneity in management for
    small firms
  • Small firms -gt big errors
  • Large firms -gt small errors
  • Data on proportions gathered from groups of
    different sizes
  • Large groups likely to give better estimates
  • Example College graduation rates

15
Autocorrelation
  • This occurs when the error in one observation is
    correlated with the error in another observation.
  • This is generally a time series problem.
  • This correlation can be quite simple, or very
    complicated.
  • If the correlation is with the previous
    observation error, this is called 1st order
    autocorrelation.

16
Example Plots of Residuals
Positive Autocorrelation Negative
Autocorrelation None
17
The Durbin-Watson Statistic
  • Used when data is collected over time to detect
    autocorrelation (Residuals in one time period
    are related to residuals in another period)
  • Measures Violation of independence assumption
  • Approximately 0positive autocorrelation
  • Approximately 2none
  • Approximately 4negative autocorrelation.

18
The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
19
The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dgtDH indicates ACCEPT NULL
20
The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dltDL indicates REJECT NULL
21
The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical values)
Durbin-Watson table (one-tailed critical
values) ?.05
dgtDL and dltDH is inconclusive
22
The Durbin-Watson Statistic
Durbin-Watson table (one-tailed critical
values) ?.05
Test for NEGATIVE autocorrelation USE
4-d Example d3.5 n15, p2 use d.5 reject
null
23
Durbin-Watson Example
Relationship between sales and customers
24
Durbin-Watson Example
DW.88 p1 ( of variables) n15 ( of
observations) dl1.08 dh1.36
Conclusion Reject null of no positive
autocorrelation (DWlt dl)
25
Problem and Cure
  • Autocorrelation present
  • t-tests are biased - estimated standard error too
    small
  • Degree of autocorrelation known (or estimated)
  • Remove by differencing the data.
  • Special case correlation 1 -gt first
    difference the data

We then run the regression using Y and X
instead of Y and X
26
EXAMPLE
How is birth rate related to wars and women in
the labor force? WLFlabor force participation of
women Divorcedivorce rate returnyr3 years
following a war
27
Results
28
Residuals
DW.55 reject Null of no autocorrelation
Estimate rho as r1-d/2 1-.55/2.725
29
Differencing Data
30
Results
31
Multicollinearity
  • Does not violate assumptions of least squares
    (unless it is perfectly collinear)
  • Estimates have low ability to reject false null
    hypotheses (low power).
  • A post hoc problem.
  • Little that can be done - eliminating a variable
    could cause omitted variable bias.

32
Multicollinearity
  • May require testing groups of variables instead
    of individual slopes.
  • Use F-test for a group of variables that are
    measuring a similar idea rather than testing the
    idea by looking at individual t-tests

33
Example of Collinearity
  • How is MPG influenced by car characteristics?

34
Regression Results
35
Correlations of Independent Variables
36
Qualitative Dependent Variables
  • The dependent variable is a category or
    membership in a group
  • Are you a union member?
  • Which type of transportation do you use?
  • Did you graduate from college?
  • Type of worker (Good Average Poor)

37
Bivariate Dependent Variable
  • Only two values (union-nonunion graduate or not)
  • Values can be coded as zero or 1
  • What is the probability that the person is a
    graduate?
  • It is likely that this probability changes in a
    nonlinear way

38
Probability changing with independent variable
values
39
Transforming the dependent variable
  • Two most popular transformations are LOGIT and
    PROBIT
  • LOGIT is log of odds
  • log
  • or where e is the
  • natural log

40
LOGIT
  • Around .5, estimates from linear and nonlinear
    model yield similar results
  • LOGIT or PROBIT constrain forecasts to the zero
    to 1 range, making them more efficient
  • LOGIT or PROBIT yield coefficients like
    regression that are interpreted as changes in
    probabilities

41
LOGIT
  • Coefficients are distributed as Z (normal)
  • Since this is a multiplicative model, the actual
    change in probability depends on the level of the
    other independent variables

42
Example Logit Application Measuring Bank
Discrimination
43
Example Logit Application Measuring Bank
Discrimination
44
Example Predicting Promotion
45
More than two categories
  • If the categories are ORDERED (1 to 3, three
    being best) can use ORDERED LOGIT or PROBIT
  • If the categories are random, need a
    multinomial LOGIT OR PROBIT (type of
    transportation used)

46
Chapter Summary
  • Presented The Multiple Regression Model
  • Considered Contribution of Individual Independent
    Variables
  • Discussed Coefficient of Determination
  • Addressed Categorical Explanatory Variables
  • Considered Transformation of Variables
Write a Comment
User Comments (0)
About PowerShow.com