Dummy Variables - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Dummy Variables

Description:

B3=change in income for 1yr increase in education ... Combining cross-section and time series data. Dropping variable and specification bias ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 10
Provided by: Mrinal
Category:
Tags: dummy | study | time | variables

less

Transcript and Presenter's Notes

Title: Dummy Variables


1
Dummy Variables
  • We have so far used Quantitative variables in
    regression
  • Qualitative variables like gender, race,
    location, season etc also an important part of
    analysis
  • Dummy variables are used to include such
    qualitative variables as explanatory variables in
    a regression model
  • Logit, Probit, Tobit models include qualitative
    variable as dependent variable

2
  • Dummy variable involves assigning 1 to
    observation of the chosen characteristic and 0
    for the rest.
  • For gender, assign 1 for female observation and 0
    for male.
  • For race(4), we need to create more than 1
    variable. 1st variable, 1 if African and 0 fo all
    other races.
  • 2nd variable1 if White and 0 for all other
    races.
  • 3rd variable1 if Asian and 0 for all other
    races
  • 4rth variable1 if colored and 0 for all other
    races
  • Important All 4 variables are not included in
    regression analysis, 4-13 are included. Omitted
    variable is the benchmark variable.
  • Constant indicates the omitted benchmark variable
  • Coefficients of the included variable is
    considered in relation to the constant

3
ANOVA Models
  • Regression model with regressors that are
    exclusively qualitative in nature
  • Sales explained only by seasonality
  • If four seasons, include 3 dummy variable.
  • Season excluded is benchmark is the constant
  • Y234 35 D1 -21 D2 -67D3
  • D1 Jan March D2Apr-June D3July-Sep
  • D4Oct-Dec (benchmark variable)
  • Jan-March season of highest sale
  • July-Sep season of lowest sale
  • Another way to do it is include ALL 4 dummies but
    exclude the constant. If not, it leads to perfect
    multicollinearity violating assumption of OLS

4
Interaction effects
  • Dummy variables can be used to study interaction
    effect between various qualitative variables like
    race and gender.
  • Yb1b2D1b3D2b4D3b5D4
  • D1gender dummy D2-D4race dummies(b,w,i)
  • Constantmissed gender and race dummy(female
    coloured)
  • What is income of a male black? Introduced an
    interaction dummy D1D2
  • Yb1b2D1b3D2b4D3b5D4b6D1D2
  • b1b6income of male black

5
ANCOVA
  • Regression model with both qualitative and
    quantitative model Incf(sex, education)
  • Yb1 b2D1 b3X1
  • Here the constant will differ for male female
  • Constant b1 (D0, female)
  • Constant b1b2 (D1, male)
  • B3change in income for 1yr increase in education
  • To understand what is the income of educated
    male, add an interactive term D1X1
  • Yb1 b2D1 b3X1 b4 DX1
  • If b4 is and significant it means educated male
    gets more than educated female. Educated male
    coefficient is (b3b4)

6
Other uses of dummy variable
  • Check for structural breaks, better than chow
    test
  • Deseasonalisation of data series
  • Piece-piece linear regression (spline functions)

7
Multicollinearity
  • Violation of the assumption of no linear
    relationship between the regressors
  • B2 in multivariable reg is change in Y wrt to 1
    unit change in X1 holding other variables
    constant
  • Important bcos isolating impact of X1 on Y is not
    possible if X1 is related to X2 included in the
    model
  • In the event of perfect multicollinearity, slopes
    are indeterminate
  • If relationship is not perfect even though there
    maybe some degree of linear relationship, slope
    can be estimated
  • However the variances are extremely high in such
    cases and inference making is distorted
  • t-value low and confidence intervals large
    because of high SE

8
Detection of multicollinearity
  • High R-sq but insignificant t-values
  • High pairwise and partial correlation between the
    regressors
  • Auxiliary regression if the R-sq of a regression
    of the X-variables(Rasq) is higher than the R-sq
    of Yon X variables
  • VIF (Variance Inflating Factor) 1/(1-Ra-sq)
  • If VIFgt10, very high multicollinearity
  • If VIF is close to 1, no multicollinearty
  • TOL (Tolerance factor) 1/VIF
  • TOL close to 0, high multicollinearity
  • TOL close to 1, no multicollinearity

9
Remedial measures
  • A priori information
  • Combining cross-section and time series data
  • Dropping variable and specification bias
  • First difference or ratio transformation or in
    deviation from mean form
  • Additional data multicollinearity is a SRF
    problem and usually caused by micronumerosity
    (sample sample)
  • Factor analysis/ PCA
Write a Comment
User Comments (0)
About PowerShow.com