Regression Models with Categorical Variables - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Regression Models with Categorical Variables

Description:

We should use one less dummy than the number of categories for any categorical ... Select the lowest level as reference category to model the impact of higher ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 25
Provided by: hrcla
Category:

less

Transcript and Presenter's Notes

Title: Regression Models with Categorical Variables


1
Regression Models with Categorical Variables
Topics Motivational Example Dummy
Variables Interpreting Dummy Variable Slope
Coefficients Implementing Dummy Variables in
StatTools
2
Problem Scenario
  • The Fifth National Bank is facing a
    gender-discrimination suit. The charge is that
    its female employees receive substantially
    smaller salaries than its male employees.

3
Categorical Variables
  • The banks employee database contains categorical
    variables for Level of Education and Gender
  • Gender Female and Male
  • How best to represent with numbers?

4
Categorical Variables
  • Female 1 and Male 2?
  • Use of this numerical scale will bias model in
    favor of males
  • Better to use Dummy (binary) variables.

5
Dummy Variables for 2 Categories
  • Female 1 and Male 0
  • Reference category assigned 0
  • Selection of reference arbitrary
  • Depends on final interpretation
  • How different is a females salary compared to
    males?

6
Rules for forming Dummy Variables
  • We should use one less dummy than the number of
    categories for any categorical variable.
  • We shouldnt use any of the original categorical
    variables that the dummies are based on.

7
Rules for forming Dummy Variables
  • Since gender has only 2 categories, male/female,
    we need just one dummy variable
  • Name the dummy variable Female to indicate impact
    on Salary of being female (compared to male)

8
Dummy Variables for More than 2 Categories
  • EducLev 1 (finished high school), 2 (finished
    some college courses), 3 (obtained a bachelors
    degree), 4 (took some graduate courses) and 5
    (obtained a graduate degree)
  • We need four dummy variables

9
Categorical Variables for Education Level
  • Any four of Ed_1-Ed_5 can be used
  • Select the lowest level as reference category to
    model the impact of higher levels of education on
    salary
  • Should lead to positive coefficients for the
    dummies easier to interpret.
  • Use Ed_2 to Ed_5.

10
Quantitative Variables in the Regression Model
  • YrsPrior number years worked at another bank
    prior to hire at Fifth National
  • YrsExper years of experience working at Fifth
    National

11
Regression Model with Dummy Variables Included
  • Predicated Salary 26.613
  • 0.362YrsPrior - 4.501Female
  • 1.033YrsExper 0.160Ed_2
  • 4.765Ed_3 7.320Ed_4 11.770Ed_5

12
Interpretation of Regression Slope Coefficients
YrsPrior
  • Slope coeff. 0.362
  • For either gender, any education level, and a
    given number of years experience with Fifth
    National, the expected increase in salary for one
    extra year of prior experience with another bank
    is 362

13
Interpretation of Regression Slope Coefficients
YrsExper
  • Slope coeff. 1.033
  • For either gender, any education level, and a
    given number of years prior experience, the
    expected increase in salary for one extra year of
    experience with Fifth National is 1033

14
Interpretation of Regression Slope Coefficients
Female
  • Slope coeff. -4.501
  • For any education level, a given number of years
    prior experience, and experience with Fifth
    National, a female employee can expect to earn
    4,501 less per year than their male counterparts

15
Interpretation of Regression Slope Coefficients
Ed_2
  • Slope coeff. 0.160
  • Any employee with a given number of years prior
    experience, experience with Fifth National, and
    some completed college courses, can expect to
    earn 160 more per year than employees with just
    a high school education

16
Interpretation of Regression Slope Coefficients
Ed_3
  • Slope coeff. 4.765
  • Any employee with a given number of years prior
    experience, experience with Fifth National, and a
    bachelors degree, can expect to earn 4,765 more
    per year than employees with just a high school
    education

17
Interpretation of Regression Slope Coefficients
Ed_4
  • Slope coeff. 7.320
  • Any employee with a given number of years prior
    experience, experience with Fifth National, and
    some graduate courses completed, can expect to
    earn 7,320 more per year than employees with
    just a high school education

18
Interpretation of Regression Slope Coefficients
Ed_5
  • Slope coeff. 11.770
  • Any employee with a given number of years prior
    experience, experience with Fifth National, and a
    graduate degree, can expect to earn 11,770 more
    per year than employees with just a high school
    education

19
Interpretation of Regression Intercept Term
constant
  • constant 26.613
  • A freshly hired male employee with no prior work
    experience and just a high school education can
    expect to earn 26,613 per year.
  • Assumption data includes at least one such
    observation.

20
Effect of Changing Dummy Variable Reference
Category
  • Regression coefficients for each category will be
    different but interpretation relative to other
    categories will remain the same

21
Effect of Changing Dummy Variable Reference
Category
22
Conclusion from Regression Analysis
  • One explanation for gender differences in salary
    might be job grade. Perhaps females tend to be in
    lower job grades
  • Why are females predominantly in the low job
    grades?
  • Is management not advancing females as quickly as
    it should?

23
Creating Dummy Variables in StatTools
  • Name the data set in the usual way
  • Place the cursor anywhere in the spreadsheet and
    click on the Data Utilities icon (3rd from left)
  • Select Dummy and by clicking insert a check mark
    in the box next to the variable (s) for which
    you want dummies

24
Creating Dummy Variables in StatTools
  • Accept the default radio button Create one dummy
    variables for each distinct category then click
    O.K.
  • Click yes when StatTools warns if you wish to
    continue to insert a new column
  • StatTools will insert the new column with the
    dummy variable (s) next to your data
Write a Comment
User Comments (0)
About PowerShow.com