Title: Regression Models with Categorical Variables
1Regression Models with Categorical Variables
Topics Motivational Example Dummy
Variables Interpreting Dummy Variable Slope
Coefficients Implementing Dummy Variables in
StatTools
2Problem Scenario
- The Fifth National Bank is facing a
gender-discrimination suit. The charge is that
its female employees receive substantially
smaller salaries than its male employees.
3Categorical Variables
- The banks employee database contains categorical
variables for Level of Education and Gender - Gender Female and Male
- How best to represent with numbers?
4Categorical Variables
- Female 1 and Male 2?
- Use of this numerical scale will bias model in
favor of males - Better to use Dummy (binary) variables.
5Dummy Variables for 2 Categories
- Female 1 and Male 0
- Reference category assigned 0
- Selection of reference arbitrary
- Depends on final interpretation
- How different is a females salary compared to
males?
6Rules for forming Dummy Variables
- We should use one less dummy than the number of
categories for any categorical variable. - We shouldnt use any of the original categorical
variables that the dummies are based on.
7Rules for forming Dummy Variables
- Since gender has only 2 categories, male/female,
we need just one dummy variable - Name the dummy variable Female to indicate impact
on Salary of being female (compared to male)
8Dummy Variables for More than 2 Categories
- EducLev 1 (finished high school), 2 (finished
some college courses), 3 (obtained a bachelors
degree), 4 (took some graduate courses) and 5
(obtained a graduate degree) - We need four dummy variables
9Categorical Variables for Education Level
- Any four of Ed_1-Ed_5 can be used
- Select the lowest level as reference category to
model the impact of higher levels of education on
salary - Should lead to positive coefficients for the
dummies easier to interpret. - Use Ed_2 to Ed_5.
10Quantitative Variables in the Regression Model
- YrsPrior number years worked at another bank
prior to hire at Fifth National - YrsExper years of experience working at Fifth
National
11Regression Model with Dummy Variables Included
- Predicated Salary 26.613
- 0.362YrsPrior - 4.501Female
- 1.033YrsExper 0.160Ed_2
- 4.765Ed_3 7.320Ed_4 11.770Ed_5
12Interpretation of Regression Slope Coefficients
YrsPrior
- Slope coeff. 0.362
- For either gender, any education level, and a
given number of years experience with Fifth
National, the expected increase in salary for one
extra year of prior experience with another bank
is 362
13Interpretation of Regression Slope Coefficients
YrsExper
- Slope coeff. 1.033
- For either gender, any education level, and a
given number of years prior experience, the
expected increase in salary for one extra year of
experience with Fifth National is 1033
14Interpretation of Regression Slope Coefficients
Female
- Slope coeff. -4.501
- For any education level, a given number of years
prior experience, and experience with Fifth
National, a female employee can expect to earn
4,501 less per year than their male counterparts
15Interpretation of Regression Slope Coefficients
Ed_2
- Slope coeff. 0.160
- Any employee with a given number of years prior
experience, experience with Fifth National, and
some completed college courses, can expect to
earn 160 more per year than employees with just
a high school education
16Interpretation of Regression Slope Coefficients
Ed_3
- Slope coeff. 4.765
- Any employee with a given number of years prior
experience, experience with Fifth National, and a
bachelors degree, can expect to earn 4,765 more
per year than employees with just a high school
education
17Interpretation of Regression Slope Coefficients
Ed_4
- Slope coeff. 7.320
- Any employee with a given number of years prior
experience, experience with Fifth National, and
some graduate courses completed, can expect to
earn 7,320 more per year than employees with
just a high school education
18Interpretation of Regression Slope Coefficients
Ed_5
- Slope coeff. 11.770
- Any employee with a given number of years prior
experience, experience with Fifth National, and a
graduate degree, can expect to earn 11,770 more
per year than employees with just a high school
education
19Interpretation of Regression Intercept Term
constant
- constant 26.613
- A freshly hired male employee with no prior work
experience and just a high school education can
expect to earn 26,613 per year. - Assumption data includes at least one such
observation.
20Effect of Changing Dummy Variable Reference
Category
- Regression coefficients for each category will be
different but interpretation relative to other
categories will remain the same
21Effect of Changing Dummy Variable Reference
Category
22Conclusion from Regression Analysis
- One explanation for gender differences in salary
might be job grade. Perhaps females tend to be in
lower job grades - Why are females predominantly in the low job
grades? - Is management not advancing females as quickly as
it should?
23Creating Dummy Variables in StatTools
- Name the data set in the usual way
- Place the cursor anywhere in the spreadsheet and
click on the Data Utilities icon (3rd from left) - Select Dummy and by clicking insert a check mark
in the box next to the variable (s) for which
you want dummies
24Creating Dummy Variables in StatTools
- Accept the default radio button Create one dummy
variables for each distinct category then click
O.K. - Click yes when StatTools warns if you wish to
continue to insert a new column - StatTools will insert the new column with the
dummy variable (s) next to your data