Title: Claudio LUCIFORA
1Università Cattolica del Sacro Cuore Istituto di
Economia dell'Impresa e del Lavoro
APPLIED ECONOMETRICS Module 2 - Multiple
Regression Analysis with dummy variables
2Main references
- - Brucchi Luchino (2000) Manuale di economia del
lavoro, CH 19-20, Il Mulino. - Wooldridge (2000), J.M., Introductory
Econometrics A Modern Approach, Second edition,
CH.19, MIT Press. - J. Altonji and R. Blank (1999), "Race and Gender
in the Labor Market", in O. Ashenfelter and Card,
D. Handbook of Labour Economics, vol.3(c),
North-Holland
3Structure of the Presentation
- Multiple Regression Analysis with dummy variables
- Defining the dummies
- Interpreting coefficients
- Dummy variable trap
- Dummy Interactions
- Spline functions
- An empirical application gender pay
differentials (Oaxaca decomposition)
4Structure of the Presentation (continued)
- Problems
- Some caveats (selection, endogeneity,
heterogeneity) - Self selection
- The index number (or discrimination free)
- Differences in distributions the Juhn, Murphy
and Pierce decomposition
5Using dummy variables
- The purpose of this module is to show how to
define, construct, specify, estimate and
interpret dummy variables in empirical analysis. - Dummies are the most common variables found in
the empirical analysis of Survey Data. - We use dummies to account for qualitative
factors, such as membership in a group (ie
gender), selected time period (ie 2001), specific
threshold (ie highest level of education
achieved), etc. - The use of dummies can produce an impressive
variety of models
6Dummy variable/1
- A dummy variable is a variable that takes on the
value 1 or 0 - it can be the recoding of a binary attribute such
as gender (male1 female0) - The recoding of a multilevel character such as
region (1-20) (i.e. north1 if region 1-5, 0
otherwise south1 if region 15-20, 0
otherwise,), etc. - The recoding of a continuous varible such as age
(i.e. young1 if age 1-15, 0 otherwise old1 if
age 55-over, 0 otherwise,), etc.
7A Dummy Independent Variable
- Consider a simple model with one continuous
variable (x) and one dummy (d) - y b0 d0d b1x u
- This can be interpreted as an intercept shift
- If d 0, then y b0 b1x u
- If d 1, then y (b0 d0) b1x u
- The case of d 0 is the base group
8Comparing two means
- If for example y is income and d is whether or
not the individual attended college (ie ignoring
the continuous variable x) - Eincomedid not attend collegeb0
- Eincomeattended college(b0 d0)
9Example of d0 gt 0
y (b0 d0) b1x
y
d 1
slope b1
d0
d 0
y b0 b1x
b0
x
10Dummies for Multiple Categories
- We can use dummy variables to control for
something with multiple categories - Suppose you want to analyse education, and
everyone in your data is either a HS dropout (1),
HS grad only (2), or college grad (3) - To compare HS grad and college grads to HS
dropouts, you need 2 dummy variables - hsgrad 1 if HS grad only, 0 otherwise and
colgrad 1 if college grad, 0 otherwise
11Dummy variable trap
- Because the base group is represented by the
intercept, if there are n categories there should
be n 1 dummy variables - Industry differentials (20 industries) include
(20 1)19 industry dummies - If there are a lot of categories (or continuous
variables), it may make sense to group some
together (such as, regions 1-20 into north,
centre, south)
12Interactions Among Dummies
- Interacting dummy variables is like subdividing
the group - Example have dummies for gender, as well as
hsgrad and colgrad - Add genderhsgrad and gendercolgrad, for a
total of 5 dummy variables gt 6 categories - Base group is female HS dropouts
- hsgrad is for female HS grads, colgrad is for
female college grads - The interactions reflect male HS grads and
male college grads
13Dummy Interactions/1
- Formally, the model is
- y b0 d1gender d2hsgrad d3colgrad
d4genderhsgrad d5gendercolgrad b1x u - If gender0 and hsgrad0 and colgrad0
- y b0 b1x u (base group female, HS
dropouts) - If gender0 and hsgrad1 and colgrad0
- y b0 d2hsgrad b1x u (female, HS grads)
- If gender1 and hsgrad0 and colgrad1
- y b0 d1gender d3colgrad d5gendercolgrad
b1x u (male, college grads)
14Dummy Interactions/2
- Can also consider interacting a dummy variable,
d, with a continuous variable, x - y b0 d1d b1x d2dx u
- If d 0, then y b0 b1x u
- If d 1, then y (b0 d1) (b1 d2) x u
- N.B. This is interpreted as a change in the slope
15Example of d0 gt 0 and d1 lt 0
y
y b0 b1x
d 0
d 1
y (b0 d0) (b1 d1) x
x
16Spline regression
- The function we wish to estimate
- y b0 d0age u if agelt18
- y b1 d1age u if agegt18 and lt22
- y b2 d2age u if agegt22
- Specify using dummy variables
- d11 if agegt18
- d21 if agegt22
- y b d age ? d1 ? d1age t d2 s d2age
u - (1.) d (2.) (d ?) (3.) (d ? s)
- In this way it is discontinuous
17It is discontinuous
y
x
22
18
18Spline regression/2
- To make it continuos we require the segments to
join at the knots - b d 18 (b ? ) (d ?)18
- (b ? ) (d ?)22 (d ? t ) (d ? s )22
- These are linear restrictions on the parameters
- y b dage ? d1(age 18) s d2 (age 22)
u - estimate with constrained least squares
- X1age X2(age 18) if agegt18, other 0
- X3(age 22) if agegt22, other 0
19Testing for Differences
- Testing whether a regression function is
different for one group versus another can be
thought of as simply testing for the joint
significance of the dummy and its interactions
with all other x variables - can estimate a model with all the interactions
and without and form an F statistic - Alternatively, you can perform a CHOW TEST (a
simple F test for exclusion restrictions)
20Caveats on Program Evaluation
- A typical use of a dummy variable is when we are
looking for a treatment effect - For example, we may have individuals that
received job training, or welfare, etc - We need to remember that usually individuals
choose whether to participate in a program, which
may lead to a self-selection problem
21(Self-)selection Problems
- If we can control for everything that is
correlated with both participation and the
outcome of interest then its not a problem - Often, though, there are unobservables that are
correlated with participation - In this case, the estimate of the program effect
is biased, and we dont want to set policy based
on it!
22An empirical application
- The analysis of the gender pay differential
23(No Transcript)
24(No Transcript)
25Pay gap/1
- The crudest approach consists in including a
gender dummy in a single wage regression for
women and men. - y b0 d0gender b1x u
- The coefficient d0 is often interpreted as an
estimate of the standardised gender pay gap - The underlying assumption here is that female and
male wages differ by a fixed amount (shift
parameter), but that human capital
characteristics and other explanatory variables
(x) have the same impact on womens and mens
wages
26Pay gap/2
- The assumption of similar returns might not be
true as gender differences in pay may go through
several explanatory variables (x). - Use gender (1,0) to split the sample
- yM bM0 bM1xM uM iff gender1, male
- yF bF0 bF1xF uF iff gender0, female
- In this case, differences in pay depend on the
intercept, as well as beta coefficients and error
terms
27Pay gap/3
- Wage differential between two groups of people
defined by gender, race, ethnicity etc. can be
decomposed into two parts - The first is explained by differences in human
capital endowments of both groups, - The second reflects differences in prices, that
is the remuneration of these endowments. (i.e.
wage discrimination).
28Decomposition of the gender wage gap
- Male-female wage differential as the difference
in logarithmic mean wages - Decompose into an explained part to reflect
productivity differences endowment eff. and an
unexplained part to reflect differences in the
remuneration of those characteristics
remuneration eff. often referred to as a
measure of discrimination
29Pay gap two equations
- The wage equations for men and women are
specified as follows - where i indexes individuals within the male and
female samples. lnWi is the log wage. Vector Xi
contains all explanatory variables. The error
term represents an iid idiosyncratic error term
with mean zero and constant variance s2
30Pay gap two equations/2
- The estimated price vector ß and the average
human capital and job characteristics Xs for
males and females are used to compute weighted
differences in mean characteristics.
31Counterfactuals
- To decompose the raw wage gap it is further
necessary to make assumptions on a competitive
price vector which operates as standard in
valuing the different characteristics. This price
vector should reflect the remuneration of human
capital characteristics in absence of
discrimination. - The predicted mean wage for women is computed
with coefficient estimates from the male wage
regression and average characteristics of
females
32Oaxaca decomposition
33non-discriminatory wage structure
- there is a vector ß which reflects the rates of
return on human capital characteristics in the
absence of discrimination.
(1)
(2)
(3)
1. differences in endowments
2. discrimination in favour of males
3.discrimination against females
34Some caveats/1
- This simple wage model is often estimated by
ordinary least squares. Yet this method only
provides consistent estimates if the following
orthogonality conditions are fulfilled - where Ii denotes a latent index variable which
is positive if an individual i is employed and
non-positive otherwise.
35Some caveats/2
- Sample selection is a source of violation of the
orthogonality condition. The sample of working
people excludes, by definition, those who do not
participate in the labour market and therefore
may not be a random selection of the overall
population. If the participation decision is
correlated with the earnings function, the
expected value of the error term of the latter
may not be zero.
36Overview of methodological problems wage
equations
37Pay gap refinements
- More complex decomposition methods have been
developed - taking also the residual wage distribution into
account (Juhn, Murphy and Pierce 1993) - treating occupational or sector segregation as
endogenous (Brown, Moon and Zoloth 1980).
38Decomposition by Juhn, Murphy and Pierce (JMP)
- It extends the approach of Oaxaca and Blinder
(O-B) by decomposing the pay gap not only at the
mean but over the whole wage distribution,
thereby accounting for the residual (unexplained)
wage distribution. - While O-B can be applied with cross-section data,
JMP needs a two time periods, two countries,
etc. - Predicted wages are then used to derive
hypothetical wage distributions that serve to
extend the decomposition of the unadjusted wage
gap by a wage structure effect. - The decomposition technique is employed to
distinguish the effects of gender specific
factors from those associated with the underlying
wage structures of both economies (i.e. wage
inequality).
39- The decomposition of the raw wage gap then
includes three components related to differences
in endowments, in estimated coeffcients and
in the residual wage distribution
40- Define where is
country js residual standard deviation of wages,
and - is the standardised unobservable
productivity. - The wage equation for a male worker from country
j is - The male-female log wage gap for country j is
then given by - NB. we assume ßM ßF ß (i.e. non
discriminatory returns)
41JMP Decomposition
- Gender pay difference between two countries j and
k - 1. and 2. as usual (but by country)
- 3. effect of cross country differences in the
relative wage position of males and females after
controlling for observed human capital
characteristics differences in the level of
unobservables. - 4. differences in the returns to unobservable
skills inequality. - 1 and 3 are gender specific 2 and 4 reflect
inter-country differences in the underlying wage
structure
42An empirical application/2
- The analysis of the public-private
- pay differential
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)