Claudio LUCIFORA - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Claudio LUCIFORA

Description:

An empirical application: gender pay differentials (Oaxaca decomposition) ... It extends the approach of Oaxaca and Blinder (O-B) by decomposing the pay gap ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 47

Provided by: ucscm

Category:

more less

Transcript and Presenter's Notes

Title: Claudio LUCIFORA

1
Università Cattolica del Sacro Cuore Istituto di
Economia dell'Impresa e del Lavoro
APPLIED ECONOMETRICS Module 2 - Multiple
Regression Analysis with dummy variables

Claudio LUCIFORA

2
Main references

- Brucchi Luchino (2000) Manuale di economia del
lavoro, CH 19-20, Il Mulino.
Wooldridge (2000), J.M., Introductory
Econometrics A Modern Approach, Second edition,
CH.19, MIT Press.
J. Altonji and R. Blank (1999), "Race and Gender
in the Labor Market", in O. Ashenfelter and Card,
D. Handbook of Labour Economics, vol.3(c),
North-Holland

3
Structure of the Presentation

Multiple Regression Analysis with dummy variables
Defining the dummies
Interpreting coefficients
Dummy variable trap
Dummy Interactions
Spline functions
An empirical application gender pay
differentials (Oaxaca decomposition)

4
Structure of the Presentation (continued)

Problems
Some caveats (selection, endogeneity,
heterogeneity)
Self selection
The index number (or discrimination free)
Differences in distributions the Juhn, Murphy
and Pierce decomposition

5
Using dummy variables

The purpose of this module is to show how to
define, construct, specify, estimate and
interpret dummy variables in empirical analysis.
Dummies are the most common variables found in
the empirical analysis of Survey Data.
We use dummies to account for qualitative
factors, such as membership in a group (ie
gender), selected time period (ie 2001), specific
threshold (ie highest level of education
achieved), etc.
The use of dummies can produce an impressive
variety of models

6
Dummy variable/1

A dummy variable is a variable that takes on the
value 1 or 0
it can be the recoding of a binary attribute such
as gender (male1 female0)
The recoding of a multilevel character such as
region (1-20) (i.e. north1 if region 1-5, 0
otherwise south1 if region 15-20, 0
otherwise,), etc.
The recoding of a continuous varible such as age
(i.e. young1 if age 1-15, 0 otherwise old1 if
age 55-over, 0 otherwise,), etc.

7
A Dummy Independent Variable

Consider a simple model with one continuous
variable (x) and one dummy (d)
y b0 d0d b1x u
This can be interpreted as an intercept shift
If d 0, then y b0 b1x u
If d 1, then y (b0 d0) b1x u
The case of d 0 is the base group

8
Comparing two means

If for example y is income and d is whether or
not the individual attended college (ie ignoring
the continuous variable x)
Eincomedid not attend collegeb0
Eincomeattended college(b0 d0)

9
Example of d0 gt 0
y (b0 d0) b1x
y
d 1
slope b1

d0
d 0

y b0 b1x
b0
x
10
Dummies for Multiple Categories

We can use dummy variables to control for
something with multiple categories
Suppose you want to analyse education, and
everyone in your data is either a HS dropout (1),
HS grad only (2), or college grad (3)
To compare HS grad and college grads to HS
dropouts, you need 2 dummy variables
hsgrad 1 if HS grad only, 0 otherwise and
colgrad 1 if college grad, 0 otherwise

11
Dummy variable trap

Because the base group is represented by the
intercept, if there are n categories there should
be n 1 dummy variables
Industry differentials (20 industries) include
(20 1)19 industry dummies
If there are a lot of categories (or continuous
variables), it may make sense to group some
together (such as, regions 1-20 into north,
centre, south)

12
Interactions Among Dummies

Interacting dummy variables is like subdividing
the group
Example have dummies for gender, as well as
hsgrad and colgrad
Add genderhsgrad and gendercolgrad, for a
total of 5 dummy variables gt 6 categories
Base group is female HS dropouts
hsgrad is for female HS grads, colgrad is for
female college grads
The interactions reflect male HS grads and
male college grads

13
Dummy Interactions/1

Formally, the model is
y b0 d1gender d2hsgrad d3colgrad
d4genderhsgrad d5gendercolgrad b1x u
If gender0 and hsgrad0 and colgrad0
y b0 b1x u (base group female, HS
dropouts)
If gender0 and hsgrad1 and colgrad0
y b0 d2hsgrad b1x u (female, HS grads)
If gender1 and hsgrad0 and colgrad1
y b0 d1gender d3colgrad d5gendercolgrad
b1x u (male, college grads)

14
Dummy Interactions/2

Can also consider interacting a dummy variable,
d, with a continuous variable, x
y b0 d1d b1x d2dx u
If d 0, then y b0 b1x u
If d 1, then y (b0 d1) (b1 d2) x u
N.B. This is interpreted as a change in the slope

15
Example of d0 gt 0 and d1 lt 0
y
y b0 b1x
d 0
d 1
y (b0 d0) (b1 d1) x
x
16
Spline regression

The function we wish to estimate
y b0 d0age u if agelt18
y b1 d1age u if agegt18 and lt22
y b2 d2age u if agegt22
Specify using dummy variables
d11 if agegt18
d21 if agegt22
y b d age ? d1 ? d1age t d2 s d2age
u
(1.) d (2.) (d ?) (3.) (d ? s)
In this way it is discontinuous

17
It is discontinuous
y
x
22
18
18
Spline regression/2

To make it continuos we require the segments to
join at the knots
b d 18 (b ? ) (d ?)18
(b ? ) (d ?)22 (d ? t ) (d ? s )22
These are linear restrictions on the parameters
y b dage ? d1(age 18) s d2 (age 22)
u
estimate with constrained least squares
X1age X2(age 18) if agegt18, other 0
X3(age 22) if agegt22, other 0

19
Testing for Differences

Testing whether a regression function is
different for one group versus another can be
thought of as simply testing for the joint
significance of the dummy and its interactions
with all other x variables
can estimate a model with all the interactions
and without and form an F statistic
Alternatively, you can perform a CHOW TEST (a
simple F test for exclusion restrictions)

20
Caveats on Program Evaluation

A typical use of a dummy variable is when we are
looking for a treatment effect
For example, we may have individuals that
received job training, or welfare, etc
We need to remember that usually individuals
choose whether to participate in a program, which
may lead to a self-selection problem

21
(Self-)selection Problems

If we can control for everything that is
correlated with both participation and the
outcome of interest then its not a problem
Often, though, there are unobservables that are
correlated with participation
In this case, the estimate of the program effect
is biased, and we dont want to set policy based
on it!

22
An empirical application

The analysis of the gender pay differential

23
(No Transcript)
24
(No Transcript)
25
Pay gap/1

The crudest approach consists in including a
gender dummy in a single wage regression for
women and men.
y b0 d0gender b1x u
The coefficient d0 is often interpreted as an
estimate of the standardised gender pay gap
The underlying assumption here is that female and
male wages differ by a fixed amount (shift
parameter), but that human capital
characteristics and other explanatory variables
(x) have the same impact on womens and mens
wages

26
Pay gap/2

The assumption of similar returns might not be
true as gender differences in pay may go through
several explanatory variables (x).
Use gender (1,0) to split the sample
yM bM0 bM1xM uM iff gender1, male
yF bF0 bF1xF uF iff gender0, female
In this case, differences in pay depend on the
intercept, as well as beta coefficients and error
terms

27
Pay gap/3

Wage differential between two groups of people
defined by gender, race, ethnicity etc. can be
decomposed into two parts
The first is explained by differences in human
capital endowments of both groups,
The second reflects differences in prices, that
is the remuneration of these endowments. (i.e.
wage discrimination).

28
Decomposition of the gender wage gap

Male-female wage differential as the difference
in logarithmic mean wages
Decompose into an explained part to reflect
productivity differences endowment eff. and an
unexplained part to reflect differences in the
remuneration of those characteristics
remuneration eff. often referred to as a
measure of discrimination

29
Pay gap two equations

The wage equations for men and women are
specified as follows
where i indexes individuals within the male and
female samples. lnWi is the log wage. Vector Xi
contains all explanatory variables. The error
term represents an iid idiosyncratic error term
with mean zero and constant variance s2

30
Pay gap two equations/2

The estimated price vector ß and the average
human capital and job characteristics Xs for
males and females are used to compute weighted
differences in mean characteristics.

31
Counterfactuals

To decompose the raw wage gap it is further
necessary to make assumptions on a competitive
price vector which operates as standard in
valuing the different characteristics. This price
vector should reflect the remuneration of human
capital characteristics in absence of
discrimination.
The predicted mean wage for women is computed
with coefficient estimates from the male wage
regression and average characteristics of
females

32
Oaxaca decomposition
33
non-discriminatory wage structure

there is a vector ß which reflects the rates of
return on human capital characteristics in the
absence of discrimination.

(1)
(2)
(3)
1. differences in endowments
2. discrimination in favour of males
3.discrimination against females
34
Some caveats/1

This simple wage model is often estimated by
ordinary least squares. Yet this method only
provides consistent estimates if the following
orthogonality conditions are fulfilled
where Ii denotes a latent index variable which
is positive if an individual i is employed and
non-positive otherwise.

35
Some caveats/2

Sample selection is a source of violation of the
orthogonality condition. The sample of working
people excludes, by definition, those who do not
participate in the labour market and therefore
may not be a random selection of the overall
population. If the participation decision is
correlated with the earnings function, the
expected value of the error term of the latter
may not be zero.

36
Overview of methodological problems wage
equations
37
Pay gap refinements

More complex decomposition methods have been
developed
taking also the residual wage distribution into
account (Juhn, Murphy and Pierce 1993)
treating occupational or sector segregation as
endogenous (Brown, Moon and Zoloth 1980).

38
Decomposition by Juhn, Murphy and Pierce (JMP)

It extends the approach of Oaxaca and Blinder
(O-B) by decomposing the pay gap not only at the
mean but over the whole wage distribution,
thereby accounting for the residual (unexplained)
wage distribution.
While O-B can be applied with cross-section data,
JMP needs a two time periods, two countries,
etc.
Predicted wages are then used to derive
hypothetical wage distributions that serve to
extend the decomposition of the unadjusted wage
gap by a wage structure effect.
The decomposition technique is employed to
distinguish the effects of gender specific
factors from those associated with the underlying
wage structures of both economies (i.e. wage
inequality).

The decomposition of the raw wage gap then
includes three components related to differences
in endowments, in estimated coeffcients and
in the residual wage distribution

Define where is
country js residual standard deviation of wages,
and
is the standardised unobservable
productivity.
The wage equation for a male worker from country
j is
The male-female log wage gap for country j is
then given by
NB. we assume ßM ßF ß (i.e. non
discriminatory returns)

41
JMP Decomposition

Gender pay difference between two countries j and
k
1. and 2. as usual (but by country)
3. effect of cross country differences in the
relative wage position of males and females after
controlling for observed human capital
characteristics differences in the level of
unobservables.
4. differences in the returns to unobservable
skills inequality.
1 and 3 are gender specific 2 and 4 reflect
inter-country differences in the underlying wage
structure

42
An empirical application/2