SAMPLE SELECTION - PowerPoint PPT Presentation

About This Presentation

Title:

SAMPLE SELECTION

Description:

Wage equation and labour participation for women ... Robins, J. M., Rotnitzky, A. (1995), Semiparametric Effciency in Multivariate ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 29

Provided by: Nico156

Category:

more less

Transcript and Presenter's Notes

Title: SAMPLE SELECTION

1
SAMPLE SELECTION

Cheti Nicoletti
ISER, University of Essex
2009

2
Wage equation and labour participation for women

Gourieroux C. (2000), Econometrics of
Qualitative Dependent Variables, Cambridge
University Press, Cambridge
Let y be the potential offered wage and let w be
the reservation wage then the observed wage y is
given by
Let us consider the following very simple
earnings profile equation

3
Women in the labour force are not a random sample

Womens labour force participation rates are
highly dependent on age. Gourieroux (2000)
Labour participation is in general lower for
women aged
16-20 because some women are still studying
25-44 for work interruption linked to children
55-60 because some women prefer to retire early
Presumably the earnings observed for women aged
16-20 are lower than if all women worked
25-44 are higher because women with higher
earnings are less incline to work interruptions
55-60 are higher because women with higher
earnings are less incline to retire early

4
(No Transcript)
5
Sample selection model Labour participation
equation

Probit model for labour participation

6
Joint model for the log-earnings and the labour
participation equationsGeneralized TOBIT MODEL

Possible candidates for x education dummies,
age, work experience
Possible candidates for z age, education, number
of children, dummies for the presence of children
lt5, for cohabiting, for widow, regional
unemployment rate.

7
Bivariate normal
8
Truncated Normal

Suggestions for the proof

9
Sample selection problem

E(yd1,x,z)x?E(?d1,x,z)
E(?d1,x,z) E(?ugt-zd )
E(yd1,x,z) X?

10
Two-step estimation

1 STEP estimation of a probit model for the
probability to be in the labour market,
? Pr(di1zi)di Pr(di0zi)1-di? ?(zi ?) di
?(-zi ?) 1-di
2 STEP estimation of the regression model with
an additional variable (the inverse Mills
ratio) using the subsample of individuals with
di1 (and using some IV restrictions)

11
Testing selectivity

If the error terms ? and u are uncorrelated, then
the selection problem is ignorable.
H0 s?u 0
Verifying H0 is equivalent to verify whether the
coefficient of the additional variable in the
equation is zero (using for ex. a Wald test)
Notice that the errors are heteroskedastic so a
proper estimation should be adopted to estimate
the standard errors

12
Generalized Tobit Maximum Likelihood Estimation
13
heckman

The heckman command is used to estimate
Generalized Tobit or Tobit of the 2nd type using
ML estimation (default option) or the two-step
estimation (option twostep)
heckman y x1 x2 xk, select(z1 z2 zs)
heckman y x1 x2 xk, select(d z1 z2 zs)
heckman y x1 x2 xk, select(z1 z2 zs) twostep

14
Generalized Tobit Maximum Likelihood Estimation
15
(No Transcript)
16
Joint model for log-income and response
probability

Possible candidates for x education dummies,
age, work experience
d is the propensity to respond to the earnings
question
Z mode of interview, education, gender, age,
etc.

17
Item nonresponse for income equation or poverty
model in cross section sample surveys

Potential explanatory variables
Socio-demographic variables age, gender, level
of education, number of adults, number of
children.
Situational economic circumstance labour status
activity.
Data collection characteristics mode of the
interview, number of visits, duration of the
interview. (These are plausible IV)

18
(No Transcript)
19
Attrition in panel surveys has two possible
causes failed contact and refusal

The potential variables explaining attrition
(contact and cooperation) are lagged variables
observed in the last wave.
The equation of interest has to use lagged
variables (otherwise we have missing explanatory
variables too)
Socio-demographic variables age, gender, level
of education, number of adults, number of
children.
Social-integration talking often to neighbours,
cohabitation, house ownership.
Situational economic circumstance labour status
activity, household equalised income.
Data collection characteristics mode of the
interview, number of visits, duration of the
interview, same interviewer across wave, duration
of the panel, length of the fieldwork. (These are
plausible IV)

20
Attrition due to lack of cooperation (BHPS
1994-96)
21
Weighted estimation
22
Weighted estimation
23

Conditioning and integrating out (marginalizing)
with respect to z
EZ (Ex(y-xß)dp-1x,z)
EZ (Ex(y-xß)x,z,d1 Pr(d1x,z)p-1)
EZ (Ex(y-xß)x,z)Ex(y-xß)x0

24
How to use weights in Stata

Most Stata commands can deal with weighted data.
Stata allows four kinds of weights
fweights, or frequency weights, are weights that
indicate the number of duplicated observations.
pweights, or sampling weights, are weights that
denote the inverse of the probability that the
observation is included due to the sampling
design, nonresponse or sample selection.
aweights, or analytic weights, are weights that
are inversely proportional to the variance of an
observation i.e., the variance of the j-th
observation is assumed to be sigma2/w_j, where
w_j are the weights.
iweights, or importance weights, are weights that
indicate the "importance" of the observation in
some vague sense.

25
Option pweights

Usually sample surveys provide weights to take
account of sampling design, nonresponse .
Let p be individual weight
Then we can run a regression with weighted
observations
regress y x1 x2 xk pweightp
Let us assume to have a random sample affected by
nonresponse, but weights to take account of unit
nonresponse are not available
A possible way to estimate your own weights is
described in the following
probit d z1 z2 zs
predict prop
gen invprop1/prop
reg y x1 x2 xk pweightinvprop

26
For complex survey design it is better to use

svyset pweightp
svy regress y x1 x2 xk
svyset have options for cluster sampling designs
or other complex design
To declare survey design with stratum
svyset pweightp, strata(stratid)

27
Stata propensity score methods for evaluation of
treatment

Abadie A., Drukker D., Herr J.L., Imbens G.W.
(2001), Implementing Matching Estimators for
Average Treatment Effects in Stata, The Stata
Journal, 1, 1-18 http//ksghome.harvard.edu/.aaba
die.academic.ksg/software.html
Becker S.O., Ichino A. (2002), Estimation of
average treatment effects based on propensity
scores. The Stata Journal, 2, 358-377
http//www.lrz-muenchen.de/sobecker/pscore.html
Sianesi B. (2001), Implementing Propensity Score
Matching Estimators with STATA, UK Stata Users
Group, VII Meeting London, http//ideas.repec.org/
c/boc/bocode/s432001.html

28
Some references for regressions with sample
selection

Buchinski, M. (2001) Quantile regression with
sample selection Estimation women return to
education in the U.S., Empirical Economics, 26,
86-113.
Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R.,
Herring, A.H. (2005) Missing-data methods for
generalized linear models A comparative review,
Journal of the American Statistical Association,
100, 469, 332-346.
Lipsitz, S.R., Fitzmaurice, G.M., Molenberghs,
G., Zhao, L.P. (1997), Quantile regression
methods for longitudinal data with drop-outs,
Applied Statistics, 46, 463-476.
Robins, J. M., Rotnitzky, A. (1995),
Semiparametric Effciency in Multivariate
Regression Models With Missing Data, Journal of
the American Statistical Association, 90,
122-129.
Vella F. (1998), Estimating models with sample
selection bias a survey', The Journal of Human
Resources, vol. 3, 127-169.
Wooldridge, J.M. (2007) Inverse probability
weighted M-Estimation for General missing data
problems, Journal of Econometrics, 141, 2,
1281-1301.
Wooldridge, J.M. (2007) Inverse probability
weighted M-Estimation for General missing data
problems, Journal of Econometrics, 141, 2,
1281-1301.