SAMPLE SELECTION - PowerPoint PPT Presentation

About This Presentation
Title:

SAMPLE SELECTION

Description:

Wage equation and labour participation for women ... Robins, J. M., Rotnitzky, A. (1995), Semiparametric Effciency in Multivariate ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 29
Provided by: Nico156
Category:

less

Transcript and Presenter's Notes

Title: SAMPLE SELECTION


1
SAMPLE SELECTION
  • Cheti Nicoletti
  • ISER, University of Essex
  • 2009

2
Wage equation and labour participation for women
  • Gourieroux C. (2000),  Econometrics of
    Qualitative Dependent Variables, Cambridge
    University Press, Cambridge
  • Let y be the potential offered wage and let w be
    the reservation wage then the observed wage y is
    given by
  • Let us consider the following very simple
    earnings profile equation

3
Women in the labour force are not a random sample
  • Womens labour force participation rates are
    highly dependent on age. Gourieroux (2000)
  • Labour participation is in general lower for
    women aged
  • 16-20 because some women are still studying
  • 25-44 for work interruption linked to children
  • 55-60 because some women prefer to retire early
  • Presumably the earnings observed for women aged
  • 16-20 are lower than if all women worked
  • 25-44 are higher because women with higher
    earnings are less incline to work interruptions
  • 55-60 are higher because women with higher
    earnings are less incline to retire early

4
(No Transcript)
5
Sample selection model Labour participation
equation
  • Probit model for labour participation

6
Joint model for the log-earnings and the labour
participation equationsGeneralized TOBIT MODEL
  • Possible candidates for x education dummies,
    age, work experience
  • Possible candidates for z age, education, number
    of children, dummies for the presence of children
    lt5, for cohabiting, for widow, regional
    unemployment rate.

7
Bivariate normal
8
Truncated Normal
  • Suggestions for the proof

9
Sample selection problem
  • E(yd1,x,z)x?E(?d1,x,z)
  • E(?d1,x,z) E(?ugt-zd )
  • E(yd1,x,z) X?

10
Two-step estimation
  • 1 STEP estimation of a probit model for the
    probability to be in the labour market,
  • ? Pr(di1zi)di Pr(di0zi)1-di? ?(zi ?) di
    ?(-zi ?) 1-di
  • 2 STEP estimation of the regression model with
    an additional variable (the inverse Mills
    ratio) using the subsample of individuals with
    di1 (and using some IV restrictions)

11
Testing selectivity
  • If the error terms ? and u are uncorrelated, then
    the selection problem is ignorable.
  • H0 s?u 0
  • Verifying H0 is equivalent to verify whether the
  • coefficient of the additional variable in the
  • equation is zero (using for ex. a Wald test)
  • Notice that the errors are heteroskedastic so a
    proper estimation should be adopted to estimate
    the standard errors

12
Generalized Tobit Maximum Likelihood Estimation
13
heckman
  • The heckman command is used to estimate
    Generalized Tobit or Tobit of the 2nd type using
    ML estimation (default option) or the two-step
    estimation (option twostep)
  • heckman y x1 x2 xk, select(z1 z2 zs)
  • heckman y x1 x2 xk, select(d z1 z2 zs)
  • heckman y x1 x2 xk, select(z1 z2 zs) twostep

14
Generalized Tobit Maximum Likelihood Estimation
15
(No Transcript)
16
Joint model for log-income and response
probability
  • Possible candidates for x education dummies,
    age, work experience
  • d is the propensity to respond to the earnings
    question
  • Z mode of interview, education, gender, age,
    etc.

17
Item nonresponse for income equation or poverty
model in cross section sample surveys
  • Potential explanatory variables
  • Socio-demographic variables age, gender, level
    of education, number of adults, number of
    children.
  • Situational economic circumstance labour status
    activity.
  • Data collection characteristics mode of the
    interview, number of visits, duration of the
    interview. (These are plausible IV)

18
(No Transcript)
19
Attrition in panel surveys has two possible
causes failed contact and refusal
  • The potential variables explaining attrition
    (contact and cooperation) are lagged variables
    observed in the last wave.
  • The equation of interest has to use lagged
    variables (otherwise we have missing explanatory
    variables too)
  • Socio-demographic variables age, gender, level
    of education, number of adults, number of
    children.
  • Social-integration talking often to neighbours,
    cohabitation, house ownership.
  • Situational economic circumstance labour status
    activity, household equalised income.
  • Data collection characteristics mode of the
    interview, number of visits, duration of the
    interview, same interviewer across wave, duration
    of the panel, length of the fieldwork. (These are
    plausible IV)

20
Attrition due to lack of cooperation (BHPS
1994-96)
21
Weighted estimation
22
Weighted estimation
23
  • Conditioning and integrating out (marginalizing)
  • with respect to z
  • EZ (Ex(y-xß)dp-1x,z)
  • EZ (Ex(y-xß)x,z,d1 Pr(d1x,z)p-1)
  • EZ (Ex(y-xß)x,z)Ex(y-xß)x0

24
How to use weights in Stata
  • Most Stata commands can deal with weighted data.
    Stata allows four kinds of weights
  • fweights, or frequency weights, are weights that
    indicate the number of duplicated observations.
  • pweights, or sampling weights, are weights that
    denote the inverse of the probability that the
    observation is included due to the sampling
    design, nonresponse or sample selection.
  • aweights, or analytic weights, are weights that
    are inversely proportional to the variance of an
    observation i.e., the variance of the j-th
    observation is assumed to be sigma2/w_j, where
    w_j are the weights.
  • iweights, or importance weights, are weights that
    indicate the "importance" of the observation in
    some vague sense.

25
Option pweights
  • Usually sample surveys provide weights to take
    account of sampling design, nonresponse .
  • Let p be individual weight
  • Then we can run a regression with weighted
    observations
  • regress y x1 x2 xk pweightp
  • Let us assume to have a random sample affected by
    nonresponse, but weights to take account of unit
    nonresponse are not available
  • A possible way to estimate your own weights is
    described in the following
  • probit d z1 z2 zs
  • predict prop
  • gen invprop1/prop
  • reg y x1 x2 xk pweightinvprop

26
For complex survey design it is better to use
  • svyset pweightp
  • svy regress y x1 x2 xk
  • svyset have options for cluster sampling designs
    or other complex design
  • To declare survey design with stratum
  • svyset pweightp, strata(stratid)

27
Stata propensity score methods for evaluation of
treatment
  • Abadie A., Drukker D., Herr J.L., Imbens G.W.
    (2001), Implementing Matching Estimators for
    Average Treatment Effects in Stata, The Stata
    Journal, 1, 1-18 http//ksghome.harvard.edu/.aaba
    die.academic.ksg/software.html
  • Becker S.O., Ichino A. (2002), Estimation of
    average treatment effects based on propensity
    scores. The Stata Journal, 2, 358-377
    http//www.lrz-muenchen.de/sobecker/pscore.html
  • Sianesi B. (2001), Implementing Propensity Score
    Matching Estimators with STATA, UK Stata Users
    Group, VII Meeting London, http//ideas.repec.org/
    c/boc/bocode/s432001.html

28
Some references for regressions with sample
selection
  • Buchinski, M. (2001) Quantile regression with
    sample selection Estimation women return to
    education in the U.S., Empirical Economics, 26,
    86-113.
  • Ibrahim, J.G., Chen, M.-H., Lipsitz, S.R.,
    Herring, A.H. (2005) Missing-data methods for
    generalized linear models A comparative review,
    Journal of the American Statistical Association,
    100, 469, 332-346.
  • Lipsitz, S.R., Fitzmaurice, G.M., Molenberghs,
    G., Zhao, L.P. (1997), Quantile regression
    methods for longitudinal data with drop-outs,
    Applied Statistics, 46, 463-476.
  • Robins, J. M., Rotnitzky, A. (1995),
    Semiparametric Effciency in Multivariate
    Regression Models With Missing Data, Journal of
    the American Statistical Association, 90,
    122-129.
  • Vella F. (1998), Estimating models with sample
    selection bias a survey', The Journal of Human
    Resources, vol. 3, 127-169.
  • Wooldridge, J.M. (2007) Inverse probability
    weighted M-Estimation for General missing data
    problems, Journal of Econometrics, 141, 2,
    1281-1301.
  • Wooldridge, J.M. (2007) Inverse probability
    weighted M-Estimation for General missing data
    problems, Journal of Econometrics, 141, 2,
    1281-1301.
Write a Comment
User Comments (0)
About PowerShow.com