A Stata program for calibration weighting - PowerPoint PPT Presentation

About This Presentation
Title:

A Stata program for calibration weighting

Description:

A Stata program for calibration weighting John D Souza National Centre for Social Research Outline Description of calibration Adjust selection weights so that a ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 20
Provided by: jenni57
Category:

less

Transcript and Presenter's Notes

Title: A Stata program for calibration weighting


1
A Stata program for calibration weighting
  • John DSouza
  • National Centre for Social Research

2
Outline
  • Description of calibration
  • Adjust selection weights so that a weighted
    sample exactly matches the population
  • Generalizes post-stratification
  • Several methods Linear, logistic
  • SAS, GenStat
  • A new Stata program
  • Limitations and extensions

3
Sampling
  • Selection weights dk 1/P(Person k is chosen)
  • Sample frame variables Xk1, , XkJ with known
    population totals, P1, , PJ.
  • Horvitz-Thompson estimator of Pi
  • ?dkXki Pi for i1,2, , J.
  • Calibration Adjust dk to get calibration
    weights, wk, giving exact equality
  • ?wkXki Pi for i1,2, , J.

4
Example School Census
  • Variables include
  • Age, Gender, Ethnic Group, Exam results
  • Type of School, Region
  • Pupils Free School Meal eligibility
  • We calibrate to J variables. Eg.
  • Boy (binary)
  • Girl (binary)
  • Region (eg. four categories)
  • FSM eligibility (binary)
  • J 1 1 (4-1) 1 6

5
Special case post-stratification
  • Simplest case
  • One categorical variable
  • Easy to deal with (post-stratification)
  • svyset , poststrata() postweight()
  • More general case
  • Several variables (categorical and numerical)

6
Deville and Sarndal (1992).
  • Minimize the distance between w and d subject
    to the J calibration constraints.
  • Linear calibration Minimize
  • ?S (wk- dk)2/dk
  • Involves solving J simultaneous linear equations
  • Logistic calibration Minimize
  • ?S (wklog(wk/dk) wk dk)
  • Involves solving J simultaneous non-linear
    equations

7
GenStat, SAS, Stata
  • GenStat and SAS
  • Methods linear, logistic and bounded.
  • Estimation GenStat gives SEs.
  • SAS handles categorical variables directly. Enter
    as indicator variables in GenStat.
  • Stata
  • Post-stratification (calibration to one
    categorical variable). Gives SEs.
  • No routine for general calibration.

8
A new Stata program
  • Typical syntax.
  • matrix M10000, 10000, 3000, 4000, 3000, 8000
  • calibrate , entrywt(w1) exitwt(w2) poptot(M) ///
  • marginals(boy girl FSM ireg1-ireg3) ///
  • method(linear) print(final)
  • 10,000 boys, 10,000 girls, 3,000 FSM
  • Variables boys, girls, FSM are binary
  • Categorical variable region (4 categories) turned
    into 4 binary indicator variables). Only 3
    entered in the syntax (colinearity)

9
Output
Variable Pop total Weighted (entrywt) Weighted (exitwt) R
boy 10000 9619.7188 10000 .21373408
girl 10000 10380.281 10000 .13733883
FSM 3000 2915.4929 3000 .04710333
ireg1 4000 4056.3379 4000 -.19511394
ireg2 3000 3197.1831 3000 -.24808005
ireg3 8000 8507.042 8000 -.2391432
10
Options
  • Options available to
  • Control amount of output/graphs
  • Set max number of iterations/tolerance
  • Methods
  • linear, logistic, bounded linear and nonresp
  • (blinear sets bounds for wk/dk. GenStat and SAS
    have something very similar )
  • (nonresp adjusts for non-response see below)

11
Limitations (1)
  • Solves the equations by finding a matrix inverse
  • Wont work if J is large
  • Can have problems with singular or nearly
    singular matrices
  • Iterative methods (logistic, blinear) wont
    always converge
  • No obvious solution to 1. Problem 2 and 3 are
    usually down to problems with the data

12
Limitations (2)
  • We need to recode categorical variables (SAS
    doesnt)
  • Stata tab region, gen(ireg)
  • More complicated (eg two-phase) problems arent
    handled directly
  • Need a bit of syntax to handle this
  • Other packages can handle this directly

13
Extensions Standard errors
  • Calibration weights are often incorrectly treated
    as selection weights.
  • calibrate , entrywt(w1) exitwt(w2) poptot(M) ///
  • marginals(boy girl FSM ireg1-ireg3)
  • calibmean , selwt(w1) calibwt(w2) yvar(y) ///
  • marginals(boy girl FSM ireg1-ireg3) ///
  • psu(school) designops (strata(region))
  • This generalizes Statas poststrata command

14
Extension Method nonresp (1)
  • Example
  • Select schools, then classes, then pupils
  • Assume all schools respond, pupils might not
  • Variables available on responders. (Pop totals
    available)
  • Gender, Exam results, FSM, Region
  • Variables on non-responders. (Pop totals not
    available)
  • PTratio Pupil-teacher ratio
  • topset Is pupil in the top set?

15
Extension Method nonresp (2)
  • serial region topset outc sex FSM
  • ------------------------------------------
  • 1. 1001 1 1 0 . .
  • 2. 1002 1 0 1 1 0
  • 3. 1003 2 0 0 . .
  • 4. 1004 1 0 1 1 1
  • 5. 1005 3 1 0 . .
  • ------------------------------------------
  • 6. 1006 1 0 1 0 1
  • 7. 1007 3 1 1 1 0
  • 8. 1008 2 1 0 . .
  • 9. 1009 1 0 1 1 0

16
Extension Method nonresp (3)
  • Population totals unknown, but variables are
    available on all the sample (including
    non-responders)
  • calibrate , entrywt(w1) exitwt(w2) poptot(M) ///
  • marginals(boy girl FSM ireg1-ireg3) ///
  • method(nonresp) outc(outc) ///
  • svars(PTratio topset)
  • Responders weighted to pop totals on marginals
    and to selected sample totals on svars
    (Lundstrom Sarndal, 2005)

17
Conclusions
  • Weve found the program can handle many practical
    problems
  • Easy to calculate SEs (but theory assumes no
    non-response)
  • Method nonresp isnt available in many packages
  • We dont have to calibrate to population totals
  • Eg, calibrate Wave n1 of a survey to totals from
    Wave n
  • Calibrate one sample to look like another

18
Questions
19
References
  • Deville, J.-C. and Sarndal, C.-E. 1992.
    Calibration estimators in survey sampling.
    Journal of the American Statistical Association
    87 376-382
  • Background and theory behind calibration
  • Lundstrom, S. and Sarndal, C.-E. 2005. Estimation
    in Surveys with Nonresponse. Wiley
  • Deals with non-response
  • Singh, A.C. and Mohl, C.A. 1996. Understanding
    Calibration estimators in Survey Sampling. Survey
    Methodology 22 107-115
  • Discusses several methods of doing bounded
    calibration
Write a Comment
User Comments (0)
About PowerShow.com