Title: A%20new%20version%20of%20the%20CALMAR%20calibration%20adjustment%20program
1A new version of the CALMAR calibration
adjustment program
2The CALMAR2 macros
3I.1. Background CALMAR CALibration on
MARgins CALMAR 1 SAS macro program, written in
1992-1993 at Frances INSEE by Sautory Scope
implementing calibration methods developped by
Deville Särndal (JASA, 1992) CALMAR 2 SAS
macro, written in 2000 at Frances INSEE Scope
implementing generalized calibration method for
handling total non-response (Deville, 1998)
4- I.2. Whats new in CALMAR 2
- Simultaneous calibration with 2 or 3 levels
- Total non- response adjustment using
generalized calibration - Handling collinearities between auxiliary
variables - A 5th distance function generalized
hyperbolic sine - Interactive screens to enter parameters, thanks
to - CALMAR2_GUIDE
5Simultaneous calibration
6II.1. The method
- Informations are collected at several levels of
observation  - households every households member
- or firms every establishment of the
firms - i.e. cluster sampling survey, including questions
about the clusters - households some of their members (Kish
individuals)Â - i.e. two-stages sampling, including questions
about the primary units (P.U.) - households every households member Kish
units - auxiliary information available at every levelÂ
- Â
7- How performing calibration ?
- Independent calibration at every level of
observation  - Simultaneous calibration (or "integrated") Â
- same weights for all members of a household
 - consistency between statistics obtained from
varied data files
Simultaneous calibration method A single
calibration is performed at the P.U. level, after
having computed the calibration variables totals
defined at the secondary levels for each P.U.
(Sautory, 1996)
8- II.2. An example
- households (sM sample)
- all the members of the selected households (sI
sample) - one member (Kish individual) in each
selected - household m, chosen by simple random
sampling among - the eligible members of the household
(sK sample)Â
Weight of the household m Weight of the
member i of the household m Weight of the
Kish-individual of the household m Â
9Auxiliary information
 auxiliary variables vector for each
household m in vector of
the known auxiliary variables totals
in the households population
auxiliary variables vector for each individual
(m,i) in sI vector of the known
totals in the individuals
population auxiliary variables
vector for the Kish- individual in Â
vector of the known totals in the
Kish-units population
10- For each household m we compute
- the totals of the individual variables
- the estimated totals of the Kish- individual
variables - Vector of the calibration variables for the
household m - Vector of the totals (X , Z, V)Â
- Calibration equations Â
- Â
- Â
11 ? weights  weight of the household m in
weight of the individual (m,i) of the
household m in weight of the Kish-individual
of the household m in The 3 samples are
correctly calibrated on totals X, Z et V
12Calmar 2 performs such simultaneous calibrations.
The user must provide the entry tables for the
various levels (sample data files and calibration
variables totals files) the program performs
all the required operations necessary to reduce
the process to a single calibration, and creates
the varied calibrated weights files.
13An example of simultaneous calibration
14The survey
- Sampling design two stages sampling
- primary units households, selected by
stratified sampling with S.R.S. in the stratum - secondary units (Kish-units) one member per
selected household, withdrawn by S.R.S. among
more than 14 years old members - Questionary
- variables of interest are measured on Kish-units
- questions about the habitation and the whole
family - questions about each member of the household
(age, sex, profession) - Calibration variables (xk)
- Households household size head of household
professional group strata ( agglomeration
size) - All individuals sex age group
- Kish individuals sex age group
- Population totals (X) come from the sampling
frame -
15The program
16- CALMAR2 (datamenbase.echant_menages,
- marmenbase.marge_men,
- poidspoids1,
- identident,
- dataindbase.echant_indiv,
- marindbase.marge_ind,
- ident2id,
- datakishbase.echant_kish,
- markishbase.marge_kish,
- poidkishnbelig,
- m1,
- datapoipoidsmen,
- datapoi2poidsind,
- datapoi3poidskish,
- poidsfinw3,
- labelpoicalage 3 niveaux,
- poidskishfinw3k,
- labelpoikishpoids kish total,
17The output
18-
- PARAMÈTRES DE LA
MACRO -
- TABLE(S) EN ENTRÉE
- TABLE DE DONNÉES DE NIVEAU 1
DATAMEN BASE.ECHANT_MENAGES - IDENTIFIANT DU NIVEAU 1 IDENT
IDENT - TABLE DE DONNÉES DE NIVEAU 2
DATAIND BASE.ECHANT_INDIV - IDENTIFIANT DU NIVEAU 2
IDENT2 ID - TABLE DES INDIVIDUS KISH
DATAKISH BASE.ECHANT_KISH - PONDÉRATION INITIALE POIDS
POIDS1 - FACTEUR D'ÉCHELLE
ECHELLE 1 - PONDÉRATION QK
PONDQK __UN - PONDÉRATION KISH
POIDKISH NBELIG - TABLE(S) DES MARGES
- DE NIVEAU 1
MARMEN BASE.MARGE_MEN - DE NIVEAU 2
MARIND BASE.MARGE_IND
19- MÉTHODE UTILISÉE M
1 - BORNE INFÉRIEURE LO
- BORNE SUPÉRIEURE UP
- COEFFICIENT DU SINUS HYPERBOLIQUE ALPHA
1 - SEUIL D'ARRÊT SEUIL
0.0001 - NOMBRE MAXIMUM D'ITÉRATIONS
MAXITER 15 - TRAITEMENT DES COLINÉARITÉS COLIN
NON - TABLE(S) CONTENANT LA POND. FINALE
- DE NIVEAU 1
DATAPOI POIDSMEN - DE NIVEAU 2
DATAPOI2 POIDSIND - DE NIVEAU KISH
DATAPOI3 POIDSKISH - MISE À JOUR DE(S) TABLE(S) DATAPOI(2)(3)
MISAJOUR OUI - PONDÉRATION FINALE
POIDSFIN W3 - LABEL DE LA PONDÉRATION FINALE
LABELPOI CALAGE 3 NIVEAUX - PONDÉRATION FINALE DES UNITES KISH
POIDSKISHFIN W3K - LABEL DE LA PONDÉRATION KISH
LABELPOIKISH POIDS KISH TOTAL - CONTENU DE(S) TABLE(S) DATAPOI(2)(3)
CONTPOI OUI
20- COMPARAISON ENTRE LES MARGES TIRÉES DE
L'ÉCHANTILLON (PONDÉRATION INITIALE) - ET LES MARGES DANS LA
POPULATION (MARGES DU CALAGE) - MARGE
MARGE POURCENTAGE POURCENTAGE - VARIABLE MODALITÉ ÉCHANTILLON
POPULATION ÉCHANTILLON POPULATION - NBIND 01 1525.60
1539 26.30 26.53 - 02 1914.37
1860 33.00 32.06 - 03 797.71
1000 13.75 17.24 - 04 930.78
885 16.05 15.26 - 05 365.18
361 6.30 6.22 - 06 267.36
156 4.61 2.69 - PCSPR 1 80.70
124 1.39 2.14 - 2 191.78
290 3.31 5.00 - 3 822.81
624 14.18 10.76 - 4 832.34
870 14.35 15.00 - 5 569.41
682 9.82 11.76 - 6 1279.53
1237 22.06 21.32
21 MARGE MARGE POURCENTAGE
POURCENTAGE VARIABLE MODALITÉ
ÉCHANTILLON POPULATION ÉCHANTILLON
POPULATION AGE 00-14 ans
3245.32 2857 21.46
19.52 15-24 ans 2217.86
2044 14.67 13.96
25-59 ans 6699.70 6800
44.31 46.45
60- ? ans 2957.50 2939 19.56
20.08 SEXE 1
7546.69 7108 49.91
48.55 2 7573.69
7532 50.09 51.45
AGEK A15 2155.94 2044
18.28 17.35
A25 6752.61 6800 57.25
57.71 A60
2885.84 2939 24.47
24.94 SEXEK 1 5596.30
5673 47.45 48.15
2 6198.09 6110
52.55 51.85
22MÉTHODE LINÉAIREPREMIER TABLEAU RÉCAPITULATIF
DE L'ALGORITHME LA VALEUR DU CRITÈRE
D'ARRÊT ET LE NOMBRE DE POIDS NÉGATIFS APRÈS
CHAQUE ITÉRATION
-
CRITÈRE POIDS - ITÉRATION D'ARRÊT NÉGATIFS
- 1 1.31960 1
- 2 0.00000 1
23MÉTHODE LINÉAIREDEUXIÈME TABLEAU RÉCAPITULATIF
DE L'ALGORITHME LES COEFFICIENTS DU VECTEUR
LAMBDA DE MULTIPLICATEURS DE LAGRANGE APRÈS
CHAQUE ITÉRATION
- VARIABLE MODALITÉ LAMBDA1 LAMBDA2
- NBIND 01 -0.15325 -0.15325
- NBIND 02 -0.24295 -0.24295
- NBIND 03 0.00562 0.00562
- NBIND 04 -0.17355 -0.17355
- NBIND 05 -0.00502 -0.00502
- NBIND 06 -0.44773 -0.44773
- PCSPR 1 0.92036 0.92036
- PCSPR 2 0.50376 0.50376
- PCSPR 3 -0.18514 -0.18514
- PCSPR 4 0.15354 0.15354
- PCSPR 5 0.36019 0.36019
- PCSPR 6 0.08424 0.08424
- PCSPR 7 0.16042 0.16042
- PCSPR 8 .
. -
24- VARIABLE MODALITÉ LAMBDA1
LAMBDA2 - STRATE 0 -0.14172 -0.14172
- STRATE 1 -0.07338 -0.07338
- STRATE 2 -0.12634 -0.12634
- STRATE 3 -0.03106 -0.03106
- STRATE 4 .
. - AGE 00-14 ans -0.03549 -0.03549
- AGE 15-24 ans -0.65576 -0.65576
- AGE 25-59 ans -0.52872 -0.52872
- AGE 60- ? ans -0.64430 -0.64430
- SEXE 1 -0.08395 -0.08395
- SEXE 2 .
. - AGEK A15 0.67198 0.67198
- AGEK A25 0.68366 0.68366
- AGEK A60 0.74262 0.74262
- SEXEK 1 0.01727 0.01727
- SEXEK 2 .
.
25COMPARAISON ENTRE LES MARGES FINALES DANS
L'ÉCHANTILLON (AVEC LA PONDÉRATION FINALE)ET
LES MARGES DANS LA POPULATION (MARGES DU CALAGE)
MARGE MARGE
POURCENTAGE POURCENTAGE VARIABLE
MODALITÉ ÉCHANTILLON POPULATION ÉCHANTILLON
POPULATION NBIND 01 1539
1539 26.53 26.53
02 1860 1860
32.06 32.06 03
1000 1000 17.24
17.24 04 885
885 15.26 15.26
05 361 361
6.22 6.22 06
156 156 2.69
2.69 PCSPR 1 124
124 2.14 2.14
2 290 290
5.00 5.00 3
624 624 10.76
10.76 4 870
870 15.00 15.00
5 682 682
11.76 11.76 6
1237 1237 21.32
21.32 7 1831
1831 31.56 31.56
8 143 143
2.47 2.47 STRATE 0
1453 1453 25.05
25.05 1 966
966 16.65 16.65
2 805 805
13.88 13.88 3
1689 1689 29.12
29.12 4 888
888 15.31 15.31
26- MARGE MARGE POURCENTAGE
POURCENTAGE - VARIABLE MODALITÉ ÉCHANTILLON POPULATION
ÉCHANTILLON POPULATION -
- AGE 00-14 ans 2857 2857
19.52 19.52 - 15-24 ans 2044 2044
13.96 13.96 - 25-59 ans 6800 6800
46.45 46.45 - 60- ? ans 2939 2939
20.08 20.08 - SEXE 1 7108 7108
48.55 48.55 - 2 7532 7532
51.45 51.45 - AGEK A15 2044 2044
17.35 17.35 - A25 6800 6800
57.71 57.71 - A60 2939 2939
24.94 24.94 - SEXEK 1 5673 5673
48.15 48.15 - 2 6110 6110
51.85 51.85
27STATISTIQUES SUR LES RAPPORTS DE POIDS (
PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES) ET
SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable _F_ (RAPPORT
DE POIDS)
Basic Statistical Measures Quantiles
(Definition 5) Location
Variability Quantile Estimate Mean
1.000000 Std Deviation 0.24564
100 Max 2.009262 Median 0.996533
Variance 0.06034 99
1.745002 Mode 0.991339 Range
2.32886 95 1.377982
Interquartile Range 0.21258
90 1.278637 75 Q3
1.105492
50 Median 0.996533
25 Q1 0.892917
10
0.749877
5 0.613091
1 0.251528
0 Min
-0.319601
Extreme
Observations -------------Lowest---
---------- ------------Highest-----------
Value IDENT Obs
Value IDENT Obs
-0.3196012 1163032100 27 1.76397
5363019600 293 0.0374385
7363016270 365 1.79618 7463000450
381 0.1498661 1169040310
73 1.85813 2369004180 129
0.1872096 7269001420 348
1.97094 5463007950 326
0.2314417 7363017990 366 2.00926
5263016110 268
28STATISTIQUES SUR LES RAPPORTS DE POIDS (
PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES) ET
SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable _F_ (RAPPORT
DE POIDS)
Histogram
Boxplot 2.05
1
.
1 .
1
.
3 0
.
3 0
.
5 0 .
7
0 .
26
.
27
.
59 -----
.
110
.
128 -----
0.85
57 -----
.
33 .
17
.
8 0
.
5 0 .
3
0 .
2 0
.
2 .
1
.
. . -0.35
1
--------------------
--------------- may
represent up to 3 counts
29STATISTIQUES SUR LES RAPPORTS DE POIDS (
PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES) ET
SUR LES PONDÉRATIONS FINALES
The UNIVARIATE Procedure
Variable __WFIN
(PONDÉRATION FINALE)
Basic Statistical Measures
Quantiles (Definition 5) Location
Variability Quantile Estimate
Mean 11.60200 Std Deviation
4.62597 100 Max 29.19457 Median
10.11949 Variance 21.39957
99 25.69548 Mode 9.57633
Range 32.03263 95
20.11085 Interquartile
Range 5.70090 90
18.04434 75 Q3 13.98763
50 Median
10.11949
25 Q1 8.28672
10 7.15056
5
6.41373
1 2.50660
0 Min -2.83806
Extreme Observations
-------------Lowest------------
------------Highest-----------
Value IDENT Obs Value
IDENT Obs -2.838058
1163032100 27 25.7604 5369016540
317 0.543982 7363016270
365 26.0985 7463000450 381
1.330811 1169040310 73
28.6378 5463007950 326
1.808444 7269001420 348 28.6643
8269018030 421 2.235727
7363017990 366 29.1946 5263016110
268
30STATISTIQUES SUR LES RAPPORTS DE POIDS (
PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES) ET
SUR LES PONDÉRATIONS FINALES The UNIVARIATE
Procedure Variable __WFIN (PONDÉRATION
FINALE)
Histogram
Boxplot 29
3 0
.
1 0 .
4
0 .
8
0 .
11
.
25
.
41 .
32
13
67 -----
.
64 ----
.
134 -----
.
88 .
14
.
4
.
3 .
-3
1 0
------------------------------------
may represent up to 3 counts
31 MÉTHODE LINÉAIRE RAPPORTS DE POIDS MOYENS
(PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
NOMBRE
D'OBSERVATIONS RAPPORT
VARIABLE MODALITE DE NIVEAU 1 DE
POIDS NBIND 01
133 1.00152
NBIND 02 167
0.97304 NBIND 03
69 1.24647
NBIND 04 79
0.95151 NBIND 05
31 0.99271
NBIND 06 21
0.58818 PCSPR 1
6 1.55064
PCSPR 2 15
1.52001 PCSPR 3
73 0.76645
PCSPR 4 73
1.04281 PCSPR 5
51 1.20566
PCSPR 6 111
0.96902 PCSPR 7
157 0.99429
PCSPR 8 14
0.76191 STRATE 0
100 1.00000
STRATE 1 100
1.00000 STRATE 2
100 1.00000
STRATE 3 100
1.00000 STRATE 4
100 1.00000
ENSEMBLE 500
1.00000
32 MÉTHODE LINÉAIRE RAPPORTS DE POIDS MOYENS
(PONDÉRATIONS FINALES / PONDÉRATIONS INITIALES)
POUR CHAQUE VALEUR DES VARIABLES
NOMBRE
D'OBSERVATIONS
RAPPORT VARIABLE MODALITE
DE NIVEAU 2 DE POIDS AGE
00-14 an 274 0.88664
AGE 15-24 an 184
0.93210 AGE 25-59 an
581 1.01758 AGE
60- ? an 249 0.99088
SEXE 1 640
0.94443 SEXE 2
648 0.99993
ENSEMBLE 1288
0.97235 NOMBRE
D'INDIVIDUS RAPPORT
VARIABLE MODALITE KISH
DE POIDS AGEK A15
66 0.95043 AGEK
A25 283 1.01108
AGEK A60 151
1.00090 SEXEK 1
232 0.98540 SEXEK
2 268 1.01264
ENSEMBLE 500 1.00000
33MÉTHODE LINÉAIRE CONTENU DE LA TABLE poidsmen
CONTENANT LA NOUVELLE PONDÉRATION w3 The
CONTENTS Procedure Variable
Type Len Pos Label 1 IDENT
Char 10 8 2 w3
Num 8 0 calage 3 niveaux CONTENU
DE LA TABLE poidsind CONTENANT LA NOUVELLE
PONDÉRATION w3 Variable Type
Len Pos Label 2 IDENT
Char 10 20 1 id Char
12 8 3 w3 Num
8 0 calage 3 niveaux CONTENU DE LA
TABLE poidskish CONTENANT LA NOUVELLE PONDÉRATION
w3 Variable Type Len Pos
Label 2 ID Char 12
26 1 IDENT Char 10 16
3 w3 Num 8 0
calage 3 niveaux 4 w3k Num
8 8 poids kish total
34 BILAN
DATE 24 AOUT 2005
HEURE 1112
TABLE
EN ENTRÉE BASE.ECHANT_MENAGES
NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE
500 NOMBRE D'OBSERVATIONS
ÉLIMINÉES 0
NOMBRE D'OBSERVATIONS CONSERVÉES
500 VARIABLE DE PONDÉRATION
POIDS1 NOMBRE DE VARIABLES CATÉGORIELLES
3 LISTE DES VARIABLES CATÉGORIELLES ET DE
LEURS NOMBRES DE MODALITÉS nbind (6)
pcspr (8) strate (5) SOMME DES POIDS
INITIAUX 5801
TAILLE DE LA POPULATION
5801
TABLE EN ENTRÉE
BASE.ECHANT_INDIV
NOMBRE D'OBSERVATIONS DANS LA
TABLE EN ENTRÉE 1288 NOMBRE
D'OBSERVATIONS ÉLIMINÉES
0 NOMBRE D'OBSERVATIONS CONSERVÉES
1288 NOMBRE DE
VARIABLES CATÉGORIELLES 2 LISTE DES
VARIABLES CATÉGORIELLES ET DE LEURS NOMBRES DE
MODALITÉS age (4) sexe (2) SOMME
DES POIDS INITIAUX
15120 TAILLE DE LA POPULATION
14640
35 TABLE EN ENTRÉE BASE.ECHANT_KISH
NOMBRE D'OBSERVATIONS DANS LA TABLE EN ENTRÉE
500 NOMBRE D'OBSERVATIONS
ÉLIMINÉES 0
NOMBRE D'OBSERVATIONS CONSERVÉES
500 VARIABLE DE PONDÉRATION
CONDITIONNELLE NBELIG
NOMBRE MAXIMUM D'UNITES SECONDAIRES PAR UP
1 NOMBRE DE VARIABLES
CATÉGORIELLES 2 LISTE DES VARIABLES
CATÉGORIELLES ET DE LEURS NOMBRES DE MODALITÉS
agek (3) sexek (2) SOMME DES POIDS
INITIAUX
11794 TAILLE DE LA POPULATION
11783 MÉTHODE
UTILISÉE LINÉAIRE LE CALAGE A ÉTÉ RÉALISÉ
EN 2 ITÉRATIONS IL Y A 1 POIDS
NÉGATIFS LES POIDS ONT ÉTÉ STOCKÉS DANS LA
VARIABLE W3 DE LA TABLE POIDSMEN ET DE LA
TABLE POIDSIND ET DE LA TABLE POIDSKISH
LES POIDS DES UNITES KISH ONT ÉTÉ STOCKÉS DANS LA
VARIABLE W3K DE LA TABLE POIDSKISH
36Handling total non-response with generalized
calibration
37III.1. Generalized calibration Calibration
functions where ? vector of p adjustment
parameters Calibration equations Solving
for ? ?
38Basic result
parameter estimates of the instrumental
regression of on with as
instrumental variables, weighted by
39Precision  Â
residual of the regression of Y on X in U with
the instrumental variables Z
Note the instruments are equal to
40III.2. Calibration in case of total non-response
? Calibration after adjustment for non-response
1.a. Adjustment for non-response  Response
probabilities (conditionnally to s) is
estimated referring to a response model and
an estimation method Expansion estimator Â
41- Examples
- Uniform response model
- Homogeneous response groups
- Generalized linear model
- vector of explanatory non-response
variables - Note for estimating , must be known both
for respondents AND NON-RESPONDENTS
421.b. Calibration We start from corrected
weights Conventional calibration
43? Direct conventional calibration Â
? is equivalent to ? with a uniform non-response
model. Â Comparison between ? and ?
(Dupont, 1993) Lets suppose - N.R. is
corrected by a GLM, in which H is one of the
usual calibration functions F -Â non-response
variables are included into calibration set
of variables . Then ? and ? are " similar
"Â
44? and ? are identical when
(b) . N.R. is corrected by HRG model based on a
categorical variable X . The sample is
calibrated on the number of units in U for
each X level ? ? formal
post-stratification on U
45? Direct generalized calibration
(E) Interpretation Response model (E) can be
written
46So, if the were known (E) generalized
calibration equation, with F is defined
as and such as
47- Precision
- uses the residuals in the
population - uses the residuals of the instrumental
regression - in r, weighted by the
-
- estimator for if
response probabilities - were known
48Response probabilities are unknown ? "estimate"
and the residuals i.e. instrumental
regression weighted by final weights Note
looks like estimated variance
1st phase (sample s selection) estimated
variance 2nd phase (respondents r "selection")
49Properties of the method
- allows nonresponse correction even when
explanatory variables are only known for
respondents - Handles the particular situation in which
non-response explanatory variables are variables
of interest (non ignorable response mechanism )Â - reduces the bias produced by nonresponse thanks
to variables , and reduces the variance
thanks to variablesÂ
This method is performed in Calmar 2.
50An example of generalized calibration
51The survey
- Sampling frame population census (1990)
- Sampling design cluster sampling
- clusters households
- secondary units all members of selected
households - Response model
- H.R.G.
- response variables household size (alone or
not) - head of
household profession (6 levels) - strata (
agglomeration size) - Calibration variables (xk)
- Households the same as before (in the sampling
frame) - Individuals sex age group (in the
sampling frame) - Simultaneous calibration with two levels
- Instrumental variables (zk)
- Response variables as they are measured in the
survey, that is in 1996 -
52The population totals data
- Constraint the xk and zk vectors must have
same dimension - Primary units (households)
- var n R mar1 mar2 mar3
mar4 mar5 mar6 - Â
- strate90 5 0 1314 833 704
1477 777 . - seul90 2 0 3933 1172 .
. . . - cs90 6 0 457 470 537
435 1254 1952 - strate96 5 1 . . .
. . . - seul96 2 1 . . .
. . . - cs96 6 1 . . .
. . . - Secondary units (individuals)
- var n R mar1 mar2 mar3
mar4 - sexe 2 0 6255 6628 .
. - age 4 0 2514 1799 5984
2586 - sexe_bis 2 1 . . .
.
53calmar2_guide
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68Merci de votre attention !