Title: OUTLINE OF PRESENTATION
1International Conference Edinburgh, 22-24
September 1999
Analysis Visual/Spatial
RDD TELEPHONE SURVEYS WEIGHTING ADJUSTMENT
USING A GIS Antonio Giusti, Alessandra Petrucci,
Monica Pratesi
Dipartimento di Statistica G. Parenti
Università degli Studi di Firenze - Italy
2OUTLINE
- POST SAMPLING WEIGHTING PROBLEM
- CALIBRATION PROCEDURE IN RDD SURVEY
- RDD GIS
- CASE STUDY
3POST SAMPLING WEIGHTING PROBLEM
Usually post sampling weighting in a RDD survey
tries to correct the traditional sources of bias
the population of subscribers is not equal
to the population of households
4(No Transcript)
5There are households without telephone and
households which can be reached through more than
one telephone number.
6There are households without telephone and
households which can be reached through more than
one telephone number.
The proposed weighting procedure allows for the
traditional weights and the simultaneous
weighting with auxiliary variables (census data
and administrative data) linked through the
Geographical Information System (GIS).
7THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household,
8THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household, ii)
household without telephone,
9THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household, ii)
household without telephone, iii) household
non-response.
10If the random generation of telephone numbers
follows the Mitowsky-Waksberg two-stage
procedure (Waksberg, 1978), the sample of
numbers should be a self-weighting sample of
households and the basic probability of
selection should be then equal to the proportion
of telephone households sampled.
11If the random generation of telephone numbers
follows the Mitowsky-Waksberg two-stage
procedure (Waksberg, 1978), the sample of
numbers should be a self-weighting sample of
households and the basic probability of
selection should be then equal to the proportion
of telephone households sampled.
This is acceptable under the assumptions that all
households with telephone have only one
telephone number and there are enough
residential telephone numbers in the groups of
telephone numbers randomly generated.
12The omission of non-telephone households has
different impact depending on the level of
estimates and on the dimension of non-coverage.
Several studies have demonstrated a low impact on
estimates at national or regional level.
13The omission of non-telephone households has
different impact depending on the level of
estimates and on the dimension of non-coverage.
Several studies have demonstrated a low impact on
estimates at national or regional level.
Telephone coverage in Italy could be low for
persons with low incomes in rural areas, while
it is generally high (near by 100) for persons
with a medium level of education and, medium
income in urban areas.
14In deciding which variables should be used to
adjust for non-coverage, it is recommended that
the chosen variable be correlated with telephone
non-coverage, routinely estimated by National
Statistical Institute or other data producers
institutes, reasonably stable over time, not
subject to large response error and having a low
item response rate.
15In Italy, there appear to be significant
differences in response rate by age, education
and income. Actually non-response can be also
divided in non-response due to the refuse to
participate in the survey and non-response due
to missed telephone contact with the person.
Contacted and non-contacted persons differ
mainly by job position (worker, retired from
work) and educational level. The probability of
response can be estimated conditionally on the
contact process and the basic household weight
can be modified to take into account the effect
both of missed contacts and of non-response.
16The previous adjustments are often routinely done
after the survey using also data collected to
this end before the survey. Before the survey
the domain of study is also identified and
population figures for the sub-domains are
obtained from independent data sources.
17The previous adjustments are often routinely done
after the survey using also data collected to
this end before the survey. Before the survey
the domain of study is also identified and
population figures for the sub-domains are
obtained from independent data sources.
The final type of adjustment, commonly used also
in face to face surveys, is the calculation of
weights to bring the survey estimates in line
with independent population figures for the
sub-domains. Realigning the RDD sample
distribution of persons to known distribution is
not an easy task.
18There are two main difficulties in
post-stratification process
19- There are two main difficulties in
post-stratification process - official data sources do not often tabulate the
distribution needed at - the geographical level that define the target
population
20- There are two main difficulties in
post-stratification process - official data sources do not often tabulate the
distribution needed at - the geographical level that define the target
population - after the adjustment of estimates following the
recommendation - in points i), ii), iii) we have an "intermediate"
weight to be modified - to be consistent with the independent auxiliary
information.
21CALIBRATED WEIGHTS
Constrained weighting estimators (calibration on
auxiliary variables) the weights wk
satisfy the condition with tx known
22 weight
correction
(constrain(s))
D distance function between initial and final
weigh
Application with linear distance
23RDD and GIS
The Italian territory for the fulfilment of the
telephone service is subdivided in urban
telephone areas with a well-defined geographical
coverage. The urban areas group in sectors, the
sectors group in districts and, finally, the
districts group in the largest areas called
compartimenti which have the geographical
extension like an administrative region.
Therefore the telephone system is based on a
hierarchical system of numbering using the
telephone area as the smallest geographical
unit.
24Generally, the telephone number consists of 10
digits where the first two digits (XX) are the
district code and the remaining 8 digits make up
the subscriber telephone number. The first 2, 3
or 4 digits of this number identify the telephone
area code. XX
YY(YY) (ZZ)ZZZZ district
code area code number In this
way every telephone area is associated with one
or more sequences composed of 2 to 4 digits
(YY(YY)). The last part of the number ((ZZ)ZZZZ)
can be filled with a random sequence.
25Using the numbering sequences and the
geographical coverage of the telephone areas, it
is possible to geocode the randomly generated
telephone numbers.
26(No Transcript)
27(No Transcript)
28The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code.
29The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code. To preserve the confidentiality, only a
few social-demographic variables and indicators
are released by ISTAT.
30The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code. To preserve the confidentiality, only a
few social-demographic variables and indicators
are released by ISTAT. The auxiliary information
that can be used for the calibrating procedure is
mainly populations sex, age and job position.
31The matrix of the auxiliary variables is
composed by the membership indicators obtained by
the partition of the population in the above
groups.
32The matrix of the auxiliary variables is
composed by the membership indicators obtained by
the partition of the population in the above
groups.
The sum of the weights is equal to the sub-domain
size that can be computed when the topological
relationship between the enumeration districts
and the telephone areas is known.
33A CASE STUDY THE SURVEY ON THE FLORENCES BUS
SERVICE QUALITY
One of the main purposes of this survey was to
evaluate the customer satisfaction of the
service and the more important expectations for
the future of the service. We refer to the first
CATI (Computer Assisted Telephone Interviewing)
conducted for this purpose from 12 to 24 October
1998.
34We decided to select the sample not using a frame
but considering all people reachable with an
incoming telephone line in the Florence
telephone area (a part of the Florence telephone
district). The Florence telephone area covers
completely 4 municipalities and partially 3
municipalities
35We decided to select the sample not using a frame
but considering all people reachable with an
incoming telephone line in the Florence
telephone area (a part of the Florence telephone
district). The Florence telephone area covers
completely 4 municipalities and partially 3
municipalities It was decided to use the
Florence telephone area for conducting the
survey due to a good approximation with the area
served by ATAF. On ATAF request this area was
divided in five zones
36(No Transcript)
37(No Transcript)
38(No Transcript)
39SAMPLE SELECTION
The RDD method can be used in different ways
40SAMPLE SELECTION
- The RDD method can be used in different ways
- to generate a telephone number before each new
call
41SAMPLE SELECTION
- The RDD method can be used in different ways
- to generate a telephone number before each new
call - to make the sample selection before the
execution of - the survey software
42SAMPLE SELECTION
- The RDD method can be used in different ways
- to generate a telephone number before each new
call - to make the sample selection before the
execution of - the survey software
- In any case these procedures give a probability
different - from zero for each incoming line to be selected,
even if they - are not included in the telephone directories.
43We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
44We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone.
45We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone. The sample
design was made considering the population, aged
over 14 years, of the five zones at the census
(506622 people in total).
46We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone. The sample
design was made considering the population, aged
over 14 years, of the five zones at the census
(506622 people in total).
The telephone numbers, randomly generated
according the numeration plan for the Florence
telephone area, contained about 8.1 of
disconnected lines (the Telephone
Company declared a smaller percentage, 7).
47AUXILIARY INFORMATION
The census data are released by Istat (the
Italian National Statistical Institute) in a
file containing the total of some census
variables for each enumeration district. The
census data were inserted in the GIS system used
to manage the territorial information.
48Problems
- we had to use the borders of the telephone
exchanging area - to define the five zones
49Problems
- we had to use the borders of the telephone
exchanging area - to define the five zones
- each telephone exchanging area covers a group of
enumeration - districts and some of these can span among two or
more - telephone exchanging area (next figure).
50(No Transcript)
51The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
52The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
The estimated number of eligible people in the
five zones has been used to select the sample
size for each zone.
53The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
The estimated number of eligible people in the
five zones has been used to select the sample
size for each zone.
To define the number of interviews per zone,
according the sex, the data contained in the
released ISTAT database were used. For the job
position it was necessary to estimate the target
population from the table released at
municipality level.
54CALLING RULES
The electronic questionnaire) was conceived to
manage the cases using an automated scheduling
management. For example, a limit of 4 calls to
each not busy telephone number was stated, to
avoid loss of time. In this way, to reach the
number of 2012 completed interviews it was
necessary to make 7070 telephone calls to 5769
different telephone numbers. Next table shows
the result of the last outcome call for the 5769
telephone numbers used in the survey.
55Result of the last call for each telephone number
used
56POST STRATIFICATION
After the survey, the post-stratification has
been conducted through the calibration method
described . The sub-domains of interest are sex
and job position. The implementation of the
procedure has been done with ad hoc software.
The results are displayed in the following
tables.
57Sample and Population size
Distribution of the Population for sex per zone
58Distribution of the Population by job position
per zone
59CONCLUDING REMARKS