OUTLINE OF PRESENTATION - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

OUTLINE OF PRESENTATION

Description:

If the random generation of telephone numbers follows the ... of non-telephone households has different ... are not included in the telephone directories. 43 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 60
Provided by: dipstat
Category:

less

Transcript and Presenter's Notes

Title: OUTLINE OF PRESENTATION


1
International Conference Edinburgh, 22-24
September 1999
Analysis Visual/Spatial
RDD TELEPHONE SURVEYS WEIGHTING ADJUSTMENT
USING A GIS Antonio Giusti, Alessandra Petrucci,
Monica Pratesi
Dipartimento di Statistica G. Parenti
Università degli Studi di Firenze - Italy
2
OUTLINE
  • POST SAMPLING WEIGHTING PROBLEM
  • CALIBRATION PROCEDURE IN RDD SURVEY
  • RDD GIS
  • CASE STUDY

3
POST SAMPLING WEIGHTING PROBLEM
Usually post sampling weighting in a RDD survey
tries to correct the traditional sources of bias
the population of subscribers is not equal
to the population of households
4
(No Transcript)
5
There are households without telephone and
households which can be reached through more than
one telephone number.
6
There are households without telephone and
households which can be reached through more than
one telephone number.
The proposed weighting procedure allows for the
traditional weights and the simultaneous
weighting with auxiliary variables (census data
and administrative data) linked through the
Geographical Information System (GIS).
7
THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household,
8
THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household, ii)
household without telephone,
9
THE CALIBRATION PROCEDURE
As it is well known, in many surveys the biases
can be ascribed to the following factors i)
multiple telephones in household, ii)
household without telephone, iii) household
non-response.
10
If the random generation of telephone numbers
follows the Mitowsky-Waksberg two-stage
procedure (Waksberg, 1978), the sample of
numbers should be a self-weighting sample of
households and the basic probability of
selection should be then equal to the proportion
of telephone households sampled.
11
If the random generation of telephone numbers
follows the Mitowsky-Waksberg two-stage
procedure (Waksberg, 1978), the sample of
numbers should be a self-weighting sample of
households and the basic probability of
selection should be then equal to the proportion
of telephone households sampled.
This is acceptable under the assumptions that all
households with telephone have only one
telephone number and there are enough
residential telephone numbers in the groups of
telephone numbers randomly generated.
12
The omission of non-telephone households has
different impact depending on the level of
estimates and on the dimension of non-coverage.
Several studies have demonstrated a low impact on
estimates at national or regional level.
13
The omission of non-telephone households has
different impact depending on the level of
estimates and on the dimension of non-coverage.
Several studies have demonstrated a low impact on
estimates at national or regional level.
Telephone coverage in Italy could be low for
persons with low incomes in rural areas, while
it is generally high (near by 100) for persons
with a medium level of education and, medium
income in urban areas.
14
In deciding which variables should be used to
adjust for non-coverage, it is recommended that
the chosen variable be correlated with telephone
non-coverage, routinely estimated by National
Statistical Institute or other data producers
institutes, reasonably stable over time, not
subject to large response error and having a low
item response rate.
15
In Italy, there appear to be significant
differences in response rate by age, education
and income. Actually non-response can be also
divided in non-response due to the refuse to
participate in the survey and non-response due
to missed telephone contact with the person.
Contacted and non-contacted persons differ
mainly by job position (worker, retired from
work) and educational level. The probability of
response can be estimated conditionally on the
contact process and the basic household weight
can be modified to take into account the effect
both of missed contacts and of non-response.
16
The previous adjustments are often routinely done
after the survey using also data collected to
this end before the survey. Before the survey
the domain of study is also identified and
population figures for the sub-domains are
obtained from independent data sources.
17
The previous adjustments are often routinely done
after the survey using also data collected to
this end before the survey. Before the survey
the domain of study is also identified and
population figures for the sub-domains are
obtained from independent data sources.
The final type of adjustment, commonly used also
in face to face surveys, is the calculation of
weights to bring the survey estimates in line
with independent population figures for the
sub-domains. Realigning the RDD sample
distribution of persons to known distribution is
not an easy task.
18
There are two main difficulties in
post-stratification process
19
  • There are two main difficulties in
    post-stratification process
  • official data sources do not often tabulate the
    distribution needed at
  • the geographical level that define the target
    population

20
  • There are two main difficulties in
    post-stratification process
  • official data sources do not often tabulate the
    distribution needed at
  • the geographical level that define the target
    population
  • after the adjustment of estimates following the
    recommendation
  • in points i), ii), iii) we have an "intermediate"
    weight to be modified
  • to be consistent with the independent auxiliary
    information.

21
CALIBRATED WEIGHTS
Constrained weighting estimators (calibration on
auxiliary variables) the weights wk
satisfy the condition with tx known
22

weight
correction
(constrain(s))
D distance function between initial and final
weigh
Application with linear distance
23
RDD and GIS
The Italian territory for the fulfilment of the
telephone service is subdivided in urban
telephone areas with a well-defined geographical
coverage. The urban areas group in sectors, the
sectors group in districts and, finally, the
districts group in the largest areas called
compartimenti which have the geographical
extension like an administrative region.
Therefore the telephone system is based on a
hierarchical system of numbering using the
telephone area as the smallest geographical
unit.
24
Generally, the telephone number consists of 10
digits where the first two digits (XX) are the
district code and the remaining 8 digits make up
the subscriber telephone number. The first 2, 3
or 4 digits of this number identify the telephone
area code. XX
YY(YY) (ZZ)ZZZZ district
code area code number In this
way every telephone area is associated with one
or more sequences composed of 2 to 4 digits
(YY(YY)). The last part of the number ((ZZ)ZZZZ)
can be filled with a random sequence.
25
Using the numbering sequences and the
geographical coverage of the telephone areas, it
is possible to geocode the randomly generated
telephone numbers.
26
(No Transcript)
27
(No Transcript)
28
The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code.
29
The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code. To preserve the confidentiality, only a
few social-demographic variables and indicators
are released by ISTAT.
30
The National Statistical Institute (ISTAT)
disseminates some data files that can be directly
linked to the (geographical) enumeration
districts by means of their unique identification
code. To preserve the confidentiality, only a
few social-demographic variables and indicators
are released by ISTAT. The auxiliary information
that can be used for the calibrating procedure is
mainly populations sex, age and job position.
31
The matrix of the auxiliary variables is
composed by the membership indicators obtained by
the partition of the population in the above
groups.
32
The matrix of the auxiliary variables is
composed by the membership indicators obtained by
the partition of the population in the above
groups.
The sum of the weights is equal to the sub-domain
size that can be computed when the topological
relationship between the enumeration districts
and the telephone areas is known.
33
A CASE STUDY THE SURVEY ON THE FLORENCES BUS
SERVICE QUALITY
One of the main purposes of this survey was to
evaluate the customer satisfaction of the
service and the more important expectations for
the future of the service. We refer to the first
CATI (Computer Assisted Telephone Interviewing)
conducted for this purpose from 12 to 24 October
1998.
34
We decided to select the sample not using a frame
but considering all people reachable with an
incoming telephone line in the Florence
telephone area (a part of the Florence telephone
district). The Florence telephone area covers
completely 4 municipalities and partially 3
municipalities
35
We decided to select the sample not using a frame
but considering all people reachable with an
incoming telephone line in the Florence
telephone area (a part of the Florence telephone
district). The Florence telephone area covers
completely 4 municipalities and partially 3
municipalities It was decided to use the
Florence telephone area for conducting the
survey due to a good approximation with the area
served by ATAF. On ATAF request this area was
divided in five zones
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
SAMPLE SELECTION
The RDD method can be used in different ways
40
SAMPLE SELECTION
  • The RDD method can be used in different ways
  • to generate a telephone number before each new
    call

41
SAMPLE SELECTION
  • The RDD method can be used in different ways
  • to generate a telephone number before each new
    call
  • to make the sample selection before the
    execution of
  • the survey software

42
SAMPLE SELECTION
  • The RDD method can be used in different ways
  • to generate a telephone number before each new
    call
  • to make the sample selection before the
    execution of
  • the survey software
  • In any case these procedures give a probability
    different
  • from zero for each incoming line to be selected,
    even if they
  • are not included in the telephone directories.

43
We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
44
We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone.
45
We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone. The sample
design was made considering the population, aged
over 14 years, of the five zones at the census
(506622 people in total).
46
We decided to select the sample before starting
the interviews (using a Matlab ver. 5 program).
This procedure allowed the construction of a
random sample stratified per zone. The sample
design was made considering the population, aged
over 14 years, of the five zones at the census
(506622 people in total).
The telephone numbers, randomly generated
according the numeration plan for the Florence
telephone area, contained about 8.1 of
disconnected lines (the Telephone
Company declared a smaller percentage, 7).
47
AUXILIARY INFORMATION
The census data are released by Istat (the
Italian National Statistical Institute) in a
file containing the total of some census
variables for each enumeration district. The
census data were inserted in the GIS system used
to manage the territorial information.
48
Problems
  • we had to use the borders of the telephone
    exchanging area
  • to define the five zones

49
Problems
  • we had to use the borders of the telephone
    exchanging area
  • to define the five zones
  • each telephone exchanging area covers a group of
    enumeration
  • districts and some of these can span among two or
    more
  • telephone exchanging area (next figure).

50
(No Transcript)
51
The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
52
The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
The estimated number of eligible people in the
five zones has been used to select the sample
size for each zone.
53
The data of the spanning enumeration districts
were proportionally assigned to the neighbouring
districts using the surface covered.
The estimated number of eligible people in the
five zones has been used to select the sample
size for each zone.
To define the number of interviews per zone,
according the sex, the data contained in the
released ISTAT database were used. For the job
position it was necessary to estimate the target
population from the table released at
municipality level.
54
CALLING RULES
The electronic questionnaire) was conceived to
manage the cases using an automated scheduling
management. For example, a limit of 4 calls to
each not busy telephone number was stated, to
avoid loss of time. In this way, to reach the
number of 2012 completed interviews it was
necessary to make 7070 telephone calls to 5769
different telephone numbers. Next table shows
the result of the last outcome call for the 5769
telephone numbers used in the survey.
55
Result of the last call for each telephone number
used
56
POST STRATIFICATION
After the survey, the post-stratification has
been conducted through the calibration method
described . The sub-domains of interest are sex
and job position. The implementation of the
procedure has been done with ad hoc software.
The results are displayed in the following
tables.
57
Sample and Population size
Distribution of the Population for sex per zone
58
Distribution of the Population by job position
per zone
59
CONCLUDING REMARKS
Write a Comment
User Comments (0)
About PowerShow.com