Title: Rudi Seljak
1 Estimation of Standard Errors of Indices in the
Sampling Business Surveys
- Rudi Seljak
- Statistical Office of the Republic of Slovenia
2Overview of the presentation
- Introduction of the problem
- Description of the simulation study
- Results of the study
- Conclusions
3Concept of index numbers
- The concept of index numbers is a widely used
concept in economy statistics, especially in the
area of short-term statistics. - Generally, the index number is a ratio of two or
more quantities measured with the same unit. - For the purposes of this study, we will consider
the index as a ratio of two or more quantities
measured in two different time points.
4Different types of indices
- Aggregate approach
- Weighted average approach
- Chained index
5Value index
- The value index is derived from the general form
of the aggregate index - If the values Vt ,V0 are estimated on the basis
of the random sample, we can write the value
index as a statistical estimator
6Value index cont.
- Estimates are generally estimated from two
different samples, which were selected from two
different populations - This fact causes departure from the classical
estimation of a ratio problem
7Target population of the study
- The basis for the simulation study were monthly
data from tax authorities for enterprises from
sections G-K of NACE classification for
2004-2005. - From these data we were able to get quite good
approximation for turnover which is often the
target variable in short-term surveys. - The target population consisted of approximately
18 000 units in each month.
8Sampling design
- For the first month, a stratified simple random
sample was selected. - The second sample was selected by rotating part
of the units out of the first sample and
replacing them with the new units. - For the rotation procedure the system of
permanent random numbers was used. - The stratification was done according to the
2-digit NACE group and size class, determined on
the basis of estimated turnover.
9Sampling design cont.
- All the large enterprises were selected with
certainty. - The target statistics was the estimator of the
turnover index - ......sampling weights for current
and base month - ......turnover for selected units in
current and base month
10Simulation
- The object of the simulation study was to explore
the sampling distribution of the estimator and to
compare three different methods for variance
estimation. - The parameters in the simulation study were
- size of the samples
- time gap between the two selected months
- the rotation rate of the second sample
11Methods for variance estimation
- Three methods for variance estimation were tested
and compared. Two of them are based on the
analytical approach. The third one uses the
re-sampling approach. - The basis for the analytical approaches was the
Taylor linearization formula for the variance
estimation of the estimator of the ratio - Using SAS procedure SURVEYMEANS we were able to
estimate and but the procedure
doesnt provide the estimation of the sampling
covariance.
12Methods for variance estimation cont.
- With the first approach we estimated the sampling
covariance by using the formula for the variance
of the sum of two random variables - With this approach we also had to construct the
common weight for the sum. We used equation
13Methods for variance estimation cont.
- With the second approach the sampling covariance
was calculated directly by using formula for
sampling covariance of estimates from two partly
overlapping samples
14Methods for variance estimation cont.
- With the third approach we used the well known
Jackknife replication method. - If we wanted to use the existing software, we had
to merge two samples into one data set and then
adequately adjust the data. - The weights from both samples had to be
composed into a common weight. - Zero values were inserted for the units which
were at a certain time point not in the sample.
15Sampling distribution
- To explore the sampling distribution of the
estimated index, we selected 10.000 partly
overlapping pairs of samples, using the same
sampling design. - The changing parameters were the size of the
samples, time lag between two time points and the
rate of rotation in the second sample.
16Sampling distribution cont.
- We expected to get the distribution that would be
at least approximately normal. - In most cases that was really the case and the
histogram looked like the following one
17Sampling distribution cont.
- We expected to get the distribution that would be
at least approximately normal. - In most cases that was really the case and the
histogram looked like the following one
18Sampling distribution cont.
- But in some cases the shape of normality
disappeared and we have been faced with clear
bimodal distribution.
19Sampling distribution cont.
- But in some cases the shape of normality
disappeared and we have been faced with clear
bimodal distribution.
20The problem of bimodality
- The source of the bimodality is the distribution
of the estimator of the total, either in the
denominator or in the enumerator. - In all the cases where the bimodality appeared at
least one of the estimators of the totals was
bimodally distributed. - For now we couldnt find the exact reason for
bimodality. It should be a subject of further
investigation.
21Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
22Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
23Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
24Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
25Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
26Comparison of the methods The case of normal
distribution
- In the case when the sampling distribution was
(approx.) normal all the methods worked quite
well. - In the picture we will show the case when we
fixed the months and the sample size and we were
only changing the rotation rate.
27Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
28Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
29Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
30Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
31Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
32Comparison of the methods The case of bimodal
distribution
- In the case of bimodal sampling distribution the
sampling variance was significantly higher. - In the pictures we show the case where the
sampling size was fixed to 4000, the rotation
rate to 0.2 and we were changing the time lag
between the months.
33Conclusions
- The second analytical method performs slightly
better but the advantage of the first method is
that it requires less tailor-made programming. - The Jackknife method slightly overestimates the
variance but we judge this bias is due to the
technical reasons of adjustment of the method and
it could be decreased. - The variability of the estimates is the lowest in
the case of the JKK method.
34Conclusions cont.
- The problem of bimodality should be further
investigated in the future. - Bimodal sampling distribution can cause serious
instability in the procedure of variance
estimation. - Bimodality of the distribution should require
different interpretation of the sampling
variation.