Title: Rudi Seljak, Metka Zaletel
1 TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION
OF THE SHORT-TERM SURVEYS RESPONSE BURDEN
- Rudi Seljak, Metka Zaletel
- Statistical Office of the Republic of Slovenia
2Introduction
- Through the recent years the national statistical
institutes have been constantly confronted with
two challenges, which are especially outstanding
in the case of the short-term business surveys - How to improve the timeliness of the published
data - How to decrease response burden and the survey
costs - One of the lately most frequently used ways to
fulfill at least some of these demands is a
convenient use of different types of
administrative data. - A lot of offices is in the last years exploring
the possibilities of using the TAX data, which
are originally used for the monthly settlement
of the value added tax (VAT), for the purposes of
the turnover indices estimation.
3Introduction contd
- Statistical Office of the Republic of Slovenia
(SORS) began to carry out the first systematic
studies in this area in year 2005. - In 2005 the feasibility study was carried out,
which explored the possibilities of using the VAT
data for the purposes of the turnover indices
estimation in the wholesale trade activity. - On the basis of the results of this study the
fundaments of the new methodology were set up. - This methodology was then adopted and applied to
some other areas.
4Main features of the new methodology
- One of the significant changes in the new
methodology was the movement from the random
sampling to the cut-off sampling procedure. - The sampling error is replaced with the bias
due to the omission of the part of the
population. One of the goals of the feasibility
study was the estimation of the range of this
bias. - The new methodology combines two types of data.
For the small number of the largest units the
classical post survey is carried out. For the
majority of the units the VAT data are used.
The statistical data processing is therefore
significantly changed.
5Feasibility study
- In the feasibility study we simulated the data
collection process under the new methodology for
all the months in 2003-2005 and then compared the
new results with the originally published
results. - The level of the turnover sometimes differed
essentially but the movement, expressed in the
form of the indices, was in most cases quite
coherent. - As the main indicator of the coherence of the
index time-series we used the coefficient of
correlation. With the exception of some smaller
domains, the coefficient was around 0.9. - For the problematic domains we increased the
number of units to be surveyed.
6Comparison of time series obtained by two
different methodologies
Month-to-month indices in Wholesale Trade
activity
7Main steps of the process
- Selection of the set of the observational units
- Selection of the set of the units to be surveyed
- Collection and editing of survey data
- Merging survey and administrative data
- Detection of outlying values by using the
Hideroglou-Berthelot method - Imputation for non-response
- Aggregation and calculation of processing quality
indicators
8Selection process
- The whole procedure is carried out in two steps.
In the first step the units of the target
population are determined and then in the second
step the units for which the data will still be
obtained by the classical survey are selected. - In the first step the units which fulfill one of
the following criteria are selected - The semi-annual turnover of the unit is more than
100,000 EUR. - The semi-annual turnover of the unit is more than
50,000 EUR and the unit has at least 3 employees. - The unit has at least 6 employees.
9Selection process contd
- For the smaller part of the units, the data is
still obtained by the post survey. - For the selection of the units to be surveyed,
the target population is firstly sorted by the
descending turnover in each of the activity
groups. - Then so many of the largest units of the group
are selected that the share of the turnover of
the selected units exceeds the target share of
the total turnover. The target share slightly
differs between the activity groups, but it is
generally between 50 and 60. - The number of units to be surveyed is
approximately 2 of the whole target population.
10Selection process - schematic presentation
Admin. data
Selected units
Sorted data
Units to be surveyed
Target population
Business register
Units for the admin. data
Survey data
11Merging data from different sources
- The data are entering the process by using the
two different channels. Each of the set of the
data is firstly separately edited by using some
consistency checks. - Data from different sources are merged into one
table and each data on turnover is assigned with
the suitable status. - This status contains information about the data
collection method and the information whether the
data was corrected through the editing process or
not. - The values of the status are assigned according
to the standard 4-digit classification used at
the SORS.
12Merging data from different sources schematic
presentation
Survey data
13Statistical editing
- When the data are merged together we use
Hidiroglu-Berthelot method to detect the
outliers. - The methods explores the distribution of
month-to-month growth rate to find extreme
values. - The main goal is to detect the extreme leaps in
the turnover, estimated from the VAT data. These
leaps are usually the consequence of the
methodological difference between administrative
and statistical data. - Such problems mostly occur in the case when the
enterprise sells the real property. This purchase
money is reported to the tax authorities but it
shouldnt be included in the turnover.
14Imputation procedures
- In the imputation process we impute the missing
values as well as the values which were in the
statistical editing process designated as the
extreme values - Three different imputation methods are used
- Estimation of monthly data from quarterly data
(only at the end of each quarter). - Historical Trend Method (only for the units with
the data from previous month). - Mean Value method.
- For each imputed data, through the values of the
statuses the reason for imputation as well as the
imputation method is recorded.
15Editing and imputation schematic presentation
Detection of outliers
Imputed data
H-B method
Imputations
16Quality indicators
- Using the values of the statuses, where all the
process changes were recorded, the set of
quality indicators is automatically calculated. - Two types of quality indicators are calculated
micro and macro indicators. - An example of the micro indicators is the
imputation rate, which is defined as the rate of
the data which have been imputed through the
process. - An example of the macro indicators is the
relative difference between the index calculated
from all the data and the index calculated from
the non-imputed data.
17Quality indicators contd
- All the quality indicators are calculated
automatically and inserted into the excel
spreadsheet template. The indicators for the last
13 months could also be presented graphically.
18Quality indicators contd
- One of the macro indicators compares indices,
calculated from the whole set of data with the
indices calculated just from the survey data
and the indices, calculated just from the admin
data.
19Benefits of the new system
- The new methodology represents a radical change
in the process of the production of the
short-term indices. - Although there are some deficiencies of the new
system, the benefits far overcome them. - The largest benefit of the new methodology is the
essential reduction of the response burden as
well as the reduction of the survey costs. - To quantify the benefits of the new methodology
we estimated the burden and cost reduction, both
of them expressed in the man-days unit. - The estimation was done for two areas Hotels and
restaurants and Services. - In the chart we present the cost and burden for
year 2006, when the old methodology was still
used, compared with the year 2007 when we
launched the new methodology.
20Respond burden and cost reduction
21Conclusions
- SORS started to implement the new methodology for
the estimation of the monthly turnover indices in
2006. - The new methodology combines two different
sources. Survey data for smaller part and
administrative data for larger part of the units.
- Allthough there are differences in the
methodological definitions of the turnover, all
the studies showed that the admin data could be
well used for the purposes of the short-term
statistics. - The new methodology means an essential decrease
of the costs and the response burden. - The new methodology is planned to be widened to
the retail trade activity in year 2008.