Title: Sin t
1Using Selective Editing Combined with an
Automatic System in the FSS of SpainDolores
LorcaNational Statistical Institute of Spain
2Summary
- An integrated editing process that combines
selective editing and the generalized edit and
imputation system Banff - We use Banff to detect the suspicious units and a
score function for selective editing - Spanish FSS the different types of data (crop,
livestock, employment) contribute to the
complexity of this process - Some results obtained from the traditional
microediting approach and from selective editing
are compared
3Traditional microediting approach
- The subject matter expert specifies the edits
- The processing department makes tailored-made
programs for each survey to detect the edit
failures - The edit failures are manually reviewed
4New integrated edit and imputation process
1) Initial editing prior to selective
editing 2) Selective editing procedure 3)
Automatic system process (BANFF)
51) Initial editing prior to selective editing
Controls of consistence are established in the
data collection phase carried out by interviewers
62) Selective editing procedure
Score functions are built to determine and
prioritize the survey suspect units to be
reviewed manually due to their significant weight
on the final estimates
73) Automatic system process (BANFF).
The automatic system process is carried out
using the generalized system Banff, developed by
Statistics Canada
8Study caseFarm Structure Survey (FSS)
- The Spanish FSS collects different types of data
such as - Utilised agricultural land
- Cultivated Land by kind of crop
- Types of livestock
- The structure and the amount of farm employment
- Machinery and equipment
9The main characteristics of the FSS
- FSS is carried out every 2 years
- It consists of a farm panel drawn from the last
Agrarian Census - The sample design is a single stage design with
stratification of the farms according to
geographical area, type of farming (TF) and size - Data collection is carried out by interviewers
-
10FSS Estimators total estimate of the jth
variable in stratum h
Fh is the sample weight for the stratum
h nh is the sample size in stratum h Xhji
denotes the jth variable value for the sampled
unit i in stratum h.
11Initial editing prior to selective editing
-
- Initial editing is carried out by interviewers in
the NSIs provincial offices - In this phase, all fatal errors are corrected
Most of these fatal errors come from balance
edits
12Selective editing procedure
- The goal To select the survey units with
suspicious values that may have a significant
effect on survey estimates - Key variable chosen
- Utilized Agricultural Land (UAL), Cultivated
Land (CL), - Woody Crops (WCs), Olive Grove (OG),Vineyard
(VY), - Animal Units (AU), Annual Labour Units (ALU)
13Selective editing crop variables
- Relative stability over time
- Anomalous variations, from the previous year to
the current one, can be a sign of data errors - We determine the units with anomalous and
significant variations of the selected crop
variables
14Steps of selective editing procedure crop
variables
- 1) In each stratum, we obtain the units with
anomalous variations with respect to the previous
period of the analyzed variables, using the
Hidiroglou-Berthelot (1986) method of outlier
detection (PROC OUTLIER of Banff system) - 2)The units for manual editing are selected among
- the outliers identified previously having a
- significant weight on the population total
estimates - using a score function
-
15(1) step Hidiroglou-Berthelot method PROC
OUTLIER of BANFF system
16(1) step Hidiroglou-Berthelot method
17(1) step Hidiroglou-Berthelot method Effect
ehji for each unit i ehjishji(max(Fht-1 xhjit
, Fht-1 xhjit-1 ))exp exp1
18(1) step Hidiroglou-Berthelot method M, Q1,Q3
median, the first quartile and the third quartile
of the transformed ehji values of the variable
being processed dQ1max(M-Q1,AM) dQ3max(Q3-
M,AM)
19(1) step Hidiroglou-Berthelot method (M-C dQ1
,MCdQ3) C5
20(2) step scaled local score function (Latouche
and Berthelot 1992)
21Setting threshold value (Lawrence and Mckenzie
2000)
ahj is the threshold value of the jth variable in
stratum h, SE(Xhj) is the standard error of the
jth variable in stratum h, nh is the sample size
in stratum h and k is a value such as
22- Using the Lawrence and Mckenzie formula ensures
that the bias due to not editing some of the
survey units is less than k of the variance of
the estimate. The value of k is set to 10
23- Within eachs stratum,the values ?hji are sorted
in descending order - Then, the outliers with score ?hji gt ahj are
selected for manual editing
24Selective editing employment variable
- ALU variable
- One ALU is equivalent to the work carried out by
one person on a full-time basis over one year - Using auxiliary information to estimate the
expected amended value the ratio between the
employment number in agriculture obtained in t
and t-2 through the Force labour Survey (FLS)
25Selective editing score function
26Selective Editing livestock variables
- The FSS collects the existing livestock in
- the farm on the day of the interview
- A farm can have a strong livestock variation
depending on the interview date - The selective editing procedure for livestock
- is different to the rest of variables
-
27Animal Units (AU)
- Livestock data are expressed in AUs which
- are obtained by applying a coefficient to
- each species and type in order to group
- different species in one common unit
-
28Steps of Selective Editing procedure livestock
1) Units that fail some of the edits, which are
specified in the traditional microediting
approach, are selected as suspicious units 2) For
each suspicious unit or edit failure, an estimate
of the expected amended response of AU variable
is calculated 3) We determine, among the
suspicious units detected at the previous step,
those units with a significant weight on the
total estimate of the AU variable
29Edits specified in the traditional microediting
approach
yhji lt chj yhji is the jth variable
(types of livestock) for the unit i in stratum h
chj is a constant determined by the historical
empirical distributions
30- Estimate of the expected amended response of AU
variable - chj expressed in AU, i.e. xhji
- Magnitude of failure for the suspicious unit i
- ehjixhji-xhji
31Selective editing score function
32- The threshold is calculated using the Lawrence
and Mckenzie formula as in previous cases - Within each stratum,the values ?hji are sorted
in descending order - Then, the edit failures with score ?hji gt ahj are
selected for manual editing
33Global score function
G?himax j(?hji )
34Macroediting and selective editing approach
- In first place, a selection of the strata with
the largest variation with respect to the
previous period of the analysed variables is
carried out - After, the steps of selective editing procedure
are applied only to the farms of the selected
strata
35Macro-editing approach
36threshold value for the strata
- In each region, the ?hj values are sorted in
descending order. - We determine a threshold value, ?j and strata
with ?hj gt?j are selected - This threshold value is set to 3.
37Results
- Farm number 3690
- We compare the results obtained for the following
editing procedures - (A) Traditional microediting approach
- (B) Selective editing procedure
- (C) Macroediting and selective editing
- approach
38Table 1
39Table 2Change rates of total estimate
for the CL variable ()
40Table 3
41Further research
- Banff will be applied to the rest of units that
have not been edited in the selective editing
procedure - Different methods of imputation will be tested
42Final remarks
- Integrating the PROC OUTLIER of Banff to detect
suspicious units and a score function to select
units for manually editing has been useful in
the Spanish FSS -
- Reduction in cost and processing time would be
attained using this approach - Response burden is reduced from carrying out less
number of recontacts