Sin t - PowerPoint PPT Presentation

About This Presentation
Title:

Sin t

Description:

An integrated editing process that combines selective editing and the ... Controls of consistence are established in the data collection phase carried out ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 43
Provided by: INE64
Learn more at: https://unece.org
Category:
Tags: consistence | sin

less

Transcript and Presenter's Notes

Title: Sin t


1
Using Selective Editing Combined with an
Automatic System in the FSS of SpainDolores
LorcaNational Statistical Institute of Spain
2
Summary
  • An integrated editing process that combines
    selective editing and the generalized edit and
    imputation system Banff
  • We use Banff to detect the suspicious units and a
    score function for selective editing
  • Spanish FSS the different types of data (crop,
    livestock, employment) contribute to the
    complexity of this process
  • Some results obtained from the traditional
    microediting approach and from selective editing
    are compared

3
Traditional microediting approach
  • The subject matter expert specifies the edits
  • The processing department makes tailored-made
    programs for each survey to detect the edit
    failures
  • The edit failures are manually reviewed

4
New integrated edit and imputation process
1) Initial editing prior to selective
editing 2) Selective editing procedure 3)
Automatic system process (BANFF)
5
1) Initial editing prior to selective editing
Controls of consistence are established in the
data collection phase carried out by interviewers

6
2) Selective editing procedure
Score functions are built to determine and
prioritize the survey suspect units to be
reviewed manually due to their significant weight
on the final estimates
7
3) Automatic system process (BANFF).
The automatic system process is carried out
using the generalized system Banff, developed by
Statistics Canada
8
Study caseFarm Structure Survey (FSS)
  • The Spanish FSS collects different types of data
    such as
  • Utilised agricultural land
  • Cultivated Land by kind of crop
  • Types of livestock
  • The structure and the amount of farm employment
  • Machinery and equipment

9
The main characteristics of the FSS
  • FSS is carried out every 2 years
  • It consists of a farm panel drawn from the last
    Agrarian Census
  • The sample design is a single stage design with
    stratification of the farms according to
    geographical area, type of farming (TF) and size
  • Data collection is carried out by interviewers

10
FSS Estimators total estimate of the jth
variable in stratum h
Fh is the sample weight for the stratum
h nh is the sample size in stratum h Xhji
denotes the jth variable value for the sampled
unit i in stratum h.
11
Initial editing prior to selective editing
  • Initial editing is carried out by interviewers in
    the NSIs provincial offices
  • In this phase, all fatal errors are corrected
    Most of these fatal errors come from balance
    edits

12
Selective editing procedure
  • The goal To select the survey units with
    suspicious values that may have a significant
    effect on survey estimates
  • Key variable chosen
  • Utilized Agricultural Land (UAL), Cultivated
    Land (CL),
  • Woody Crops (WCs), Olive Grove (OG),Vineyard
    (VY),
  • Animal Units (AU), Annual Labour Units (ALU)

13
Selective editing crop variables
  • Relative stability over time
  • Anomalous variations, from the previous year to
    the current one, can be a sign of data errors
  • We determine the units with anomalous and
    significant variations of the selected crop
    variables

14
Steps of selective editing procedure crop
variables
  • 1) In each stratum, we obtain the units with
    anomalous variations with respect to the previous
    period of the analyzed variables, using the
    Hidiroglou-Berthelot (1986) method of outlier
    detection (PROC OUTLIER of Banff system)
  • 2)The units for manual editing are selected among
  • the outliers identified previously having a
  • significant weight on the population total
    estimates
  • using a score function

15
(1) step Hidiroglou-Berthelot method PROC
OUTLIER of BANFF system
16
(1) step Hidiroglou-Berthelot method
17
(1) step Hidiroglou-Berthelot method Effect
ehji for each unit i ehjishji(max(Fht-1 xhjit
, Fht-1 xhjit-1 ))exp exp1
18
(1) step Hidiroglou-Berthelot method M, Q1,Q3
median, the first quartile and the third quartile
of the transformed ehji values of the variable
being processed dQ1max(M-Q1,AM) dQ3max(Q3-
M,AM)
19
(1) step Hidiroglou-Berthelot method (M-C dQ1
,MCdQ3) C5
20
(2) step scaled local score function (Latouche
and Berthelot 1992)
21
Setting threshold value (Lawrence and Mckenzie
2000)

ahj is the threshold value of the jth variable in
stratum h, SE(Xhj) is the standard error of the
jth variable in stratum h, nh is the sample size
in stratum h and k is a value such as
22
  • Using the Lawrence and Mckenzie formula ensures
    that the bias due to not editing some of the
    survey units is less than k of the variance of
    the estimate. The value of k is set to 10

23
  • Within eachs stratum,the values ?hji are sorted
    in descending order
  • Then, the outliers with score ?hji gt ahj are
    selected for manual editing

24
Selective editing employment variable
  • ALU variable
  • One ALU is equivalent to the work carried out by
    one person on a full-time basis over one year
  • Using auxiliary information to estimate the
    expected amended value the ratio between the
    employment number in agriculture obtained in t
    and t-2 through the Force labour Survey (FLS)

25
Selective editing score function

26
Selective Editing livestock variables
  • The FSS collects the existing livestock in
  • the farm on the day of the interview
  • A farm can have a strong livestock variation
    depending on the interview date
  • The selective editing procedure for livestock
  • is different to the rest of variables

27
Animal Units (AU)
  • Livestock data are expressed in AUs which
  • are obtained by applying a coefficient to
  • each species and type in order to group
  • different species in one common unit

28
Steps of Selective Editing procedure livestock
1) Units that fail some of the edits, which are
specified in the traditional microediting
approach, are selected as suspicious units 2) For
each suspicious unit or edit failure, an estimate
of the expected amended response of AU variable
is calculated 3) We determine, among the
suspicious units detected at the previous step,
those units with a significant weight on the
total estimate of the AU variable
29
Edits specified in the traditional microediting
approach
yhji lt chj yhji is the jth variable
(types of livestock) for the unit i in stratum h
chj is a constant determined by the historical
empirical distributions
30
  • Estimate of the expected amended response of AU
    variable
  • chj expressed in AU, i.e. xhji
  • Magnitude of failure for the suspicious unit i
  • ehjixhji-xhji

31
Selective editing score function
32
  • The threshold is calculated using the Lawrence
    and Mckenzie formula as in previous cases
  • Within each stratum,the values ?hji are sorted
    in descending order
  • Then, the edit failures with score ?hji gt ahj are
    selected for manual editing

33
Global score function
G?himax j(?hji )
34
Macroediting and selective editing approach
  • In first place, a selection of the strata with
    the largest variation with respect to the
    previous period of the analysed variables is
    carried out
  • After, the steps of selective editing procedure
    are applied only to the farms of the selected
    strata

35
Macro-editing approach

36
threshold value for the strata
  • In each region, the ?hj values are sorted in
    descending order.
  • We determine a threshold value, ?j and strata
    with ?hj gt?j are selected
  • This threshold value is set to 3.

37
Results
  • Farm number 3690
  • We compare the results obtained for the following
    editing procedures
  • (A) Traditional microediting approach
  • (B) Selective editing procedure
  • (C) Macroediting and selective editing
  • approach

38
Table 1

39
Table 2Change rates of total estimate
for the CL variable ()
40
Table 3

41
Further research
  • Banff will be applied to the rest of units that
    have not been edited in the selective editing
    procedure
  • Different methods of imputation will be tested

42
Final remarks
  • Integrating the PROC OUTLIER of Banff to detect
    suspicious units and a score function to select
    units for manually editing has been useful in
    the Spanish FSS
  • Reduction in cost and processing time would be
    attained using this approach
  • Response burden is reduced from carrying out less
    number of recontacts
Write a Comment
User Comments (0)
About PowerShow.com