SiStaN in Brief - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SiStaN in Brief

Description:

Optimisation (calculating of optimal probabilities) ... Optimisation: the calculus of the inclusion probabilities (sample size and ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 29
Provided by: IST80
Category:

less

Transcript and Presenter's Notes

Title: SiStaN in Brief


1
A balanced Sampling approach for multiway
stratification design for small area
estimation Piero Demetrio Falorsi - Paolo
Righi ISTAT
2
Index
  • The issue of multivariate-multidomain sampling
    strategy
  • The proposed sampling strategy
  • Balanced sample for multiway stratification
  • Modified GREG estimator
  • The algorithm for the sample size definition
  • Application fields and experiments

3
1. The issue of multivariate-multidomain sampling
strategy
When planning a sample strategy for a survey
aiming at producing estimates for several domains
(defined as non-nested partitions of the
population) an issue is to define the sample size
so that the sampling errors of domain estimates
of several parameters are lower than given
thresholds. A sampling strategy is proposed here
dealing with multivariate-multidomain surveys
when the overall sample size must satisfy budget
constraints. The standard solution of a
stratification given by cross-classification of
the domain variables is often not feasible
because the number of strata can be larger than
the overall sample size. Moreover, even if the
overall sample size allows covering all the
strata, the resulting allocation could lead to an
inefficient design.
4
1. The issue of multivariate-multidomain sampling
strategy
  • Population
  • Planned and actual sample with cross-classificatio
    n stratification

5
1. The issue of multivariate-multidomain sampling
strategy
  • Example Business Structural Statistics
  • 36.000 cross-classification strata

6
Standard strategy
1. The issue of multivariate-multidomain sampling
strategy
  • Standard solution to obtain planned domains
    adopts cross-stratified sampling design by
    combining the domains
  • Consequences
  • when the population size in many strata is small,
    the stratification scheme could be inefficient
  • if different partitions in domains of interest
    are not nested, the allocation of the sample in
    the cross-classified strata may be substantially
    different from the optimal allocation for the
    domains of a given partition
  • the sample size to cover all strata could be too
    large for the survey economical constrains
  • dealing with surveys repeated over time,
    statistical burden may arise if there exist
    strata containing only few units in the
    population.

7

1. The issue of multivariate-multidomain sampling
strategy
  • One possible solution is the multi-way
    stratification
  • Several sophisticated solutions have been
    proposed to keep under control the sample size in
    all the categories of the stratifying variables
    without using cross-classification design. These
    methods are generally referred to as multi-way
    stratification techniques, and have been
    developed under two main approaches
  • Latin Squares or Latin Lattices schemes (Bryant
    et al., 1960 Jessen, 1970) the indipendece
    among rows and columns is supposed. these methods
    work only if all the cross-strata exist in the
    population.
  • Controlled rounding problems via linear
    programming (Causey et al., 1985 Sitter and
    Skinner, 1994). Very computationally complex
    methods, not always get to a solution, inclusion
    probability (both simple and joint) cannot be
    computed immediately.
  • The main weaknesses of these approaches derives
    from the computational complexity and moreover a
    solution is not always reached.

8
2. The proposed sampling strategy
  • Aim of this work is to define a sample strategy
    that is optimal with regard to the sample scheme
    and to the estimator utilized, by exploiting the
    available auxiliary information in both phases
  • Define a probabilistic sample method
  • Realize a multiway stratification based on
    balanced sampling, controlling the sample size of
    the margin domains
  • Use a modified GREG estimator
  • Define the sample allocation, aiming at
    controlling the sampling errors on margins, using
    a variance estimator taking into account jointly
    both the regression model under the GREG
    estimator and the balanced sampling design
  • The strategy may take into account a simple (Fay
    Herriot) Small Area Estimator
  • The proposed overall sampling strategy is easy to
    implement and a software has been developed for
    each phase
  • It is possible to extend it to different contexts
    (considering the anticipated variance or the use
    of indirect small area estimators)
  • It is possible to develop a sample strategy for
    small area estimation considering the sample and
    estimation phases jointly

9
2. The proposed sampling strategy
  • Notation
  • Denote with
  • U the population of size N
  • Ub the b-th partition in Mb domains Ubd , b1,,
    B, d1,, Mb
  • the value of the (r 1,,R) variable of
    interest in the k-th population unit
  • the domain membership indicator
  • n the overall fixed sample size
  • r-th parameter of interest

10
3. Balanced sampling and multi-way stratification
  • Balanced sampling is a class of designs using
    auxiliary information.
  • Properties have been studied in the
  • model based approach (Royall and Herson, 1973
    Valliant et al., 2000)
  • design based approach (Deville and Tillé, 2004,
    2005).
  • In the following we consider the design based or
    model assisted approach

11
3. Balanced sampling and multi-way stratification
  • Let us define the sampling design p(.) with
    inclusion probabilities
    a design which assigns a
    probability p(s) to each sample s such that
  • being a vector of sample indicators.
  • Let be a
    vector of Q auxiliary variables known for each
    unit in the population. The sampling design p(s)
    is said to be balanced with respect to the Q
    auxiliary variables if and only if it satisfies
    the balancing equations given by
  • being the sample weight

12
3. Balanced sampling and multi-way stratification
  • Multi-way stratification design can represent a
    special case of balanced design, when for unit k
    the auxiliary variable vector is the indicator of
    the belonging to the domains of the different
    partitions multiplied by its inclusion
    probability
  • The z vector, in this case, is defined as
  • the balancing equations assure that for each
    selected sample s, the size of the subsample
    is a non-random quantity and is

13
3. Balanced sampling and multi-way stratification
  • For multiway stratification the balancing
    equations become
  • being the sample size for the d-th
    domain of the b-th partition
  • and

14
3. Balanced sampling and multi-way stratification
  • A relevant drawback of balanced sampling has
    always been implementing a general procedure
    giving a multivariate balanced random sample.
  • Deville and Tillé (2004) proposed a sample
    selection method (cube method) drawing a balanced
    samples for a large set of auxiliary variables
    and with respect to different vectors of
    inclusion probabilities.
  • A free macro for the selection of balanced
    samples for large data sets may be downloaded
    (SAS or R routine)
  • http//www.insee.fr/fr/nom_df_met/outils_stat/cube
    /accueil_cube.htm
  • Deville and Tillé (2000) show that with our
    specification of the auxiliary vectors, the
    balancing equations can be exactly satisfied,
    while in general the balancing equation are
    approximately respected

15
4. Modified GREG estimator
  • In the context of multi-variate estimation, the
    r-th parameter of interest is
  • The modified GREG estimator is (through a
    specific domain weight)
  • The superpopulation working model is

16
Variance of the Horvitz-Thompson estimator with
the balanced sampling
4. Modified GREG estimator variance
  • Deville and Tillé (2005) proposed an
    approximation of the variance expression for HT
    estimator and the overall domain
  •  
  • with

17
4. Modified GREG estimator variance
  • Starting from the result by Deville (2005) it is
    possible to derive the approximate expression of
    the variance for the modified GREG estimator
    under balanced sampling
  • being
  • and

18
5. The algorithm for the sample size definition
  • In order to calculate the inclusion probabilities
    it is necessary to fix the sample size for each
    domain so that the constraints on the sampling
    errors were accomplished
  • When considering separately each marginal
    partition we would have for each of them a
    different set of inclusion probabilities
  • In our methodology we calculate a single
    inclusion probability through a two step
    procedure
  • Optimisation (calculating of optimal
    probabilities)
  • Calibration (calculating of working
    probabilities)

19
5. The algorithm for the sample size definition
  • Optimisation the calculus of the inclusion
    probabilities (sample size and domain allocation)
    is carried out with the aim of minimizing the
    expected sampling errors on several domains and
    estimates
  • Multi domains
  • Multi variable
  • The problem is solved through the system

The solution can be obtained through the Chromy
algorithm (the one used in the software for
allocation MAUSS, which can be can be downloaded
from www.istat.it)
20
5. The algorithm for the sample size definition
  • Calibration optimal inclusion probabilities lead
    to non integer values for the domain sample size
  • Rounding of the expected domain sample size to
    next integer
  • Calculating working probabilities nearest to
    the optimal ones
  • The problem is defined through the system

Solution obtained by means of the Newton
algorithm (with some change), the same used in
calibration software Genesees which can be can
be downloaded from www.istat.it)
21
21
6. Application fields and experiments Artificial
data
  • Population Contingency table
  • Variable for the allocation and estimation model

,
22
6. Application fields and experiments Artificial
data
22
  • Compared sampling designs and expected CV()

23
6. Application fields and experiments
  • Real data
  • A simulation on real enterprises data (N10,392)
    has been carried out to evaluate the effects of
    planned sample size for small domain of estimate
    (Falorsi et al., 2006)
  • U1 partition regions (20 domains)
  • U2 partition economic activity by size class (24
    domains)
  • Cross-classification strata with population
    units 360.
  • Variables of interest value added and labour
    cost
  • the sample sizes of U1 and U2 partitions have
    been planned separately by means of a compromise
    allocation
  • the 2 allocations guarantee a CV of 34.5 for U1
    and 8.7 for U2 with regard to the variables
    number of employers (supposed known at sampling
    stage)
  • the overall sample size is n360

24
6. Application fields and experiments Real data
  • The experiment examines a situation
    characterizing many real survey contexts in which
    the overall sample size n is fixed and the
    marginal sample sizes are determined by a quite
    simple rule being a compromise between the
    Allocation Proportional to Population size (APP)
    and the allocation uniform for each domain of a
    given partition
  • The probabilities of both designs for U1 and U2
    partitions have been obtained as solution of the
    calibration problem below where the initial
    probabilities are set uniformly equal to

25
6. Application fields and experiments Real data
26
7. Extension to the Fay Herriot Model
26
27
7. Extension to the Fay Herriot Model
27
28
7. Extension to the Fay Herriot Model
28
Write a Comment
User Comments (0)
About PowerShow.com