Multistage Sample Design - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Multistage Sample Design

Description:

To produce monthly unemployment estimates for the nation and for states. ... in Southwest, Cuban Americans in Florida, and Puerto Rican Americans in New York. ... – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 26
Provided by: barbara177
Category:

less

Transcript and Presenter's Notes

Title: Multistage Sample Design


1
  • Multistage Sample Design
  •  
  • Basic ideas
  •  
  • Two-stage cluster sampling can be extended to
    multistage sampling to facilitate the sampling
    needs and requirements.
  •  
  • Different frames are used at different stages.
    It is possible to use area frames at certain
    stages and time frames at other stages. It is
    also possible to use more than one frame at any
    stage.

2
  • Different sampling procedures can be used at
    different stages, and stratification can be
    introduced at any stage.
  •  
  • Sampling rates at different stages are often
    determined to accomplish certain design
    objectives such as self- weighting, oversampling
    of certain units, reduction of design effect,
    and accommodation of fieldwork requirements.
  •   Two examples (Current Population Survey and the
    Third National Health and Nutrition Examination
    Survey) are reviewed below.

3
  • I. Current Population Survey (CPS)
  •  
  • Background
  •  
  • The first attempt to design a large-scale
    probability sample survey was made in 1937 by
    Work Progress Administration (WPA) and the first
    survey was conducted in 1940 to estimate the
    unemployment rate.
  •  
  • In 1942 this survey function was transferred
    to the Bureau of the Census. The design
    included 68 PSUs, which covered 125 counties and
    cities. In 1945 it was expanded to include
    25,000 households and 21,000 were usually
    interviewed.

4
  • In 1954, CPS was expanded to include 230 PSU to
    cover 25,000 households, and the overall
    sampling ratio was 1/2245. In 1967, it was
    redesigned to include 449 PSUs to cover nearly
    60,000 households or over 100,000 persons of 16
    years old and over. It had at least some
    coverage in every state and the overall sampling
    ratio was 1/1170.
  •   The current design was introduced in 1996,
    which includes about 60,000 households from 754
    primary sampling areas.

5
  • Design objectives of CPS
  • To produce monthly unemployment estimates for
    the nation and for states. This survey is also
    used to collect current demographic data such as
    migration, school enrollment, and family size.
  •  

6
  • It is designed to maintain a 1.9 of CV
    (coefficient of variation) on national monthly
    estimates of unemployment rate. This translates
    into a change of 0.2 percentage point in the
    unemployment rate being significant at a 90
    confidence level. For each state, the design
    maintains a CV of at most 8percent on the
    annual average estimate of unemployment level,
    assuming a 6 unemployment level.

7
  • Sample design of CPS
  •   The entire area of the US, consisting of 3,141
    counties and independent cities, is divided into
    2,007 PSUs. The PSUs are grouped into 754
    strata, of which 428 contain one populous PSU,
    and they are automatically selected to form
    self-representing strata (or certainty strata).
    The remaining PSUs strata are stratified into
    326 strata that are similar in several
    population characteristics. One PSU is selected
    from each of non-representing strata by PPS
    sampling procedure.

8
  • Within PSU sampling is based on census blocks
    that are bounded primarily by streets in urban
    areas and by other prominent physical features
    such as rivers or railroad tracks in rural
    areas. Census blocks are grouped into three
    strata Unit (regular housing units), Group
    (group living quarters), and Area (open
    country). Blocks are sorted within strata by one
    or two census characteristics such as proportion
    of female heads of household and/or proportion
    of owner occupied households. Within blocks
    housing units (1990 census data updated by
    building permits issued since 1990) are sorted
    geographically and grouped into clusters of four
    households.

9
  • The state level sampling rates are determined
    based on population size and the reliability
    requirements described above. They range
    roughly from 1 in every 100 households to 1 in
    very 3000 households. Then the within-PSU
    sampling rate is determined based on the PSU
    level sampling ratio and the overall state level
    sampling rate. For example, for a PSU with a
    probability of selection of 1 in 5 and the state
    sampling ratio of 1/500, the within PSU sampling
    ratio is 1 in 100.

10
  • A systematic sample of these clusters are
    selected using the within PSU sampling ratio.
    Then all eligible persons in selected clusters
    are interviewed.

11
  • Rotation Scheme in CPS
  • The sample within PSU is divided into 8
    subsamples or rotation groups. One subsample is
    replaced each month by taking a new sample.
  •  
  • A given subsample remains in the sample for 4
    consecutive months, leaves the sample during the
    following 8 months, and returns to the sample
    for another 4 consecutive months.
  • Under this system of rotation, 75 percent of
    the sample is common from month to month and 50
    percent from year to year for the same month.
  •  

12
  • Estimation in CPS
  •  
  • The non-interview adjustment is made
    separately for clusters of similar sample areas.
  • Post-stratification adjustment is made in two
    stages (at the PSU level and the household
    level) based on population census updated with
    postcensal births and deaths.

13
  • Composite estimation procedure is used to
    produce estimates, which is based on a weighted
    average of current estimate based on the entire
    sample and the composite estimate of previous
    month plus the monthly change estimated based on
    6 rotation groups common to both months. A bias
    adjustment term is also added to the composite
    estimate to correct for somewhat high
    unemployment estimates for persons in their
    first and fifth months of interview. The
    year- to-year overlap in the sample would also
    stabilize the estimates, although this change is
    not included in the composite estimation
    procedure.

14
  • References for CPS
  •   Bureau of Labor Statistics,
  • URL http//stats.bls.gov/cpstn.htm
  •  
  • Bureau of the Census, The Current Population
    Survey Design and Methodology, Technical Paper
    No. 40.

15
II. National Health and Nutrition Examination
Survey (NHANES) B Background      Three rounds
of health examination surveys were conducted in
1960s to produce information for the Nations
health status.     Beginning in 1970 a large
nutrition component was added to the basic
design, and the name was changed to
NHANES.  
16
  • A special survey of Hispanic population
    (HHANES) was conducted in 1982-84, covering
    Mexican Americans in Southwest, Cuban Americans
    in Florida, and Puerto Rican Americans in New
    York.
  •   NHANES III was conducted in 1988-94 as the
    seventh in a series of national examination
    studies.

17
S Survey objectives of NHANES III A . A
prevalence statistic of 10 percent should have a
relative standard error (RSE) less than 30
percent. A     Differences of at least 10
percent in health or nutrition statistics
between any two sub-domains should be detected
with a type I error of no more than 0.05 and a
type II error of no more than 0.10. A set of 52
sub-domains is defined based on gender, age
group, and three race-ethnic groups (Black,
White and all other, and Hispanic).  
18
T To meet the above precision requirements, the
sample size for each of the defined sub-domains
was determined to be 560 or greater. Adjusting
for the design effect, the required sample size
for both black and Hispanic is 9000, and 12000
for white and all other persons. To yield the
required sample size, a total of about 40,000
persons are sampled. T The number of sample
persons selected at each survey site turned out
to be somewhere between 300 and 600, with an
average of approximately 450, yielding an
expected 340 examined persons. T The minimum
time to complete fieldwork at any site is 4
weeks.
19
  • Sample design of NHANES III
  •  
  • The target population is the total civilian
    non- institutionalized population, 2 months of
    age or over, in the 50 states of the US.
  •  

20
  • In 1st stage, 81 PSUs are selected from 2,812
    PSUs defined based on counties or
    combined counties. They are divided into 47
    strata, of which 13 are certainty strata each of
    which contains 1 large urban county. From the
    remaining 34 strata 2 PSUs are selected by PPS
    sampling without replacement. The 13 large
    counties are rearranged into 21 survey sites,
    subdividing some large counties.
  • The 89 sample areas are randomly divided into 2
    sets. The 44 sites in the first set were
    surveyed in 1988- 91 (Phase I), and the 45 sites
    in the second set were surveyed in 1991-94
    (Phase II). Each phase sample is an independent
    sample.

21
         In 2nd stage, sampling within each of
selected PSUs, a sample of area segments (census
blocks) are selected within each strata based on
population density and the percent of Hispanic
population. Various controlled selection
procedures are used to provide a self-weighting
sample for all sex-age sub-domains of both black
and white and all other persons. In Phase I,
segments are defined based on 1980 census data
supplemented with building permits issued since
1980. In Phase II, segments are defined based on
1990 census data. There are 24 segments selected
in most sample areas.
22
  • IIn 3rd stage, households and certain types of
    group quarters are selected. All households in
    the sample segments are listed, and a subsample
    of households and group quarters is designated
    for screening to identify potential sample
    persons for interviews and examinations. The
    subsampling rates are designed to produce a
    national, approximately equal, probability sample
    of households, with higher rates for the
    geographical strata with high minority
    concentrations.

23
  • In 4th stage, eligible persons are selected
    within households. All eligible persons within
    the screened households are listed and a
    subsample of individuals is selected based on
    sex, age, and race-ethnicity. Oversampled
    segments of population include young persons,
    elderly persons, blacks, and Hispanics.

24
E Estimation procedures in NHANES III   The
sample weight is calculated for each individual
in the sample, which is the product of three
component weights inverse of the probabilities
of selection, nonreresponse adjustment, and
poststratification ratio adjustment.  Separate
weights are developed for the interviewed
persons, examined persons, and special subsampled
persons. Weights are also available for Phases I
and II separately. TTo facilitate variance
estimation, data are restructured to have two
PSUs in each stratum.
25
  • References for NHANES III
  •  
  • NCHS, Sample Design Third National Health and
    Nutrition Examination Survey, Vital and
    Health Statistics, Series 2, No. 113, 1992.
  • NCHS, Plan and Operation of the Third National
    Health and Nutrition Examination Survey, Vital
    and Health Statistics, Series 1, No. 32, 1994.
Write a Comment
User Comments (0)
About PowerShow.com