Survey Sampling 2 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Survey Sampling 2

Description:

Methods of selection in which the sampling unit, the unit of selection, ... Leslie Kish, Survey Sampling, Joh Wiley & Sons, Inc. ???,??,???,????,?????????,2002?? ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 41
Provided by: MCSY4
Category:
Tags: kish | sampling | survey

less

Transcript and Presenter's Notes

Title: Survey Sampling 2


1
Survey Sampling (2)
  • Spring 2007
  • Shuaizhang Feng

2
  • Cluster Sampling
  • (????)

3
Cluster Sampling (????)
  • Methods of selection in which the sampling unit,
    the unit of selection, contains more than one
    population element.
  • In this part, we only talk about simple cluster
    sampling (each cluster contain the same number of
    elements clusters are chosen randomly all
    selected elements are included in the sample)

4
  • Suppose there are A clusters in the population a
    clusters are selected.
  • Each cluster contains B elements.
  • Thus, the sample size is naB
  • Population size is NAB

5
  • Sample mean is also the mean of the a cluster
    means.

6
More sample stats
  • Cluster means
  • Cluster totals
  • Between-cluster variance
  • Within-cluster variance

7
Total variance can be decomposed as a weighted
average of the between and within variances
8
  • Population stats can be defined similarly.
  • Between variance
  • Within variance

9
  • Property 1 Sample mean is an unbiased estimator
    of the population mean.

10
In terms of variance of the estimator, the
situation is exactly the same as in srs. Thus, we
have
11
  • Property 2 An unbiased estimator of sample
    variance is

12
Note
  • This simple case of cluster sampling is very
    similar to srs of elements.
  • The precision of the estimator depends on between
    cluster variance only. Thus, when selecting
    clusters, we want to minimize between variance,
    or equivalently, maximize within variance.
  • Unfortunately in many cases, clusters are
    naturally formed. For example, county, classes,
    etc..

13
Comparing cluster sampling with element sampling
  • The cost per element is lower
  • The element variance is higher
  • The costs and problems of statistical analysis
    are greater

14
Coefficient of intraclass correlation,Roh (rate
of homogeneity)
15
(No Transcript)
16
  • Note that the variance of the estimator can be
    expressed in terms of Roh. If A is large, then

17
Design effect of cluster sampling
18
Discussions
  • What does deff0 mean?
  • Show that roh can only take values in the range
    -1/(B-1),1

19
Systematic Sampling (????)
20
What is systematic sampling?
  • Taking every k-th sampling unit after a random
    start
  • Most widely known selection procedure
  • Sometimes called Pseudo-random selection
  • Easy to use in practice

21
Let the population size be N. It can be arranged
as follows (A row, K columns)
22
Choose a random number from 1-k, let the number
be r (in this example r2), starting from it,
select sampling units every at the interval of K.

23
  • Population size NAK
  • Sample size nA
  • Sampling properties depend on how population
    units are listed.

24
Special Case Simple Random Model
  • Population units are listed randomly.
  • Think of playing poker games. Why do you always
    insist on shuffling the cards thoroughly?
  • In this case, we can treat systematic sampling as
    stratified sampling if we treat each row as a
    stratum.
  • Alternatively, we can treat it as cluster
    sampling if we treat each column as a cluster.

25
The simple way to go
  • Since units are randomly distributed in the
    population, any selection rule will result srs,
    hence all the results will apply.
  • Alternatively, we could do it hard way.

26
Treating it as stratified sampling
  • There are A strata, from each stratum we take
    only one sampling unit (element).
  • Sampling mean is an unbiased estimator of the
    population mean.

27
Note that the result is the same as srs!
(Why??) (within stratum variance is the same as
total variance)Note that the within variance
can not be estimated directly since theres only
one element per stratum. This is not a problem
here for simple random model.
28
Treating it as cluster sampling
  • There are K clusters. We choose only one cluster
    (cluster r) from them.
  • Sampling mean is an unbiased estimator of the
    population mean.

29
(Why??) What is the relationship of
between cluster variance with total
variance?Note again that the between cluster
variance can not be estimated directly from the
sample.
30
For simple random model
31
Paired Selections
  • Choose the interval K/2 carefully, so that two
    random selections are chosen for each stratum.
  • This would allow us to calculate the with-in
    stratum variance (or equivalently between
    cluster variance), thus drop the simple random
    model assumption.
  • In the following example, we have A strata (every
    k elements consist a stratum), within each strata
    we choose 2 elements. Thus NAK, n2A.

32
(No Transcript)
33
  • Sample mean
  • If the selection within each stratum can be
    viewed as random, then this is an unbiased
    estimator of population mean.
  • Why?

34
  • Variance of sample mean (following the stratified
    sampling formula).

35
What happens with blanks?
  • Suppose we have (yx) (xy) (xx) (yy) (xy) (xy)
  • Where y stands for a real value, x stands for a
    blank.
  • Collapsed stratum instead of 6 strata, we can
    have 2
  • (yxxyxx) (yyxyxy)
  • Combined stratum Take the first ones as one
    cluster, the second ones as the other cluster
  • (y) (x) (x) (y) (x) (x) (x) (y) (x) (y)
    (y) (y)

36
Replicated Sampling
  • Take m systematic samples. Within each interval
    (stratum), random choice of m numbers.
  • Note this is different from paired selection
    even when m2.
  • Advantages of doing this Thinking of panel data!
    You have variations in two dimensions, so if one
    doesnt work well, you still got the other.

37
In class Same example before, but m random
numbers are selected from 1-k. Calculate the
mean and variance of the mean of the sample.
38
Next week
  • Sample design of CPS

39
Homework
  • Sampling design of PSID
  • Sampling design of NLS
  • Sampling design of HRS
  • Three random groups.
  • Due in two weeks. (Class presentation)
  • Each group should present at least 30 mins.

40
References
  • Leslie Kish, Survey Sampling, Joh Wiley Sons,
    Inc.
  • ???,??,???,????,?????????,2002??
Write a Comment
User Comments (0)
About PowerShow.com