Title: Survey Sampling 2
1Survey Sampling (2)
- Spring 2007
- Shuaizhang Feng
2 3Cluster Sampling (????)
- Methods of selection in which the sampling unit,
the unit of selection, contains more than one
population element. - In this part, we only talk about simple cluster
sampling (each cluster contain the same number of
elements clusters are chosen randomly all
selected elements are included in the sample)
4- Suppose there are A clusters in the population a
clusters are selected. - Each cluster contains B elements.
- Thus, the sample size is naB
- Population size is NAB
5- Sample mean is also the mean of the a cluster
means.
6More sample stats
- Cluster means
- Cluster totals
- Between-cluster variance
- Within-cluster variance
7Total variance can be decomposed as a weighted
average of the between and within variances
8- Population stats can be defined similarly.
- Between variance
- Within variance
9- Property 1 Sample mean is an unbiased estimator
of the population mean.
10In terms of variance of the estimator, the
situation is exactly the same as in srs. Thus, we
have
11- Property 2 An unbiased estimator of sample
variance is
12Note
- This simple case of cluster sampling is very
similar to srs of elements. - The precision of the estimator depends on between
cluster variance only. Thus, when selecting
clusters, we want to minimize between variance,
or equivalently, maximize within variance. - Unfortunately in many cases, clusters are
naturally formed. For example, county, classes,
etc..
13Comparing cluster sampling with element sampling
- The cost per element is lower
- The element variance is higher
- The costs and problems of statistical analysis
are greater
14Coefficient of intraclass correlation,Roh (rate
of homogeneity)
15(No Transcript)
16- Note that the variance of the estimator can be
expressed in terms of Roh. If A is large, then
17Design effect of cluster sampling
18Discussions
- What does deff0 mean?
- Show that roh can only take values in the range
-1/(B-1),1
19Systematic Sampling (????)
20What is systematic sampling?
- Taking every k-th sampling unit after a random
start - Most widely known selection procedure
- Sometimes called Pseudo-random selection
- Easy to use in practice
21Let the population size be N. It can be arranged
as follows (A row, K columns)
22Choose a random number from 1-k, let the number
be r (in this example r2), starting from it,
select sampling units every at the interval of K.
23- Population size NAK
- Sample size nA
- Sampling properties depend on how population
units are listed.
24Special Case Simple Random Model
- Population units are listed randomly.
- Think of playing poker games. Why do you always
insist on shuffling the cards thoroughly? - In this case, we can treat systematic sampling as
stratified sampling if we treat each row as a
stratum. - Alternatively, we can treat it as cluster
sampling if we treat each column as a cluster.
25The simple way to go
- Since units are randomly distributed in the
population, any selection rule will result srs,
hence all the results will apply. - Alternatively, we could do it hard way.
26Treating it as stratified sampling
- There are A strata, from each stratum we take
only one sampling unit (element). - Sampling mean is an unbiased estimator of the
population mean.
27 Note that the result is the same as srs!
(Why??) (within stratum variance is the same as
total variance)Note that the within variance
can not be estimated directly since theres only
one element per stratum. This is not a problem
here for simple random model.
28Treating it as cluster sampling
- There are K clusters. We choose only one cluster
(cluster r) from them. - Sampling mean is an unbiased estimator of the
population mean.
29 (Why??) What is the relationship of
between cluster variance with total
variance?Note again that the between cluster
variance can not be estimated directly from the
sample.
30For simple random model
31Paired Selections
- Choose the interval K/2 carefully, so that two
random selections are chosen for each stratum. - This would allow us to calculate the with-in
stratum variance (or equivalently between
cluster variance), thus drop the simple random
model assumption. - In the following example, we have A strata (every
k elements consist a stratum), within each strata
we choose 2 elements. Thus NAK, n2A.
32(No Transcript)
33- Sample mean
- If the selection within each stratum can be
viewed as random, then this is an unbiased
estimator of population mean. - Why?
34- Variance of sample mean (following the stratified
sampling formula).
35What happens with blanks?
- Suppose we have (yx) (xy) (xx) (yy) (xy) (xy)
- Where y stands for a real value, x stands for a
blank. - Collapsed stratum instead of 6 strata, we can
have 2 - (yxxyxx) (yyxyxy)
- Combined stratum Take the first ones as one
cluster, the second ones as the other cluster - (y) (x) (x) (y) (x) (x) (x) (y) (x) (y)
(y) (y)
36Replicated Sampling
- Take m systematic samples. Within each interval
(stratum), random choice of m numbers. - Note this is different from paired selection
even when m2. - Advantages of doing this Thinking of panel data!
You have variations in two dimensions, so if one
doesnt work well, you still got the other.
37In class Same example before, but m random
numbers are selected from 1-k. Calculate the
mean and variance of the mean of the sample.
38Next week
39Homework
- Sampling design of PSID
- Sampling design of NLS
- Sampling design of HRS
- Three random groups.
- Due in two weeks. (Class presentation)
- Each group should present at least 30 mins.
40References
- Leslie Kish, Survey Sampling, Joh Wiley Sons,
Inc. - ???,??,???,????,?????????,2002??