Title: Sample Size Considerations
1- Sample Size Considerations
- Basic ideas
- The sample size question is a precise
question but there is no precise answer. - The choice of sample size is made considering
statistical precision, practical issues, and
available resources (cost and time).
2- Formal approach
- The formal approach starts with the
specification of precision (reliability) of the
estimate. - The precision can be specified in terms of
standard error of the estimate. But, to
facilitate effective communication with
non-statisticians, we usually specify the
precision in terms of the margin of error to be
tolerated. The margin of error is half width of
the confidence interval, as we ordinarily say
that the estimate is accurate within 5.
3- It may be convenient to consider the sample
size, ignoring the finite population correction
term (use the approximate formula in the text)
and to adjust for the finite population
correction later if necessary. - The margin of error can be specified either in
absolute terms or in relative terms. The text
uses the latter strategy.
41. Absolute error approach Let the margin of
error be d (absolute amount). In statistical
terms,
(s2 population variance). Solving for n,
we have n' (zs / d)2. For the proportion, we
have n' (z2 PQ) / d2.
5- For z2, P0.5, and d0.01, then n 10000
- d0.02,
n 2500 - d0.03,
n 1111 - d0.04, n 625
- d0.05, n 400
- For most opinion surveys use n of 1000 1500 and
the margin of error is about 3.
6- Relative error approach Let the margin of
error d e(estimate), then e d/estimate,
fraction of the estimate V coefficient of
variation -
- Note that the following formulas are same as the
approximate formula in Box 3.5
n' (z / e )2 (zV / e)2 for the population
mean n' z2Q / e2P for the population
proportion
7How do we get population variance? 1. Previous
survey of the same subject matter - In some
cases, available information may have to be
modified. Suppose we want to estimate the
average medical expenses among students in a
large university. Previous survey in another
university showed that 40 (Pj) of the students
used medical services and that the mean and
variance among those who utilized medical
services are 200 (Xj) and 900 (sj2). Since the
sampling frame is the entire student body, we
need to convert the variance to include the
non-users. The variance of medical expenses
among the entire student body (including the
non-users) can be obtained by s2 Pjsj2
PjQjXj2 0.4(900) (0.4)(0.6)(2002) 360
9600 9960
8- Two- phase sampling (or double sampling) - use
preliminary survey to estimate variance - If is estimated from , the increase the
sample size by the factor (See
Cochran, p. 79) - 3. New material - If no prior information is
available and two-phase sampling is impractical,
then - For a proportion, we can use P0.5 since PQ is
the maximum when P0.5. - For a rough estimation, we can use Deming's
geometric approach (handout).
9- Adjustment of the preliminary sample size
- Finite population correction will reduce the
size n n' / (1(n'/N)) For a large
population, this step is not necessary. - 2. Imperfection in the frame (say, we expect 10
of the frame are ineligible elements) n n' /
(1 - 0.1)
103. Analytic requirements (say, we need separate
estimates for three ethnic groups. The
population distribution is 50 white, 30
Hispanic, and 20 black and others) n n' /
0.2 (use the smallest group's share under simple
random sampling) a different strategy is needed
for a different sample design)
11- Nonresponse adjustment (say, we expect 20
nonresponse) n n' / (1 - 0.2) - 5. Sample design effect deff n n'(deff)
12Sample size and validity of the normal
approximation Our confidence that the normal
approximation is adequate in most practical
situations comes from the statistical theory
based on infinite populations. For a finite
population, how large must n be so that the
normal approximation is accurate enough? If the
population is distributed normally, we can rely
on the theory. However, in sampling practice it
cannot be assumed that the frequency distribution
will all be reasonably close to normality. The
distributions of many types of socioeconomic and
health related phenomena (for example, income)
exhibit a marked positive skewness, with a few
large units and many small units.
13There is no safe general rule as to how large n
must be to assure the use of the normal
approximation in computing confidence limits. For
a positively skewed population, a crude rule
suggested by Cochran (chapter 2) is n gt 25G12
where G1 is Fisher's measure of skewness which
is defined
14For example, G1 from the following positively
skewed distribution is 1.9 Value -0.9
0 1 2 3 4 5 6 7
8 9 11 13 Frequency 47 143 154 82
62 33 13 6 4 6 2 2 2
The minimum n to assure the validity of normal
approximation is n (25)(1.9)2 90