Title: Sampling Methods for Rare Events
1- Sampling Methods for Rare Events
- Â Basic Ideas
- Â When we survey rare events, the conventional
methods of sampling and estimation may be
unsatisfactory. Even a large sample may not
provide enough rare events.
2- In such cases we may consider using the following
methods - Â 1. Inverse sampling
- 2. Network sampling
- 3. Snowball sampling
- 4. Dual sampling (capture-recapture methods)
3- Inverse Sampling
- Â In this method the sample n is not fixed in
advance. Instead, sampling is continued until a
predetermined number of units of possessing the
rare attribute have been drawn.
4- Let P denote the proportion of units in the
population possessing the rare attribute.
Sampling is continued n times until m units with
rare attribute are selected from the population
with N units. Then there are NP units with rare
attribute in the population, m is fixed, and n
is a random variable. Then the probability
distribution of n is given by
5As N tends to be large, the probability
distribution of n can be given by the well-known
negative binomial distribution.
6- Unbiased estimate of P and its sampling
variance is as follows
7- Network Sampling
- See Section 14.5 on page 439
- See Sudman (Medical Care, October 1988)
8- Snowball Sampling - Chain referral sampling
- Â See (Goodman, Annals of Mathematical
Statistics 32(1), March 1961) - See (Biernacki and Waldorf, Sociological
Methods Analysis 10(2), November 1981)
9- Dual Sampling and Capture-Recapture Methods
- Â There are several variations in dual sampling.
- 1. See Section 14.6 on page 443
- Â
- 2. Tagging model under simple random sampling
- Let t denote the number of units tagged. Then
Pt/N and Nt/P in the initial capture. Let
ps/n the proportion tagged in the recapture
sample. (n is fixed)
10(It is a biased estimate random variable is in
the denominator)
(See Cochran, Chapter 3 )
11 3. Tagging model under inverse sampling  In
this model s is fixed and n is random.
12 4. Tagging model under cluster sampling  For
example, if fishes are recaptured at several
randomly selected locations of a lake, you will
have several sets of n and s. Then it is a ratio
estimation problem. Let the number of clusters
(sampling spots) be m.