Title: Formalizing the Concepts: Simple Random Sampling
1Formalizing the ConceptsSimple Random Sampling
2Purpose of sampling
- To study a sample of the population to acquire
knowledge by observing the units selected
typified by households, persons, institutions, or
physical objects and making quantitative
statements about the entire population
3(No Transcript)
4Purpose of sampling
- Why sampling?
- Saves cost compared to full enumeration
- Easier to control quality of sample
- More timely results from sample data
- Measurement can be destructive
5Unit of analysis
Some concepts used in Sampling
- An object on which a measurement is taken
- Most common units of analysis are persons,
households, farms, and economic establishments
6Target population or universe
Some concepts used in Sampling
- The complete collection of all the units of
analysis to study. - Examples population living in households in a
country students in primary schools
7Sampling frame
Some concepts used in Sampling
- List of all the units of analysis whose
characteristics are to be measured - Comprehensive, non-overlapping and must not
contain irrelevant elements - Should be updated to ensure complete coverage
- Examples list of establishments census civil
registration
8Parameter
Some concepts used in Sampling
- Quantity computed from all N values in a
population set - Typically, a descriptive measure of a population,
such as mean, variance - Poverty rate, average income, etc.
- Objective of sampling is to estimate parameters
of a population
9Some concepts used in Sampling
Estimation
- Estimator - mathematical formula or function
using sample results to produce an estimate for
the entire population - Estimate - numerical quantity computed from
sample observations of a characteristic and
intended to provide information about an unknown
population value (parameter). - Examples mean (average), total, proportion,
ratio
10Some concepts used in Sampling
Unbiased estimator
- When the mean of individual sample estimates
equals the population parameter, then the
estimator is unbiased - Formally, an estimator is unbiased if the
expected value of the (sample) estimates is equal
to the (population) parameter being estimated
11Random sampling
- Also known as scientific sampling or probability
sampling - Each unit has a non-zero and known probability of
selection - Mathematical theory is available to assess the
sampling error (the error caused by observing a
sample instead of the whole population).
12Random sampling techniques
- Single stage, equal probability sampling
- Simple Random Sampling (SRS)
- Systematic sampling with equal probability
- Stratified sampling
- Multi-stages sampling
- In real life those techniques are usually
combined in various ways most sampling designs
are complex
13Single stage, equal probability sampling
Random sampling techniques
- Random selection of n units from a population
of N units, so that each unit has an equal
probability of selection - N (population ) ? n (sample)
- Probability of selection (sampling fraction) f
n/N - Is the most basic form of probability sampling
and provides the theoretical basis for more
complicated techniques
14Single stage, equal probability sampling
(continued)
Random sampling techniques
- Simple Random Sampling. The investigator mixes up
the whole target population before grabbing n
units. - Systematic Random Sampling. The N units in the
population are ranked 1 to N in some order (e.g.,
alphabetic). To select a sample of n units,
calculate the step k ( k N/n) and take a unit at
random, from the 1st k units and then take every
kth unit.
15Random sampling techniques
Single stage, equal probability sampling
(continued)
- Advantage
- self-weighting (simplifies the calculation of
estimates and variances) - Disadvantages
- Sample frame may not be available
- May entail high transportation costs
16Stratified sampling
Random sampling techniques
- The population is divided into mutually exclusive
subgroups called strata. - Then a random sample is selected from each
stratum.
17Two-stage sampling
Random sampling techniques
- Units of analysis are divided into groups called
Primary Sampling Units (PSUs) - A sample of PSUs is selected first
- Then a sample of units is chosen in each of the
selected PSUs
This technique can be generalized (multi-stage
sampling)
18Random sampling
- Estimates obtained from random samples can be
accompanied by measures of the uncertainty
associated with the estimate. - The uncertainty is measured by the standard
error. Confidence intervals around the estimate
can be calculated taking advantage of the Central
Limit Theorem.
19Central limit theorem
- The central limit theorem states that given a
parameter with mean µ and variance s², the
sampling distribution of the mean approaches a
normal distribution with mean µ and variance s²/n - This is true even when the distribution of the
parameter is not normal. - The normal distribution is widely used. Part of
its appeal is that it is well behaved and
mathematically tractable.
20Sample variance and standard error
- Variance of the sample mean of an SRS of n
units for a population of size N - e standard error
- Measure of sampling error. Depends on 3 factors
- ( 1 - n/N ) Finite Population Correction (fpc)
- n sample size
- Var(X) Population variance. Unknown, but can be
estimated without bias by
21Proportions
- A proportion P (or prevalence) is equal to the
mean of a dummy variable. - In this case Var(P) P(1-P), and
22Confidence intervals
- It is not sufficient to simple report the sample
proportion obtained by Mr Green in the sample
survey, we also need to give an indication of how
accurate the estimate is. - Confidence intervals are used to indicate the
accuracy of an estimate. - In other words, instead of estimating the
parameter of interest by a single value, an
interval of likely estimates is given.
23Confidence intervals (continued)
- where
- ta 1.28 for confidence level a 80
- ta 1.64 for confidence level a 90
- ta 1.96 for confidence level a 95
- ta 2.58 for confidence level a 99
24Confidence intervals
In a sample of 1,000 electors, 280 of them (28
percent) say they will vote Green.
Standard error is 1.42 percent.
25Confidence intervals
In a sample of 1,000 electors, 280 of them (28
percent) say they will vote Green. Standard error
is 1.42 percent.
24 25 26 27
28 29 30 31 32
26- The required sample size n is determined by
- The variability of the parameter Var(X)
- But we dont know it!
- The maximum margin of error E we are willing to
accept - How confident we want to be in that the error of
our estimation will not exceed that maximum - For each confidence level a there is a
coefficient ta - The size of the population
- But this is not very important!
For a proportion