Title: Sampling
- Dr. John T. Drea
- Professor of Marketing
- MKTG 329 - Western Illinois University
2Basic Sampling Terms
- Population
- The entire group under study as specified by the
objectives of the research project. - It is the groups to which we want to generalize
the findings. - Sample
- A subset of the population that should represent
the entire group. - Census
- Different than a sample - it is an accounting of
an entire population. The U.S. Census is an
example - everyone is accounted for.
3Basic Sampling Terms
- Sample Unit
- It is the basic level of investigation.
- It might be an individual, a household, a
business, etc. - All of the sample units together comprise the
sample. - Sample Frame
- A master list of all the sample units in the
population - Sample Frame Error
- The extent to which the sample frame misses
elements of the population
4Example Surveying WIU Students
- The sample unit could be
- individual students, or
- student addresses
Sample 600
Census 13,000
Population 13,000
- The sample frame could be
- roster of all students currently enrolled at WIU,
or - roster of all current addresses, or
- listing of all residence hall rooms.
Non-sampled students 12,400
5Example Surveying WIU Students
Sample 600
Census 13,000
Question What are the advantages and
disadvantages of doing a survey of this
population as opposed to doing a census?
Population 13,000
Non-sampled students 12,400
6Example Surveying Other Populations
Question To get the sample frame for WIU
students, we would go to the WIU registrar and
request a roster for the current semester. But
if our research project was to survey consumers
who has purchased a particular brand of
antiperspirant, where would we get that sample
Population 13,000
Sample Frame - a master list of all the sample
units in the population
7Two Basic Sampling Methodologies
Probability Each member of the population has an
equal and known probability of being included in
the sample Nonprobability The probability of
selecting members of the population is
8Probability Samples Simple Random Sampling
Every member of the population has an equal and
known probability of being included in the
sample. Where... P (sample size/population
- Blind Draw Method (like drawing numbers out of a
hat) - Table of Random Numbers (assign all units a
number, choose a random starting point in the
table and proceed)
Simple random sampling requires unique
designations for each population member.
9Probability Samples Systematic Sampling
Uses a skip interval typically drawn from a
printed listing, where every nth sample unit is
included in the sample Skip Interval (pop.
size/sample size)
Systematic Sampling Steps 1. Get the listing
(sample frame) 2. Compute skip interval 3. Use
random numbers to identify starting point for
page, column, and position in column. 4. Apply
skip interval 5. Treat the list as circular.
10Probability Samples Systematic Sampling
Uses a skip interval typically drawn from a
printed listing, where every nth sample unit is
included in the sample Skip Interval (pop.
size/sample size)
- Systematic sampling is efficient...
- it doesnt require a listing of the population
- it only requires the use of random numbers to
get started - But be careful
- it requires the population be known
- it can be troublesome if there are patterns to
the data (periodicities)
11Probability Samples Cluster Sampling
Divides the total population into subgroups, then
one of more of these subgroups is randomly
selected to represent the total population.
Cluster Sampling Steps 1. Determine the area to
be surveyed/identify subgroups (clusters) 2.
Choose a one-step (performing a census on one
random cluster) or two-step procedure (randomly
select multiple clusters, then randomly select
sample units from the cluster) 3. Randomly
select the subgroup(s) 4. For a two-step
procedure, identify clusters and identify random
starting point in each cluster (similar to
systematic sampling)
12Probability Samples Cluster Sampling
Divides the total population into subgroups, then
one of more of these subgroups is randomly
selected to represent the total population.
- Cluster sampling makes sense when the population
is relatively homogenous (each cluster is likely
to also be homogenous.) - More efficient than systematic sampling.
- One-step procedures can introduce error if
clusters are not homogenous.
13Probability Samples Stratified Sampling
Separates the population into different subgroups
(strata,) then samples all of these subgroups. It
is used when the population is not normally
Involves the calculation of a weighted mean when
the strata sample sizes are disproportionate to
the population MeanPop. (meanA)(proportionA)
(meanB)(proportionB) ... (meani)(proportioni)
14Probability Samples Stratified Sampling
Separates the population into different subgroups
(strata,) then samples all of these subgroups. It
is used when the population is not normally
Stratified Sampling Steps 1. Confirm that the
population is not normally distributed. 2.
Divide the population into strata. 3. Select a
probability sample from each strata. 4. If strata
sizes are disproportionate to strata sizes in the
population, a weighted mean approach is required.
15Nonprobability Samples Convenience Sampling
A sample drawn at the convenience of the
researcher. Typically done in high-traffic
areas. Mall-intercepts are an example.
- Its faster and easier to generate large sample
sizes with convenience sampling, but it raises
potential problems - Does the sample represent the population?
- Likely to miss certain segments of the
population who arent in the high traffic area.
16Nonprobability Samples Judgment Sampling
An expert decides which sample units will be
included in the sample. Example If I selected
Intro. to Psychology students at WIU to represent
all WIU Fresh/Soph students.
- Judgment sampling is common in focus group
research for selecting participants. - Judgment sampling is dependent upon the judgment
of the individual selecting the sample - how will
you know if that judgment was incorrect?
17Nonprobability Samples Referral (snowball)
Respondents provide the names of other potential
Respondent 1
- Useful when there is a limited sample frame and
respondents might be otherwise difficult to
identify. - Whats the connection between referral sampling
and the incidence rate?
18Nonprobability Samples Quota Sampling
Establishes a specific quota for the types of
individuals to be contacted. Ex If we know the
population is 50 female, 50 male, 70
Caucasian, 14 African-American, 10
Hispanic-American, 5 Asian-American, and 1
other, then we select a sample to fit those
parameters (quotas)
- Useful when we need certain number of subjects
in each group for analysis purposes. - Can be subjective.
19Developing a Sample Plan
20Some comments on the sampling plan
- Obtaining a list of the population is not always
easy - Examine the fit between the sample frame and the
population - Who is likely to be missing, and is it
systematic? - Oversampling and resampling are frequently
necessary. - How many surveys do you need to mail to get 300
responses? - Oversampling might select 1,500 names to get 300
responses. - Resampling might involve an initial sample of 300
names that elicited only 60 responses, so we draw
another 300 names from the sample frame. - Validate the sample
- How does the sample compare w/ known data about
the population?
21Determining Sample Size
- Two appropriate approaches
- Statistical analysis approach
- For some tests, you may need to have a minimum
number of subjects per cell - Confidence interval approach
- The other approaches discussed by Burns Bush
are really more considerations in determining a
sample size (e.g., judgment, cost) - What is the acceptable level of sampling error?
- Consider the implications/costs of mistakes
22Sampling and Non-Sampling Errors
- Sampling Error the difference between the
observed value of a measure and the long-run
average of the observed values in repetitions of
the measurement. - The difference between the sample mean and the
mean of the population (as estimated by multiple
means of the population by the same instrument) - A key to reducing sampling error is increasing
sample size. - Non-Sampling Errors types of errors not related
to the sample (i.e., all other samples) - Sampling errors decrease as sample sizes
increase, but non-sampling errors do not.
23Basic statistics underlying sample size
- Variability the amount of dissimilarity in
respondents answers to a particular question.
(ex quiz, exam scores) - Standard deviation an approximation of the
average absolute distance away from the mean for
all respondents to a particular question.
Standard deviation is needed to determine a
confidence interval. - Confidence interval a range whose endpoints
define a certain percentage of responses to a
question. - 1.96 x standard deviation /- mean 95 of the
responses to a given question
Example If we ask a question scaled 1-7, the
standard deviation is 0.8 and the mean is 4.5, we
know that 1.96(0.8) 1.568 4.5-1.568
2.932 4.51.568 6.068 Thus, 95 of the
responses will be between 2.932 and 6.068
24Basic statistics underlying sample size
- Central Limit Theorem As sample size, n,
increases, the distribution of the mean, X, of a
random sample taken from most populations
approaches a normal distribution. - Sampling distribution If you took repeated
samples from the same population, the sampling
distribution is the distribution of these sample
means. - Standard error How far from the true population
value a typical sample result is expect to fall.
The standard error of the mean is the standard
deviation of the population divided by the square
root of the sample size.
What happens is we increase the sample
size from 100 to 300? to 500?
25Estimating sample size
- To estimate a sample size, a researcher must
- estimate the standard deviation of the population
(a good rule of thumb is 1/6th of the range) - make a judgment about allowable amounts of error
- determine a confidence interval
- Once these are known, the formula for calculating
sample size is
where... Z standardized value that corresponds
to the confidence level (95 is 1.96) S sample
standard deviation E acceptable magnitude of
Z2S2 E2
26Estimating sample size
- Suppose a researcher studying annual expenditures
on lipstick wishes to have a 95 confidence
interval (Z1.96) and a range of error (E) of
less than 2, and an estimate of the standard
deviation is 29.
If we change the range of acceptable error to 4,
sample size falls to 202
(1.962)(292)/42 (3.8416)(841)/16
If the range of error is 5, n 129.231
Thus, if we do the research with a and determine
that the mean is 110 w/ a standard deviation of
29, we can be 95 confident that the mean for
the population is between 108-112 ( if n808),
between 106-114 (if n202), or between
105-115 (if n129)
Partial source Zikmund (1999), Essentials of
Marketing Research
27Estimating sample size
- Suppose you wanted to estimate the same size for
a survey which contains the following question - What is your overall attitude towards Hospital X?
Very Good 7 6 5 4 3 2 1 Very Poor - The range of acceptable error is 0.1 points, the
confidence level is 95, and the estimated
standard deviation is 1/6 of the range.
Z2S2 E2
(1.962)(12)/0.12 a sample size of 384
If you increase the acceptable error to 0.2, the
sample size drops to n 96!
28Sample size determination when a proportion is
Sp estimate of the std. error of the
proportion p proportion of successes q (1 -
p), or the proportion of failures
pq n
Suppose that 20 of a sample of 1,200 recall
seeing an ad.
(0.2)(0.8) 1200
Thus, the population proportion who saw the ad is
between 17.8 and 22.2, w/ 95 confidence.
Confidence interval p ZclSp .2
(1.96)(0.0115) .2 .022
29Sample size determination when a proportion is
To determine the sample size for a proportion, we
need to know or estimate the following Z2cl
square of the confidence level in standard error
units (i.e., typically 1.962, or
3.8416) p estimated proportions of successes q
(1 - p), or the proportion of failures E2
square of the maximum allowance for error
We insert this information into the following
n Z2cl(pq) E2
30Sample size determination when a proportion is
Example We estimate that 60 of respondents will
prefer to stay in town X than to drive 20 miles
to see a physician, with a confidence level of
95, and the allowable error is /- 4.
n Z2clpq E2
(1.96)2(.6)(.4) 0.042
(3.8416)(0.24)/0.0016 n 576
If we assume a 70/30 split and if we increase the
maximum allowable error to 5, what would be n?
31Overall Sample size determination when a
proportion is present
- When the split is hypothesized to be 70/30 (95
CI) - 1 7,939
- 2 2,009
- 3 895
- 5 322
- When the split is hypothesized to be 85/15 (95
CI) - 1 4,850
- 2 1,222
- 3 544
- 5 306
Small population sizes typically require a
slightly smaller sample size If population
10,000, the 70/30 split sample sizes would be
14,465 21,678 3823 and 5313