Title: QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions
1QMS 6351Statistics and Research Methods
Chapter 7Sampling andSampling Distributions
2Chapter 7 Outline
- Simple random sampling
- Point estimation
- Introduction to sampling distributions
- Sampling distribution of
- Sampling distribution of
- Other sampling methods
3Statistical inference
- The purpose of statistical inference is to obtain
information about a population from information
contained in a sample. - A population is the set of all the elements of
interest in a study. - A sample is a subset of the population.
4- A parameter is a numerical characteristic of a
population. - A sample statistic is a numerical characteristic
of a sample. - We will use a sample statistic in order to judge
tentatively or approximately the value of the
population parameter.
5- The sample results provide only estimates (that
is, rough and approximate values) of the values
of the population characteristics. - The reason is simply that the sample contains
only a portion of the population. - With proper sampling methods, the sample results
will provide good estimates of the population
characteristics.
6Simple random sampling procedure
7Selecting a sample
- Sampling from a finite population. Finite
populations are often defined by lists such as
organization membership roster, class roster,
inventory product numbers, etc. - Sampling from an infinite population (a process).
The population is usually considered infinite if
it involves an ongoing process that makes listing
or counting every element impossible. For
example, parts being manufactured on a production
line, customers entering a store, etc.
8Sampling from a finite population
- A simple random sample from a finite population
of size N is a sample selected such that each
possible sample of size n has the same
probability of being selected. - Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement. - Sampling without replacement is the procedure
used most often. - In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
9Example St. Andrews College
St. Andrews College received 900 applications
for admission in the upcoming year from
prospective students. The applicants were
numbered, from 1 to 900, as their applications
arrived. The Director of Admissions would like to
select a simple random sample of 30 applicants.
10Sampling from a finite population using Excel
- RAND() Excel generates a random number between 0
and 1 - RAND()N Excel generates a random number greater
than or equal to 0 but less than or equal N - INT(RAND()900)
11Sampling from an infinite population
- In the case of infinite populations, it is
impossible to obtain a list of all elements in
the population. - The random number selection procedure cannot be
used for infinite populations.
12Sampling from an infinite population
- A simple random sample from an infinite
population is a sample selected such that the
following conditions are satisfied - Each element selected comes from the same
population. - Each element is selected independently.
13Point estimation
14Point estimation
- In point estimation we use the data from the
sample to compute a value of a sample statistic
that serves as an estimate of a population
parameter. - A point estimate is a statistic computed from
a sample that gives a single value for the
population parameter. - An estimator is a rule or strategy for using
the data to estimate the parameter.
15Terminology of point estimation
- We refer to
- as the point estimator of the population mean
?. -
- We refer to
- as the point estimator of the population
standard deviation ?.
16Terminology of point estimation
- We refer to
- as the point estimator of the population
proportion p. - The actual numerical value obtained for
- in a particular sample is called the point
estimate of the parameter.
17Example St. Andrews College
- Recall that St. Andrews College received 900
applications from prospective students. The
application form contains a variety of
information including the individuals scholastic
aptitude test (SAT) score and whether or not the
individual desires on-campus housing. - At a meeting in a few hours, the Director of
Admissions would like to announce the average SAT
score and the proportion of applicants that want
to live on campus, for the population of 900
applicants.
18Example St. Andrews College
- However, the necessary data on the applicants
have not yet been entered in the colleges
computerized database. So, the Director decides
to estimate the values of the population
parameters of interest based on sample
statistics. The sample of 30 applicants selected
earlier with Excels RAND() function will be used.
19Point estimation using Excel
Excel Value Worksheet
Note Rows 10-31 are not shown.
20Point estimates
Note Different random numbers would
have identified a different sample which would
have resulted in different point estimates.
21Population parameters
Once all the data for the 900 applicants were
entered in the colleges database, the values of
the population parameters of interest were
calculated.
22Summary of point estimates obtained from a simple
random sample
Population Parameter
Point Estimator
Point Estimate
Parameter Value
m Population mean SAT score
990
997
80
s Sample std. deviation for SAT
score
75.2
s Population std. deviation for
SAT score
.72
.68
p Population pro- portion wanting
campus housing
23Making inferences about a population mean
24- Making inferences about a population mean
A simple random sample of n elements is
selected from the population.
Population with mean m ?
25Population vs sampling distribution
- The population distribution is the probability
distribution derived from the information on all
elements of a population. - The probability distribution of a sample
statistic ( ) is called its
sampling distribution.
26Sampling distribution of
- The sampling distribution of the sample mean
( ) is the probability distribution of all
possible values of . - We need to know
- Expected value of
- Standard deviation of
- Form of the sampling distribution of
27Mean of the sampling distribution of
- The mean of the sampling distribution of
is equal to the mean of the population. Thus,
28Standard deviation of the sampling distribution
of
(1) Infinite population (N is unknown) (2)
Finite population and n/N 0.05
Finite population and n/N ? 0.05
is referred to as the standard error of
the mean.
29Two important observations
- 1. The spread of the sampling distribution of
is smaller than the spread of the
corresponding population distribution. In other
words, . - 2. The standard deviation of the sampling
distribution of decreases as the sample
size increases.
30Form of the sampling distribution of
- 1. The population has a normal distribution.
- If the population from which the samples are
drawn is normally distributed, then the sampling
distribution of the sample mean will also be
normally distributed for any sample size.
31Form of the sampling distribution of
- 2. The population is not normally distributed
but the sample size is large (n 30). - According to the Central Limit Theorem, for a
large sample size (n 30), the sampling
distribution of the sample mean is approximately
normal, irrespective of the shape of the
population distribution. - In cases where the population is highly skewed
or outliers are present, samples of size 50 or
more may be needed.
32Form of the sampling distribution of
- 3. The sample size is small (n lt 30) and the
population is not normally distributed. - Use special statistical procedures.
33Example St. Andrews College
34Example St. Andrews College
- What is the probability that the sample mean
will be between 980 and 1000? In other words,
what is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population mean SAT score that is within
/-10 of the actual population mean ?
35Example St. Andrews College
Area .5034
1000
980
990
36Example St. Andrews College
- The probability of 0.5034 means that, for a large
number of samples of size 30 selected from the
population, we can expect that in 50.34 of all
cases the sample mean will be within /-10 of the
actual population mean (that is, 980-1000) and in
49.66 of all cases the sample mean will be
further than /-10 of the actual population mean
(that is, below 980 or above 1000).
37Relationship between the sample size and the
sampling distribution of
- Example St. Andrews College
- Suppose we select a simple random sample of 100
applicants instead of the 30 originally
considered.
38- regardless of the sample
size. In our example, E( ) remains at 990. - Whenever the sample size is increased, the
standard error of the mean is
decreased. With the increase in the sample size
to n 100, the standard error of the mean is
decreased from 14.6 to
39Relationship between the sample size and the
sampling distribution of
Example St. Andrews College
40Example St. Andrews College
- Recall that when n 30, P(980 lt lt 1000)
.5034. - Now, with n 100, P(980 lt lt 1000) .7888.
- Because the sampling distribution with n 100
has a smaller standard error, the values of
have less variability and tend to be closer to
the population mean than the values of with n
30.
41Example St. Andrews College
Area .7888
1000
980
990
42Example St. Andrews College
- The probability of 0.7888 means that, for a large
number of samples of size100 selected from the
population, we can expect that in 78.88 of all
cases the sample mean will be within /-10 of the
actual population mean (that is, 980-1000) and in
21.12 of all cases the sample mean will be
further than /-10 of the actual population mean
(that is, below 980 or above 1000).
43Making inferences about a population proportion
44Making inferences about a population proportion
A simple random sample of n elements is
selected from the population.
Population with proportion p ?
45Sampling distribution of
- The sampling distribution of the sample
proportion ( ) is the probability
distribution of all possible values of . - We need to know
- Expected value of
- Standard deviation of
- Form of the sampling distribution of
46Mean of the sampling distribution of
- The mean of the sampling distribution of
is equal to the population proportion. Thus,
47Standard deviation of the sampling distribution
of
(1) Infinite population (N is unknown) (2)
Finite population and n/N 0.05
Finite population and n/N ? 0.05
is referred to as the standard error of
the proportion.
48Form of the sampling distribution of
- The sampling distribution of can be
approximated by a normal probability distribution
whenever the sample size is large. - The sample size is considered large whenever the
following two conditions are satisfied
49Form of the sampling distribution of
- For values of p near .50, sample sizes as small
as 10 permit a normal approximation. - With very small (approaching 0) or very large
(approaching 1) values of p, much larger samples
are needed.
50Example St. Andrews College
- For our example, with n 30 and
- p .72, the normal distribution is an
acceptable approximation because
np 30(.72) 21.6 gt 5
and
n(1 - p) 30(.28) 8.4 gt 5
51Sampling distribution of
Example St. Andrews College
52Example St. Andrews College
- Recall that 72 of the prospective students
applying to St. Andrews College desire on-campus
housing. - What is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population proportion of applicant
desiring on-campus housing that is within plus or
minus .05 of the actual population proportion?
53Example St. Andrews College
Area .4582
.77
.67
.72
54Other sampling methods
55Other sampling methods
- Stratified random sampling
- Cluster sampling
- Systematic sampling
- Convenience sampling
- Judgment sampling
56Stratified random sampling
- The population is first divided into groups of
elements called strata. - Each element in the population belongs to one and
only one stratum. - Best results are obtained when the elements
within each stratum are as much alike as possible
(i.e. a homogeneous group). - A simple random sample is taken from each
stratum.
57Cluster sampling
- The population is first divided into separate
groups of elements called clusters. - Ideally, each cluster is a representative
small-scale version of the population (i.e.
heterogeneous group). - A simple random sample of the clusters is then
taken. - All elements within each sampled (chosen) cluster
form the sample.
58Systematic sampling
- If a sample size of n is desired from a
population containing N elements, we might sample
one element for every n/N elements in the
population. - We randomly select one of the first n/N elements
from the population list. We then select every
n/Nth element that follows in the population list.
59Convenience sampling
- It is a nonprobability sampling technique. Items
are included in the sample without known
probabilities of being selected. The sample is
identified primarily by convenience. - Example A professor conducting research might
use student volunteers to constitute a sample.
60Judgment sampling
- The person most knowledgeable on the subject of
the study selects elements of the population that
he or she feels are most representative of the
population. It is a nonprobability sampling
technique. - Example A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.