Title: Sampling Distributions
1- Chapter 7
- Sampling Distributions
2Introduction
- In real life calculating parameters of
populations is prohibitive (difficult to find)
because populations are very large. - Rather than investigating the whole population,
we take a sample, calculate a statistic related
to the parameter of interest, and make an
inference. - The sampling distribution of the statistic is the
tool that tells us how close is the statistic to
the parameter.
3population
sample
Sampling Techniques
Parameters
Statistics
Statistical Procedures
inference
4Sampling
- Example
- A pollster is sure that the responses to his
agree/disagree question will follow a binomial
distribution, but p, the proportion of those who
agree in the population, is unknown. - An agronomist believes that the yield per acre of
a variety of wheat is approximately normally
distributed, but the mean m and the standard
deviation s of the yields are unknown. - If you want the sample to provide reliable
information about the population, you must select
your sample in a certain way!
5Types of Sampling Methods/Techniques
The sampling plan or experimental design
determines the amount of information you can
extract, and often allows you to measure the
reliability of your inference.
Sampling
Probability Samples
Non-Probability Samples
Simple Random
Stratified
Judgement
Chunk
Cluster
Systematic
Quota
6Simple Random Sampling
- Sampling Plan
- Simple random sampling is a method of sampling
that allows each possible sample of size n an
equal probability of being selected.
7Example
- There are 89 students in a statistics class. The
instructor wants to choose 5 students to form a
project group. How should he proceed?
- Give each student a number from 01 to 89.
- Choose 5 pairs of random digits from the random
number table. - If a number between 90 and 00 is chosen, choose
another number. - The five students with those numbers form the
group.
8Other Sampling Techniques
- There are several other sampling plans that still
involve randomization
- Stratified random sample Divide the population
into subpopulations or strata and select a simple
random sample from each strata. - Cluster sample Divide the population into
subgroups called clusters select a simple random
sample of clusters and take a census of every
element in the cluster. - 1-in-k systematic sample Randomly select one of
the first k elements in an ordered population,
and then select every k-th element thereafter.
9Examples
- Divide West Malaysia into states and
- take a simple random sample within each state.
- Divide West Malaysia into states and take a
simple random sample of 5 states. - Divide a city into city blocks, choose a simple
random sample of 10 city blocks, and interview
all who - live there.
- Choose an entry at random from the phone book,
and select every 50th number thereafter.
10Non-Random Sampling Plans
- There are several other sampling plans that do
not involve randomization. They should NOT be
used for statistical inference!
- Convenience sample A sample that can be taken
easily without random selection. - People walking by on the street
- Judgment sample The sampler decides who will and
wont be included in the sample. - Quota sample The makeup of the sample must
reflect the makeup of the population on some
selected characteristic. - Race, ethnic origin, gender, etc.
11Types of Samples
- Sampling can occur in two types of practical
situations
- Observational studies The data existed before
you decided to study it. Watch out for - Nonresponse Are the responses biased because
only opinionated people responded? - Undercoverage Are certain segments of the
population systematically excluded? - Wording bias The question may be too complicated
or poorly worded.
12Types of Samples
- Sampling can occur in two types of practical
situations
- 2. Experimentation The data are generated by
imposing an experimental condition or treatment
on the experimental units. - Hypothetical populations can make random sampling
difficult if not impossible. - Samples must sometimes be chosen so that the
experimenter believes they are representative of
the whole population. - Samples must behave like random samples!
13Sampling Distributions
- Numerical descriptive measures calculated from
the sample are called statistics. - Statistics vary from sample to sample and hence
are random variables. - The probability distributions for statistics are
called sampling distributions. - In repeated sampling, they tell us what values of
the statistics can occur and how often each value
occurs.
14Sampling Distributions
Definition The sampling distribution of a
statistic is the probability distribution for the
possible values of the statistic that results
when random samples of size n are repeatedly
drawn from the population.
Each value of x-bar is equally likely, with
probability 1/4
Population 3, 5, 2, 1 Draw samples of size n 3
without replacement
151. Sampling Distribution of the Mean
- A fair die is thrown infinitely many times, with
the random variable X of spots on any throw.
The probability distribution of X is - and the mean and variance are
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
16Sampling Distribution of Two Dice
- A sampling distribution is created by looking at
- all samples of size n2 (i.e. two dice) and their
means - While there are 36 possible samples of size 2,
there are only 11 values for , and some (e.g.
3.5) occur more frequently than others
(e.g. 1).
17Sampling Distribution of Two Dice
- The sampling distribution of is shown below
6/36
5/36
4/36
3/36
2/36
1/36
18Example
Thrown two fair dice. Based on all possible
samples, the calculation of mean and standard
deviation can also be done as.
19Sampling Distribution of the Mean
6
20Sampling Distribution of the Mean
21Central Limit Theorem
22How Large is Large?
If the population is normal, then the sampling
distribution of will also be normal, no
matter what the sample size. When the population
is approximately symmetric, the distribution
becomes approximately normal for relatively small
values of n. When the population is skewed, the
sample size must be at least 30 before the
sampling distribution of becomes
approximately normal.
23The Sampling Distribution of the Sample Mean
- A random sample of size n is selected from a
population with mean m and standard deviation s. - The sampling distribution of the sample mean
will have mean m and standard deviation
. - If the original population is normal, the
sampling distribution will be normal for any
sample size. - If the original population is nonnormal, the
sampling distribution will be normal when n is
large.
The standard deviation of x-bar is sometimes
called the STANDARD ERROR (SE).
24Sampling Distribution of the Mean
Central Limit Theorem
Given population with and the
sampling distribution will have
Mean
Variance
Standard Deviation Standard Error (mean)
As n increases, the shape of the distribution
becomes normal (whatever the shape of the
population)
25Example
26Finding Probabilities for the Sample Mean
- If the sampling distribution of is normal
or approximately normal, standardize or rescale
the interval of interest in terms of - Find the appropriate area using Z Table.
Example A random sample of size n 16 from a
normal distribution with m 10 and s 8.
27Example
A soda filling machine is supposed to fill cans
of soda with 12 fluid ounces. Suppose that the
fills are actually normally distributed with a
mean of 12.1 oz and a standard deviation of 0.2
oz. What is the probability that the average fill
for a 6-pack of soda is less than 12 oz?
28Exercise
- The time that the laptops battery pack can
function before recharging is needed is normally
distributed with a mean of 6 hours and standard
deviation of 1.8 hours. A random sample of 25
laptops with a type of battery pack is selected
and tested. What is the probability that the mean
until recharging is needed is at least 7 hours?
29- The characteristics of the sampling distribution
of a statistic - The distribution of values is obtained by means
of repeated sampling - The samples are all of size n
- The samples are drawn from the same population
302. Sampling Distribution of a Proportion
The proportion in the sample is denoted "p-hat"
The proportion in the population (parameter) is
denoted p
31Types of response variables
Quantitative
Sums
Averages
Response type
Categorical
Counts
Proportions
Prior chapters have focused on quantitative
response variables. We now focus on categorical
response variables.
32The Sampling Distribution of the Sample Proportion
33Approximating Normal from the Binomial
- Under certain conditions, a binomial random
variable has a distribution that is approximately
normal. - When n is large, and p is not too close to zero
or one, areas under the normal curve with mean
np and standard deviation can
be used to approximate binomial probabilities. - Make sure that np and n(1-p) are both greater
than 5 to avoid inaccurate approximations!
34The Sampling Distribution of the Sample Proportion
The standard deviation of p-hat is sometimes
called the STANDARD ERROR (SE) of p-hat.
35Finding Probabilities for the Sample Proportion
- If the sampling distribution of is normal
or approximately normal, standardize or rescale
the interval of interest in terms of - Find the appropriate area using Z Table.
If both np gt 5 and np(1-p) gt 5
Example A random sample of size n 100 from a
binomial population with p 0.4.
36Example
The soda bottler in the previous example claims
that only 5 of the soda cans are underfilled.
A quality control technician randomly samples
200 cans of soda. What is the probability that
more than 10 of the cans are underfilled?
n 200 U underfilled can p P(U) 0.05 q
0.95 np 10 nq 190
This would be very unusual, if indeed p .05!
OK to use the normal approximation
37Example
38Sampling Distribution of the Difference Between
Two Averages
- Theorem If independent sample of size n1 and n2
are drawn at random from two populations,with
means ?1 and ?2 and variances ?12 and ?22,
respectively, then the sampling distribution of
the differences of means, is
approximately normally distributed with mean and
variance given by
39Example
- Starting salaries for MBA grads at two
universities are normally distributed with the
following means and standard deviations. Samples
from each school are taken
University 1 University 2
Mean RM 62,000 /yr RM 60,000 /yr
Std. Dev. RM14,500 /yr RM18,300 /yr
sample size, n 50 60
- What is the sampling distribution of
- What is the probability that a sample mean of U1
students will exceed the sample mean of U2
students? -
40Example
- mean
-
62,000 60,000 2000 - and standard deviation
-
- 3128.3
41Sampling Distribution of the Difference Between
Two Sample Proportions
- Theorem If independent sample of size n1 and n2
are drawn at random from two populations, where
the proportions of obs with the characteristic of
interest in the two populations are p1 and p2
respectively, then the sampling distribution of
the differences between sample proportions,
is approximately normally distributed
with mean and variance given by
and
Then
42Sampling Distribution of the Difference Between
Two Sample Proportions
- Example It is known that 16 of the households
in Community A and 11 of the households in
Community B have internets in their houses. If
200 households and 225 households are selected at
random from Community A and Community B
respectively, compute the probability of
observing the difference between the two sample
proportions at least 0.10?
43Sampling Distribution of S2
- Theorem If S2 is the variance of a random
sample of size n taken from a normal population
having the variance ?2, then the statistic - has a chi-squared distribution with v n -1
degrees of freedom.