Title: Sampling Distribution of a Sample Proportion
1Sampling Distribution of a Sample Proportion
- Lecture 26
- Sections 8.1 8.2
- Wed, Mar 8, 2006
2Preview of the Central Limit Theorem
- We looked at the distribution of the sum of 1, 2,
and 3 uniform random variables U(0, 1). - We saw that the shapes of their distributions was
moving towards the shape of the normal
distribution. - If we replace sum with average, we will
obtain the same phenomenon, but on the scale from
0 to 1 each time.
3Preview of the Central Limit Theorem
2
1
0
1
4Preview of the Central Limit Theorem
2
1
0
1
5Preview of the Central Limit Theorem
2
1
0
1
6Preview of the Central Limit Theorem
- Some observations
- Each distribution is centered at the same place,
½. - The distributions are being drawn in towards
the center. - That means that their standard deviation is
decreasing. - Can we quantify this?
7Preview of the Central Limit Theorem
2
1
0
1
8Preview of the Central Limit Theorem
2
1
0
1
9Preview of the Central Limit Theorem
2
1
0
1
10Preview of the Central Limit Theorem
- This tells us that a mean based on three
observations is much more likely to be close to
the population mean than is a mean based on only
one or two observations.
11Parameters and Statistics
- THE PURPOSE OF A STATISTIC IS TO ESTIMATE A
POPULATION PARAMETER. - A sample mean is used to estimate the population
mean. - A sample proportion is used to estimate the
population proportion. - Sample statistics, by their very nature, are
variable. - Population parameters are fixed.
12Some Questions
- We hope that the sample proportion is close to
the population proportion. - How close can we expect it to be?
- Would it be worth it to collect a larger sample?
- If the sample were larger, would we expect the
sample proportion to be closer to the population
proportion? - How much closer?
13The Sampling Distribution of a Statistic
- Sampling Distribution of a Statistic The
distribution of values of the statistic over all
possible samples of size n from that population.
14The Sample Proportion
- Let p be the population proportion.
- Then p is a fixed value (for a given population).
- Let p (p-hat) be the sample proportion.
- Then p is a random variable it takes on a new
value every time a sample is collected. - The sampling distribution of p is the
probability distribution of all the possible
values of p.
15Example
- Suppose that this class is 3/4 freshmen.
- Suppose that we take a sample of 2 students,
selected with replacement. - Find the sampling distribution of p.
16Example
17Example
- Let X be the number of freshmen in the sample.
- The probability distribution of X is
x P(x)
0 1/16
1 6/16
2 9/16
18Example
- Let p be the proportion of freshmen in the
sample. (p X/n.) - The sampling distribution of p is
x P(p x)
0 1/16
1/2 6/16
1 9/16
19Samples of Size n 3
- If we sample 3 people (with replacement) from a
population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x P(p x)
0 1/64 .02
1/3 9/64 .14
2/3 27/64 .42
1 27/64 .42
20Samples of Size n 4
- If we sample 4 people (with replacement) from a
population that is 3/4 freshmen, then the
proportion of freshmen in the sample has the
following distribution.
x P(p x)
0 1/256 .004
1/4 12/256 .05
2/4 54/256 .21
3/4 108/256 .42
1 81/256 .32
21The Parameters of the Sampling Distributions
- When n 1, the sampling distribution is
- The mean and standard deviation are
- ? 3/4 0.75
- ?2 3/16 0.1875
p P(p)
0 1/4
1 3/4
22The Parameters of the Sampling Distributions
- When n 2, the sampling distribution is
- The mean and standard deviation are
- ? 3/4 0.75
- ?2 3/32 0.09375
p P(p)
0 1/16
1/2 6/16
1 9/16
23The Parameters of the Sampling Distributions
- When n 3, the sampling distribution is
- The mean and standard deviation are
- ? 3/4 0.75
- ?2 3/48 0.0625
p P(p)
0 1/64 .02
1/3 9/64 .14
2/3 27/64 .42
1 27/64 .42
24The Parameters of the Sampling Distributions
- When n 4, the sampling distribution is
- The mean and standard deviation are
- ? 3/4 0.75
- ?2 3/64 0.046875
p P(p)
0 1/256 .004
1/4 12/256 .05
2/4 54/256 .21
3/4 108/256 .42
1 81/256 .32
25Sampling Distributions
- Run the program
- Central Limit Theorem for Proportions.exe.
- Use n 30 and p 0.75 generate 100 samples.
26100 Samples of Size n 30
?? 0.75
?? 0.079
27Observations and Conclusions
- Observation 1 The values of p are clustered
around p. - Conclusion 1 p is probably close to p.
28Larger Sample Size
- Now we will select 100 samples of size 120
instead of size 30. - Run the program
- Central Limit Theorem for Proportions.exe.
- Pay attention to the spread (standard deviation)
of the distribution.
29100 Samples of Size n 120
?? 0.75
?? 0.0395
30Observations and Conclusions
- Observation 2 As the sample size increases, the
clustering is tighter. - Conclusion 2A Larger samples give more reliable
estimates. - Conclusion 2B For sample sizes that are large
enough, we can make very good estimates of the
value of p.
31Larger Sample Size
- Now we will select 10000 samples of size 120
instead of only 100 samples. - Run the program
- Central Limit Theorem for Proportions.exe.
- Pay attention to the shape of the distribution.
3210,000 Samples of Size n 120
?? 0.75
?? 0.0395
3310,000 Samples of Size n 126
34More Observations and Conclusions
- Observation 3 The distribution of p appears to
be approximately normal.
35One More Conclusion
- Conclusion 3 We can use the normal distribution
to calculate just how close to p we can expect p
to be. - However, we must know the values of ? and ? for
the distribution of p. - That is, we have to quantify the sampling
distribution of p.
36The Sampling Distribution of p
- It turns out that the sampling distribution of p
is approximately normal with the following
parameters. - This is the Central Limit Theorem for
Proportions, summarized on page 519.
37The Sampling Distribution of p
- The approximation to the normal distribution is
excellent if
38Why Surveys Work
- Suppose 51 of the population plan to vote for
candidate X, i.e., p 0.51. - What is the probability that an exit survey of
1000 people would show candidate X with less than
45 support, i.e., p lt .45?
39Why Surveys Work
- First, describe the sampling distribution of p
if the sample size is n 1000 and p 0.51. - Check np 510 ? 5 and n(1 p) 490 ? 5.
- p is approximately normal.
40Why Surveys Work
- The z-score of 0.45 is z (0.45 0.51)/.01581
-3.795. - P(p lt 0.45) P(Z lt -3.795)
- 0.00007385 (not likely!)
- Or use normalcdf(-E99, 0.45, 0.51, 0.01581).
41Why Surveys Work
- Perform the same calculation, but with a smaller
sample size, say n 50. - The probability turns out to be 0.1980, nearly a
20 chance. - By symmetry, there is also a 20 chance that the
sample proportion is greater than 57. - Thus, there is a 40 chance that the sample
proportion is off by at least 6 percentage points.