Title: Statistics 221
1Statistics 221
- Chapter 8
- Interval Estimation
2What this chapter is about
- This chapter is about using sample statistics to
infer population parameters. - We learned in chapter 7 that we can use sample
statistics (such as sample means and proportions)
as point estimates (best guesses) of population
parameters. - But point estimates should not be treated as
precise values of population parameters. - Instead, it should be acknowledged that there is
a some possibility for inaccuracy when using a
sample statistic to drawn inferences about a
population. To compensate for that
possibility, statisticians create interval
estimates. - In other words, instead just claiming that ? x,
we state that ? probably falls within an
interval of xs. - This chapter is about developing such interval
estimates.
3Interval estimates
- We provide interval estimates by taking a point
estimate and adding and subtracting a value
called the margin of error. - For example, if we take a survey and find that
our sample mean x is 25, we dont claim that - ? 25.
- Instead we compute a margin of error value
(lets say its 3) and we can claim that - The true ? is probably somewhere between 25-3
(22) and 253 (28).
4Developing an interval estimate for a population
mean when ? is known
- When the population standard deviation ? is
known, the formula for developing an interval
estimate is different than if ? were not known. - Realistically, if ? were known, then so would ?
(since we use ? in the formula to calculate ?)
and it wouldnt be necessary to use a sample
statistic x to estimate ?. - But the ? known situation is presented first so
that it can be compared to the ? not known
situation which we will examine next.
5Example 1 Develop Confidence Interval for ? (?
known)
- A company named CJW, Inc. conducts a customer
survey each month to monitor customer
satisfaction with its products and services. - They send out surveys on a monthly basis and ask
customers about their satisfaction with such
things as ease of placing orders, timely
delivery, price, etc. They use the questionnaire
results to develop an overall satisfaction
score for each survey participant. - A mean satisfaction score is derived from the
sample and it provides a point estimate of the
populations satisfaction score. But we use the
point estimate to develop an interval estimate of
what we think is the populations satisfaction
score.
61. Identify/Obtain inputs
- Lets say that based on previous surveys, CJW,
Inc knows that ?20. - Lets say that a survey is taken of 100 customers
and their overall satisfaction score is 82 (out
of 100). - Now we know that
- n100
- x 82
- ?20.
72. Verify assumptions regarding the distribution
of means
- We visualize this one sample of 100 as one of
all the potential samples of size 100 that could
be taken out of this population. From this one
sample we calculate a mean which is one mean out
of all potential sample means (where n100). - And since n gt 30, we know that if we make a
frequency distribution of all potential sample
means, it will be normally distributed. - Further, the mean of this distribution of sample
means is equal to the population mean. - Further, since we know that ? 20, the standard
deviation of this distribution of sample means
is
?
20
2
?x
v n
v100
83. Specify a Confidence Coefficient
- We dont actually say
- The true ? is probably somewhere between 25-3
(22) and 253 (28). - What we do say
- We are x confident that the true ? is somewhere
between 25-3 (22) and 253 (28). - That x is called the confidence coefficient and
it is usually set at 99, 95 or 90 (you
decide). - Lets say we use 95 for this example.
9The confidence interval
- The higher the confidence coefficient, the wider
would be our confidence interval. - For example
- If the confidence coefficient was 90, the
confidence interval might be 24-26. - If the confidence coefficient was 95, the
confidence interval might be 23-27. - If the confidence coefficient was 99, the
confidence interval might be 22-28.
10What the confidence coefficient is
- It turns out that this sample had a mean of 82.
But we acknowledge that if we took another sample
of 100, we would probably get a mean of something
different (e. g., 81, 83, 80, etc). - An interval estimate can be calculated from each
of these point estimates and each of these
intervals would be slightly different. - In fact, if there were (N)! potential samples of
100, then there are (N)! possible interval
estimates. - The confidence coefficient is the percentage of
those interval estimates that contain the true
population parameter ?.
11Confidence Intervals from numerous samples.
We want to specify a width for our confidence
interval so that 95 of all possible confidence
intervals will contain the populations true
value.
x86
x85
x84
x83
TRUE POPULATION VALUE
x.82
x81
x80
x79
x78
12Calculate a width of the confidence interval
- The next thing we must do is to calculate a
width for our confidence interval so that 95 of
all possible confidence intervals will contain
the populations true value. - How wide must that be?
- The answer will (at first) be expressed as a
z-value.
Confidence Coefficent ? 95 ?
82
134. Calculate z?/2
- If the confidence coefficient is 95, we are
leaving out 5 of the samples - 2.5 in each of the two tails.
- To express that percentage as distance in zs
- We define alpha (?) as 1 - .95
- ? .05
- ? / 2 .025
- Using Excels normsinv(.025 ) z -1.96
- The width of the confidence interval should
extend 1.96 standard deviations in each direction
from the mean to encompass 95 of the area of the
distribution.
14If Confidence Coefficient .95, Then ? .05,
z 1.96
? This width encompasses 95 of all sample means
?
?/2 .025
?/2 .025
Z - 1.96
Z 1.96
This is the distribution of sample means
155. Calculate the Margin of Error (E)
- z is then multiplied by the standard deviation
of this distribution of sample means to get an
exact value for the margin of error.
(20)
?
E z ?/2
1.96
?100
?n
3.92
166. Calculate the confidence interval
- The margin of error (E) is both added and
subtracted from the mean to calculate the
confidence interval.
(82 3.92) lt ? lt (82 3.92) 78.08 lt ? lt
85.92
- Conclusion we are 95 confident that the true
population mean ? falls between 78.08 and 85.92.
17Example 2Develop Confidence Interval for ? (?
known)
- In order to help identify baby growth patterns
that are unusual, we need to construct a
confidence interval estimate of the mean head
circumference of all babies that are two months
old. A random sample of 100 babies is obtained,
and the mean head circumference is found to be
40.6 cm. Assuming that the population ? is known
to be 1.6 cm, find a 99 confidence interval
estimate of the mean head circumference of all
two-month-old babies.
181. Identify/Obtain inputs
- n 100
- x 40.6 cm
- ? 1.6 cm.
192. Verify assumptions regarding the distribution
of means
- Since n gt 30, we know that if we make a frequency
distribution of all potential sample means, it
will be normally distributed. - Further, the mean of this distribution of sample
means is equal to the population mean. - Further, since we know that ? 1.6, the standard
deviation of this distribution of sample means
is
?
1.6
.16
?x
v n
v100
203. Specify a Confidence Coefficient
- A 99 confidence coefficient has been set.
214. Calculate z?/2
- ? 1 - .99
- ? .01
- ?/2 .005
- z?/2 normsinv(.005) -2.576
225. Calculate the Margin of Error (E)
- z is then multiplied by the standard deviation
of this distribution of sample means to get an
exact value for the margin of error.
(1.6)
?
E z ?/2
2.576
?100
?n
.4
236. Calculate the confidence interval
- The margin of error (E) is both added and
subtracted from the mean to calculate the
confidence interval.
(40.6 .4) lt ? lt (40.6 .4) 40.2 lt ?
lt 41.0
- Conclusion we are 99 confident that the true
population mean ? falls between 40.2 and 41.0.
24Example 3Develop a Confidence Interval for ?
(? known)
- The health of the bear population in Yellowstone
National Park is monitored by periodic
measurements taken from anesthetized bears. A
sample of 54 bears has a mean weight of 182.9.
Assume that ? is known to be 121.8 lbs, find a
99 confidence interval estimate of the mean of
the population of all such bear weights.
251. Identify/Obtain inputs
- n 54
- x 182.9 lbs
- ? 121.8 lbs.
262. Verify assumptions regarding the distribution
of means
- Since n of 54 gt 30, we know that if we make a
frequency distribution of all potential sample
means, it will be normally distributed. - Further, the mean of this distribution of sample
means is equal to the population mean. - Further, since we know that ? 121.8, the
standard deviation of this distribution of sample
means is
?
121.8
16.57
?x
v n
v54
273. Specify a Confidence Coefficient
- A 99 confidence coefficient has been set.
284. Calculate z?/2
- ? 1 - .99
- ? .01
- ?/2 .005
- z?/2 normsinv(.005) -2.576
295. Calculate the Margin of Error (E)
- z is then multiplied by the standard deviation
of this distribution of sample means to get an
exact value for the margin of error.
(121.8)
?
E z ?/2
2.576
?54
?n
42.68 lbs.
306. Calculate the confidence interval
- The margin of error (E) is both added and
subtracted from the mean to calculate the
confidence interval.
182.9 42.68) lt ? lt (182.9 42.68) 140.22
lt ? lt 225.58
- Conclusion we are 99 confident that the true
population mean ? (weight of bears) falls between
140.22 lbs. and 225.58 lbs.
31Estimating a Population Mean ? when ? is unknown
- In the previous section, we estimated the
population mean when ? was known. But that
scenario would be unrealistic because we need the
? to calculate ?. - Again, the scenario involves obtaining one sample
from a population of samples and the
distribution of those sample means is normal
because either the underlying population
distribution is normal or the sample size (n) is
gt30.
32when ? is unknown
- When ? is not known, we must estimate ? from s
(the sample std deviation) and this introduces an
additional source of error into the calculation
of an interval estimate. - To compensate for that source of error, we have
to increase our margin of error which is going to
widen our interval estimate. - For example, if our margin of error was /- 3 if
? were known, it might be /- 4 when ? is
unknown. - So if the point estimate of a mean is 25, the
interval estimate (at 95) might be - (25-4) 21 lt ? lt 29 (254)
33The distribution of sample means when ? is
unknown
- When ? is unknown, we cannot assume our
distribution of sample means resembles the
standardized (z) distribution. - Instead we compensate for the additional source
of uncertainty by assuming that our distribution
of sample means resembles the (wider)
t-distribution.
34The t distribution wider than z
z
The way that the t-distribution works is the
smaller the sample size (n), the wider the curve
line.
35The margin of error formula (when using the t
distribution)
- When ? is unknown, the t-value is used instead of
the z-value.
?
s
E z ?/2
E t ?/2
?n
?n
36Calculating a t-value
- The way that the t-distribution works is the
smaller the sample size (n), the wider the curve
line. - Excels TINV( ) formula can be used but it
requires two arguments - tinv(?, n-1)
- The first argument is ? which is calculated as
1-confident coefficient. - The second argument is the degrees of freedom
which is calculated as n-1.
37The bears weight example
- In that example, ? was known to be 121.8. With a
99 confidence coefficient, z was 2.576 - z normsinv(?/2)
- z normsinv(.005) 2.576
- If ? had been unknown but it was estimated with s
121.8, then t would have been - t tinv(?, n-1)
- t tinv(.01, 54-1) 2.672
- Now if n had been lower (say 30), t would have
been - t tinv(.01, 30-1) 2.756
Notice that ? is not divided by 2
38Which distribution to use?
39Which distribution should be used?
- 2. If n 10, ? is unknown, population appears to
be normally distributed. - 4. If n 45, ? is known, population appears to
be very skewed. - 6. If n 9, ? is known, population appears to be
very skewed. - 8. If n 37, ? is unknown, population appears to
be normally distributed.
40Example 1Developing a confidence interval for ?
(? unknown)
- A study was conducted to estimate hospital costs
for accident victims who were wearing seat belts.
Twenty randomly-selected cases have a
distribution that appears to be bell-shaped with
a mean of 9004 and a standard deviation of
5629. - A. Construct the 99 confidence interval for the
mean of all such costs. - B. If you are a manager for an insurance company
that provides lower rates for drivers who wear
seat belts and you want a conservative estimate
for a worse case scenario, what amount should you
use as the possible hospital cost for an accident
victim who wears seat belts?
411. Identify/Obtain inputs
- n 20
- x 9004
- ? is estimated by s 5629
422a. Which distribution should you use?
- ? is unknown
- n 20
- But we are told that the population has a
distribution which appears to be bell-shaped so
well assume that the underlying population is
normal, so the distribution of means is normal
even though n lt 30. - So use the t distribution
432b. Verify assumptions regarding the distribution
of means
- We have already concluded that since the
underlying population is normal, the distribution
of means is normally distributed. - Further, the mean of this distribution of sample
means is equal to the population mean. - Further, since we know that ? is estimated by s
5629, the standard deviation of this
distribution of sample means is
?
5629
1258.68
?x
v n
v20
443. Specify a Confidence Coefficient
- A 99 confidence coefficient has been set.
454. Calculate t?/2
- ? 1 - .99
- ? .01
- t?/2 tinv(.01, 20-1) 2.861
- (Notice that the first argument, ? is not divided
by 2 as it is with the normsinv( ) formula. The
tinv( ) formula will automatically divide ? by 2.)
465. Calculate the Margin of Error (E)
- t is then multiplied by the standard deviation
of this distribution of sample means to get an
exact value for the margin of error.
(5629)
?
E t?/2
2.861
?20
?n
3601.01 lbs.
476. Calculate the confidence interval
- The margin of error (E) is both added and
subtracted from the mean to calculate the
confidence interval.
(9004 3601) lt ? lt (9004 3601) 5,403 lt
? lt 12,605
- Conclusion we are 99 confident that the true
population mean ? (average medical cost of a car
crash victim who was not wearing a seat belt)
falls between 5,403 and 23,605.
48Question B
- B. If you are a manager for an insurance company
that provides lower rates for drivers who wear
seat belts and you want a conservative estimate
for a worse case scenario, what amount should you
use as the possible hospital cost for an accident
victim who wears seat belts? - 12,605
49Determining Sample Size
- In the previous examples, we had a pre-determined
sample size and we used the sample size along
with the mean, standard deviation, and confidence
coefficient to calculate a margin of error and
then a confidence interval estimate of a
population parameter (?) based on a sample
statistic (x). - What if we want to fix the margin of error to be
no more than a certain amount? We can do that by
making the sample size sufficiently large. - In this section, we calculate how large the
sample size has to be in order to achieve a
certain (minimal) margin of error that we will
use in our interval estimates.
50The sample size formula
- Using this formula, we can achieve a desired
margin of error at a chosen confidence level.
2
z?/2 ?
n
E
- E is the margin of error that we are willing to
accept. - z?/2 will reflect the chosen confidence level.
- ? may or may not be known. If not, estimate it
from s or do a separate pilot study to get an
estimate.
51Example 1Determine sample size to estimate a
population mean
- Assume that we want to estimate the mean IQ score
for the population of statistics professors. How
many statistics professors must be randomly
selected for IQ tests if we want a 95 confidence
that the sample mean is within 2 IQ points of the
population mean? - Recall that the formula requires the population
?. We can either use an overly-conservative
guestimate or use the samples std deviation (s).
52Determine Sample Size
- The confidence coefficient is set at 95, so z?/2
is 1.96. - The margin of error we are willing to accept (E)
2 IQ points. - ? 15. (We know that 15 is the ? for the
general population and so we use this as our
overly-conservative estimate.)
2
2
z?/2 ?
1.96 (15)
n
? 217
E
2
53Example 2Determine sample size to estimate a
population mean
- The Tyco Video Game Corporation finds that it is
losing income because of slugs used in video
games. The machines must be adjusted to accept
coins only if they fall within set limits. In
order to set those limits, the mean weight of
quarters in circulation must be estimated. A
sample of quarters will be weighed in order to
determine the mean. How many quarters must we
randomly select and weigh if we want to be 99
confident that the sample mean is within .025
grams of the true population mean for all
quarters? Based on results from the sample of
quarters, we estimated the population standard
deviation to be .068 g.
54Determine Sample Size
- Desired Confidence Coefficient 99
- Desired Margin of Error (E) .025 grams
- Population ? .068 grams
2
2
z?/2 ?
2.575 (.068)
n
? 50
E
.025
55Estimating a population proportion (P)
- A proportion is a percentage expressed as a
decimal value. - A proportion (P) is a way of referring to to the
percentage of cases (e. g. people or subjects)
that have some characteristic or opinion (the
successes). - For example, we may want to make inferences about
the proportion of people who favor gun control,
support the death penalty, use email, or that
watched a particular TV show. - In these situations, we are trying to discover
what P is, where P is the percentage/proportion
of people who posses the characteristic or
opinion of interest.
56P is the random variable
- In this scenario, P the proportion of the
population (of size N) that have this
characteristic or opinion (the successes) is
the unknown. - In the last chapter, P, the probability of having
a girl was known (50) and the P(having x girls)
was the unknown. In this scenario, P, the
probability / proportion is the unknown. - Theoretically speaking, if we took all possible
samples of size n, and got a p for each sample,
and created a probability distribution of ps,
that distribution would have an average value of
P. - In reality, we take one sample, and get one p and
then draw inferences about what P is from that
one p.
57Estimating a population proportion
- Here is the scenario 829 people who live in
Minnesota were asked if they were in favor of
using a photo-cop system that uses cameras to
ticket drivers. - 51 of the 829 said they were opposed to it. So
51 what we call the sample proportion (denoted
by p). - Can we say that the proportion of all Minnesotans
(P) that oppose photo-cop is 51?
58Formula for the Margin of Error (E)(when
estimating a proportion)
(.51) (.49)
(p)(q)
E z?/2
1.96
n
829
.0345
59We can now calculate the Confidence Interval
- The confidence interval .51 /- .034
- OR
- (p E) lt P lt (p E)
- .476 lt P lt .544
- We are 95 sure that the true population
parameter (P) is between 47.6 and 54.4
60Determining Sample Size when estimating a
population proportion
- How do you know how large of a sample you need
when gathering your data? - Its not just a matter of making it large enough
so that the sampling distribution is normal (the
n gt 30 rule). - Its also a function of what margin of error you
are willing to accept (e.g. .03, .04, etc.) - And what confidence level you chose (e. g., 90,
95, 99).
61The sample size formula for estimating population
proportions
z ?/2 2 (p)(q)
n
E2
- As you can see, it incorporates the margin of
error (E), the confidence level (1 - ?), and the
product of (p) (q).
62Calculating Sample Size when we have an estimated
p
- If we have some prior information suggesting what
p is, we use this formula
z ?/2 2 (p)(q)
n
E2
63Calculating Sample Size when we do NOT have an
estimated p
- If we have NO prior information suggesting what p
is, we use the value that maximizes p q (.5
.5 .25)
z ?/2 2 (.25)
n
E2
64Sample size for an Email Survey
- Suppose you want to do a survey to determine the
current percentage of US households that use
email. - How many households must be surveyed in order to
be 95 confident that the sample percentage is in
error by no more than 4. - A. In an earlier study (1997), the percentage was
16.9. Use this for your p. - B. Assume that we have no prior information
suggesting a possible value for p.
65A. Assume p .169
z ?/2 2 (p)(q)
1.96 2 (.169)(.831)
n
E2
.042
337.194 (round to 338)
66b. Assume p is unknown
z ?/2 2 (.25)
1.96 2 (.25)
n
E2
.042
600.25 (round to 601)
When p of .5 is used, the required sample size
almost doubles.
67Sample size ? of population size
- The population size is not used in the formula to
determine sample size. - Generally, this is the case unless we are
sampling a small population without replacement. - Neilson, for example surveys 4,000 TV households
from a population of 104 million households
(.004) and can still be 95 confident that the
sample p will be within 1 of the population P.
68Use the Confidence Intervals to find the point
estimate p and the margin of error E. Practice
exercise (p. 312)
- 10. (.278 lt p lt .338)
- 12. (.887 lt p lt .927)
69Find the point estimate p and the margin of error
E.
.278 .338
p
.308
2
.338 - .278
E
.03
2
70Find the point estimate p and the margin of error
E.
.887 .927
p
.907
2
.927 - .887
E
.02
2
71Find the margin of error E that corresponds to
the given statistics and confidence
levelPractice exercise p. 312
- 14. n 1200, x 400, CL 99
- 16. CL 95, sample size 500, 80 are successes
72Find the margin of error E that corresponds to
the given statistics and confidence level
- 14. n 1200, x 400, CL 99
- If CL 99, then z 2.575
- If x 400 and n 1200, then p .33, q .67
(.33) (.67)
(p)(q)
E z ?/2
2.575
n
1200
.0350
73Find the margin of error E that corresponds to
the given statistics and confidence level
- 16. 95 CL, sample size is 500, 80 are
successes. - If CL 95, then z 1.96
- p .80, q .20, n 500
(.80) (.20)
(p)(q)
E z ?/2
1.96
n
500
.0351
74Construct the confidence interval estimate of the
population proportion PPractice Exercise (P. 312)
- 18. n 1200, x 200, CL 99
- 20. n 2001, x 1776, CL 90
75Construct the confidence interval estimate of the
population proportion P
- 18. n 1200, x 200, CL 99
- p 200/1200 or .167, q .833, z 2.575
(.167) (.833)
(p)(q)
E z ?/2
2.575
n
1200
.028
(.167 .028) lt P lt (.167 .028) .139 lt
P lt .194
76Construct the confidence interval estimate of the
population proportion P
- 20. n 2001, x 1776, CL 90
- p 1776/2001 .887, q .113, z 1.645
(.887) (.113)
(p)(q)
E z ?/2
1.645
n
2001
.0116
(.887 .0116) lt P lt (.887 .0116) .876 lt
P lt .899
77Use the given data to find the minimum sample
size to estimate a population proportion or
percentagePractice Exercise (p. 312)
- 22. E .038, CL 95, p and q unknown
- 24. E .03, CL 90, estimated p .08
78Use the given data to find the minimum sample
size to estimate a population proportion or
percentage
- 22. E .038, CL 95, p and q unknown
z ?/2 2 (p)(q)
1.96 2 (.25)
n
E2
.0382
665.10 (round to 666)
79Use the given data to find the minimum sample
size to estimate a population proportion or
percentage
- 24. E .03, CL 90, estimated p .08
z ?/2 2 (p)(q)
1.645 2 (.08)(.92)
n
E2
.032
221.29 (round to 222)
80Practice Exercise (p. 313 - 28)
- In a Gallup poll, 491 randomly-selected adults
were asked whether they are in favor of the death
penalty for a person convicted of murder, and 65
said they were in favor of the death penalty. - A. Find the point estimate of the percentage of
adults who are in favor of the death penalty. - B. Find a 95 confidence interval of the
percentage of adults who are in favor of the
death penalty. - C. Can we safely conclude that the majority of
adults are in favor of this death penalty?
Explain.
81A B Find the point estimate and confidence
Interval of the percentage of adults who are in
favor of the death penalty.
- A. The point estimate is 65
- B. The confidence interval is
(.65) (.35)
(p)(q)
E z ?/2
1.96
n
491
.04
(.65 .04) lt P lt (.65 .04) .61 lt P lt
.69
82C. Can we safely conclude that the majority of
adults are in favor of this death penalty?
Explain.
- Yes, since the entire interval is above 50.
83Homework 13
- 6 on page 304
- develop confidence interval
- 18 on page 313
- develop confidence interval
- 28 on page 316
- sample size
- 38 on page 321
- confidence interval and sample size