Title: Chapters 2325
1Chapters 23-25
- Inferences about Means, and a little on 2-Sample
and Paired t-tests.
2Inference for Means
- Weve talked about inference for proportions (Cis
and HTs), but what about means? - What if I want to use a sample of student heights
to estimate the true average height of CCSU
students? How do we do that? - The procedures will be very similar to those for
proportions! - Recall what we learned about the sampling
distribution for means, though
3Standard Error for xbar
- The CLT told us that the sampling distribution of
xbar is - But, we have a problem, even if we want to
estimate mu! - In the real world, we dont usually know sigma,
the population SD! - So, how can we use the sampling distribution?
- Answer The same way we did with proportions,
by using a Standard Error instead of sigma(phat)
or sigma(xbar)!
4SE(xbar)
So, if we dont know sigma, whats a reasonable
substitute for it? Answer the only standard
deviation we do know, that of the SAMPLE, s. So,
Everything else proceeds the same as it did
for proportions with the Confidence Interval and
Hypothesis Tests, except for one thing
5The problem!
Unlike with proportions, when we are dealing with
the sampling distribution of means and use
Standard Error (SE(xbar)), the distribution is
NOT NORMAL!!! With proportions, we were only
estimating one unknown, p. Now, though, we are
dealing with TWO unknown parameters, mu(xbar)
and sigma(xbar) Instead of were now using
This is a LOT more variable, since s varies
with each sample! So what do we do now???
6Students T Distribution
William Gossett, a Quality Control Engineer for
Guiness (yes, the Beer) was struggling with this
problem. After a lot of research on beer, he
figured out what came to be known as Students
T distribution, which is the sampling
distribution of the quantity known as Students
t (whereas the Standard Normal Distribution is
the distribution of z-scoreS) Students t looks
a LOT like a z-score (you need to know this!!)
7The distribution of t
Unlike the Standard Normal Curve, which is a
single curve, the sampling distribution of t is
actually a family of distributions. Since s
varies from sample to sample, but varies LESS as
n increases, there is a slightly different
t-distribution for EACH SAMPLE SIZE. We can
specify the particular distribution using what is
known as the degrees of freedom, df. df
n-1 Thankfully, this is easier than it
sounds. But first, some notes about the
t-distribution shape.
8The t-distribution
The t-distribution Is very similar in shape to
the Normal Curve (symmetric, unimodal,
essentially bell-shaped) Has a spread GREATER
than the Normal Distribution, with more
probability (more area) in the tails. As sample
size (n) increases, the t-distribution gets
closer and closer to Normal When should we use
the t-distribution? Nearly always when youre
dealing with sample means- whenever you dont
know sigma, the population SD.
9Assumptions for the t-distribution
Assumptions for the t-distribution are very
similar to those to use the Normal model for
sampling distributions 1) Random sample 2) n lt
10 of population 3) Nearly normal
population Rule of thumb if nlt15, data
should be nearly Normal if 15ltnlt40, data
should be unimodal and symmetric if ngt 40,
t-distribution safe to use even if data is
skewed. Thankfully, other than changing from z
to t, CIs and HTs work nearly the same way as
they did for proportions (easier in some cases!!)
10One sample t-interval for Means(Confidence
Interval)
This looks nearly the same as for
proportions!! xbar /- t SE(xbar) where
SE(xbar) Ie, an estimate /- ME Where do we
get t from? From the t-table in the
textbook. Flip to the page after the Z-table- on
the right page, youll see the T-table!
11An example 10
A random sample of 41 days of daily fees for a
garage averaged 126, with a standard deviation
of 15. Give a 90 confidence interval for the
mean daily income this garage will
generate. What are we trying to estimate? Mu,
the true mean daily income. Known xbar
126 s 15 n 41 confidence level
90. Assumptions Random sample? yes. n lt 10
of population? yes, less than 10 of all
days nearly normal? Sample sizegt40, so ok. So
our CI will be 126 /- t (15/sqrt(41)) But what
is t?
12Finding t on the table
In this problem, since n41, df40 (since
dfn-1). Looking in the t-table, you should see
confidence levels written at the bottom. Find the
column for 90 (our confidence level) Go up the
column until you get to the row for df40. There
you will find t 1.684. So our CI is now 126
/- 1.684 (15/sqrt(41)) This gives a confidence
interval of 121.05 lt mu lt 130.95 So we are 90
confident that the true mean daily fee collected
is between 121.05 and 130.95.
13More practice finding t
Say we have n6, and want to make a 95 CI. What
t do we use? df6-1 5. using the t-table, we
find t 2.571 Say now n25, and we want a 98
CI. Find t. df25-1 24 using the t-table, we
find t2. 492. So the ONLY CHANGE in the CI
procedure is how we find the critical value (t
in this case). This is easier than it was for
proportions!!
14On to Hypothesis tests
Name The One-sample t-test for
means!! Likewise, HTs work the same as
before State Ho and Ha Check
Assumptions Calculate t (where we used to
calculate z), where Find the P-value (this
is a little different) State your conclusion-
reject or fail to reject Ho.
15An example for HT 22
A company with a large fleet of cars hopes to
keep gas costs down and set a goal of attaining a
fleet average of at least 26 mpg. They take a
random sample of 51 trips and find a mean of
25.02 mpg and a standard deviation of 4.83 mpg.
Is this strong evidence that they have failed to
attain their goal? Known xbar25.02 s4.83 n5
1 df50 We are testing mu, the true mean gas
mileage for the company fleet. State Ho and
Ha Ho mu 26mpg Ha mu lt 26 mpg
1622
Check assumptions Random? Yes nlt10?
Reasonable to assume nearly normal? Ngt40, so
ok. Test run one sample t-test for the
mean. Calculate t Find the P-value- here is
where things are a little different.
1722
Heres how your model should look Were
interested in the area to the left of our t-value.
Looking on the t-table, go to the ROW for
df50. Find the two values of t that our
calculated t falls between (use the absolute
value of our t, since the curve is symmetric).
Then look to the TOP of each column and find the
one-tail probability (since this is a one-sided
Ha. Had it been a 2-sided Ha, use the 2-tailed
probability s) Looking on the table, I find
that our t falls between 1.299 and 1.676, meaning
our P-value is between 0.10 and 0.05. At an
alpha level of 0.05, we fail to reject the Ho.
There is evidence to suggest that the company has
met its goal.
18 - A coffee machine dispenses coffee into paper
cups. Youre supposed to get 10 ounces of coffee,
but the amount varies from cup to cup. You take a
random sample of 20 cups of coffee, and find a
mean of 9.845 oz, and standard deviation of
0.199. - Is the machine short-changing customers?
19CIs and HTs
With sample means, HTs and CIs are related A
CI at level C confidence will contain ALL the Ho
values that would not be rejected by a 2-tailed
HT. Ie, if you fail to reject the null, that
means that a CI with the corresponding confidence
level would contain that value for the mean in
the interval.
20Chapter 24 and 25
When we do experiments, we end up with at least
two groups (assuming good randomized comparative
design!). And we are usually interested in if the
experiment worked or not. For example, did Drug
XXX lower cholesterol levels? In this case,
were not so much interested in the mean
cholesterol levels as we are in the DIFFERENCE
between the two population means. Ie, now we
want to estimate the true difference between
means, m1 - m2 Since we are out of time,
there are only two things I want you to know from
these two chapters Which test or interval do I
use?
21Independent vs Dependent samples
When we design an experiment, there are typically
two ways of doing so 1) have two completely
independent groups, one receiving each
treatment. 2) do before and after tests on the
same individuals (dependent samples). This same
effect can be accomplished using twins as well
(or otherwise closely paired subjects). What we
saw in our probability section was that
probabilities depended on whether or not the
outcomes were independent or not. The same
problem is here, and there are two different
tests to use, one for Independent samples and one
for Dependent samples.
222-sample test vs Paired t-test
If we have two samples that are independent, we
will use the 2-sample t-test. If the two
samples are different sizes or
otherwise not linked to each other, such as two
randomly assigned treatment groups then we
have independent samples.
23Paired samples
If we have before and after measurements of
the same people (or objects), then our two
samples (before and after) are LINKED, and
therefore DEPENDENT. Also known as
PAIRED. Groups of twins (or otherwise carefully
matched subjects) where one twin is given
treatment A and the other treatment B are also
dependent (since the twins are related to each
other/linked). In these cases, we use a Paired
t-test
24Examples Which test to use?
We seeded 26 clouds and compared the rainfall to
that from 26 unseeded clouds. Are the two groups
of clouds related at all? No Use the 2-sample
test. We recorded temperatures in January and
July for 10 cities. Are the two groups of
temperatures related? Yes- they are on the same
cities! Use the paired t-test. Consumer
reports tests 17 brands of poultry hot dogs and
20 brands of beef hot dogs to see the difference
in the amount of sodium. The two groups are not
the same size- we MUST use the 2-sample test.
25More examples
Having done poorly on their math final exams in
May, 6 students repeat the class in summer school
and take another exam in July. Did summer school
improve their math skills? These are before and
after measurements use a paired t-test.