Title: Testing Multiple Means and the Analysis of Variance
1Testing Multiple Means and the Analysis of
Variance
- Situations where comparing more than two means is
important. - The approach to testing equality of more than two
means. - Introduction to the analysis of variance table,
its construction and use.
2Study Designs and Analysis Approaches
- Simple Random Sample from a population with known
s - continuous response. - Simple Random Sample from a population with
unknown s - continuous response. - Simple RandomSamples from 2 popns with known s.
- Simple Random Samples from 2 popns with unknown s.
- One sample z-test.
- One sample t-test.
- Two sample z-test.
- Two sample t-test.
3Sampling Study with tgt2 Populations
One sample is drawn independently and randomly
from each of t gt 2 populations.
Objective to compare the means of the t
populations for statistically significant
differences in responses.
Initially we will assume all populations have
common variance, later, we will test to see if
this is indeed true. (Homogeneity of variance
tests).
4Sampling Study
Vegetarians
Meat Potato Eaters
Health Eaters
Random Sample
Random Sample
Random Sample
Cholesterol Levels
5Experimental Studywith tgt2 treatments
Experimental Units nt samples are independently
and randomly drawn from one population. Because
of this, we can safely assume that each sample
has the same mean and variance.
Separate treatments are applied to each sample.
A treatment is something done to the experimental
units which would be expected to change the
distribution (usually only the mean) of the
response(s).
6Experimental Study
Male College Undergraduate Students
Veg. Diet
Health Diet
Random Sampling
M P Diet
Set of Experimental Units
Set of Experimental Units
Set of Experimental Units
Responses
7Hypothesis
Let mi be the true mean of treatment group i (or
population i ).
Hence we are interested in whether all the groups
(populations) have exactly the same true means.
The alternative is that some of the groups
(populations) differ from the others in their
means.
8A Simple Model
Let yij be the response for experimental unit j
in group i. i1,2, ..., t j1,2, ..., ni
Another way of saying that we expect the group
mean to be mi.
Let eij yij - mi be the residual/ deviation
from the group mean.
Each population has normally distributed
responses around their own means, but the
variances are the same across all populations.
Assuming yij N(mi, s2), then eij N(0, s2)
If H0 holds, yij m0 eij , that is, all groups
have the same mean and variance.
9A Naïve Testing Approach
Test each possible pair of groups by performing
all pair-wise t-tests.
- Assume each test is performed at the a0.05
level. - The probability of not rejecting Ho when Ho is
true is 0.95 (1-a). - The probability of not rejecting Ho when Ho is
true for all three tests is (0.95)3 0.857. - Thus the true significance level for the overall
test of no difference in the means will be
1-0.857 0.143, NOT the a0.05 level we thought
it would be.
1
In each individual t-test, only part of the
information available to estimate the underlying
variance is actually used. This is inefficient -
WE CAN DO MUCH BETTER!
2
10Testing Approaches - Analysis of Variance
The term analysis of variance comes from the
fact that this approach compares the variability
observed among sample means to a pooled estimate
of the variability among observations within each
group.
11Extreme Situations
12Pooled Variance
From two-sample t-test with assumed equal
variance, s2, we produced a pooled
(within-group) sample variance estimate.
13Variance among Group Means
Consider the variance among the group means
computed as
If we assume each group is of the same size, say
n, then under H0, s is an estimate of s2/n.
Hence, n times s is an estimate of s2. When the
sample sizes are unequal, the estimate is given
by.
14F-test
Now we have two estimates of s2. An F-test can
be used to determine if the two statistics are
equal. Note that if the groups truly have
different means, sb2 will be greater than sw2.
Hence the F-statistics is written as
If H0 holds, the computed F-statistics should be
close to 1. If HA holds, the computed F-statistic
should be much greater than 1. We use the
appropriate critical value from the F - table to
help make this decision.
Hence,the F-test is really a test of equality of
means under the assumption of normal populations
and homogeneous variances.
15Partition of Sums of Squares
SSB
SSW
TSS
Total Sums of Squares
Sums of Squares Between Means
Sums of Squares Within Groups
16The AOV (Analysis of Variance) Table
The computations needed to perform the F-test for
equality of variances are organized into a table.
17Example-Excel
average(b6b10) var(b6b10) sqrt(b13) count(b6
b10) (B15-1)B13
(sum(B15D15)-1)var(B6D10) sum(b16d16) b18-b
19
18Excel Analysis Tool Pac
19Example SAS
proc anova
class popn
model resp popn
title 'Table 13.1 in Ott -
Analysis of Variance' run
Table 13.1 in Ott - Analysis of Variance
31
Analysis of Variance Procedure
Dependent Variable RESP
Sum of Mean
Source DF
Squares Square F Value Pr gt F
Model
2 2.03333333 1.01666667 5545.45
0.0001
Error
12 0.00220000 0.00018333
Corrected Total 14 2.03553333
R-Square
C.V. Root MSE RESP Mean
0.998919 0.247684 0.013540
5.466667
Source
DF Anova SS Mean Square F Value
Pr gt F
POPN
2 2.03333333 1.01666667
5545.45 0.0001
20GLM in SAS
General Linear Models Procedure
Dependent Variable RESP
Sum of Mean
Source DF
Squares Square F Value Pr gt F
Model
2 2.03333333 1.01666667 5545.45
0.0001
Error
12 0.00220000 0.00018333
Corrected Total 14 2.03553333
R-Square
C.V. Root MSE RESP Mean
0.998919 0.247684 0.013540
5.466667
Source
DF Type I SS Mean Square F Value
Pr gt F
POPN
2 2.03333333 1.01666667
5545.45 0.0001
Source DF Type III SS
Mean Square F Value Pr gt F
POPN 2
2.03333333 1.01666667 5545.45 0.0001
T for H0 Pr gt T Std Error of
Parameter Estimate
Parameter0 Estimate
INTERCEPT
5.000000000 B 825.72 0.0001
0.00605530 POPN 1
0.900000000 B 105.10 0.0001
0.00856349 2
0.500000000 B 58.39 0.0001
0.00856349 3
0.000000000 B . . .
NOTE The
X'X matrix has been found to be singular and a
generalized inverse was used to solve
the normal equations. Estimates followed by the
letter 'B' are biased, and are not
unique estimators of the parameters.
proc glm
class popn
model resp popn / solution
title 'Table 13.1 in Ott
run
21Minitab Example
STAT gt ANOVA gt OneWay (Unstacked)
One-way Analysis of Variance Analysis of
Variance Source DF SS MS
F P Factor 2 2.033333 1.016667
5545.45 0.000 Error 12 0.002200
0.000183 Total 14 2.035533
Individual 95 CIs For Mean
Based on Pooled
StDev Level N Mean StDev
--------------------------------- EG1
5 5.9000 0.0158
( EG2 5 5.5000 0.0071
) EG3 5 5.0000 0.0158
(
--------------------------------- Pooled
StDev 0.0135 5.10 5.40
5.70 6.00