Title: Factorial Experiments
1Chapter 9
2Section 9.1 One-Factor Experiments
- In general, a factorial experiment involves
several variables. - One variable is the response variable, which is
sometimes called the outcome variable or the
dependent variable. - The other variables are called factors.
- The question addressed in a factorial experiment
is whether varying the levels of the factors
produces a difference in the mean of the response
variable.
3More Basics
- If there is just a single factor, then we say
that it is a one-factor experiment. - The different values of the factor are called the
levels of the factor and can also be called
treatments. - The objects upon which measurements are make are
called experimental units. - The units assigned to a given treatment are
called replicates.
4Fixed Effects or Random Effects
- If particular treatments are chosen deliberately
by the experimenter, rather than at random from a
larger population of treatments, then we say that
the experiment follows a fixed effects model. An
example is when we are testing four tomato
fertilizers, then we have four treatments that we
have chosen to test. - In some experiments, treatments are chosen at
random from a population of possible treatments.
In this case, the experiment is said to follow a
random effects model. An example is an
experiment to determine whether or not different
flavors of ice cream melt at different speeds.
We test a random sample of three flavors for a
large population of flavors offered to the
customer be a single manufacturer. - The methods of analysis for these two models is
essentially the same, although the conclusions to
be drawn from them differ.
5Completely Randomized Experiment
- Definition A factorial experiment in which
experimental units are assigned to treatments at
random, with all possible assignments being
equally likely, is called a completely randomized
experiment. - In many situations, the results of an experiment
can be affected by the order in which the
observations are taken. - The ideal procedure is to take the observations
in random order. - In a completely randomized experiment, it is
appropriate to think of each treatment as
representing a population, and the responses
observed for the units assigned to that treatment
as a simple random sample from that population.
6Treatment Means
- The data from the experiment thus consists of
several random samples, each from a different
population. - The population means are called treatment means.
- The questions of interest concern the treatment
means whether they are all equal, and if not,
which ones are different, how big the differences
are, and so on. - To make a formal determination as to whether the
treatment means differ, a hypothesis test is
needed.
7One-Way Analysis of Variance
- We have I samples, each from a different
treatment. - The treatment means are denoted ?1,, ?I.
- The sample sizes are denoted J1,, JI.
- The total number in all the samples combined is
denoted by N, N J1 JI. - The hypothesis that we wish to test is
H0 ?1 ?I versus
H1 two or more of
the ?i are different - If there were only two samples, we might use the
two-sample t test to test for the null
hypothesis. - Since there are more than two samples, we use a
method known as one-way analysis of variance
(ANOVA).
8Notation Needed
- Since there are several samples, we use a double
subscript to denote the observations. - Specifically, we let Xij denote the jth
observation in the ith sample. - The sample mean of the ith sample
- The sample grand mean
9Example
- Question For the data in Table 9.1, find I,
J1,,JI, N, X23, , and - Insert Table 9.1 here.
10Example (cont.)
- Answer There are four samples, so I 4. Each
sample contains five observations, so J1 J2 J3
J4 5. The total number of observations is N
20. The quantity X23 is the third observation in
the second sample, which is 267. The quantity
is the sample mean of the third sample. This
value is presented in Table 9.1 and is 271.0. We
can use the equation on the previous slide
11Treatment Sum of Squares
- The variation of the sample means around the
sample grand mean is measured by a quantity
called the treatment sum of squares (SSTr), which
is given by - Note that each squared distance is multiplied by
the sample size corresponding to its sample mean,
so that the means for the larger samples count
more. - SSTr provides an indication of how different the
treatment means are from each other. - If SSTr is large, then the sample means are
spread widely, and it is reasonable to conclude
that the treatment means differ and to reject H0.
- If SSTr is small, then the sample means are all
close to the sample grand mean and therefore to
each other, so it is plausible that the treatment
means are equal.
12Error Sum of Squares
- In order to determine, whether SSTr is large
enough to reject H0, we compare it to another sum
of squares, called the error sum of squares
(SSE). - SSE measures the variation in the individual
sample points around their respective sample
means. - This variation is measured by summing the squares
of the distances from each point to its own
sample mean. - SSE is given by
13Comments
- The term that is squared in the formula for SSE
is called a residual. - Therefore, SSE is the sum of the squared
residuals. - SSE depends only on the distances of the sample
points from their own means and is not affected
by the location of the treatment means relative
to one another. - So, SSE measures only the underlying random
variation in the process being studied. - An easier computational formula is
14Assumptions for the One-Way ANOVA
- The standard one-way ANOVA hypothesis test are
valid under the following conditions - The treatment populations must be normal.
- The treatment populations must all have the same
variance, which we will denote by ?2. - To check
- Look at a normal probability plot for each sample
and see if the assumption of normality is
violated. - The spreads of the observations within the
various samples can be checked visually by making
a residual plot.
15The F test for One-Way ANOVA
- To test H0 ?1 ?I versus
H1 two or more of the
?i are different - Compute SSTr.
- Compute SSE.
- Compute MSTr SSTr/(I 1) and MSE SSE/(N
I). - Compute the test statistic F MSTr / MSE.
- Find the P-value by consulting the F table (Table
A.7 in Appendix A) with I 1 and N I degrees
of freedom. - Note The total sum of squares, SST SSTr
SSE.
16Confidence Intervals for the Treatment Means
- A level 100(1 - ?) confidence interval for ?i is
given by
17Computer Output
- Insert output on p.629
- The fifth column is the one titled F. This
gives the test statistic that we just discussed.
- The column P presents the P-value for the F
test. - Below the ANOVA table, the value S is the
pooled estimate of the error standard deviation,
?. - Sample means and standard deviations are
presented for each treatment group, as will as a
graphic that illustrates a 95 CI for each
treatment mean.
18Balanced versus Unbalanced Designs
- When equal numbers of units are assigned to each
treatment, the design is said to be balanced. - With a balanced design, the effect of unequal
variances is generally not great. - With an unbalanced design, the effect of unequal
variances can be substantial. - The more unbalanced the design, the greater the
effect of unequal variances.
19Random Effects Model
- If the treatments are chosen at random from a
population of possibly treatments, then the
experiments are said to follow a random effects
model. - In a random effects model, the interest is in the
whole population of possible treatments and there
is no particular interest in the ones that happen
to be chosen for the experiment. - There is an important difference in
interpretation between the results of a fixed
effects model and those of a random effects
model. - In a fixed effects model, the only conclusions
that can be drawn are conclusions about the
treatments actually used in the experiment.
20Conclusions with Random Effects Model
- In a random effects model, however, since the
treatments are a simple random sample from a
population of treatments, conclusions can be
drawn concerning the whole population, including
treatments not actually used in the experiment. - In the random effects model, the null hypothesis
of interest is H0 the treatment means are equal
for every level in the population. - Although the null hypothesis for the random
effects model differs from that of the fixed
effects model, the hypothesis test is exactly the
same.
21Section 9.2 Pairwise Comparisons in One-Factor
Experiments
- In a one-way ANOVA, an F test is used to test the
null hypothesis that all the treatment means are
equal. - If this hypothesis is rejected, we can conclude
that the treatment means are not all the same. - But the test does not tell us which ones are
different from the rest. - Sometimes an experimenter has in mind two
specific treatments, i and j, and wants to study
the difference ?i - ?j. - In this case, a method known as Fishers least
significant difference (LSD) method is
appropriate and can be used to construct
confidence intervals for ?i - ?j or to test the
null hypothesis that ?i - ?j 0. - At other times, the experimenter may want to
determine all the pairs of means that can be
concluded to differ from each other. - In this case a type of procedure called a
multiple comparisons method must be used. We
will discuss two such comparisons, the Bonferroni
method and the Tukey-Kramer method.
22Fishers Least Significant Difference Method for
Confidence Intervals and Hypothesis Tests
- The Fishers least significant difference
confidence interval, at level 100(1-?), for the
difference ?i - ?j is - To test the null hypothesis H0 ?i - ?j 0, the
test statistic is - If H0 is true, this statistic has a Students t
distribution with N I degrees of freedom.
Specifically, if - then is rejected at level ?.
23Output
- Insert output from p.646
- The output from Minitab presents the 95 Fisher
LSD CIs for each difference between treatment
means. - The values labeled Center are the differences
between pairs of treatment means. - The quantities labeled Lower and Upper are
the lower and upper bounds of the confidence
interval.
24Simultaneous Tests
- The simultaneous confidence level of 81.11 in
the previous output indicates that although we
are 95 confident that any given confidence
interval contains it true difference in means, we
are only 81.11 confident that all the confidence
intervals contain their true differences. - When several confidence intervals are hypothesis
tests are to be considered simultaneously, the
confidence intervals must be wider, and the
criterion for rejecting the null hypothesis more
strict, than in situations where only a single
interval or test is involved. - In this situations, multiple comparison methods
are used to produce simultaneous confidence
intervals or simultaneous hypothesis tests. - If level 100(1-?) simultaneous confidence
intervals are constructed for differences between
every pair of means, then we are confident at the
100(1- ?) level that every confidence interval
contains the true difference. - If simultaneous hypothesis tests are conducted
for all null hypotheses of the form H0 ?i - ?j
0, then we may reject, at level ?, every null
hypothesis whose P-value is less than ?.
25The Bonferroni Method for Simultaneous Confidence
Intervals
- Assume that C differences of the form ?i - ?j are
to be considered. The Bonferroni simultaneous
confidence intervals, at level 100(1-?), for the
C differences ?i - ?j are - We are 100(1-?) confident that the Bonferroni
confidence intervals contain the true value of
the difference ?i - ?j for all C pairs under
consideration.
26Bonferroni Simultaneous Hypothesis Tests
- To test the C null hypotheses of the form H0 ?i
- ?j 0, the test statistics are - To find the P-value for each test, consult the
Students t table with N I degrees of freedom,
and multiply the P-value found there by C. - Specifically, if
- then H0 is rejected at level ?.
27Disadvantages
- Although easy to use, the Bonferroni method has
the disadvantage that as C becomes large, the
confidence intervals become very wide, and the
hypothesis tests have low power. - The reason for this is that the Bonferroni method
is a general method, not specifically designed
for analysis of variance or for normal
populations. - In many cases C is fairly large, in particular it
is often desired to compare all pairs of means. - In these cases, a method called the Tukey-Kramer
method is superior, because it is designed for
multiple comparisons of means of normal
populations. - The Tukey-Kramer method is based on a
distribution called the Studentized range
distribution, rather that on the Students t
distribution. - The Studentized range distribution has two values
for degrees of freedom, which for the
Tukey-Kramer method are I and N I. - The Tukey-Kramer method uses the 1 - ? quantile
of the Studentized range distribution with I and
N I degrees of freedom, this quantity is
denoted qI,N I ,? .
28Tukey-Kramer Method for Simultaneous Confidence
Intervals
- The Tukey-Kramer level 100(1-?) simultaneous
confidence intervals for all differences ?i - ?j
are -
- We are 100(1-?) confident that the Tukey-Kramer
confidence intervals contain the true value of
the difference ?i - ?j for every i and j.
29Tukey-Kramer Method for Simultaneous Hypothesis
Tests
- To test all the null hypotheses of the form H0
?i - ?j 0 simultaneously, the test statistics
are - The P-value for each test is found by consulting
the Studentized range table (Table A.8) with I
and N I degrees of freedom. - For every pair of levels i and j for which
- the null hypothesis H0 ?i - ?j 0 is rejected
at level ?.
30Section 9.3 Two-Factor Experiments
- In one-factor experiments, the purpose is to
determine whether varying the level of a single
factor affects the response. - Many experiments involve varying several factors,
each of which may affect the response. - In this section, we will discuss the case in
which there are two factors. - The experiments are called two-factor
experiments. - If one factor is fixed and one is random, then we
say that the experiment follows a mixed model. - In the two factor case, the tests vary depending
on whether the experiment follows a fixed effects
model, a random effects model, or a mixed model.
Here we discuss methods for experiments that
follow a fixed effects model.
31Example
- A chemical engineer is studying the effects of
various reagents and catalysts on the yield of a
certain process. - Yield is expressed as a percentage of a
theoretical maximum. - Four runs of the process were made for each
combination of three reagents and four catalysts. - In the experiment in Table 9.2, there are two
factors, the catalyst and reagent. - The catalyst is called the row factor since its
values varies from row to row in the table. - The reagent is called the column factor.
- We will refer to each combination of factors as a
treatment (some call theses treatment
combinations). - Recall that the units assigned to a given
treatment are called replicates. - When the number of replicates is the same for
each treatment , we will denote this number by K. - When observations are taken on every possible
treatment, the design is called a complete design
or a full factorial design.
32Notes
- Incomplete designs, in which there are no data
for one or more treatments, can be difficult to
interpret. - When possible, complete designs should be used.
- When the number of replicates is the same for
each treatment, the design is said to be
balanced. - With two-factor experiments, unbalanced designs
are much more difficult to analyze than balanced
designs. - The factors may be fixed or random.
33Set-Up
- In a completely randomized design, each treatment
represents a population, and the observation on
that treatment are a simple random sample from
that population. - We will denote the sample values for the
treatment corresponding to the ith level of the
row factor and the jth level of the column factor
by Xij1,, XijK. - We will denote the population mean outcome for
this treatment by ?ij. - The values ?ij are often called the treatment
means. - In general, the purpose of a two-factor
experiment is to determine whether the treatment
means are affected by varying either the row
factor, the column factor, or both. - The method of analysis appropriate for two-factor
experiments is called the two-way analysis of
variance.
34Parameterization for Two-Way Analysis of Variance
- For any level i of the row factor, the average of
all the treatment means ?ij in the ith row is
denoted . We express in terms of the
treatment means as follows - Similarly, for level j of the column factor, the
average of all the treatment means ?ij in the jth
row is denoted . We express in terms
of the treatment means as follows - We define the population grand mean, denoted by
?, which represents the average of all the
treatment means ?ij. The population grand mean
can be expressed in terms of the previous means
35More Notation
- Using the quantities we just defined, we can
decompose the treatment mean ?ij as follows - This equation expresses the treatment mean ?ij as
a sum of four terms. In practice, simpler
notation is used for the three rightmost terms in
the above equation.
36Interpretations
- The quantity ? is the population grand mean,
which is the average of all the treatment means. - The quantity ?i is called the ith row effect. It
is the difference between the average treatment
mean for the ith level of the row factor and the
population grand mean. The value of ?i indicates
the degree to which the ith level of the row
factor tends to produce outcomes that are larger
or smaller than the population mean. - The quantity ?j is called the jth column effect.
It is the difference between the average
treatment mean for the jth level of the column
factor and the population grand mean. The value
of ?j indicates the degree to which the jth level
of the column factor tends to produce outcomes
that are larger or smaller than the population
mean. - The quantity ?ij is called the ij interaction.
The effect of a level of a row (or column) factor
may depend on which level of the column (or row)
factor it is paired with. The interaction term
measures the degree to which this occurs. For
example, assume that level 1 of the row factor
tends to produce a large outcome when paired with
column level 1, but a small outcome when paired
with column level 2. In this case ?1,1 would be
positive, and ?1,2 would be negative.
37More Set-Up
- Both row effects and column effects are called
main effects to distinguish them from the
interactions. - Note that there are I row effects, one for each
level of the row factor, J column effects, one
for each level of the column factor, and IJ
interactions, one for each treatment. - Furthermore, based on the re-parameterizations,
the row effects, column effects, and interactions
must satisfy the following constraints - So, now we can write
. - For each observation Xijk, define ?ijk Xijk -
?ij, the difference between the observation and
its treatment mean. The quantities ?ijk are
called errors. - It follows that Xijk ?ij ?ijk ? Xijk ? ?i
?j ?ij ?ijk - When the interactions ?ij are equal to zero, the
additive model is said to apply. Under the
additive model, Xijk ? ?i ?j ?ijk. - When some or all of the interactions are not
equal to zero, the additive model does not hold,
and the combined effect of a row level and a
column level cannot be determined from their
individual main effects.
38Statistics
- The cell means are given by
- The row means are given by
- The column means are given by
- The sample grand mean is given by
39Estimating Effects
- We estimate the row effects by
- We estimate the column effect by
- We estimate the interactions with
40Using the Two-Way ANOVA to Test Hypotheses
- A two-way ANOVA is designed to address three main
questions - Does the additive model hold?
- We test the null hypothesis that all the
interactions are equal to zero H0 ?11 ?12
?IJ 0. If this null hypothesis is true, the
additive model holds. - If so, is the mean outcome the same for all
levels of the row factor? - We test the null hypothesis that all the row
effects are equal to zero H0 ?1 ?2 ?I 0.
If this null hypothesis is true, the mean
outcome is the same for all levels of the row
factor. - If so, is the mean outcome the same for levels of
the column factor? - We test the null hypothesis that all the column
effects are equal to zero H0 ?1 ?2 ?I 0.
If this null hypothesis is true, the mean
outcome is the same for all levels of the column
factor.
41Assumptions
- The standard two-way ANOVA hypothesis test are
valid under the following conditions - The design must be complete.
- The design must be balanced.
- The number of replicates per treatment, K, must
be at least 2. - Within each treatment, the observations are a
simple random sample from a normal population. - The population variance is the same for all
treatments. We denote this variation ?2. - Notation SSA is the sum of squares for the
rows. SSB is the sum of squares for the column.
The interaction sum of squares is SSAB, and the
error sum of squares is SSE. The sum of all of
these is the total sum of squares (SST).
42ANOVA Table
- Insert Table 9.5
- Note that SST SSA SSB SSAB SSE
43Mean Square Errors
- The mean square error for rows is
MSA SSA / (I 1). - The mean square error for columns is
MSB SSB / (J 1). - The mean square error for interaction is
MSAB SSAB / ((I 1)(J 1)). - The mean square error is MSE SSE / (IJ(K 1)).
- The test statistics for the three null hypotheses
are quotients of MSA, MSB, and MSAB with MSE.
44Test Statistics
- Under H0 ?1 ?2 ?I 0, the statistic MSA /
MSE has an FI - 1, IJ(K - 1) distribution. - Under H0 ?1 ?2 ?I 0, the statistic MSB /
MSE has an FJ - 1, IJ(K - 1) distribution. - Under H0 ?11 ?12 ?IJ 0, the statistic
MSAB / MSE has an F(I -1)(J 1), IJ(K - 1)
distribution.
45Output
- Insert output on p.664
- The labels DF, SS, MS, F, and P refer to degrees
of freedom, sum of squares, mean square, F
statistic, and P-value, respectively. - The MSE is an estimate of the error variance.
46Comments
- In a two-way analysis of variance, if the
additive model is not rejected, then the
hypothesis tests for the main effects can be used
to determine whether the row and column factors
affect the outcome. - In a two-way analysis of variance, if the
additive model is rejected, then the hypothesis
tests for the main effects should not be used.
Instead, the cell means must be examined to
determine how various combinations of row and
column levels affect the outcome. - When there are two factors, a two factor design
must be used. - Examining one factor at a time cannot reveal
interactions between the factors.
47Interaction Plots
- Interaction plots can help to visualize
interactions. - Insert Figure 9.8
- In this figure, the lines are nowhere near
parallel, indicating that there is substantial
interaction between factors.
48Tukeys Method for Simultaneous Confidence
Intervals
- Let I be the number of levels of the row factor,
J be the number of levels of the column factor,
and K be the sample size for each treatment.
Then, if the additive model is plausible, the
Tukey 100(1 - ?) simultaneous confidence
intervals for all differences ?i - ?j (or all
differences ?i - ?j) are - We are 100(1 - ?) confident that the Tukey
confidence intervals contain the true value of
the difference ?i - ?j (or ?i - ?j) for every i
and j.
49Tukeys Method for Simultaneous Hypothesis Tests
- For every pair of levels i and j for which
- the null hypothesis H0 ?i - ?j 0, is rejected
at level ?. - For every pair of levels i and j for which
- the null hypothesis H0 ?i - ? j 0, is
rejected at level ?.
50Section 9.4 Randomized Complete Block Designs
- In some experiments, there are factors that vary
and may have an effect on the response, but whose
effects are not of interest to the experimenter. - For example, imagine that there are three
fertilizers to be evaluated for their effect on
yield of fruit in an orange grove, and that three
replicates will be performed, for a total of nine
observations. - An area is divided into nine plots, in three rows
with three plots each. - Now assume there is a water gradient along the
plot area, so that the rows receive differing
amounts of water. - The amount of water is now a factor in the
experiment, even though there is no interest in
estimating the effect of water amount on the
yield of oranges.
51Continued
- If the water factor is ignored, then a one-factor
experiment could be carried out with fertilizer
as the only factor. - If the amount of water in fact has a negligible
effect on the response, then the completely
randomized one-factor design is appropriate. - Different arrangement of the treatments bias the
estimates in different directions. - If the experiment is repeated several times, the
estimates are likely to vary greatly from
repetition to repetition. - For this reason, the estimates from the
randomized one-factor produces estimated effects
that have large uncertainties. - A better design is the a two-factor design with
water as the second factor. Since the effects of
water are not of interest, water is called a
blocking factor.
52More Information
- In this design, we have treatments randomized
within blocks. - Since every possible combination of treatments
and blocks is included in the experiment, the
design is complete. - So, the design is called a randomized complete
block design. - Randomized complete block designs can be
constructed with several treatment factors and
several blocking factors. - The only effects of interest are the main effects
of the treatment factor. - In order to interpret these main effects, there
must be no interaction between treatment and
blocking factors. - It is possible to construct confidence intervals
and test hypotheses as before but the only ones
of interest are the ones that test the treatment
effect.
53Section 9.5 2p Factorial Experiments
- When an experimenter wants to study several
factors simultaneously, the number of different
treatments can become quite large. - In these cases, preliminary experiments are often
performed in which each factor has only two
levels. - One level is designated as the high level and
the other as the low level. - If there are p factors, there are then 2p
different treatments. - Such experiments are called 2p factorial
experiments.
5423 Factorial Experiments
- Here there are three factors and 8 treatments.
- The main effect of a factor is defined to be the
difference between the mean response when the
factor is at its high level and the mean response
when the factor is at its low level. - There are three main effects denoted by A, B, and
C. - There are three two way interactions AB, AC, and
BC. - There is one three way interaction ABC.
- The treatments are denoted with lowercase
letters, with a letter indicating that a factor
is at its highest level, ab denotes the treatment
in which the first two factors are at their high
level and the third is at its low level. - 1 is used to denote the treatment in which all
factors are at their low levels.
55Estimating Effects
- A sign table is used to estimate main effects and
interactions. - For the main effects, A, B, and C, the sign is
for treatments in which the factor is at its high
level, and for treatments at its low level. - For the interactions, the signs are computed by
taking the product of the signs in the
corresponding main effects column. - For example, the estimated mean response for A at
high level -
56Contrasts
- The estimate of the main effect of A is the
difference in the estimated mean response between
its high and low levels. So, the A effect
estimate is - The quantity in the parentheses is called the
contrast for factor A. - The contrast for any main effect or interaction
is obtained by adding and subtracting the cell
means, using the signs in the appropriate column
of the sign table. For a 23 factorial
experiment, Effect estimate contrast / 4.
57Output
- Looking at the output on p.695, we see
- The column Effect gives the effect for each of
the main effects and interaction. - A t-test is performed with the t-statistics given
in the column T. - The P-values for these tests are given in column
P. From this, we can determine which main
effects and interactions are important. - The ANOVA table, the second row is the test for
whether or not the main effects are all equal to
zero. - The third row is the test for whether or not the
two-way interactions are all equal to zero. - The fourth row is the test for whether or not the
three-way interaction is equal to zero.
58Using Probability Plots to Detect Large Effects
- An informal method has been suggested to help
determine which effects are large. - The method is to plot the effect and interaction
estimates on a normal probability plot. - If in fact none of the factors affect the
outcome, then the effect and interaction
estimates form a simple random sample from a
normal population and should lie approximately on
a straight line. - The main effects and interactions whose estimates
plot far from the line are the ones most likely
to be important.
59Summary
- We have discussed
- One-Factor experiments
- Pairwise comparisons
- Multiple comparisons
- Two-Factor experiments
- Randomized complete block designs
- 2p factorial experiment