Title: SLIDES PREPARED
1STATISTICS for the Utterly Confused, 2nd ed.
- SLIDES PREPARED
- By
- Lloyd R. Jaisingh Ph.D.
- Morehead State University
- Morehead KY
2Chapter 15
- One-Way Analysis of Variance
3Outline
- Do I Need to Read This Chapter? You should read
the Chapter if you would like to learn about -
- 15-1 Comparing Population Means
Graphically. - 15-2 Some Terminology Associated with
Analysis of Variance (ANOVA) - 15-3 The F Distribution.
-
4Outline
- Do I Need to Read This Chapter? You should read
the Chapter if you would like to learn about - 15-4 One-way or Single Factor ANOVA F
Tests - 15-5 Technology integration for one-way
ANOVA
5Objectives
- To graphically compare more than two population
means. - To introduce some terminology associated with
Analysis of Variance (ANOVA).
6Objectives
- To introduce one-way or single factor ANOVA F
tests. - To introduce technology integration for one-way
ANOVA.
715-1 Comparing Population Means Graphically
- The objective in comparing several population
means is to determine whether there is a
statistical significant difference between them. - So when random samples are obtained from these
populations, the respective sample means can be
computed to help determine whether there is a
significant difference between the population
means.
815-1 Comparing Population Means Graphically
- If the sample means are very different then it is
likely that the true or population means will be
different. - We need to determine whether the differences are
due to random variation in the sample data or if
there really are differences between the
population means.
915-1 Comparing Population Means Graphically
- One simple way of looking at differences of
population means is to display the data through
box plots. - Example A random sample of students on a
college campus was asked to count the number of
pennies, nickels, dimes and quarters they had on
their person. The summary information is shown
on the next slide.
1015-1 Comparing Population Means Graphically
Note We can consider each of the four data sets
as samples from their respective Populations.
1115-1 Comparing Population Means Graphically
- Example (continued) Compute the sample means and
display the data using box plots. -
- Solution The sample means for the pennies,
nickels, dimes and quarters are respectively
10.36, 4.444, 3.714, and 3.25. -
- Observe that the average number of pennies seem
to be an outlying value relative to the values of
the other means.
1215-1 Comparing Population Means Graphically
- Solution (continued) The box plots can give some
insight as to whether these differences are
significant. Observe that the box in the box
plot for the number of pennies does not overlap
with the boxes in the plots for the number of
nickels, dimes and quarters.
1315-1 Comparing Population Means Graphically
1415-1 Comparing Population Means Graphically
- Solution (continued) Another way to observe this
difference is to compute the one sample
confidence intervals for the means. - The confidence intervals for the number of coins
are pennies - (7.5091, 13.2182) nickels -
(2.95042, 5.93847) dimes - (1.40437, 6.02420)
quarters - (1.85464, 4.64536). - From the confidence intervals, we see that the
interval for the number of pennies does not
overlap with the others.
1515-1 Comparing Population Means Graphically
From the confidence intervals, we see that the
interval for the number of pennies does not
overlap with the others.
1615-1 Comparing Population Means Graphically
- Solution (continued) Both of these observations
would indicate that the population average for
the number of pennies carried by the students is
significantly different from the rest of the
denomination of coins. - Thus, it is unlikely that the difference in the
sample averages is due to sample variation. - That is, we can say that the variability between
the sample averages is large when compared to the
variability within the samples.
1715-1 Comparing Population Means Graphically
- Solution (continued) Based on the above
discussions, we can safely say that there is no
significant difference between the averages for
the populations of the number of nickels, dimes
and quarters. - This can be further reinforced by observing that
both the box plots and the confidence intervals
overlap for these variables.
1815-1 Comparing Population Means Graphically
- Example The figure on the next slide shows
random samples obtained from three different
normal distributions. Discuss whether you think
that the means of these populations are
significantly different based on the samples. -
1915-1 Comparing population Means Graphically
Samples from normal populations with almost
equal means
2015-1 Comparing Population Means Graphically
- Solution Based on the display, we would expect
the sample means to be nearly equal. - We would also expect the variation among the
sample means (between sample) to be small,
relative to the variation found around the
individual sample means (within sample). - Thus one may infer that there would not be a
significant difference between the population
means.
2115-1 Comparing Population Means Graphically
- Example The figure on the next slide shows
random samples obtained from three different
normal distributions. Discuss whether you think
that the means of these populations are
significantly different based on the samples.
2215-1 Comparing population Means Graphically
Samples from normal populations with
significantly different means.
2315-1 Comparing Population Means Graphically
- Solution Based on the display, we would expect
the sample means to be significantly different. - We would also expect the variation among the
sample means (between sample) to be large,
relative to the variation found around the
individual sample means (within sample). - Thus one may infer that there would be a
significant difference between the population
means.
2415-1 Comparing Population Means Graphically
- These three examples provide us with a sense of
whether or not there is a significant difference
among the population means. - However, they cannot help us evaluate how likely
it is that any observed difference is due to
sampling variation or variations in the sample
data.
2515-1 Comparing Population Means Graphically
- In this chapter, we will present procedures which
will help us determine how likely it is that the
observed differences among the sample means are
due to sampling error. Such procedures are
called ANalysis Of VAriance (ANOVA).
2615-2 Some Terminology Associated with ANOVA
- Explanation of the term - ANOVA ANOVA is a
statistical method for determining the
existence of the differences among several
population means. - Explanation of the term - experiment The term
experiment in ANOVA is a statement of the problem
to be solved.
2715-2 Some Terminology Associated with ANOVA
- Example A researcher would like to determine
whether there is a difference in the average
mileages for three different brands of gasoline.
What is the experiment in this case? - Solution The problem to be solved in this
example is to determine whether there is
difference in the average mileage for the three
different brands of gasoline. Hence, this is the
experiment.
2815-2 Some Terminology Associated with ANOVA
- Explanation of the term experimental units
Individuals or objects on which the experiment is
performed are called experimental units. - Explanation of the term response variable A
response variable in an experiment is a
characteristic of an experimental unit on which
information is to be obtained.
2915-2 Some Terminology Associated with ANOVA
- Example Suppose a researcher is interested in
determining the effectiveness of four teaching
methods for a given course. In such an
experiment, the researcher would be interested in
the final averages for each student in the course
for the different methods of teaching. - Here we refer to the students as the experimental
units and the final averages as the values for
the response variable.
3015-2 Some Terminology Associated with ANOVA
- Note The response variable may be qualitative,
such as whether or not you suffer from migraine
headaches, or quantitative, such as the time it
takes for your migraine to subside from a certain
pain level. - Many experimental variables which we can control
are called independent variables or factors.
Values of the factor are called levels of the
factor. - Note Factors may be qualitative or quantitative.
3115-2 Some Terminology Associated with ANOVA
- Explanation of the term qualitative factor A
qualitative factor is a factor that have levels
that may vary by category rather than by
numerical values. - Explanation of the term quantitative factor A
quantitative factor is a factor that have levels
that may be counts or measurements.
3215-2 Some Terminology Associated with ANOVA
- Note Sometimes the word treatment is used
interchangeably with the term level or maybe
combined as treatment level.
3315-2 Some Terminology Associated with ANOVA
- Explanation of the term treatment (level) of a
factor An experimental condition which is
applied to the experimental units is called a
treatment (level) of the factor.
3415-2 Some Terminology Associated with ANOVA
- Note The term treatment can also refer to the
populations which are being analyzed. For
example, if we are comparing the average income
for four different counties in a particular
state, we may refer to the four populations
(counties) as four treatments.
3515-2 Some Terminology Associated with ANOVA
- Example A farmer would like to determine
whether there is a difference in the average
yield per acre for his corn crop for equal
amounts of five different fertilizers. In this
experiment, assume that there were equal amounts
of corn plants per acre. Identify the factor,
treatment levels, experimental units, and
response variable of the experiment.
3615-2 Some Terminology Associated with ANOVA
- Solution Factor ? fertilizer.
- Treatment levels ? the five different
fertilizers. - Experimental units ? corn plants.
- Response variable ? yield per acre.
- Note So far, all the examples deal with a single
factor. In this text, we will restrict our
discussions to only one factor analysis. That
is, we will only discuss one-factor or one-way
ANOVA.
3715-2 Some Terminology Associated with ANOVA
- Explanation of the term one-factor or one-way
ANOVA A one-factor or one-way ANOVA deals with
experiments which involve a single factor with
different levels. These levels could be
quantitative or qualitative.
3815-3 The Hypothesis Test of One-Way ANOVA
- Suppose we have a single factor experiment in
which there are r levels. - Thus we will be sampling from r populations or
treatments. - We will select an independent random sample from
each of these r populations. - Let the size of the sample from population i for
i 1, 2, 3, , r, be equal to ni and the total
sample size is n n1 n2 n3 nr.
3915-3 The Hypothesis Test of One-Way ANOVA
- The figure on the next slide shows the r
populations from which the independent samples
are selected. - Observe that each population has its own mean and
the respective sample means are computed for the
samples. - Also, indicated in the figure are the respective
sample sizes.
4015-3 The Hypothesis Test of One-Way ANOVA
4115-3 The Hypotheses Associates with the
One-Way ANOVA
42QUICK TIPS
- When using the ANOVA technique to test for
equality of population means, we usually would
want r gt 2. - If r 2, we can use the simpler two-sample t
tests. - The null hypothesis is called a joint hypothesis
about the equality of several population means
(parameters).
43QUICK TIPS
- It would not be efficient to compare two
population means at a time to achieve what the
ANOVA test will achieve. - If we test two population means at a time to
achieve what ANOVA will achieve, we will not be
sure of the combined probability of a Type I
error for all the tests. - By using the ANOVA technique to compare several
population means at the same time, we will have
control of the probability of a Type I error.
44 Assumptions for the One-Way ANOVA
- The required assumptions of a one-way ANOVA are
- The random samples from the r populations are
independent. - The r random samples are assumed to be selected
from normal populations whose means may or may
not be equal, but the populations have equal
variances ?2.
45 Assumptions for the One-Way ANOVA
Three Normally Distributed populations with
Different Means but with Equal Variance
46Validating the Assumptions for the One-Way
ANOVA
- The assumptions for ANOVA should be validated
before any inference is made on the population
means. - If these assumptions are not met then the
inference on the population means may not be
reliable. - The assumptions are necessary in order for the
test statistic used in the analysis to follow a
certain probability distribution (discussed in
the next section).
47Validating the Assumptions for the One-Way
ANOVA
- If the populations are not exactly normally
distributed but are approximately normally
distributed, then the ANOVA procedure will still
produce reliable results. - If the distributions are highly skewed or very
different from a normal distribution, or the
population variances are not equal or
approximately equal, then ANOVA will not produce
reliable results. - In such cases, other tests, such as equivalent
nonparametric tests, should be employed.
48 Assumptions for the One-Way ANOVA
- Two simple graphical techniques can be used to
establish the assumptions for a one-way ANOVA. - We can use the histogram with summary statistics
to help establish the normality assumption, and
we can use box plots to help establish the equal
variance assumption.
49EXAMPLE
- Example Equal dosages of three drugs were used
to ease a certain level of headache. Drug 1 was
administered to ten patients and Drug 2 and 3
were administered to nine patients each. The
time, in minutes, for complete relief of the
headache for the drugs are given on the next
slide.
50EXAMPLE (continued)
Before we use ANOVA to determine whether the
average relieve time for the three drugs are the
same, the validity of the ANOVA assumptions
should be checked. This must be done so that the
inference made on the population means will be
reliable.
51EXAMPLE (continued)
- Before we use ANOVA to determine whether the
average relieve time for the three drugs are the
same, the validity of the ANOVA assumptions
should be checked. - This must be done so that the inference made on
the population means will be reliable.
52EXAMPLE (continued)
- Graphical displays will be used to help check the
validity of the ANOVA assumptions. - From the histograms on the next slide, one can
observe that the histograms for Drug 1, Drug 2,
and Drug 3 can all be approximated by a normal
distribution. - Thus the assumption of normal populations has not
been violated or severely violated.
53EXAMPLE (continued)
54EXAMPLE (continued)
- Next, let us determine whether the equal variance
assumption has been violated. - By looking at the box plots on the next slide,
one can observe that the spread for the data sets
are not the same. - However, the spread are similar enough for one to
infer that it is likely that the observed
difference in spread is due to sample variation.
55EXAMPLE (continued)
- Thus, one may assume that the equal variance
assumption has not been violated. - Here, similar enough means that the range of
values is approximately the same for the data
sets. Also, the ranges for the middle fifty
percent (length of the boxes) for the different
data sets are approximately the same.
56EXAMPLE (continued)
57EXAMPLE (continued)
- Since both the normality and the constant
variance assumptions have not been violated or
severely violated, one can now proceed to test
for equality of the population means using the
analysis of variance procedure.
5815-4 The Test Statistic and the F
Distribution
- The F-distribution will enable us to
statistically compare different (at least three)
population means through the ANOVA procedure. - The F distribution is obtained by taking the
ratio of two chi-square distributions and thus
has a numerator as well as a denominator degrees
of freedom associated with it.
5915-4 The Test Statistic and the F
Distribution
- The numerator degrees of freedom is r -1 and the
denominator degrees of freedom is n r, where r
is the number of populations or treatments and n
is the combined sample size from these r
populations (total data values).
60EXAMPLE (Data on Slide 10)
- Example What are the numerator and denominator
degrees of freedom if a one-way analysis of
variance was run on the data. - Solution From the information given on Slide
10, the number of populations is r 4 and the
combined sample size is n 10 9 7 8 34.
Thus, the numerator degrees of freedom is r 1
4 -1 3, and the denominator degrees of freedom
is n r 34 4 30.
61A NOTE
- The formulas associated with the computations are
complex and it is time consuming to carry out the
calculations by hand. - Thus, the computational technology is
indispensable in most situations involving ANOVA.
- Extensive use of technology will be integrated
into the computations in these notes. We will
assume that the appropriate technology is
available to compute the F test statistic value
for us.
6215-4 The Test Statistic and the F
Distribution
- The one-way ANOVA test statistic is given by
6315-4 The Test Statistic and the
F-Distribution
- We would have to compare this F test statistic
value with a critical F value from a table with r
1 and n r degrees of freedom and a given
level of significance ?.
6415-4 The Test Statistic and the F
Distribution
- The general decision rule to reject the null
hypothesis H0 ?1 ?2 ?3 ?r for a given
significance level ? is given by
6515-4 The Test Statistic and the F
Distribution
- A general critical or rejection region for the F
test is shown below.
Note If we use the P-value approach to
hypothesis testing for the one-way ANOVA, we will
reject H0 if the P-value lt ?.
66EXAMPLE
- Example For the data given on Slide 10,
suppose an F test was conducted at the 5 percent
significance level to determine whether there was
a significant difference between the average
number of pennies, nickels, dimes, and quarters. - What will be the F critical value for the test?
67EXAMPLE (continued)
- Solution From the information given, then r
4, n 34, and ? 0.05. Now, since the
numerator degrees of freedom r -1, then this
value will be 4 1 3. Also, the denominator
degrees of freedom n r, then this value will
be 34 4 30. - From the F table in the Appendix of the test, we
have F3,30,0.05 2.92. - Thus, the F critical value for the test will be
2.92.
68EXAMPLE (continued)
- At this juncture we may also implement an
appropriate form of technology to help with the
solution. - We will apply the MINITAB software to help in
finding the F critical value. - We use the Inverse Cumulative Distribution
Function feature for the F distribution in
MINITAB to determine the F critical value. - The result is shown on the next slide.
69EXAMPLE (continued)
F critical value for the test using MINITAB
70EXAMPLE
- Example For the data given on Slide 50,
suppose an F test was conducted at the 1 percent
significance level to determine whether there was
a significant difference between the average time
for the headache to subside for the different
drugs. - What will be the F critical value for the test?
71EXAMPLE (continued)
- Solution From the information given, then r
3, n 28, and ? 0.01. Now, since the
numerator degrees of freedom r -1, then this
value will be 3 1 2. Also, the denominator
degrees of freedom n r, then this value will
be 28 3 25. - From the F table in the Appendix of the test, we
have F2, 25, 0.01 5.45. - Thus, the F critical value for the test will be
5.45.
72EXAMPLE (continued)
F critical value for the test using MINITAB
7315-5 One-Way or Single Factor ANOVA Tests
- So far in all the previous examples, we had a
single factor with different levels of the
treatments. - In this section, we will present the F test for
these single factor experiments. - We sometimes refer to this single factor F test
as One-Way ANOVA F test.
74Summary of the One-way ANOVA Hypothesis F Test
Using the Classical Approach
75EXAMPLE
- Example Perform a one-way ANOVA F test for the
information given on Slide 10. That is, test
whether there is a significant difference in the
population averages for the number of pennies,
nickels, dimes, and quarters for the student
population at that particular campus. Test at a
significance level of 0.05 and use the classical
approach to hypothesis testing.
76EXAMPLE (Continued)
- Solution As mentioned earlier, because of the
complexity of the formulas for the F test,
appropriate technology will be integrated into
the solution of these problems. - The MINITAB statistical software was used for the
computations and the output is shown on the next
slide.
77EXAMPLE (Continued)
78EXAMPLE (Continued)
- Note From the MINITAB output
- The numerator degrees of freedom is the Factor
degrees of freedom (DF). - The denominator degrees of freedom is the Error
degrees of freedom.
79EXAMPLE (Continued)
- Solution Observe that the F test statistic
value from the output is 12.04. - The numerator degrees of freedom is r -1 4 1
3, and the denominator degrees of freedom is n
r 34 4 30. - Thus, the F critical value obtained from the F
table is F3, 30, 0.05 2.92.
80EXAMPLE (Continued)
81EXAMPLE (Continued) Critical Region
82From the MINITAB OUTPUT
- The source due to Factor is the contribution from
the between samples. - That is, the contribution when comparing the
variability for the four samples means. - This is associated with the numerator in the test
statistic F value.
83From the MINITAB OUTPUT
- The source due to Error is the contribution from
the within samples. - That is, the contribution when comparing the
variability for all the sample data combined. - This is associated with the denominator in the
test statistic F value.
84From the MINITAB OUTPUT
- The first part of the MINITAB output (i.e.
excluding the confidence intervals) is usually
referred to as the One-Way ANOVA Table. - This involves information on the factor (between
information) and the error (within information).
85MULTIPLE COMPARISONS
- Since the null hypothesis was rejected and we
concluded that there is a significant difference
between the population averages, then the
question is which of the means are different from
which ones? - We can use multiple comparisons to answer this
question.
86MULTIPLE COMPARISONS
- One way to determine which population means are
significantly different from which ones, one can
compute the confidence intervals using the sample
information. - The MINITAB output on slide 77 shows plots of
the 95 percent confidence intervals.
87MULTIPLE COMPARISONS
- Observe that the confidence intervals for the
average number of nickels, dimes and quarters all
overlap. - This would indicate that there is not a
significant difference between these averages.
88MULTIPLE COMPARISONS
- On the other hand, the confidence interval for
the average number of pennies, do not overlap
with any of the other confidence interval. - This would indicate that the average number of
pennies is significantly different from the
average number of nickels, dimes and quarters.
89MULTIPLE COMPARISONS
- In particular, since the confidence interval for
the average number of pennies is to the right of
the other intervals, one can conclude that this
population average is significantly greater than
the other population means.
90Generally, when performing a one-way ANOVA, you
should follow this procedure.
91Using the P-value Approach to a One-way ANOVA
Hypothesis Test
Refer to Slide 77 for the P-value.
92TI- 83 SOLUTION
- Use the TI-83 to help with the computations for
the coin example. - Input the values for pennies, nickels, dimes, and
quarters in lists L1, L2, L3, and L4
respectively. - Select the STAT button and choose TESTS. Scroll
down to F ANOVA and press ENTER.
93TI- 83 SOLUTION
- Input the lists L1, L2, L3, L4 and press ENTER.
- The One-way ANOVA computations will be
displayed. - You will need to scroll down to view all of the
output. The output is shown on two screens on
the next slide.
94TI- 83 SOLUTION
Observe that the F test statistic value (12.04,
to two decimal places) is the same as that
produced in the MINITAB output. Also, the
P-value produced by the TI-83 is given as P
2.4089811E-5 ? 0.00002 0 just as in the MINITAB
output.
95Validating the Assumptions for a One-way ANOVA
(Revisited)
- When validating the one-way ANOVA assumptions
previously, we used box plots to help check the
constant variance assumption and we used
histograms to help check the normality
assumptions. - Because of the computer and readily available
statistical software, it is easy to check these
assumptions. - Following are two MINITAB outputs which we can
analyze to help establish these assumptions.
96Normality Assumption
- We can use MINITAB (other technologies as well)
to present a normality plot for the data and
observe the P value for the normality test. The
normality test is used to test - H0 The distribution from which the sample was
drawn is normally distributed -
- H1 The distribution from which the sample was
drawn is not normally distributed
97Normality Assumption
98Normality Assumption
- Observe from the probability plots that all the
P-values are large (P-value gt 0.05) - Hence the null hypothesis of normality for the
sampling populations will not be rejected. - Hence the normality assumption has not been
violated.
99Constant Variance Assumption
- Recall that in the ANOVA MINITAB display, 95
percent confidence intervals for the means were
displayed. - We can similarly use MINITAB (or other
appropriate technologies) to construct confidence
intervals for the standard deviations.
100Constant Variance Assumption
101Constant Variance Assumption
- Observe that the intervals for the standard
deviations overlap and hence one can assume that
the constant variance assumption has not been
violated. - P-values (0.026 and 0.055) for two separate tests
for equal variance are displayed on the output.
102Constant Variance Assumption
- This small P-value of 0.026 is due to the fact
that the point estimate of the standard deviation
for the number of pennies falls just outside the
upper limit of the interval for the number of
quarters. - The Levenes test is not as sensitive as the
Bartletts test which gives a P-value of
0.055 (gt 0.05).
103Constant Variance Assumption
- So, from a statistical significance, using the
P-value for the Bartletts test, one would
conclude that the constant variance assumption
has been violated. - However, since the intervals overlap and the
P-value for the Levenes test is greater than
0.05, one may conclude, from a practical
significance, that the assumption has not been
severely violated.
104Assumption of Independence
- The test for independence is beyond the scope of
this text (see section on Beyond the Text), so we
will assume that the data was collected in an
independent manner.
105FINAL COMMENT
In more advanced approach to One-Way ANOVA, a
model is usually presented and the assumption
tests are done on the errors for the model.
106Beyond the Text
- The mathematical model for a One-Way ANOVA
experiment, may be written as follows - Yij ? ?j ?ij
- where Yij represents the response value for the
ith row and the jth column value ? is an overall
mean value ?j is the treatment effect
contribution to the response value ?ij is the
error in the observed response value.
107Beyond the Text
- In the mathematical model, it is assumed that the
errors ( ?ij ) are independent and normally
distributed with a variance which equals the
constant variance for the populations.
108Beyond the Text
- We will use the MINITAB technology to test for
these assumptions. - Other software may be used as well (e.g. SPSS,
SAS, etc.)
109EXAMPLE
- Example For the data given on slide 50 for the
time to complete relief of the headache for the
three different drugs, perform a one-way ANOVA
for the data and validate the assumptions for the
model.
110EXAMPLE (continued)
The data may be entered into a MINITAB
worksheet as shown on the left. The procedure in
MINITAB is Stat ANOVA One-Way The
resulting dialog box, with appropriate entries,
is shown on the next slide
111EXAMPLE (continued)
When the OK button is selected on the dialog
box, the following results will be displayed
in the session window as shown on the next slide.
112EXAMPLE (continued)
113EXAMPLE (continued)
- Observe that the P-value for the test is 0.045.
- If we test for equality of the means for the time
for the headache to subside for the three
different drugs at the 5 significance level, we
will reject the null hypothesis since 0.045 lt
0.05.
114EXAMPLE (continued)-Hypothesis Test
- H0 ?drug1 ?drug2 ?drug3
- H1 Not all the means are the same.
- T.S. P-value 0.045
- D.R. Reject the null hypothesis if the P-value
0.045 lt the significance level 0.05.
115EXAMPLE (continued)-Hypothesis Test
- Conclusion Since 0.045 lt 0.05, reject the null
hypothesis. That is, the average time for the
headache to subside for the different drugs are
different at the 5 significance level.
116EXAMPLE (continued)-Multiple Comparisons
- Since we rejected the null hypothesis, and
concluded that the means are different, then the
question is which mean is different from which
ones? - The confidence interval plots in Slide 112,
allows us to answer this question. These are
shown on the next slide.
117EXAMPLE (continued)-Multiple Comparisons
- From the plots, one can observe that although the
intervals overlap, the average for drug2 falls
outside the intervals for drugs 1 and 3.
118EXAMPLE (continued)-Multiple Comparisons
- We can infer from the plots that the average time
for the headache to subside for drug 2 is smaller
than that for drugs 1 and 3. - Also, we can infer that the averages for drugs 1
and 3 are not significantly different since they
overlap for almost the entire interval.
119EXAMPLE (continued)-Validating the Assumptions
for the One-Way ANOVA
- Recall, the assumptions are
- The errors (or residuals) are independent of each
other. - The errors (or residuals) are normally
distributed. - The variances are equal (constant variance) for
the sampling distributions.
120EXAMPLE (continued)-Validating the Normality
Assumption
- From the dialog box on slide 111, observe that
the Store residuals (errors) option was checked.
This will allow MINITAB to compute the errors and
store them in the worksheet.
121EXAMPLE (continued)-Validating the Normality
Assumption
- MINITAB allows us to do an Anderson-Darling
goodness-of-fit test for normality. - For this test, H0The distribution of the
residuals is normally distributed against H1 The
distribution of the residuals is not normally
distributed.
122EXAMPLE (continued)-Validating the Normality
Assumption
- The MINITAB procedure for Anderson-Darling
goodness-of-fit test is Stat Basic
Statistics Normality Test. - The dialog box with the appropriate entries are
shown on the next slide.
123EXAMPLE (continued)-Validating the Normality
Assumption
124EXAMPLE (continued)-Validating the Normality
Assumption
125EXAMPLE (continued)-Validating the Normality
Assumption
- The P-value for the normality test is given as
0.938. - Thus the null hypothesis of normality test will
not be rejected. - Thus, one may infer that the normality assumption
has not been violated. - Note The plotted points follow a straight line.
126EXAMPLE (continued)-Validating the Constant
Variance Assumption
- Again, MINITAB allows us to test for the constant
variance assumption. - The procedure follows Stat ANOVA Test for
Equal Variances. - The dialog box with the appropriate entries is
shown on the next slide.
127EXAMPLE (continued)-Validating the Constant
Variance Assumption
128EXAMPLE (continued)-Validating the Constant
Variance Assumption
- Click on the OK button and the test for equal
variances will be displayed along with confidence
interval plots for the population standard
deviations. - Observe that there are two tests Bartletts and
Levenes.
129EXAMPLE (continued)-Validating the Constant
Variance Assumption
- Bartletts test is used when the samples (errors)
are selected from normal distributions. - Levenes testis used when the samples (errors)
are selected from continuous distributions.
130EXAMPLE (continued)-Validating the Constant
Variance Assumption
- Since the errors were established to be normally
distributed, we will use Bartletts test. - H0 The sampling populations have equal variances
against H1 The sampling populations do not have
equal variances.
131EXAMPLE (continued)-Validating the Constant
Variance Assumption
- The results are shown on the next slide.
- The P-value for the Bartletts test is 0.693.
- Hence the null hypothesis will not be rejected
and one can conclude that the constant variance
assumption has not been violated.
132EXAMPLE (continued)-Validating the Constant
Variance Assumption
- Observe that the confidence intervals for the
standard deviations all overlap which supports
the conclusion of the Bartletts test.
133EXAMPLE (continued)-Validating the Constant
Variance Assumption
134EXAMPLE (continued)-Validating the Independence
Assumption
- The independence assumption can be tested using
the 1-lag autocorrelation function (ACF). - An autocorrelation coefficient, which indicates
how the errors (residuals) are correlated with
themselves, is often used to investigate the
independence assumption.
135EXAMPLE (continued)-Validating the Independence
Assumption
- To compute this value , denoted by r1, we
correlate the observed residuals (in time series
order) with the same errors moved one position
from the originals. - Thus, the lag 1 autocorrelation is computed for
the pairs (e1, e2), (e2, e3), (e3, e4), ,
(en-1, en), where ei are the observed errors.
136EXAMPLE (continued)-Validating the Independence
Assumption
- When the errors in the model equation are
normally and independently distributed, the
sampling distribution of the lag 1
autocorrelation coefficient associated with a
sample of size n is approximately normal with
mean 0 and standard deviation 1/?n.
137EXAMPLE (continued)-Validating the Independence
Assumption
- Thus, independence of the errors should be
questioned when the absolute value of r1 is
greater than (1.96)?(1/?n). That is, when
r1 gt (1.96)?(1/?n). - MINITAB can be used to compute r1 for the errors.
138EXAMPLE (continued)-Validating the Independence
Assumption
139EXAMPLE (continued)-Validating the Independence
Assumption
140EXAMPLE (continued)-Validating the Independence
Assumption
- From the previous slide, r1 0.1497.
- (1.96)?(1/?n) (1.96)(1/?28) 0.3704.
- Since r1 0.1497 lt 0.3704, one can assume
that the assumption of independence has not been
violated.
141EXAMPLE (continued)-Validating the Independence
Assumption
- H0 The errors are independent of each other
- H1 The errors are not independent of each
other - T.S. r1 0.1497
- D.R. Reject the null hypothesis if r1 gt
(1.96)?(1/?n) i.e. if 0.1497 gt 0.3704. - Conclusion Do not reject H0 since 0.1497 lt
0.3704 and assume the assumption of independence
of the errors has not been violated.