Title: SW388R6
1Two-way Analysis of Variance
2Problem 1
- Based on the dataset GSS2000.SAV, is the
following statement true, false, or an incorrect
application of a statistic? Use 0.05 as the level
of significance. Base your answer on the output
for two-way analysis of variance of "NUMBER OF
HOURS WORKED LAST WEEK" with the factors
"RESPONDENTS SEX" and "SUBJECTIVE CLASS
IDENTIFICATION." - The effect of sex on number of hours worked in
the past week was not the same for all categories
of subjective class identification. For survey
respondents who were female, those who said they
belonged in the middle class worked fewer hours
in the past week than those who said they
belonged in the upper class. For survey
respondents who were male, those who said they
belonged in the middle class worked longer hours
in the past week than those who said they
belonged in the upper class. - 1. True
- 2. True with caution
- 3. False
- 4. Incorrect application of a statistic
3Request the statistics to evaluate normality
The two-way analysis of variance assumes that the
dependent variable is normally distributed. To
evaluate this assumption, we need to compute the
skewness and kurtosis of the distribution. The
two-way analysis of variance will be computed
using cases that do not have missing data for
both of the independent variables and the
dependent variable. We want our evaluation of
the assumption of normality to be based on the
same cases, so we need to use an SPSS procedure
that allows us to control for missing data in all
variables simultaneously. We will use the
"Means" procedure instead of the "Descriptives"
procedure.
4Select dependent and first independent variables
First, move the variable hrs1" to the list of
dependent variables.
Third, click on the Next button add another
layer, or dimension, to the problem. We will
put the second independent variable in this
layer, so that SPSS computes the skewness and
kurtosis for the independent variables
simultaneously, rather than sequentially in
separate tables.
Second, move the first variable variable sex" to
the list of independent variables.
5Select the second independent variable
First, move the variable class" to the second
layer of the list of independent variables.
Second, click on the Options button to select
the statistics we want to compute.
6Request the skewness and kurtosis
First, in addition to the default Cell Statistics
of Mean, Number of Cases, and Standard Deviation,
add Kurtosis and Skewness to the list.
Second. click on the continue button to close the
Options dialog box.
7Complete the request to evaluate normality
Click on the OK button to complete the request.
8Statistical output to evaluate normality
The two-way analysis of variance requires that
NUMBER OF HOURS WORKED LAST WEEK be normally
distributed. The skewness of NUMBER OF HOURS
WORKED LAST WEEK for the sample (-0.324) is
within the range for normality (-1.0 to 1.0).
The kurtosis of NUMBER OF HOURS WORKED LAST WEEK
for the sample (0.935) is within the range for
normality (-1.0 to 1.0). The assumption of
normality required by the two-way analysis of
variance is satisfied.
Note we cannot satisfy the assumption of
normality for the analysis of variance with the
Central Limit Theorem, which applies to a
sampling distribution based on the normal
distribution. The two-way analysis of variance
used the F distribution to determine the
probability of the test statistic.
9Request the two-way analysis of variance
To compute a two-way analysis of variance, select
the General Linear Model Univariate command
from the Analyze menu. Univariate indicates
that we will analyze a single dependent variable.
10Select the variables for the two-way ANOVA
First, move the dependent variable hrs1" to the
Dependent Variable text box.
Second, move the independent variables sex" and
class to the Fixed Factor(s) text box. When
the values of the independent variables include
all of the possible categories, they are fixed
factors. If the values of the independent
variable represent a random sample of the
possible categories, they are random factors.
We will generally be working with fixed factors.
Third, click on Plots button to request a
profile plot which helps us evaluate the presence
of an interaction.
11Set the specifications for the profile plot
First, move the variable for the first factor,
sex" to the Horizontal Axis text box. The
categories of this variable will be plotted on
the horizontal axis.
Second, move the variable for the second factor,
class, to the Separate Lines text box. The
mean number of hours worked for the categories of
class will be plotted on different colored lines
in the chart.
Third, click on Add button to add this plot to
the list of plots. We can create more than one
plot in our output if we think that will be
helpful to the analysis.
12Complete the specifications for the profile plot
First, when we clicked on the Add button in the
previous slide, the plot for sex by class was
added to the list of plots that SPSS will produce
for this analysis.
Second, we click on the Continue button to
complete the specifications for plots.
13Specify the Pos Hoc Tests
First, click on the Post Hoc button to set the
specifications for the post hoc tests. If
there is no interaction between the factors, the
analysis becomes equivalent to two one-way
analyses of variance, for which the post hoc
tests are useful for interpretation.
14Select the Post Hoc Test
First, move both factor variables to the list of
variables to do post hoc tests for. A Post Hoc
test cannot be done for an interaction, only for
individual factors.
Third, click on the Continue button to complete
signal completion of the dialog box.
Second, mark the checkbox for the Tukey HSD test
as our preferred measure of differences.
15Select the Optional Output
In addition to the assumption of normality, the
two-way analysis of variance assumes that the
population variances are equal. Click on the
Options button to add the test of homogeneity of
variances to the analysis.
16Select all means and homogeneity tests
First, request the means for hours worked for all
combinations of the factor variables, to aid in
the interpretation of our results.
Second, mark the Homogeneity tests check box to
request a Levene Test for Homogeneity of
Variance.
Third, click on the Continue button to complete
the Options dialog box.
17Complete the request for a two-way ANOVA
Click on the OK button to complete the request
for the two-way ANOVA.
18The homogeneity of variance test
The two-way analysis of variance assumes that the
variances of the groups are equal in the
population. This assumption is evaluated with
Levene's Test for Equality of Variances. The null
hypothesis for this test states that the
variances of all groups are equal. The desired
outcome for this test is to fail to reject the
null hypothesis. Since the probability
associated with the Levene test (0.863) is
greater than the level of significance (0.05),
the null hypothesis is not rejected. The
requirement for equal variances is satisfied.
19Hypotheses tested in a two-way ANOVA
- The two-way analysis of variance tests three
research and null hypotheses. The research
hypotheses are - the average number of hours worked in the past
week is different for one or more categories of
the first factor "RESPONDENTS SEX" - the average number of hours worked in the past
week is different for one or more categories of
the second factor "SUBJECTIVE CLASS
IDENTIFICATION" - the average number of hours worked in the past
week is different for interacting combinations of
the two factors. - The corresponding null hypotheses are
- the average number of hours worked in the past
week is the same for all categories of the first
factor "RESPONDENTS SEX" - the average number of hours worked in the past
week is the same for all categories of the second
factor "SUBJECTIVE CLASS IDENTIFICATION" - the average number of hours worked in the past
week is not affected by interacting combinations
of the two factors.
20The ANOVA table output for hypothesis 1
The two-way analysis of variance tests three
research and null hypotheses. The first research
hypotheses is the average number of hours worked
in the past week is different for one or more
categories of the first factor "RESPONDENTS
SEX." The corresponding null hypothesis is the
average number of hours worked in the past week
is the same for all categories of the first
factor "RESPONDENTS SEX." This is the output we
use for testing the first hypothesis.
21The ANOVA table output for hypothesis 2
The two-way analysis of variance tests three
research and null hypotheses. The second
research hypotheses is the average number of
hours worked in the past week is different for
one or more categories of the second factor
"SUBJECTIVE CLASS IDENTIFICATION." The
corresponding null hypothesis is the average
number of hours worked in the past week is the
same for all categories of the second factor
"SUBJECTIVE CLASS IDENTIFICATION." This is the
output we use for testing the second hypothesis.
22The ANOVA table output for hypothesis 3
The two-way analysis of variance tests three
research and null hypotheses. The third research
hypotheses is the average number of hours worked
in the past week is different for interacting
combinations of the two factors. The
corresponding null hypothesis is the average
number of hours worked in the past week is not
affected by interacting combinations of the two
factors. This is the output we use for testing
the third hypothesis.
23The ANOVA table
The research question in this problem is a test
of the third research hypothesis that there is an
interaction between "RESPONDENTS SEX" and
"SUBJECTIVE CLASS IDENTIFICATION."
In the ANOVA table, the probability of the
interaction, SEX CLASS, is 0.035, which is less
than or equal to the level of significance of
0.05. We reject the null hypothesis and
conclude that the analysis supports the research
hypothesis that the average number of hours
worked in the past week is different for
interacting combinations of the factors
"RESPONDENTS SEX" and "SUBJECTIVE CLASS
IDENTIFICATION."
24The Profile Plot
If the lines in a profile plot are parallel,
there is no interaction. When the lines
intersect and slope in different direction, it is
a visual indication that there is an
interaction. In this example, we will focus on
the magenta line for the upper class and the blue
line for the middle class. The patterns of mean
hours worked by class differs for males and
females.
For female survey respondents, those who were in
the middle class worked fewer hours in the past
week than female survey respondents who said they
belonged in the upper class. For male survey
respondents who were in the middle class worked
more hours in the past week than male survey
respondents who said they belonged in the upper
class.
25Table of means showing opposite changes
The mean number of hours worked in the past week
for survey respondents who were male and who said
they belonged in the middle class (44.125) was
higher than the mean number of hours worked in
the past week for survey respondents who were
male and who said they belonged in the upper
class (20.000).
The mean number of hours worked in the past week
for survey respondents who were female and who
said they belonged in the middle class (36.775)
was lower than the mean number of hours worked in
the past week for survey respondents who were
female and who said they belonged in the upper
class (46.600).
26The answer to the question
The original question stated that the effect of
sex on number of hours worked in the past week
was not the same for all categories of subjective
class identification. For survey respondents who
were female, those who said they belonged in the
middle class worked fewer hours in the past week
than those who said they belonged in the upper
class. For survey respondents who were male,
those who said they belonged in the middle class
worked longer hours in the past week than those
who said they belonged in the upper class. The
answer to the question is true.
27Problem 2
- Based on the dataset GSS2000.SAV, is the
following statement true, false, or an incorrect
application of a statistic? Use 0.05 as the level
of significance. Base your answer on the output
for two-way analysis of variance of "NUMBER OF
HOURS WORKED LAST WEEK" with the factors "IS LIFE
EXCITING OR DULL" and "DOES R OR SPOUSE BELONG TO
UNION". - For the population represented by this sample,
there are differences in average number of hours
worked in the past week among groups defined by
the variable "IS LIFE EXCITING OR DULL".
Specifically, survey respondents who said that
they generally find life exciting worked longer
hours in the past week than survey respondents
who said that they generally find life pretty
routine. - 1. True
- 2. True with caution
- 3. False
- 4. Incorrect application of a statistic
28Solution 2 - output to evaluate normality
The two-way analysis of variance requires that
NUMBER OF HOURS WORKED LAST WEEK be normally
distributed. The skewness of NUMBER OF HOURS
WORKED LAST WEEK for the sample (0.756) is within
the range for normality (-1.0 to 1.0). The
kurtosis of NUMBER OF HOURS WORKED LAST WEEK for
the sample (-0.266) is within the range for
normality (-1.0 to 1.0). The assumption of
normality required by the two-way analysis of
variance is satisfied.
29Solution 2 - the homogeneity of variance test
The two-way analysis of variance assumes that the
variances of the groups are equal in the
population. This assumption is evaluated with
Levene's Test for Equality of Variances. The null
hypothesis for this test states that the
variances of all groups are equal. The desired
outcome for this test is to fail to reject the
null hypothesis. Since the probability
associated with the Levene test (0.852) is
greater than the level of significance (0.05),
the null hypothesis is not rejected. The
requirement for equal variances is satisfied.
30Solution 2 - the ANOVA table
The research question in this problem is a test
of the first research hypothesis that there is a
main effect for the first factor "IS LIFE
EXCITING OR DULL."
However, in the ANOVA table, we see that the
probability of the interaction, LIFE UNION, is
0.019, which is less than or equal to the level
of significance of 0.05. It is not appropriate
to do a test of a main effect when the
interaction is statistically significant because
the interaction alters the interpretation of the
main effects. The answer to the question is an
incorrect application of a statistic.
31Problem 3
- Based on the dataset GSS2000.SAV, is the
following statement true, false, or an incorrect
application of a statistic? Use 0.05 as the level
of significance. Base your answer on the output
for two-way analysis of variance of "NUMBER OF
HOURS WORKED LAST WEEK" with the factors
"SUBJECTIVE CLASS IDENTIFICATION" and "R USE
COMPUTER." - For the population represented by this sample,
there are differences in average number of hours
worked in the past week among groups defined by
the variable "R USE COMPUTER". Specifically,
survey respondents who said they used a computer
worked fewer hours in the past week than survey
respondents who said they didn't use a computer. - 1. True
- 2. True with caution
- 3. False
- 4. Incorrect application of a statistic
32Solution 3 - output to evaluate normality
The two-way analysis of variance requires that
NUMBER OF HOURS WORKED LAST WEEK be normally
distributed. The skewness of NUMBER OF HOURS
WORKED LAST WEEK for the sample (-0.127) is
within the range for normality (-1.0 to 1.0).
The kurtosis of NUMBER OF HOURS WORKED LAST WEEK
for the sample (1.173) is outside the range for
normality (-1.0 to 1.0).
This condition violates the assumption of
normality required by the two-way analysis of
variance. A note of caution should be added to
any findings based on this analysis.
33Solution 3 - the homogeneity of variance test
The two-way analysis of variance assumes that the
variances of the groups are equal in the
population. This assumption is evaluated with
Levene's Test for Equality of Variances. The null
hypothesis for this test states that the
variances of all groups are equal. The desired
outcome for this test is to fail to reject the
null hypothesis. Since the probability
associated with the Levene test (0.775) is
greater than the level of significance (0.05),
the null hypothesis is not rejected. The
requirement for equal variances is satisfied.
34Solution 3 - the ANOVA table
The research question in this problem is a test
of the second research hypothesis that there is a
main effect for the second factor "R USE
COMPUTER."
The probability of the F statistic is 0.019,
which is less than or equal to the level of
significance of 0.05. We reject the null
hypothesis and conclude that the analysis
supports the research hypothesis that the average
number of hours worked in the past week is
different for one or more categories of the
factor "R USE COMPUTER".
35Solution 3 post hoc tests not available
Since there are only two groups for number of
hours worked in the past week, the Tukey HSD post
hoc test is not computed by SPSS. The tests are
not necessary because there are only two groups
in the effect found to be significant. These two
groups must be the ones that have different
means, and the probability for finding a
difference this large must be equal to the
probability for the main effect.
36Solution 3 - table of means
The mean number of hours worked in the past week
for survey respondents who said they used a
computer (44.847) is significantly higher than
the mean for survey respondents who said they
didn't use a computer (34.139). The probability
for the difference in means (10.708) is 0.019,
the same as the probability for the main effect.
37Solution 3 - the answer to the question
The original question was for the population
represented by this sample, there are differences
in average number of hours worked in the past
week among groups defined by the variable "R USE
COMPUTER". Specifically, survey respondents who
said they used a computer worked longer hours in
the past week than survey respondents who said
they didn't use a computer. The answer to the
question is true with caution, due to the
violation of normality.
38Problem 4
- Based on the dataset GSS2000.SAV, is the
following statement true, false, or an incorrect
application of a statistic? Use 0.01 as the level
of significance. Base your answer on the output
for two-way analysis of variance of "NUMBER OF
CHILDREN" with the factors "RESPONDENTS SEX" and
"R USE COMPUTER." - The effect of sex on number of children was not
the same for all categories of computer use. For
survey respondents who were female, those who
said they used a computer had fewer children than
those who said they didn't use a computer. For
survey respondents who were male, those who said
they used a computer had more children than those
who said they didn't use a computer. - 1. True
- 2. True with caution
- 3. False
- 4. Incorrect application of a statistic
39Solution 4 - output to evaluate normality
The two-way analysis of variance requires that
NUMBER OF CHILDREN be normally distributed. The
skewness of NUMBER OF CHILDREN for the sample
(0.707) is within the range for normality (-1.0
to 1.0). The kurtosis of NUMBER OF CHILDREN for
the sample (0.268) is within the range for
normality (-1.0 to 1.0). The assumption of
normality required by the two-way analysis of
variance is satisfied.
Note we cannot satisfy the assumption of
normality for the analysis of variance with the
Central Limit Theorem, which applies to a
sampling distribution based on the normal
distribution. The two-way analysis of variance
used the F distribution to determine the
probability of the test statistic.
40Solution 4 - the homogeneity of variance test
The two-way analysis of variance assumes that the
variances of the groups are equal in the
population. This assumption is evaluated with
Levene's Test for Equality of Variances. The null
hypothesis for this test states that the
variances of all groups are equal. The desired
outcome for this test is to fail to reject the
null hypothesis. Since the probability
associated with the Levene test (0.312) is
greater than the level of significance (0.01),
the null hypothesis is not rejected. The
requirement for equal variances is satisfied.
41Solution 4 - the ANOVA table
The research question in this problem is a test
of the third research hypothesis that there is an
interaction between "RESPONDENTS SEX" and "R USE
COMPUTER."
In the ANOVA table, the probability of the
interaction, SEX COMPUSE, is 0.678, which is
greater than the level of significance of 0.01.
We fail to reject the null hypothesis and
conclude that the analysis does not support the
research hypothesis that the average number of
children is different for interacting
combinations of the factors "RESPONDENTS SEX" and
"R USE COMPUTER." The answer to the question is
false.
42Solution 4 - the Profile Plot
When there is no interaction, the lines in the
profile plot are very close to being parallel to
one another.
43Solving two-way analysis of variance problems - 1
The following is a guide to the decision process
for answering two-way analysis of variance
homework problems
- Is the level of measurement satisfied?
- Dependent variable interval level
- Independent variables any level - two or more
groups
Incorrect application of a statistic
- Is the assumption of normality satisfied?
- Skewness, kurtosis of dependent variable 1.0 to
1.0
Add caution if the question turns out to be true
44Solving two-way analysis of variance problems 2
Does the Levene test of equality of population
variances indicate that the groups variances are
equal?
No
Add caution if the question turns out to be true
Yes
Does the problem question tests a hypothesis
about an interaction?
Does the problem question tests a hypothesis
about a main effect?
No
Yes
Yes
Is the probability of the F for the interaction
less than or equal to the level of significance?
Is the probability of the F less than or equal
to the level of significance for the interaction?
Yes
No
Incorrect application of a statistic
False
No
Yes
45Solving two-way analysis of variance problems 3
Does the profile plot and table of means support
the proposed relationship?
Is the probability of the F for the main effect
less than or equal to the level of significance?
No
No
False
False
Yes
Yes
Is the probability of the Tukey HSD Post Hoc Test
statistic less than the level of significance?
True
No
False
Yes
True