Title: SW388R7
1Assumptions of multiple regression
- Assumption of normality
- Assumption of linearity
- Assumption of homoscedasticity
- Script for testing assumptions
- Practice problems
2Assumptions of Normality, Linearity, and
Homoscedasticity
- Multiple regression assumes that the variables in
the analysis satisfy the assumptions of
normality, linearity, and homoscedasticity.
(There is also an assumption of independence of
errors but that cannot be evaluated until the
regression is run.) - There are two general strategies for checking
conformity to assumptions pre-analysis and
post-analysis. In pre-analysis, the variables are
checked prior to running the regression. In
post-analysis, the assumptions are evaluated by
looking at the pattern of residuals (errors or
variability) that the regression was unable to
predict accurately. - The text recommends pre-analysis, the strategy we
will follow.
3Assumption of Normality
- The assumption of normality prescribes that the
distribution of cases fit the pattern of a normal
curve. - It is evaluated for all metric variables included
in the analysis, independent variables as well as
the dependent variable. - With multivariate statistics, the assumption is
that the combination of variables follows a
multivariate normal distribution. - Since there is not a direct test for multivariate
normality, we generally test each variable
individually and assume that they are
multivariate normal if they are individually
normal, though this is not necessarily the case.
4Assumption of NormalityEvaluating Normality
- There are both graphical and statistical methods
for evaluating normality. - Graphical methods include the histogram and
normality plot. - Statistical methods include diagnostic hypothesis
tests for normality, and a rule of thumb that
says a variable is reasonably close to normal if
its skewness and kurtosis have values between
1.0 and 1.0. - None of the methods is absolutely definitive.
- We will use the criteria that the skewness and
kurtosis of the distribution both fall between
-1.0 and 1.0.
5Assumption of NormalityHistograms and Normality
Plots
On the left side of the slide is the histogram
and normality plot for a occupational prestige
that could reasonably be characterized as normal.
Time using email, on the right, is not normally
distributed.
6Assumption of NormalityHypothesis test of
normality
The hypothesis test for normality tests the null
hypothesis that the variable is normal, i.e. the
actual distribution of the variable fits the
pattern we would expect if it is normal. If we
fail to reject the null hypothesis, we conclude
that the distribution is normal. The
distribution for both of the variable depicted on
the previous slide are associated with low
significance values that lead to rejecting the
null hypothesis and concluding that neither
occupational prestige nor time using email is
normally distributed.
7Assumption of NormalitySkewness, kurtosis, and
normality
Using the rule of thumb that a rule of thumb that
says a variable is reasonably close to normal if
its skewness and kurtosis have values between
1.0 and 1.0, we would decide that occupational
prestige is normally distributed and time using
email is not. We will use this rule of thumb for
normality in our strategy for solving problems.
8Assumption of NormalityTransformations
- When a variable is not normally distributed, we
can create a transformed variable and test it for
normality. If the transformed variable is
normally distributed, we can substitute it in our
analysis. - Three common transformations are the logarithmic
transformation, the square root transformation,
and the inverse transformation. - All of these change the measuring scale on the
horizontal axis of a histogram to produce a
transformed variable that is mathematically
equivalent to the original variable.
9Assumption of NormalityComputing Transformations
- We will use SPSS scripts as described below to
test assumptions and compute transformations. - For additional details on the mechanics of
computing transformations, see Computing
Transformations
10Assumption of NormalityWhen transformations do
not work
- When none of the transformations induces
normality in a variable, including that variable
in the analysis will reduce our effectiveness at
identifying statistical relationships, i.e. we
lose power. - We do have the option of changing the way the
information in the variable is represented, e.g.
substitute several dichotomous variables for a
single metric variable.
11Assumption of NormalityComputing Explore
descriptive statistics
To compute the statistics needed for evaluating
the normality of a variable, select the Explore
command from the Descriptive Statistics menu.
12Assumption of NormalityAdding the variable to
be evaluated
Second, click on right arrow button to move the
highlighted variable to the Dependent List.
First, click on the variable to be included in
the analysis to highlight it.
13Assumption of NormalitySelecting statistics to
be computed
To select the statistics for the output, click on
the Statistics command button.
14Assumption of NormalityIncluding descriptive
statistics
First, click on the Descriptives checkbox to
select it. Clear the other checkboxes.
Second, click on the Continue button to complete
the request for statistics.
15Assumption of NormalitySelecting charts for the
output
To select the diagnostic charts for the output,
click on the Plots command button.
16Assumption of NormalityIncluding diagnostic
plots and statistics
First, click on the None option button on the
Boxplots panel since boxplots are not as helpful
as other charts in assessing normality.
Finally, click on the Continue button to complete
the request.
Second, click on the Normality plots with tests
checkbox to include normality plots and the
hypothesis tests for normality.
Third, click on the Histogram checkbox to include
a histogram in the output. You may want to
examine the stem-and-leaf plot as well, though I
find it less useful.
17Assumption of NormalityCompleting the
specifications for the analysis
Click on the OK button to complete the
specifications for the analysis and request SPSS
to produce the output.
18Assumption of NormalityThe histogram
An initial impression of the normality of the
distribution can be gained by examining the
histogram. In this example, the histogram shows
a substantial violation of normality caused by a
extremely large value in the distribution.
19Assumption of NormalityThe normality plot
The problem with the normality of this variables
distribution is reinforced by the normality
plot. If the variable were normally distributed,
the red dots would fit the green line very
closely. In this case, the red points in the
upper right of the chart indicate the severe
skewing caused by the extremely large data values.
20Assumption of NormalityThe test of normality
- Since the sample size is larger than 50, we use
the Kolmogorov-Smirnov test. If the sample size
were 50 or less, we would use the Shapiro-Wilk
statistic instead. - The null hypothesis for the test of normality
states that the actual distribution of the
variable is equal to the expected distribution,
i.e., the variable is normally distributed.
Since the probability associated with the test of
normality is lt 0.001 is less than or equal to
the level of significance (0.01), we reject the
null hypothesis and conclude that total hours
spent on the Internet is not normally
distributed. (Note we report the probability as
lt0.001 instead of .000 to be clear that the
probability is not really zero.)
21Assumption of NormalityThe rule of thumb for
skewness and kurtosis
Using the rule of thumb for evaluating normality
with the skewness and kurtosis statistics, we
look at the table of descriptive statistics. The
skewness and kurtosis for the variable both
exceed the rule of thumb criteria of 1.0. The
variable is not normally distributed.
22Assumption of Linearity
- Linearity means that the amount of change, or
rate of change, between scores on two variables
is constant for the entire range of scores for
the variables. - Linearity characterizes the relationship between
two metric variables. It is tested for the pairs
formed by dependent variable and each metric
independent variable in the analysis. - There are relationships that are not linear.
- The relationship between learning and time may
not be linear. Learning a new subject shows
rapid gains at first, then the pace slows down
over time. This is often referred to a a learning
curve. - Population growth may not be linear. The pattern
often shows growth at increasing rates over time.
23Assumption of LinearityEvaluating linearity
- There are both graphical and statistical methods
for evaluating linearity. - Graphical methods include the examination of
scatterplots, often overlaid with a trendline.
While commonly recommended, this strategy is
difficult to implement. - Statistical methods include diagnostic hypothesis
tests for linearity, a rule of thumb that says a
relationship is linear if the difference between
the linear correlation coefficient (r) and the
nonlinear correlation coefficient (eta) is small,
and examining patterns of correlation
coefficients.
24Assumption of LinearityInterpreting scatterplots
The advice for interpreting linearity is often
phrased as looking for a cigar-shaped band, which
is very evident in this plot.
25Assumption of LinearityInterpreting scatterplots
Sometimes, a scatterplot shows a clearly
nonlinear pattern that requires transformation,
like the one shown in the scatterplot.
26Assumption of LinearityScatterplots that are
difficult to interpret
The correlations for both of these relationships
are low. The linearity of the relationship on
the right can be improved with a transformation
the plot on the left cannot. However, this is not
necessarily obvious from the scatterplots.
27Assumption of LinearityUsing correlation
matrices
Creating a correlation matrix for the dependent
variable and the original and transformed
variations of the independent variable provides
us with a pattern that is easier to interpret.
The information that we need is in the first
column of the matrix which shows the correlation
and significance for the dependent variable and
all forms of the independent variable.
28Assumption of LinearityThe pattern of
correlations for no relationship
The correlation between the two variables is very
weak and statistically non-significant. If we
viewed this as a hypothesis test for the
significance of r, we would conclude that there
is no relationship between these variables.
Moreover, none of significance tests for the
correlations with the transformed dependent
variable are statistically significant. There is
no relationship between these variables it is
not a problem with non-linearity.
29Assumption of LinearityCorrelation pattern
suggesting transformation
The correlation between the two variables is very
weak and statistically non-significant. If we
viewed this as a hypothesis test for the
significance of r, we would conclude that there
is no relationship between these variables.
However, the probability associated with the
larger correlation for the logarithmic
transformation is statistically significant,
suggesting that this is a transformation we might
want to use in our analysis.
30Assumption of LinearityCorrelation pattern
suggesting substitution
- Should it happen that the correlation between a
transformed independent variable and the
dependent variable is substantially stronger than
the relationship between the untransformed
independent variable and the dependent variable,
the transformation should be considered even if
the relationship involving the untransformed
independent variable is statistically
significant. - A difference of 0.20 or -0.20, or more, would be
considered substantial enough since a change of
this size would alter our interpretation of the
relationship.
31Assumption of LinearityTransformations
- When a relationship is not linear, we can
transform one or both variables to achieve a
relationship that is linear. - Three common transformations to induce linearity
are the logarithmic transformation, the square
root transformation, and the inverse
transformation. - All of these transformations produce a new
variable that is mathematically equivalent to the
original variable, but expressed in different
measurement units, e.g. logarithmic units instead
of decimal units.
32Assumption of LinearityWhen transformations do
not work
- When none of the transformations induces
linearity in a relationship, our statistical
analysis will underestimate the presence and
strength of the relationship, i.e. we lose power. - We do have the option of changing the way the
information in the variables are represented,
e.g. substitute several dichotomous variables for
a single metric variable. This bypasses the
assumption of linearity while still attempting to
incorporate the information about the
relationship in the analysis.
33Assumption of LinearityCreating the scatterplot
Suppose we are interested in the linearity of
the relationship between "hours per day watching
TV" and "total hours spent on the
Internet". The most commonly recommended
strategy for evaluating linearity is visual
examination of a scatter plot.
To obtain a scatter plot in SPSS, select the
Scatter command from the Graphs menu.
34Assumption of LinearitySelecting the type of
scatterplot
First, click on thumbnail sketch of a simple
scatterplot to highlight it.
Second, click on the Define button to specify the
variables to be included in the scatterplot.
35Assumption of LinearitySelecting the variables
First, move the dependent variable netime to the
Y Axis text box.
Third, click on the OK button to complete the
specifications for the scatterplot.
Second, move the independent variable tvhours to
the X axis text box.
If a problem statement mentions a relationship
between two variables without clearly indicating
which is the independent variable and which is
the dependent variable, the first mentioned
variable is taken to the be independent variable.
36Assumption of LinearityThe scatterplot
The scatterplot is produced in the SPSS output
viewer. The points in a scatterplot are
considered linear if they form a cigar-shaped
elliptical band. The pattern in this scatterplot
is not really clear.
37Assumption of LinearityAdding a trendline
To try to determine if the relationship is
linear, we can add a trendline to the chart.
To add a trendline to the chart, we need to open
the chart for editing. To open the chart for
editing, double click on it.
38Assumption of LinearityThe scatterplot in the
SPSS Chart Editor
The chart that we double clicked on is opened for
editing in the SPSS Chart Editor.
To add the trend line, select the Options
command from the Chart menu.
39Assumption of LinearityRequesting the fit line
In the Scatterplot Options dialog box, we click
on the Total checkbox in the Fit Line panel in
order to request the trend line.
Click on the Fit Options button to request the
r² coefficient of determination as a measure of
the strength of the relationship.
40Assumption of LinearityRequesting r²
First, the Linear regression thumbnail sketch
should be highlighted as the type of fit line to
be added to the chart.
Third, click on the Continue button to complete
the options request.
Second, click on the Fit Options Click on the
Display R-square in Legend checkbox to add this
item to our output.
41Assumption of LinearityCompleting the request
for the fit line
Click on the OK button to complete the request
for the fit line.
42Assumption of LinearityThe fit line and r²
The red fit line is added to the chart.
The value of r² (0.0460) suggests that the
relationship is weak.
43Assumption of LinearityComputing the
transformations
There are four transformations that we can use to
achieve or improve linearity. The compute
dialogs for these four transformations for
linearity are shown.
44Assumption of LinearityCreating the scatterplot
matrix
To create the scatterplot matrix, select the
Scatter command in the Graphs menu.
45Assumption of LinearitySelecting type of
scatterplot
First, click on the Matrix thumbnail sketch to
indicate which type of scatterplot we want.
Second, click on the Define button to select the
variables for the scatterplot.
46Assumption of LinearitySpecifications for
scatterplot matrix
First, move the dependent variable, the
independent variable and all of the
transformations to the Matrix Variables list box.
Second, click on the OK button to produce the
scatterplot.
47Assumption of LinearityThe scatterplot matrix
The scatterplot matrix shows a thumbnail sketch
of scatterplots for each independent variable or
transformation with the dependent variable. The
scatterplot matrix may suggest which
transformations might be useful.
48Assumption of LinearityCreating the correlation
matrix
To create the correlation matrix, select the
Correlate Bivariate command in the Analyze
menu.
49Assumption of LinearitySpecifications for
correlation matrix
First, move the dependent variable, the
independent variable and all of the
transformations to the Variables list box.
Second, click on the OK button to produce the
correlation matrix.
50Assumption of LinearityThe correlation matrix
The answers to the problems are based on the
correlation matrix. Before we answer the
question in this problem, we will use a script to
produce the output.
51Assumption of Homoscedasticity
- Homoscedasticity refers to the assumption that
the dependent variable exhibits similar amounts
of variance across the range of values for an
independent variable. - While it applies to independent variables at all
three measurement levels, the methods that we
will use to evaluation homoscedasticity requires
that the independent variable be non-metric
(nominal or ordinal) and the dependent variable
be metric (ordinal or interval). When both
variables are metric, the assumption is evaluated
as part of the residual analysis in multiple
regression.
52Assumption of Homoscedasticity Evaluating
homoscedasticity
- Homoscedasticity is evaluated for pairs of
variables. - There are both graphical and statistical methods
for evaluating homoscedasticity . - The graphical method is called a boxplot.
- The statistical method is the Levene statistic
which SPSS computes for the test of homogeneity
of variances. - Neither of the methods is absolutely definitive.
53Assumption of Homoscedasticity The boxplot
Each red box shows the middle 50 of the cases
for the group, indicating how spread out the
group of scores is.
If the variance across the groups is equal, the
height of the red boxes will be similar across
the groups. If the heights of the red boxes
are different, the plot suggests that the
variance across groups is not homogeneous. The
married group is more spread out than the other
groups, suggesting unequal variance.
54Assumption of Homoscedasticity Levene test of
the homogeneity of variance
The null hypothesis for the test of homogeneity
of variance states that the variance of the
dependent variable is equal across groups defined
by the independent variable, i.e., the variance
is homogeneous. Since the probability
associated with the Levene Statistic (lt0.001) is
less than or equal to the level of significance,
we reject the null hypothesis and conclude that
the variance is not homogeneous.
55Assumption of Homoscedasticity Transformations
- When the assumption of homoscedasticity is not
supported, we can transform the dependent
variable variable and test it for
homoscedasticity . If the transformed variable
demonstrates homoscedasticity, we can substitute
it in our analysis. - We use the sample three common transformations
that we used for normality the logarithmic
transformation, the square root transformation,
and the inverse transformation. - All of these change the measuring scale on the
horizontal axis of a histogram to produce a
transformed variable that is mathematically
equivalent to the original variable.
56Assumption of Homoscedasticity When
transformations do not work
- When none of the transformations results in
homoscedasticity for the variables in the
relationship, including that variable in the
analysis will reduce our effectiveness at
identifying statistical relationships, i.e. we
lose power.
57Assumption of Homoscedasticity Request a boxplot
Suppose we want to test for homogeneity of
variance whether the variance in "highest
academic degree" is homogeneous for the
categories of "marital status."
The boxplot provides a visual image of the
distribution of the dependent variable for the
groups defined by the independent variable. To
request a boxplot, choose the BoxPlot command
from the Graphs menu.
58Assumption of Homoscedasticity Specify the type
of boxplot
First, click on the Simple style of boxplot to
highlight it with a rectangle around the
thumbnail drawing.
Second, click on the Define button to specify the
variables to be plotted.
59Assumption of Homoscedasticity Specify the
dependent variable
First, click on the dependent variable to
highlight it.
Second, click on the right arrow button to move
the dependent variable to the Variable text box.
60Assumption of Homoscedasticity Specify the
independent variable
Second, click on the right arrow button to move
the independent variable to the Category Axis
text box.
First, click on the independent variable to
highlight it.
61Assumption of Homoscedasticity Complete the
request for the boxplot
To complete the request for the boxplot, click on
the OK button.
62Assumption of Homoscedasticity The boxplot
Each red box shows the middle 50 of the cases
for the group, indicating how spread out the
group of scores is.
If the variance across the groups is equal, the
height of the red boxes will be similar across
the groups. If the heights of the red boxes
are different, the plot suggests that the
variance across groups is not homogeneous. The
married group is more spread out than the other
groups, suggesting unequal variance.
63Assumption of Homoscedasticity Request the test
for homogeneity of variance
To compute the Levene test for homogeneity of
variance, select the Compare Means One-Way
ANOVA command from the Analyze menu.
64Assumption of Homoscedasticity Specify the
independent variable
First, click on the independent variable to
highlight it.
Second, click on the right arrow button to move
the independent variable to the Factor text box.
65Assumption of Homoscedasticity Specify the
dependent variable
Second, click on the right arrow button to move
the dependent variable to the Dependent List text
box.
First, click on the dependent variable to
highlight it.
66Assumption of Homoscedasticity The homogeneity
of variance test is an option
Click on the Options button to open the options
dialog box.
67Assumption of Homoscedasticity Specify the
homogeneity of variance test
First, mark the checkbox for the Homogeneity of
variance test. All of the other checkboxes can
be cleared.
Second, click on the Continue button to close the
options dialog box.
68Assumption of Homoscedasticity Complete the
request for output
Click on the OK button to complete the request
for the homogeneity of variance test through the
one-way anova procedure.
69Assumption of Homoscedasticity Interpreting the
homogeneity of variance test
The null hypothesis for the test of homogeneity
of variance states that the variance of the
dependent variable is equal across groups defined
by the independent variable, i.e., the variance
is homogeneous. Since the probability
associated with the Levene Statistic (lt0.001) is
less than or equal to the level of significance,
we reject the null hypothesis and conclude that
the variance is not homogeneous.
70Using scripts
- The process of evaluating assumptions requires
numerous SPSS procedures and outputs that are
time consuming to produce. - These procedures can be automated by creating an
SPSS script. A script is a program that executes
a sequence of SPSS commands. - Though writing scripts is not part of this
course, we can take advantage of scripts that I
use to reduce the burdensome tasks of evaluating
assumptions .
71Using a script for evaluating assumptions
- The script EvaluatingAssumptionsAndMissingData.ex
e will produce all of the output we have used
for evaluating assumptions. - Navigate to the link SPSS Scripts and Syntax on
the course web page. - Download the script file EvaluatingAssumptionsAnd
MissingData.exe to your computer and install
it, following the directions on the web page.
72Open the data set in SPSS
Before using a script, a data set should be open
in the SPSS data editor.
73Invoke the script in SPSS
To invoke the script, select the Run Script
command in the Utilities menu.
74Select the script
First, navigate to the folder where you put the
script. If you followed the directions, you will
have a file with an ".SBS" extension in the
C\SW388R7 folder. If you only see a file with
an .EXE extension in the folder, you should
double click on that file to extract the script
file to the C\SW388R7 folder.
Second, click on the script name to highlight it.
Third, click on Run button to start the script.
75The script dialog
The script dialog box acts similarly to SPSS
dialog boxes. You select the variables to
include in the analysis and choose options for
the output.
76Complete the specifications - 1
Move the the dependent and independent variables
from the list of variables to the list boxes.
Metric and nonmetric variables are moved to
separate lists so the computer knows how you want
them treated.
You must also indicate the level of measurement
for the dependent variable. By default the
metric option button is marked.
77Complete the specifications - 2
Mark the option button for the type of output you
want the script to compute.
Click on the OK button to produce the output.
Select the transformations to be tested.
78The script finishes
If your SPSS output viewer is open, you will see
the output produced in that window.
Since it may take a while to produce the output,
and since there are times when it appears that
nothing is happening, there is an alert to tell
you when the script is finished. Unless you
are absolutely sure something has gone wrong, let
the script run until you see this alert. When
you see this alert, click on the OK button.
79Output from the script - 1
The script will produce lots of output.
Additional descriptive material in the titles
should help link specific outputs to specific
tasks. Scroll through the script to locate the
outputs needed to answer the question.
80Closing the script dialog box
The script dialog box does not close
automatically because we often want to run
another test right away. There are two methods
for closing the dialog box.
Click on the X close box to close the script.
Click on the Cancel button to close the script.
81Problem 1
- In the dataset GSS2000R, is the following
statement true, false, or an incorrect
application of a statistic? Use a level of
significance of 0.01 for evaluating missing data
and assumptions. - In pre-screening the data for use in a multiple
regression of the dependent variable "total hours
spent on the Internet" netime with the
independent variables "age" age, "sex" sex,
and "income" rincom98, the evaluation of the
assumptions of normality, linearity, and
homogeneity of variance did not indicate any need
for a caution to be added to the interpretation
of the analysis. - 1. True
- 2. True with caution
- 3. False
- 4. Inappropriate application of a statistic
82Level of measurement
Since we are pre-screening for a multiple
regression problem, we should make sure we
satisfy the level of measurement before
proceeding.
"Total hours spent on the Internet" netime is
interval, satisfying the metric level of
measurement requirement for the dependent
variable.
- 9. In the dataset GSS2000R, is the following
statement true, false, or an incorrect
application of a statistic? Use a level of
significance of 0.01 for evaluating missing data
and assumptions. - In pre-screening the data for use in a multiple
regression of the dependent variable "total hours
spent on the Internet" netime with the
independent variables "age" age, "sex" sex,
and "income" rincom98, the evaluation of the
assumptions of normality, linearity, and
homogeneity of variance did not indicate any need
for a caution to be added to the interpretation
of the analysis.
"Age" age and "highest year of school
completed" educ are interval, satisfying the
metric or dichotomous level of measurement
requirement for independent variables. "Sex"
sex is dichotomous, satisfying the metric or
dichotomous level of measurement requirement for
independent variables. "Income" rincom98 is
ordinal, satisfying the metric or dichotomous
level of measurement requirement for independent
variables. Since some data analysts do not agree
with this convention of treating an ordinal
variable as metric, a note of caution should be
included in our interpretation.
83Run the script to test normality - 1
To run the script to test assumptions, choose the
Run Script command from the Utilities menu.
84Run the script to test normality - 2
First, navigate to the SW388R7 folder on your
computer.
Second, click on the script name to select it
EvaluatingAssumptionsAndMissingData.SBS
Third, click on the Run button to open the script.
85Run the script to test normality - 3
First, move the variables to the list boxes based
on the role that the variable plays in the
analysis and its level of measurement.
Second, click on the Normality option button to
request that SPSS produce the output needed to
evaluate the assumption of normality.
Fourth, click on the OK button to produce the
output.
Third, mark the checkboxes for the
transformations that we want to test in
evaluating the assumption.
86Normality of the dependent variable
The dependent variable "total hours spent on the
Internet" netime did not satisfy the criteria
for a normal distribution. Both the skewness
(3.532) and kurtosis (15.614) fell outside the
range from -1.0 to 1.0.
87Normality of transformed dependent variable
Since "total hours spent on the Internet"
netime did not satisfy the criteria for
normality, we examine the skewness and kurtosis
of each of the transformations to see if any of
them satisfy the criteria.
The "log of total hours spent on the Internet
LGNETIMELG10(NETIME)" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.150) was between -1.0 and 1.0
and the kurtosis of the distribution (0.127) was
between -1.0 and 1.0. The "log of total hours
spent on the Internet LGNETIMELG10(NETIME)"
was substituted for "total hours spent on the
Internet" netime in the analysis.
88Normality of the independent variables - 1
The independent variable "age" age satisfied
the criteria for a normal distribution. The
skewness of the distribution (0.595) was between
-1.0 and 1.0 and the kurtosis of the
distribution (-0.351) was between -1.0 and 1.0.
89Normality of the independent variables - 2
The independent variable "income" rincom98
satisfied the criteria for a normal distribution.
The skewness of the distribution (-0.686) was
between -1.0 and 1.0 and the kurtosis of the
distribution (-0.253) was between -1.0 and 1.0.
90Run the script to test linearity - 1
If the script was not closed after it was used
for normality, we can take advantage of the
specifications already entered. If the script
was closed, re-open it as you would for normality.
First, click on the Linearity option button to
request that SPSS produce the output needed to
evaluate the assumption of linearity.
When the linearity option is selected, a default
set of transformations to test is marked.
91Run the script to test linearity - 2
Since we have already decided to use the log of
the dependent variable to satisfy normality, that
is the form of the dependent variable we want to
evaluate with the independent variables. Mark
this checkbox for the dependent variable and
clear the others.
Click on the OK button to produce the output.
92Linearity test with age of respondent
The assessment of the linear relationship between
"log of total hours spent on the Internet
LGNETIMELG10(NETIME)" and "age" age
indicated that the relationship was weak, rather
than nonlinear. The statistical probabilities
associated with the correlation coefficients
measuring the relationship with the untransformed
independent variable (r0.074, p0.483), the
logarithmic transformation (r0.119, p0.257),
the square root transformation (r0.096,
p0.362), and the inverse transformation
(r0.164, p0.116), were all greater than the
level of significance for testing assumptions
(0.01). There was no evidence that the
assumption of linearity was violated.
93Linearity test with respondents income
The assessment of the linear relationship between
"log of total hours spent on the Internet
LGNETIMELG10(NETIME)" and "income" rincom98
indicated that the relationship was weak, rather
than nonlinear. The statistical probabilities
associated with the correlation coefficients
measuring the relationship with the untransformed
independent variable (r-0.053, p0.658), the
logarithmic transformation (r0.063, p0.600),
the square root transformation (r0.060,
p0.617), and the inverse transformation
(r0.073, p0.540), were all greater than the
level of significance for testing assumptions
(0.01). There was no evidence that the
assumption of linearity was violated.
94Run the script to test homogeneity of variance -
1
If the script was not closed after it was used
for normality, we can take advantage of the
specifications already entered. If the script
was closed, re-open it as you would for normality.
First, click on the Homogeneity of variance
option button to request that SPSS produce the
output needed to evaluate the assumption of
homogeneity.
When the homogeneity of variance option is
selected, a default set of transformations to
test is marked.
95Run the script to test homogeneity of variance -
2
In this problem, we have already decided to use
the log transformation for the dependent
variable, so we only need test it. Next, clear
all of the transformation checkboxes except for
Logarithmic.
Finally, click on the OK button to produce the
output.
96Levene test of homogeneity of variance
Based on the Levene Test, the variance in "log of
total hours spent on the Internet
LGNETIMELG10(NETIME)" was homogeneous for the
categories of "sex" sex. The probability
associated with the Levene statistic (0.166) was
p0.685, greater than the level of significance
for testing assumptions (0.01). The null
hypthesis that the group variances were equal was
not rejected. The homogeneity of variance
assumption was satisfied.
97Answer 1
- In pre-screening the data for use in a multiple
regression of the dependent variable "total hours
spent on the Internet" netime with the
independent variables "age" age, "sex" sex,
and "income" rincom98, the evaluation of the
assumptions of normality, linearity, and
homogeneity of variance did not indicate any need
for a caution to be added to the interpretation
of the analysis. - 1. True
- 2. True with caution
- 3. False
- 4. Inappropriate application of a statistic
The logarithmic transformation of the dependent
variable LGNETIMELG10(NETIME) solved the only
problem with normality that we encountered. In
that form, the relationship with the metric
dependent variables was weak, but there was no
evidence of nonlinearity. The variance of log
transform of the dependent variable was
homogeneous for the categories of the nonmetric
variable sex. No cautions were needed because of
a violation of assumptions. A caution was needed
because respondents income was ordinal
level. The answer to the problem is true with
caution.
98Problem 2
- In the dataset 2001WorldFactbook, is the
following statement true, false, or an incorrect
application of a statistic? Use a level of
significance of 0.01 for evaluating missing data
and assumptions. - In pre-screening the data for use in a multiple
regression of the dependent variable "life
expectancy at birth" lifeexp with the
independent variables "population growth rate"
pgrowth, "percent of the total population who
was literate" literacy, and "per capita GDP"
gdp, the evaluation of the assumptions of
normality, linearity, and homogeneity of variance
did not indicate any need for a caution to be
added to the interpretation of the analysis. - 1. True
- 2. True with caution
- 3. False
- 4. Inappropriate application of a statistic
99Level of measurement
Since we are pre-screening for a multiple
regression problem, we should make sure we
satisfy the level of measurement before
proceeding.
"Life expectancy at birth" lifeexp is interval,
satisfying the metric level of measurement
requirement for the dependent variable.
- 14. In the dataset 2001WorldFactbook, is the
following statement true, false, or an incorrect
application of a statistic? Use a level of
significance of 0.01 for evaluating missing data
and assumptions. - In pre-screening the data for use in a multiple
regression of the dependent variable "life
expectancy at birth" lifeexp with the
independent variables "population growth rate"
pgrowth, "percent of the total population who
was literate" literacy, and "per capita GDP"
gdp, the evaluation of the assumptions of
normality, linearity, and homogeneity of variance
did not indicate any need for a caution to be
added to the interpretation of the analysis.
"Population growth rate" pgrowth "percent of
the total population who was literate" literacy
and "per capita GDP" gdp are interval,
satisfying the metric or dichotomous level of
measurement requirement for independent
variables.
100Run the script to test normality - 1
To run the script to test assumptions, choose the
Run Script command from the Utilities menu.
101Run the script to test normality - 2
First, navigate to the SW388R7 folder on your
computer.
Second, click on the script name to select it
EvaluatingAssumptionsAndMissingData.SBS
Third, click on the Run button to open the script.
102Run the script to test normality - 3
First, move the variables to the list boxes based
on the role that the variable plays in the
analysis and its level of measurement.
Second, click on the Normality option button to
request that SPSS produce the output needed to
evaluate the assumption of normality.
Fourth, click on the OK button to produce the
output.
Third, mark the checkboxes for the
transformations that we want to test in
evaluating the assumption.
103Normality of the dependent variable
The dependent variable "life expectancy at birth"
lifeexp satisfied the criteria for a normal
distribution. The skewness of the distribution
(-0.997) was between -1.0 and 1.0 and the
kurtosis of the distribution (0.005) was between
-1.0 and 1.0.
104Normality of the first independent variables
The independent variable "population growth rate"
pgrowth did not satisfy the criteria for a
normal distribution. Both the skewness (2.885)
and kurtosis (22.665) fell outside the range from
-1.0 to 1.0.
105Normality of transformed independent variable
Neither the logarithmic (skew-0.218,
kurtosis1.277), the square root (skew0.873,
kurtosis5.273), nor the inverse transformation
(skew-1.836, kurtosis5.763) induced normality
in the variable "population growth rate"
pgrowth. A caution was added to the findings.
106Normality of the second independent variables
The independent variable "percent of the total
population who was literate" literacy did not
satisfy the criteria for a normal distribution.
The kurtosis of the distribution (0.081) was
between -1.0 and 1.0, but the skewness of the
distribution (-1.112) fell outside the range from
-1.0 to 1.0.
107Normality of transformed independent variable
Since the distribution was skewed to the left, it
was necessary to reflect, or reverse code, the
values for the variable before computing the
transformation.
The "square root of percent of the total
population who was literate (using reflected
values) SQLITERASQRT(101-LITERACY)" satisfied
the criteria for a normal distribution. The
skewness of the distribution (0.567) was between
-1.0 and 1.0 and the kurtosis of the
distribution (-0.964) was between -1.0 and 1.0.
The "square root of percent of the total
population who was literate (using reflected
values) SQLITERASQRT(101-LITERACY)" was
substituted for "percent of the total population
who was literate" literacy in the analysis.
108Normality of the third independent variables
The independent variable "per capita GDP" gdp
did not satisfy the criteria for a normal
distribution. The kurtosis of the distribution
(0.475) was between -1.0 and 1.0, but the
skewness of the distribution (1.207) fell outside
the range from -1.0 to 1.0.
109Normality of transformed independent variable
The "square root of per capita GDP
SQGDPSQRT(GDP)" satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.614) was between -1.0 and 1.0
and the kurtosis of the distribution (-0.773) was
between -1.0 and 1.0. The "square root of per
capita GDP SQGDPSQRT(GDP)" was substituted
for "per capita GDP" gdp in the analysis.
110Run the script to test linearity - 1
If the script was not closed after it was used
for normality, we can take advantage of the
specifications already entered. If the script
was closed, re-open it as you would for normality.
First, click on the Linearity option button to
request that SPSS produce the output needed to
evaluate the assumption of linearity.
When the linearity option is selected, a default
set of transformations to test is marked.
Click on the OK button to produce the output.
111Linearity test with population growth rate
The assessment of the linearity of the
relationship between "life expectancy at birth"
lifeexp and "population growth rate" pgrowth
indicated that the relationship could be
considered linear because the probability
associated with the correlation coefficient for
the relationship (r-0.262) was statistically
signficant (plt0.001) and none of the
statistically significant transformations for
population growth rate had a relationship that
was substantially stronger. The relationship
between the untransformed variables was assumed
to satisfy the assumption of linearity.
112Linearity test with population literacy
The transformation "square root of percent of the
total population who was literate (using
reflected values) SQLITERASQRT(101-LITERACY)"
was incorporated in the analysis in the
evaluation of normality. Additional
transformations for linearity were not
considered.
113Linearity test with per capita GDP
The transformation "square root of per capita GDP
SQGDPSQRT(GDP)" was incorporated in the
analysis in the evaluation of normality.
Additional transformations for linearity were not
considered.
114Run the script to test homogeneity of variance -
1
There were no nonmetric variables in this
analysis, so the test of homogeneity of variance
was not conducted.
115Answer 2
- In pre-screening the data for use in a multiple
regression of the dependent variable "life
expectancy at birth" lifeexp with the
independent variables "population growth rate"
pgrowth, "percent of the total population who
was literate" literacy, and "per capita GDP"
gdp, the evaluation of the assumptions of
normality, linearity, and homogeneity of variance
did not indicate any need for a caution to be
added to the interpretation of the analysis. - 1. True
- 2. True with caution
- 3. False
- 4. Inappropriate application of a statistic
Two transformations were substituted to satisfy
the assumption of normality the "square root of
percent of the total population who was literate
(using reflected values) SQLITERASQRT(101-LITERA
CY)" and the "square root of per capita GDP
SQGDPSQRT(GDP)" was substituted for "per
capita GDP" gdp in the analysis. However,
none of the transformations induced normality in
the variable "population growth rate" pgrowth.
A caution was added to the findings. The answer
to the problem is false. A caution was added
because "Population growth rate" pgrowth did
not satisfy the assumption of normality and none
of the transformations were successful in
inducing normality.
116Steps in evaluating assumptionslevel of
measurement
The following is a guide to the decision process
for answering problems about assumptions for
multiple regression
Is the dependent variable metric and the
independent variables metric or dichotomous?
Incorrect application of a statistic
No
Yes
117Steps in evaluating assumptionsassumption of
normality for metric variable
Does the dependent variable satisfy the criteria
for a normal distribution?
Assumption satisfied, use untransformed variable
in analysis
Yes
No
Does one or more of the transformations satisfy
the criteria for a normal distribution?
Assumption satisfied, use transformed variable
with smallest skew
Assumption not satisfied, use untransformed
variable in analysis Add caution to
interpretation
118Steps in evaluating assumptionsassumption of
linearity for metric variables
Independent variable transformed for normality?
If dependent variable was transformed for
normality, substitute transformed dependent
variable in the test for the assumption of
linearity
Yes
Skip test
Probability of correlation (r) for relationship
between IV and DV lt level of significance?
Yes
No
Probability of correlation (r) for relationship
between any transformed IV significant AND r
greater than r of untransformed by 0.20?
Probability of correlation (r) for relationship
between any transformed IV and DV lt level of
significance?
Yes
No
No
Yes
Assumption satisfied, use untransformed
independent variable
Assumption satisfied, Use transformed variable
with highest r
Interpret relationship as weak, not
nonlinear. No caution needed
119Steps in evaluating assumptionshomogeneity of
variance for nonmetric variables
If dependent variable was transformed for
normality, substitute transformed dependent
variable in the test for the assumption of
homogeneity of variance
Probability of Levene statistic lt level of
significance?
Yes
No
Assumption satisfied
Assumption not satisfied Add caution to
interpretation