SESSION 2 ANOVA and regression

About This Presentation

Title:

SESSION 2 ANOVA and regression

Description:

In ANOVA, the rejection of the null hypothesis leaves many questions unanswered. ... A second-order polynomial (parabola) changes direction ONCE and can be fitted to ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 155

Provided by: colin111

Category:

more less

Transcript and Presenter's Notes

Title: SESSION 2 ANOVA and regression

1
SESSION 2 ANOVA and regression
2
Only the starting point

In ANOVA, the rejection of the null hypothesis
leaves many questions unanswered.
Further analysis is needed to pinpoint the
crucial patterns in the data.
So, unlike the t test, the ANOVA is often just
the first step in what may be quite an extensive
statistical analysis.

3
Comparisons among the five treatment means
4
Simple and complex comparisons

You might want to make SIMPLE COMPARISONS between
the mean for each of the four drug conditions and
the Placebo mean.
Or you might want to compare the Placebo mean
with the mean of the four drug means. This is a
COMPLEX COMPARISON.

5
(No Transcript)
6
Non-independence of comparisons

The simple comparison of M5 with M1 and the
complex comparison are not independent.
The value of M5 feeds into the value of the
average of the means for the drug groups.

7
Systems of comparisons

With a complex experiment, interest centres on
SYSTEMS of comparisons.
Which comparisons are independent or ORTHOGONAL?
What is the probability, under the null
hypothesis, that at least one comparison will
show significance?
How much variance can we attribute to different
comparisons?

8
The crumpled paper fallacy

We owe this to Thouless.
Uncrumple a piece of paper.
The wrinkles are unique.
Therefore, they are statistically significant.
Data sets from complex experiments may, ex post
facto, show all manner of interesting patterns.
Inferences from such patterns are dangerous.

9
Over-analysis?

You have run a complex experiment and submitted a
paper to a journal.
Your reviewers will need to be convinced that
what you are reporting isnt just a chance
pattern thrown up by sampling error.
You may well be asked to specify orthogonal
comparisons and test them for significance.

10
Linear functions

Y is a linear function of X if the graph of Y
upon X is a straight line.
For example, temperature in degrees Fahrenheit is
a linear function of temperature in degrees
Celsius.

11
F is a linear function of C
Degrees Fahrenheit
P
Q
Intercept ? 32
(0, 0)
Degrees Celsius
12
(No Transcript)
13
(No Transcript)
14
Linear contrasts

Any comparison can be expressed as a sum of
terms, each of which is a product of a treatment
mean and a coefficient such that the coefficients
sum to zero.
When so expressed, the comparison is a LINEAR
CONTRAST, because it has the form of a linear
function.
It looks artificial at first, but this notation
enables us to study the properties of systems of
comparisons among the treatment means.

15
(No Transcript)
16
(No Transcript)
17
More compactly, if there are k treatment
groups, we can write
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Helmert contrasts

Compare the first mean with the mean of the other
means.
Drop the first mean and compare the second mean
with the mean of the remaining means. Drop the
second mean.
Continue until you arrive at a comparison between
the last two means.

23
Helmert contrasts

Our first contrast is
1, -¼, -¼, -¼, -¼
Our second contrast is
0, 1, -? , -?, -?
Our third contrast is
0, 0, 1, -½, -½
Our fourth is
0, 0, 0, 1, -1

24
(No Transcript)
25
Orthogonal contrasts

The first contrast in no way constrains the value
of the second, because the first mean has been
dropped.
The first two contrasts do not affect the third,
because the first two means have been dropped.
This is a set of four independent or ORTHOGONAL
contrasts.

26
The orthogonal property

The sum of the products of corresponding
coefficients in any pair of rows is zero.
This means that we have an ORTHOGONAL contrast
set.

27
Size of an orthogonal set

In our example, with five treatment means, there
are four orthogonal contrasts.
In general, for an array of k means, you can
construct a set of, at most, k-1 orthogonal
contrasts.
In the present ANOVA example, k 5, so the rule
tells us that there can be no more than 4
orthogonal contrasts in the set.
Several different orthogonal sets, however, can
often be constructed for the same set of means.

28
Accounting for variability
grand mean

The building block for any variance estimate is a
DEVIATION of some sort.
The TOTAL DEVIATION of any score from the grand
mean (GM) can be divided into 2 components 1. a
BETWEEN GROUPS component 2. a WITHIN GROUPS
component.

29
Breakdown (partition) of the total sum of squares

If you sum the squares of the deviations over all
50 scores, you obtain an expression which breaks
down the total variability in the scores into
BETWEEN GROUPS and WITHIN GROUPS components.

30
Contrast sums of squares

We have seen that in the one-way ANOVA, the value
of SSbetween reflects the sizes of the
differences among the treatment means.
In the same way, it is possible to measure the
importance of a contrast by calculating a sum of
squares which reflects the variation attributable
to that contrast alone
We can use an F statistic to test each contrast
for significance.

31
Formula for a contrast sum of squares
32
(No Transcript)
33
Here, once again, is our set of Helmert
contrasts, to which I have added the values of
the five treatment means
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
Testing a contrast sum of squares for
significance
41
Two approaches

A contrast is a comparison between two means.
You can therefore run a one-way, 2-group ANOVA.
Or you can use a t-test.
The tests are equivalent.

42
Degrees of freedom of a contrast sum of squares

A contrast sum of squares compares two means.
A contrast sum of squares, therefore, has ONE
degree of freedom, because the two deviations
from the grand mean sum to zero.

43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
Contrasts with SPSS

Two approaches
The simpler is through the One-Way option in the
Compare Means menu.
The General Linear Model, however, provides many
more useful statistics.
I suggest you begin by exploring contrasts with
the One-Way procedure first, then move on to the
General Linear Model menu.

49
(No Transcript)
50
(No Transcript)
51
Contrasts with SPSS
The coefficients must be integers
52
(No Transcript)
53
Our Helmert contrasts

Each ringed item is a MEAN.
In the top row, the Placebo mean is compared with
the mean of the drug means.
In the third row, the mean for Drug B is compared
with the mean of the means for Drug C and Drug D.

54
Summary

A contrast is a comparison between two means.
The contrasts can therefore be tested with either
F or t. (F t2.)
The contrast sums of squares sum to the value of
SSbetween.

55
(No Transcript)
56
Heterogeneity of variance

The lower part of the table shows the results of
tests of the same contrasts when homogeneity of
variance is not assumed.
Notice that the degrees of freedom have lower
values.

57
Non-orthogonal contrasts

Contrasts dont have to be independent.
For example, you might wish to compare each of
the four drug groups with the Placebo group.
What you want are SIMPLE CONTRASTS.

58
Simple contrasts

These are linear contrasts each row sums to
zero.
But they are not orthogonal with some pairings,
the sum of products of corresponding coefficients
is not zero.

59
Simple contrasts with SPSS

Here are the entries for the first contrast,
which is between the Placebo and Drug A groups.
Below that are the entries for the final contrast
between the Placebo and Drug D groups.

60
The results

In the column headed Value of Contrast, are the
differences between pairs of treatment means.
For example, Drug A mean minus Placebo mean
7.90 - 8.00 -.10. Drug D Placebo 13.00
8.00 5.00.

61
Trend analysis

Sometimes the factor (independent variable) may
be quantitative and continuous.
The theory of contrasts can be extended to study
trends in the relationship between the factor and
the dependent variable.
The following slides outline the procedure.

62
Polynomials

A POLYNOMIAL is a sum of terms, each of which is
a product of a constant and a power of the same
variable.
The highest power n is the DEGREE of the
polynomial.

63
Graphs of some polynomials
QUADRATIC
LINEAR
QUARTIC
CUBIC
64
Fitting points with polynomials

A first-order polynomial (line) does not change
direction at all. But you can adjust the
constants to fit any TWO points.
A second-order polynomial (parabola) changes
direction ONCE and can be fitted to any THREE
points.
A third-order polynomial changes direction TWICE
and can be fitted to any FOUR points.

65
Fitting points with polynomials

In general, any k points can be fitted perfectly
by a polynomial of order k 1.

66
QUADRATIC
LINEAR
CUBIC
67
Another drug experiment

In the drug experiment, the independent variable
(or factor) comprised a set of five qualitatively
different conditions.
There was no intrinsic ordering of the
categories. The order in which the variables
appeared in Data View was entirely arbitrary.
Now suppose that the five groups vary in the
extent to which the same drug was present.
The Placebo, A, B, C and D groups have dosages of
0, 10, 20, 30 and 40 units of the drug,
respectively.
The five groups are now ordered with respect to a
CONTINUOUS INDEPENDENT VARIABLE.

68
A linear trend

There is evidence of a linear TREND in these
data.
The pattern, however, is imperfect other trends
(e.g. quadratic) may be present as well. On the
other hand, the irregularity may reflect random
error.

69
Capturing the linear trend

Consider the linear contrast
-2 -1 0 1 2
If we plot these values against X (the
concentration of the drug), we shall have the
graph of a straight line.

LINEAR
70
Polynomial coefficients

The coefficients in this contrast are actually
values of the polynomial
y x 3
The sum of squares of this contrast captures or
reflects the linear trend in the data.

71
Orthogonal polynomial contrasts

Here is a set of orthogonal contrasts.
The values in each row are values of one
polynomial for various values of X, the
continuous independent variable.
The top row is a first degree (linear)
polynomial, the next row is a second degree
(quadratic) polynomial and so on.

72
Trend analysis

Although the entries in a row are values of the
same polynomial (whether linear or not), they are
still the coefficients of a linear contrast
they sum to zero moreover, the products of the
corresponding coefficients also sum to zero. We
have an ORTHOGONAL SET of contrasts.
Associated with each contrast is a sum of squares
which captures that particular trend in the data.
The contrasts are tested in the usual way.

73
Ordering a linear polynomial contrast
Specify a linear (1st degree) polynomial

You must check the Polynomial box and specify the
order of the polynomial.
Orthogonal polynomial sets are obtainable from
tables in statistics books, such as Howell
(2007), which provide orthogonal sets for sets of
means of various sizes.

You must check the Polynomial box
74
Ordering a quadratic polynomial contrast
Specify a 2nd degree (quadratic) polynomial

You must now specify a Quadratic (2nd degree)
polynomial.
The coefficients are entered in the usual way.

75
A trend analysis

The relevant results are ringed.
You can see that only the linear trend is
significant.
This formal analysis confirms the appearance of
the profile plot.

76
Partition of the between groups sum of squares

Since we have an orthogonal set of contrasts,
their sums of squares sum to the ANOVA between
groups sum of squares.

77
Deviations in the ANOVA table

The DEVIATION sum of squares is what remains of
SSbetween when the last contrast sum of squares
has been subtracted.
Each deviation has one degree of freedom fewer
than the previous deviation (if there is one).

78
The deviations
The first deviation SS (with df 3) is obtained
by subtracting the linear SS from SSbetween
The second deviation has df 2. Both the linear
and the quadratic trends have now been removed.
79
The deviation terms
80
The t tests

The t tests produce exactly the same p-values as
the F tests.
As usual, F t2

81
Equivalence of F and t
82
Alternative analyses

As usual, t-tests are also made without assuming
homogeneity of variance (lower half).
The values of df are markedly lower, suggesting
that we should go by the tests in the lower part
of the table.

83
A useful question

Are you making comparisons or measuring
association?
If youre making comparisons, you may need
statistics such as the t-test and ANOVA
If youre investigating associations, you will
need techniques such as correlation and
regression.

84
Purpose of this section

Today I intend to build some bridges between the
statistics of comparison and association.
I hope to show that in some circumstances, the
making of a comparison and the investigation of
an association are equivalent.

85
Some regression fundamentals
86
A scatterplot
87
A strong linear association

A narrowly elliptical scatterplot like this
indicates a strong positive linear association
between the two variables.

88
The Pearson correlation
89
(No Transcript)
90
(No Transcript)
91
Warning!

This significance test presupposes that the
distribution is BIVARIATE NORMAL, which implies
that the scatterplot is elliptical (or circular)
in shape.
ALWAYS CHECK THIS OUT BY INSPECTING THE
SCATTERPLOT.

92
Independence

Select a large sample at random from a population
and array the values in a column.
Select another sample from the same population at
random and array those values alongside the
values of the first sample.
The two samples are independent, because the data
are not paired in any meaningful sense.
The correlation between the two columns of values
should be approximately zero.

93
Scatterplot indicating no association
94
Regression

Regression is a set of techniques for exploiting
the presence of statistical association among
variables to make predictions of values of one
variable (the DV or CRITERION) from knowledge of
the values of other variables (the IVs or
REGRESSORS).

95
Simple and multiple regression

In the simplest case, there is just one IV. This
is known as SIMPLE regression.
In MULTIPLE regression, there are two or more IVs.

96
The regression line of actual violence upon film
preference
97
The regression line of Violence upon Preference

The REGRESSION LINE is the line that fits the
points best from the point of view of predicting
Actual Violence from Preference.
(A different line would be drawn were we to try
to predict Preference from Actual Violence.)

98
(No Transcript)
99
Here is the equation of the regression line
100
(No Transcript)
101
(No Transcript)
102
Residual scores

Suppose we use the regression line of Y upon X to
predict the value of a persons score Y from a
particular value of X.
A RESIDUAL (e) is the difference between a
persons true score on Y and the point on the
regression line.

103
(No Transcript)
104
The residuals are shown in the next
picture
105
(No Transcript)
106
Summary

B1 is the slope and B0 is the intercept.
Y/ is the Y-coordinate of the point on the line
above the value X.
An increase of one unit on variable X will result
in an estimated increase of (B1) units on
variable Y.
A NEGATIVE value of B1 means that an increase of
one unit on variable X will result in an
estimated REDUCTION of B1 units on Y.

regression coefficient (slope)
regression constant (intercept)
107
The least-squares criterion

The regression line is the best-fitting line
in the sense that it minimises the sum of the
squares of the residuals.

108
(No Transcript)
109
Breakdown of the total sum of squares
110
Coefficient of determination
111
Explanation
112
The coefficient of determination (r2)

The COEFFICIENT OF DETERMINATION (r2) is the
proportion of the variance of the predicted
variable accounted for by regression.
The coefficient of determination can take values
within the range from 0 to 1, inclusive.

113
Range of r
114
(No Transcript)
115
Positive bias

The coefficient of determination is positively
biased as an estimator.
The statistic known as adjusted R2 attempts to
correct this bias.

116
(No Transcript)
117
(No Transcript)
118
Using more than one regressor

By analogous methods, we could try to predict a
persons actual violence from exposure to screen
violence and number of years of education.
This is a problem in MULTIPLE REGRESSION.

119
Multiple regression
120
Geometrical interpretation

This is the equation of a plane (or hyperplane)
with slopes B1, B2, ,Bp with respect to axes X1,
X2, , Xp and intercept B0.
The slopes are the PARTIAL REGRESSION
COEFFICIENTS and the intercept is the CONSTANT.

121
Regression coefficients

In simple regression the REGRESSION COEFFICIENT
(B1 ) is the estimated change in units of the DV
that would result from an increase of one unit in
the IV.
In multiple regression, a PARTIAL REGRESSION
COEFFICIENT such as B1 is the estimated change in
the DV resulting from an increase of one unit in
the IV X1 with ALL OTHER IVs HELD CONSTANT.

122
The multiple correlation coefficient R

The MULTIPLE CORRELATION COEFFICIENT is the
correlation between the estimates Y/ and the
actual values of the DV (Y).
The COEFFICIENT OF DETERMINATION (R2) is the
proportion of the variance of Y that is accounted
for by regression.

123
Range of R

The multiple correlation coefficient R can only
have non-negative values
0 R 1
This is because the regression line (or plane)
cannot have a slope of opposite sign to that of
the elliptical (or hyperelliptical) scatterplot.

124
Attribution of variance to regressors

If the IVs are uncorrelated, it is easy to
attribute variance in Y to each of the
independent variables X.

125
(No Transcript)
126
Correlated IVs

When the IVs are measured, they always correlate
to at least some extent.
It is then impossible to attribute variance
unequivocally to any particular IV.

127
(No Transcript)
128
(No Transcript)
129
Dummy variables

Information about group membership is carried by
a grouping variable.
A DUMMY VARIABLE has only two values 0 and 1,
where 0 usually denotes the control or comparison
condition in this case the Placebo.

130
Point-biserial correlation

If we correlate the scores in the Group column
with the dummy variable in the Score column, we
obtain what is known as a POINT-BISERIAL
CORRELATION.
The meaning of point-biserial is lost in the
mists of antiquity.
The point is that we are correlating a measured
variable with code numbers for category
membership.

131
(No Transcript)
132
A link

The point biserial correlation is of limited
value as a descriptive statistic.
However, it forms a useful conceptual bridge
between the statistics of comparison (t-test) and
association (correlation).

133
Regression upon dummy variables

We shall now regress the scores that people
achieved in the Caffeine experiment against those
of the dummy variable carrying group membership.

134
The regression line will pass through the group
means
0
1
X
135
Why?

OLS regression minimises the sums of the squares
of the residuals.
In either group of scores, the sum of the squared
deviations about the mean is a minimum.

136
The sum of squares of deviations about the mean
is a minimum
137
The regression statistics

When we regress the Score variable against the
dummy variable, the intercept of the regression
line is the mean score of the Placebo group.
The slope of the regression line is the
difference between the means of the Caffeine and
Placebo groups.

138
The regression statistics
139
(No Transcript)
140
(No Transcript)
141
Significance tests

The intercept (Constant) is 9.25, the value of
the Placebo mean.
The slope is 2.65, which is 11.90 9.25, the
difference between the Caffeine and Placebo
means.
t(38) 2.604 p .013. This is exactly the
result we obtained with the independent samples t
test.

142
Equivalence of ANOVA and regression

When we test the slope of the regression line for
significance, we are also testing the difference
between the Caffeine and Placebo means for
significance.
Since (in the 2-group case) the F and t tests are
equivalent, the regression ANOVA table is
identical with the one-way ANOVA table we
obtained before.

143
Dummy coding for the k-group case

Since MSbetween has only four degrees of freedom,
regression will predict the treatment means
perfectly if the Score variable is regressed upon
four dummy variables X1, X2, X3 and X4.
As with the two-group example, an interesting
equivalence emerges.

144
Dummy coding for the k-group case
145
The one-way ANOVA statistics
146
The regression statistics
Same as the ANOVA value of F.
We see that B0 is the Placebo mean and B1, B2, B3
and B4 are the differences between the means for
the 4 drug conditions and the Placebo mean.
147
In summary

When the scores in the five-group drug experiment
are regressed upon 4 dummy variables,
The regression constant or intercept B0 is the
Placebo mean.
The partial regression coefficients are the
differences between the drug conditions and the
Placebo mean.
The regression sum of squares is equal to the
ANOVA between groups sum of squares.

148
In summary

The t - tests of the regression coefficients are
equivalent to the t-tests of the sums of squares
associated with the four contrasts.

149
Eta squared

Returning to the one-way ANOVA, recall that eta
squared (also known as the CORRELATION RATIO) is
defined as the ratio of the between groups and
within groups mean squares.
Its theoretical range of variation is from zero
(no differences among the means) to unity (no
variance in the scores of any group, but
different values in different groups).
In our example, ?2 .447

150
Eta squared revisited

If the scores from a k group experiment are
regressed upon k 1 dummy variables, the
square of the multiple correlation coefficient R
is the proportion of variance of the scores
accounted for by differences among the treatment
means.
Eta squared is R2, which I think is why it is
also termed the correlation ratio.

151
Formula for SS?

We can think of a contrast sum of squares as the
between treatments variability that is accounted
for by a particular contrast.
The sums of squares for orthogonal contrasts add
up to the ANOVA between groups sum of squares.

152
The contrast sum of squares revisited
153
Building bridges

In these two sessions, in addition to revising
(and adding to) some material with which you are
already familiar, I have tried to demonstrate
some striking equivalences between techniques
which many think of as having quite different
contexts and purposes.