Title: SESSION 2 ANOVA and regression
1SESSION 2 ANOVA and regression
2Only the starting point
- In ANOVA, the rejection of the null hypothesis
leaves many questions unanswered. - Further analysis is needed to pinpoint the
crucial patterns in the data. - So, unlike the t test, the ANOVA is often just
the first step in what may be quite an extensive
statistical analysis.
3Comparisons among the five treatment means
4Simple and complex comparisons
- You might want to make SIMPLE COMPARISONS between
the mean for each of the four drug conditions and
the Placebo mean. - Or you might want to compare the Placebo mean
with the mean of the four drug means. This is a
COMPLEX COMPARISON.
5(No Transcript)
6Non-independence of comparisons
- The simple comparison of M5 with M1 and the
complex comparison are not independent. - The value of M5 feeds into the value of the
average of the means for the drug groups.
7Systems of comparisons
- With a complex experiment, interest centres on
SYSTEMS of comparisons. - Which comparisons are independent or ORTHOGONAL?
- What is the probability, under the null
hypothesis, that at least one comparison will
show significance? - How much variance can we attribute to different
comparisons?
8The crumpled paper fallacy
- We owe this to Thouless.
- Uncrumple a piece of paper.
- The wrinkles are unique.
- Therefore, they are statistically significant.
- Data sets from complex experiments may, ex post
facto, show all manner of interesting patterns. - Inferences from such patterns are dangerous.
9Over-analysis?
- You have run a complex experiment and submitted a
paper to a journal. - Your reviewers will need to be convinced that
what you are reporting isnt just a chance
pattern thrown up by sampling error. - You may well be asked to specify orthogonal
comparisons and test them for significance.
10Linear functions
- Y is a linear function of X if the graph of Y
upon X is a straight line. - For example, temperature in degrees Fahrenheit is
a linear function of temperature in degrees
Celsius.
11F is a linear function of C
Degrees Fahrenheit
P
Q
Intercept ? 32
(0, 0)
Degrees Celsius
12(No Transcript)
13(No Transcript)
14Linear contrasts
- Any comparison can be expressed as a sum of
terms, each of which is a product of a treatment
mean and a coefficient such that the coefficients
sum to zero. - When so expressed, the comparison is a LINEAR
CONTRAST, because it has the form of a linear
function. - It looks artificial at first, but this notation
enables us to study the properties of systems of
comparisons among the treatment means.
15(No Transcript)
16(No Transcript)
17More compactly, if there are k treatment
groups, we can write
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Helmert contrasts
- Compare the first mean with the mean of the other
means. - Drop the first mean and compare the second mean
with the mean of the remaining means. Drop the
second mean. - Continue until you arrive at a comparison between
the last two means.
23Helmert contrasts
- Our first contrast is
- 1, -¼, -¼, -¼, -¼
- Our second contrast is
- 0, 1, -? , -?, -?
- Our third contrast is
- 0, 0, 1, -½, -½
- Our fourth is
- 0, 0, 0, 1, -1
24(No Transcript)
25Orthogonal contrasts
- The first contrast in no way constrains the value
of the second, because the first mean has been
dropped. - The first two contrasts do not affect the third,
because the first two means have been dropped. - This is a set of four independent or ORTHOGONAL
contrasts.
26The orthogonal property
- The sum of the products of corresponding
coefficients in any pair of rows is zero. - This means that we have an ORTHOGONAL contrast
set.
27Size of an orthogonal set
- In our example, with five treatment means, there
are four orthogonal contrasts. - In general, for an array of k means, you can
construct a set of, at most, k-1 orthogonal
contrasts. - In the present ANOVA example, k 5, so the rule
tells us that there can be no more than 4
orthogonal contrasts in the set. - Several different orthogonal sets, however, can
often be constructed for the same set of means.
28Accounting for variability
grand mean
- The building block for any variance estimate is a
DEVIATION of some sort. - The TOTAL DEVIATION of any score from the grand
mean (GM) can be divided into 2 components 1. a
BETWEEN GROUPS component 2. a WITHIN GROUPS
component.
29Breakdown (partition) of the total sum of squares
- If you sum the squares of the deviations over all
50 scores, you obtain an expression which breaks
down the total variability in the scores into
BETWEEN GROUPS and WITHIN GROUPS components.
30Contrast sums of squares
- We have seen that in the one-way ANOVA, the value
of SSbetween reflects the sizes of the
differences among the treatment means. - In the same way, it is possible to measure the
importance of a contrast by calculating a sum of
squares which reflects the variation attributable
to that contrast alone - We can use an F statistic to test each contrast
for significance.
31Formula for a contrast sum of squares
32(No Transcript)
33Here, once again, is our set of Helmert
contrasts, to which I have added the values of
the five treatment means
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40Testing a contrast sum of squares for
significance
41Two approaches
- A contrast is a comparison between two means.
- You can therefore run a one-way, 2-group ANOVA.
- Or you can use a t-test.
- The tests are equivalent.
42Degrees of freedom of a contrast sum of squares
- A contrast sum of squares compares two means.
- A contrast sum of squares, therefore, has ONE
degree of freedom, because the two deviations
from the grand mean sum to zero.
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48Contrasts with SPSS
- Two approaches
- The simpler is through the One-Way option in the
Compare Means menu. - The General Linear Model, however, provides many
more useful statistics. - I suggest you begin by exploring contrasts with
the One-Way procedure first, then move on to the
General Linear Model menu.
49(No Transcript)
50(No Transcript)
51Contrasts with SPSS
The coefficients must be integers
52(No Transcript)
53Our Helmert contrasts
- Each ringed item is a MEAN.
- In the top row, the Placebo mean is compared with
the mean of the drug means. - In the third row, the mean for Drug B is compared
with the mean of the means for Drug C and Drug D.
54Summary
- A contrast is a comparison between two means.
- The contrasts can therefore be tested with either
F or t. (F t2.) - The contrast sums of squares sum to the value of
SSbetween.
55(No Transcript)
56Heterogeneity of variance
- The lower part of the table shows the results of
tests of the same contrasts when homogeneity of
variance is not assumed. - Notice that the degrees of freedom have lower
values.
57Non-orthogonal contrasts
- Contrasts dont have to be independent.
- For example, you might wish to compare each of
the four drug groups with the Placebo group. - What you want are SIMPLE CONTRASTS.
58Simple contrasts
- These are linear contrasts each row sums to
zero. - But they are not orthogonal with some pairings,
the sum of products of corresponding coefficients
is not zero.
59Simple contrasts with SPSS
- Here are the entries for the first contrast,
which is between the Placebo and Drug A groups. - Below that are the entries for the final contrast
between the Placebo and Drug D groups.
60The results
- In the column headed Value of Contrast, are the
differences between pairs of treatment means. - For example, Drug A mean minus Placebo mean
7.90 - 8.00 -.10. Drug D Placebo 13.00
8.00 5.00.
61Trend analysis
- Sometimes the factor (independent variable) may
be quantitative and continuous. - The theory of contrasts can be extended to study
trends in the relationship between the factor and
the dependent variable. - The following slides outline the procedure.
62Polynomials
- A POLYNOMIAL is a sum of terms, each of which is
a product of a constant and a power of the same
variable. - The highest power n is the DEGREE of the
polynomial.
63Graphs of some polynomials
QUADRATIC
LINEAR
QUARTIC
CUBIC
64Fitting points with polynomials
- A first-order polynomial (line) does not change
direction at all. But you can adjust the
constants to fit any TWO points. - A second-order polynomial (parabola) changes
direction ONCE and can be fitted to any THREE
points. - A third-order polynomial changes direction TWICE
and can be fitted to any FOUR points.
65Fitting points with polynomials
- In general, any k points can be fitted perfectly
by a polynomial of order k 1.
66QUADRATIC
LINEAR
CUBIC
67Another drug experiment
- In the drug experiment, the independent variable
(or factor) comprised a set of five qualitatively
different conditions. - There was no intrinsic ordering of the
categories. The order in which the variables
appeared in Data View was entirely arbitrary. - Now suppose that the five groups vary in the
extent to which the same drug was present. - The Placebo, A, B, C and D groups have dosages of
0, 10, 20, 30 and 40 units of the drug,
respectively. - The five groups are now ordered with respect to a
CONTINUOUS INDEPENDENT VARIABLE.
68A linear trend
- There is evidence of a linear TREND in these
data. - The pattern, however, is imperfect other trends
(e.g. quadratic) may be present as well. On the
other hand, the irregularity may reflect random
error.
69Capturing the linear trend
- Consider the linear contrast
- -2 -1 0 1 2
- If we plot these values against X (the
concentration of the drug), we shall have the
graph of a straight line.
LINEAR
70Polynomial coefficients
- The coefficients in this contrast are actually
values of the polynomial - y x 3
- The sum of squares of this contrast captures or
reflects the linear trend in the data.
71Orthogonal polynomial contrasts
- Here is a set of orthogonal contrasts.
- The values in each row are values of one
polynomial for various values of X, the
continuous independent variable. - The top row is a first degree (linear)
polynomial, the next row is a second degree
(quadratic) polynomial and so on.
72Trend analysis
- Although the entries in a row are values of the
same polynomial (whether linear or not), they are
still the coefficients of a linear contrast
they sum to zero moreover, the products of the
corresponding coefficients also sum to zero. We
have an ORTHOGONAL SET of contrasts. - Associated with each contrast is a sum of squares
which captures that particular trend in the data.
- The contrasts are tested in the usual way.
73Ordering a linear polynomial contrast
Specify a linear (1st degree) polynomial
- You must check the Polynomial box and specify the
order of the polynomial. - Orthogonal polynomial sets are obtainable from
tables in statistics books, such as Howell
(2007), which provide orthogonal sets for sets of
means of various sizes.
You must check the Polynomial box
74Ordering a quadratic polynomial contrast
Specify a 2nd degree (quadratic) polynomial
- You must now specify a Quadratic (2nd degree)
polynomial. - The coefficients are entered in the usual way.
75A trend analysis
- The relevant results are ringed.
- You can see that only the linear trend is
significant. - This formal analysis confirms the appearance of
the profile plot.
76Partition of the between groups sum of squares
- Since we have an orthogonal set of contrasts,
their sums of squares sum to the ANOVA between
groups sum of squares.
77Deviations in the ANOVA table
- The DEVIATION sum of squares is what remains of
SSbetween when the last contrast sum of squares
has been subtracted. - Each deviation has one degree of freedom fewer
than the previous deviation (if there is one).
78The deviations
The first deviation SS (with df 3) is obtained
by subtracting the linear SS from SSbetween
The second deviation has df 2. Both the linear
and the quadratic trends have now been removed.
79The deviation terms
80The t tests
- The t tests produce exactly the same p-values as
the F tests. - As usual, F t2
81Equivalence of F and t
82Alternative analyses
- As usual, t-tests are also made without assuming
homogeneity of variance (lower half). - The values of df are markedly lower, suggesting
that we should go by the tests in the lower part
of the table.
83A useful question
- Are you making comparisons or measuring
association? - If youre making comparisons, you may need
statistics such as the t-test and ANOVA - If youre investigating associations, you will
need techniques such as correlation and
regression.
84Purpose of this section
- Today I intend to build some bridges between the
statistics of comparison and association. - I hope to show that in some circumstances, the
making of a comparison and the investigation of
an association are equivalent.
85Some regression fundamentals
86A scatterplot
87A strong linear association
-
- A narrowly elliptical scatterplot like this
indicates a strong positive linear association
between the two variables.
88The Pearson correlation
89(No Transcript)
90(No Transcript)
91Warning!
- This significance test presupposes that the
distribution is BIVARIATE NORMAL, which implies
that the scatterplot is elliptical (or circular)
in shape. - ALWAYS CHECK THIS OUT BY INSPECTING THE
SCATTERPLOT.
92Independence
- Select a large sample at random from a population
and array the values in a column. - Select another sample from the same population at
random and array those values alongside the
values of the first sample. - The two samples are independent, because the data
are not paired in any meaningful sense. - The correlation between the two columns of values
should be approximately zero.
93Scatterplot indicating no association
94Regression
- Regression is a set of techniques for exploiting
the presence of statistical association among
variables to make predictions of values of one
variable (the DV or CRITERION) from knowledge of
the values of other variables (the IVs or
REGRESSORS).
95Simple and multiple regression
- In the simplest case, there is just one IV. This
is known as SIMPLE regression. - In MULTIPLE regression, there are two or more IVs.
96The regression line of actual violence upon film
preference
97The regression line of Violence upon Preference
- The REGRESSION LINE is the line that fits the
points best from the point of view of predicting
Actual Violence from Preference. - (A different line would be drawn were we to try
to predict Preference from Actual Violence.)
98(No Transcript)
99Here is the equation of the regression line
100(No Transcript)
101(No Transcript)
102Residual scores
- Suppose we use the regression line of Y upon X to
predict the value of a persons score Y from a
particular value of X. - A RESIDUAL (e) is the difference between a
persons true score on Y and the point on the
regression line.
103(No Transcript)
104The residuals are shown in the next
picture
105(No Transcript)
106Summary
- B1 is the slope and B0 is the intercept.
- Y/ is the Y-coordinate of the point on the line
above the value X. - An increase of one unit on variable X will result
in an estimated increase of (B1) units on
variable Y. - A NEGATIVE value of B1 means that an increase of
one unit on variable X will result in an
estimated REDUCTION of B1 units on Y.
regression coefficient (slope)
regression constant (intercept)
107The least-squares criterion
- The regression line is the best-fitting line
in the sense that it minimises the sum of the
squares of the residuals.
108(No Transcript)
109Breakdown of the total sum of squares
110Coefficient of determination
111Explanation
112The coefficient of determination (r2)
- The COEFFICIENT OF DETERMINATION (r2) is the
proportion of the variance of the predicted
variable accounted for by regression. - The coefficient of determination can take values
within the range from 0 to 1, inclusive.
113Range of r
114(No Transcript)
115Positive bias
- The coefficient of determination is positively
biased as an estimator. - The statistic known as adjusted R2 attempts to
correct this bias.
116(No Transcript)
117(No Transcript)
118Using more than one regressor
- By analogous methods, we could try to predict a
persons actual violence from exposure to screen
violence and number of years of education. - This is a problem in MULTIPLE REGRESSION.
119Multiple regression
120Geometrical interpretation
- This is the equation of a plane (or hyperplane)
with slopes B1, B2, ,Bp with respect to axes X1,
X2, , Xp and intercept B0. - The slopes are the PARTIAL REGRESSION
COEFFICIENTS and the intercept is the CONSTANT.
121Regression coefficients
- In simple regression the REGRESSION COEFFICIENT
(B1 ) is the estimated change in units of the DV
that would result from an increase of one unit in
the IV. - In multiple regression, a PARTIAL REGRESSION
COEFFICIENT such as B1 is the estimated change in
the DV resulting from an increase of one unit in
the IV X1 with ALL OTHER IVs HELD CONSTANT.
122The multiple correlation coefficient R
- The MULTIPLE CORRELATION COEFFICIENT is the
correlation between the estimates Y/ and the
actual values of the DV (Y). - The COEFFICIENT OF DETERMINATION (R2) is the
proportion of the variance of Y that is accounted
for by regression.
123Range of R
- The multiple correlation coefficient R can only
have non-negative values - 0 R 1
- This is because the regression line (or plane)
cannot have a slope of opposite sign to that of
the elliptical (or hyperelliptical) scatterplot.
124Attribution of variance to regressors
- If the IVs are uncorrelated, it is easy to
attribute variance in Y to each of the
independent variables X.
125(No Transcript)
126Correlated IVs
- When the IVs are measured, they always correlate
to at least some extent. - It is then impossible to attribute variance
unequivocally to any particular IV.
127(No Transcript)
128(No Transcript)
129Dummy variables
- Information about group membership is carried by
a grouping variable. - A DUMMY VARIABLE has only two values 0 and 1,
where 0 usually denotes the control or comparison
condition in this case the Placebo.
130Point-biserial correlation
- If we correlate the scores in the Group column
with the dummy variable in the Score column, we
obtain what is known as a POINT-BISERIAL
CORRELATION. - The meaning of point-biserial is lost in the
mists of antiquity. - The point is that we are correlating a measured
variable with code numbers for category
membership.
131(No Transcript)
132A link
- The point biserial correlation is of limited
value as a descriptive statistic. - However, it forms a useful conceptual bridge
between the statistics of comparison (t-test) and
association (correlation).
133Regression upon dummy variables
- We shall now regress the scores that people
achieved in the Caffeine experiment against those
of the dummy variable carrying group membership.
134The regression line will pass through the group
means
0
1
X
135Why?
- OLS regression minimises the sums of the squares
of the residuals. - In either group of scores, the sum of the squared
deviations about the mean is a minimum.
136The sum of squares of deviations about the mean
is a minimum
137The regression statistics
- When we regress the Score variable against the
dummy variable, the intercept of the regression
line is the mean score of the Placebo group. - The slope of the regression line is the
difference between the means of the Caffeine and
Placebo groups.
138The regression statistics
139(No Transcript)
140(No Transcript)
141Significance tests
- The intercept (Constant) is 9.25, the value of
the Placebo mean. - The slope is 2.65, which is 11.90 9.25, the
difference between the Caffeine and Placebo
means. - t(38) 2.604 p .013. This is exactly the
result we obtained with the independent samples t
test.
142Equivalence of ANOVA and regression
- When we test the slope of the regression line for
significance, we are also testing the difference
between the Caffeine and Placebo means for
significance. - Since (in the 2-group case) the F and t tests are
equivalent, the regression ANOVA table is
identical with the one-way ANOVA table we
obtained before.
143Dummy coding for the k-group case
- Since MSbetween has only four degrees of freedom,
regression will predict the treatment means
perfectly if the Score variable is regressed upon
four dummy variables X1, X2, X3 and X4. - As with the two-group example, an interesting
equivalence emerges.
144Dummy coding for the k-group case
145The one-way ANOVA statistics
146The regression statistics
Same as the ANOVA value of F.
We see that B0 is the Placebo mean and B1, B2, B3
and B4 are the differences between the means for
the 4 drug conditions and the Placebo mean.
147In summary
- When the scores in the five-group drug experiment
are regressed upon 4 dummy variables, - The regression constant or intercept B0 is the
Placebo mean. - The partial regression coefficients are the
differences between the drug conditions and the
Placebo mean. - The regression sum of squares is equal to the
ANOVA between groups sum of squares.
148In summary
- The t - tests of the regression coefficients are
equivalent to the t-tests of the sums of squares
associated with the four contrasts.
149Eta squared
- Returning to the one-way ANOVA, recall that eta
squared (also known as the CORRELATION RATIO) is
defined as the ratio of the between groups and
within groups mean squares. - Its theoretical range of variation is from zero
(no differences among the means) to unity (no
variance in the scores of any group, but
different values in different groups). - In our example, ?2 .447
150Eta squared revisited
- If the scores from a k group experiment are
regressed upon k 1 dummy variables, the
square of the multiple correlation coefficient R
is the proportion of variance of the scores
accounted for by differences among the treatment
means. - Eta squared is R2, which I think is why it is
also termed the correlation ratio.
151Formula for SS?
- We can think of a contrast sum of squares as the
between treatments variability that is accounted
for by a particular contrast. - The sums of squares for orthogonal contrasts add
up to the ANOVA between groups sum of squares.
152The contrast sum of squares revisited
153Building bridges
- In these two sessions, in addition to revising
(and adding to) some material with which you are
already familiar, I have tried to demonstrate
some striking equivalences between techniques
which many think of as having quite different
contexts and purposes.
154Assignment
- Please complete the project and hand it in to
Anne - before noon on Wednesday 31st October.
- I shall return your answers (with comments) by
- Wednesday 7th November.