Title: QUANTITATIVE AND QUALITATIVE METHODS
1QUANTITATIVE AND QUALITATIVE METHODS
- ANOVA and Subsidiary Analyses
2SCIENTIFIC METHOD
- Scientific Theory -gt Predictions -gt Experiment
- Reminder There is a crucial distinction between
SCIENTIFIC hypothesis (e.g. about mental
processes) with the extremely restricted null
and alternative hypotheses of the Neyman-Pearson
approach to statistical inference
3SCIENTIFIC METHOD
- Although you should design experiments to answer
the scientific questions you want to answer, it
is useful to bear in mind the analytic tools
(statistical tests) that will be available to
process your results - Why statistical tests? - because of noise in most
psychological data
4MEASUREMENT (Recap)
- In any experiment, the experimenter will be
"measuring" something or other - usually several
things. - Need to consider (for each thing "measured") the
type of scale of measurement - nominal
- ordinal
- interval
- ratio
5EXPERIMENTAL DESIGN
- In Experimental Psychology (though not
necessarily other branches of psychology) the
most common design in one that can be analysed
with ANOVA - Independent
- (experimenter controlled/selected) vs
- Dependent variable(s)
- Vary as a result of the experimenters
manipulations - Traditionally more than one dependent variable -gt
more than one ANOVA - (though possibility of MANOVA - see later in
course)
6EXPERIMENTAL DESIGN
- independent variables
- treatment
- classification
- usually considered as nominal, though with some
independent variables (e.g. age, dose) it is
possible to do post hoc trend tests that treat
levels of independent variables as
ordinal/interval
7FACTORS AND COUNTERBALANCING
- independent variables regarded as factors with
levels - what counts as a factor - to some extent depends
on choice of experimenter - Experimental factor vs counterbalancing
8OTHER ISSUES
- fixed vs random (independent) variables
- crossed and nested designs
- factorial designs (all fixed factors crossed)
- systematic subset designs
- only Latin square common, where only some orders,
say, are chosen - Latin squares are often used for counterbalancing
9WITHIN- and BETWEEN FACTORS
- Terminology
- between within
- unrelated related
- matched samples
- correlated
- independent groups repeated measures
- matched samples (e.g. when the experimental group
is hard to construct, and differs from general
population on possibly relevant measures e.g.
age, IQ)
10ADVANTAGES AND DISADVANTAGES OF BETWEEN-SUBJECT
DESIGNS
- necessary if subjects can't do one condition
after other (e.g. unexpected memory test) - necessary if one S can't be in both conditions
(e.g. male/female good reader/bad reader) - more subjects needed (to get same number of
observations in all conditions, and because
between s variance can't be partitioned out) - must allocate Ss randomly to conditions (problem
of Ss who are rejected on basis of experimental
performance - may not be randomly distributed
between conditions)
11ADVANTAGES AND DISADVANTAGES OF WITHIN-SUBJECT
DESIGNS
- "each subject acts as own control"
- computationally equivalent to taking out
differences between subject means (cf. related
groups t-test - is the difference different from
0) - in some types of experiment a between-subjects
design can make test conditions too homogeneous
(can get around with filler items) - additional assumptions in within-subjects ANOVA
(homongeneous covariance for each subject) - order (sequence), practice, fatigue, boredom
effects - carry over effects
- need to make a decision about blocked vs mixed
designs
12ANALYSIS OF VARIANCE BEFORE THE ANOVA
- Before carrying out statistical tests
- Think about results in simple numerical terms
- Produce simple plots
- (histograms, line graphs, whatever is
appropriate) - with some indication of range/variability (e.g.
standard error) - THEN do the stats
13ANOVA
- USE Regimented Experimental Designs
- What it tells you
- are there differences between MEAN values for
conditions in your experiment. - So WHY is it called ANOVA?
- BECAUSE it partitions (analyses) overall variance
into various components, which can then be used
to construct tests (F-tests) of the (statistical)
(null) hypotheses that some set of means are
really all the same. - Details of how later.
- First, a more intuitive approach.
14INDEPENDENT TESTS
- Given a completely factorial design (all fixed
factors crossed) it is possible to construct
INDEPENDENT tests for all main effects and
interactions - Statisticians can provide a mathematical proof
that the tests are independent - However, some grasp of why they are independent
can be gain by simpler means (tabular,
graphical).
15WHY THE TESTS ARE INDEPENDENT
- THINKING ABOUT NUMBERS IN TABLES
- Experiment
- Good readers and poor readers answering questions
about a text - Two types of questions - those about material
explicit in the text and those based on inferences
16MAIN EFFECT - GOOD vs POOR READERS
- good poor
- literal 60 30 45
- inference 60 30 45
- 60 30
17MAIN EFFECT - LITERAL vs INFERENCE QUESTIONS
- good poor
- literal 60 60 60
- inference 30 30 30
- 45 45
18INTERACTION BUT NO MAIN EFFECTS
- good poor
- literal 30 60 45
- inference 60 30 45
- 45 45
19TWO MAIN EFFECTS BUT NO INTERACTION
- good poor
- literal 60 30 45
- inference 50 20 35
- 55 25
20WHY THE TESTS ARE INDEPENDENT
- GRAPHICAL INTERPRETATION (FOR A SIMPLE TWO x TWO
INTERACTION) - one main effect is (average) slope of lines
- other main effect is separation of lines
- interaction is angle between lines
- which you can see, in principle, vary
independently of one another.
21GRAPH TWO MAIN EFFECTS AND AN INTERACTION
22HOW ANOVA WORKS
- Consider one of the main effects or interactions
in your design, say the main effect of good vs
poor readers. - Some of the variability (i.e. numerical
differences) in the data you collect will arise
because you are collecting data at different
levels at of this variable (good or poor). - In general, differences between data collected at
different levels of such a factor arise from two
sources - random noise in the data
- any effect of the variable itself
23HOW ANOVA WORKS
- With a within-subjects factor (which literal vs
inference MIGHT be in the above design), there is
also the possibility that individual subjects are
affected differently by the manipulation being
made (e.g. some subjects may show a large effect
and others a small effect). - To put this another way there may be an
interaction between subjects and conditions. - In a between subjects design this issue does not
arise. - Subjects are assumed to be assigned at random to
conditions, and any difference in performance
between conditions can only be attributed to a
condition effect.
24HOW ANOVA WORKS
- So, to test whether the manipulation in question
has any effect, we need to calculate, from the
data, an estimate of the variability in the data
from level to level of a factor and compare that
with an estimate of the general noisiness of the
data. Then we can write - F estimate which includes noise effect of
factor - ------------------------------------------------
------------ - estimate which includes noise alone
- This ratio will be bigger than one, only if the
effect of the factor itself is significant.
25HOW ANOVA WORKS
- For a completely between-subjects design the
noise is estimated from by pooling estimates from
within each cell, and every main effect or
interaction is tested again that so-called error
term. - For a within-subjects factor we have the
complication that any estimate from the data of
the variability from level to level of a factor
includes a component attributable to the
different performance of the subjects in the
different conditions. We, therefore, need - F estimate which includes noise effect of
factor - interaction of factor with subjects
- ------------------------------------------------
------------ - estimate which includes noise alone
- interaction of factor with subjects
26HOW ANOVA WORKS
- Fortunately we can get the bottom estimate from
the factor X subject cells, but we will have
different "error" terms for different main
effects and interactions. - In a completely within-subjects design there is a
different error term for each effect and
interaction tested.
27ASSUMPTIONS OF ANOVA
- Population of scores in any group is normally
distributed. - Variance is the same for each group
(homoscedasticity). - Each observation sampled randomly and
independently from a normal distribution. - (For within-subjects designs) - some assumption
about the covariance between measures at
different levels of the same factor.
28ASSUMPTIONS OF ANOVA
- "Compound symmetry" (i.e. all covariances equal)
is a sufficient, but not necessary, assumption.
Necessary and sufficient condition is equality of
variances of differences between all treatment
pairs (Huynh and Feldt, 1970). For between
designs these covariances are assumed to be zero
(follows from homegeneity of variance
assumption), but they cannot be for within - e.g. a fast subject in an RT experiment is
likely to be fast in all conditions.
29ASSUMPTIONS OF ANOVA
- If these assumptions are met, then the ratios
described above will have the distribution known
as F, and we can calculate absolute probabilities
(or use tables). - ANOVA is "robust" with respect to first two
assumptions (but not third independence). - E.g. ratios of 4 or 5 between variances are not
usually a problem. - skewed distributions are usually ok, particularly
if all groups show similar skew (e.g. RT data).
30ASSUMPTIONS OF ANOVA
- As a rule of thumb, the simpler the design the
more robust it is with respect to violations of
the assumptions - (e.g. robustness with respect to 2 is weaker for
unequal N designs). - Ways of testing robustness
- analytic (not usually straightforward)
- randomisation/Monte Carlo
31ANOVA - SUBSIDIARY ANALYSES
- Statistical significance vs importance (extra
statistical) of differences. - Which of the differences between means are
significant? - In addition you need to think not only about
errors on a single test. - You may be making several comparisons
- Reminder
- Type I - reject a true null hypothesis
- Type II - accept a false null hypothesis (fail to
reject)
32ANOVA - SUBSIDIARY ANALYSES
- In an ANOVA with lots of different Fs you might
need to think about - probability of error on each test
- number of errors per experiment
- e.g. 1 Type I error per 20 comparisons
- probability of at least one error.
33A PRIORI vs POST HOC (A POSTERIORI) COMPARISONS
- Not really before or after data are collected,
but depends on whether the differences can be
predicted from the (scientific) hypothesis under
investigation. - A priori comparisons are more powerful and are
preferred for this reason (and others). - But a posteriori tests are necessary when the
results of a study are clearly interesting, but
different from those predicted by any theory that
has been considered. - An a priori test can be significant even if the
overall F of which it is part is not.
34A PRIORI TESTS AND OVERALL F-RATIOS
- There has been some disagreement in the
literature about whether a priori comparisons
should be made when the overall F is not
significant. - A sensible view is that if the (scientific)
hypothesis makes a specific prediction, then the
comparison is legitimate.
35A PRIORI TESTS
- Comparisons between pairs of groups are sometimes
made using t-tests. - For a two group ANOVA t-squared F
- With three or more groups, a different method is
usually preferred in which the same (overall)
estimate of the background noise in the data
(error term) is used. - With a t-test, you would make this estimate from
just the data in the two groups being compared
(and reduced denominator df). - Numerically, the procedure is very simple and
produces a new F for each comparison.
36A PRIORI TESTS - Cont.
- Independence of contrasts
- remember that in the overall ANOVA all main
effects and interactions are independent - if the
design is completely factorial. - Independent tests are useful because they are
straightforward to interpret, but each main
effect or interaction can only be broken down
into (numerator) df independent contrasts. - e.g.
- three level main effect -gt two tests
- 2x2 - 1 test i.e. the F itself
- 3x2 -gt 2 etc
- cf. the number of possible pairwise
comparisons.
37A PRIORI TESTS - Cont.
- Criterion - assign coefficients that
- sum to zero for any one comparison
- have products that sum to zero for any pair of
comparisons. - Tables or see a text such as Howell.
- Linear (or other) trends can be investigated
using planned comparison technique - E.g with 2df can make orthogonal tests for linear
and quadratic trends - more for more df - Note trends require treating independent
variable as being measured on interval/ratio
scale.
38A PRIORI TESTS - Cont.
- May want to make alpha smaller, e.g.
- alpha for the effect/interaction
- -----------------------------------
- number of comparisons
- Nonorthogonal comparisons - may be suggested by
(scientific) hypothesis for a priori comparisons.
- Generally acceptable, if the number of such
comparisons is small - best if less than or equal to total number of
orthogonal comparisons for the effect/interaction - beware of dangers of overinterpretation!
39A PRIORI TESTS - Cont.
- Can modify alpha via the Dunn/Bonferroni
procedure - setting an error rate per experiment. - A very common practice, which leads to
nonorthogonal comparisons, is testing for simple
main effects ("carrying out subsidiary
analyses").
40A POSTERIORI TESTS
- Used when data do not fit any (scientific)
hypothesis or only a hypothesis suggested by the
data themselves (hence post hoc) - Fisher's Least Significant Difference - requires
a significant overall F before multiple t tests
are carried out. This restricts the familywise
error rate (probability of at least one type I
error) to ?, but only if the complete null
hypothesis is true (!! i.e. no differences
between any means !!)
41A POSTERIORI TESTS - Cont.
- Newman-Keuls - arrange levels in order of value
of dependent variable and calculate different
critical values for different step lengths along
the series (by definition, adjacent means have
step 2). - Again, holds experimentwise error rate to alpha
(but only if complete null hypothesis is true) - Duncan - similar to Newman-Keuls, but sets error
rate per degree of freedom to ? (too liberal). - Tukey HSD (Honestly Significant Difference -
Tukeys Test) - as Newman-Keuls, but uses
biggest critical value. Holds experimentwise
error rate to alpha (for all possible null
hypotheses, all means equal, some equal) - Ryan REGWQ - holds experimentwise error rate to
alpha for all null hypotheses (as Tukey), but
varies the critical value with the number of
means in the set.
42A POSTERIORI TESTS - Cont.
- Last 4 tests are variants, and are based on a
statistic called the Studentised Range - Largest treatment total - Smallest
treatment total - q --------------------------------------------
---------- - sqrt(mean square error/n per group)
- t sqrt(2)
- Scheffé - Holds experimentwise error rate to
alpha for all linear contrasts (i.e. all
contrasts for which the sum of the coefficients
0), not just pairwise. - Very conservative, Scheffé himself suggests
setting ?0.1. - Not recommended for pairwise comparisons only.
- Dunnett - for comparing all treatments against a
control (more powerful than the Dunn/Bonferroni
procedure for this case)
43MULTIPLE COMPARISONS WITH REPEATED MEASURES
- If you look at introductory texts you will notice
that the procedures described are usually for
between-subject comparsions. - The issue of multiple comparisons with repeated
measures is a complicated one and I can do no
better than refer you to Howells discussion at - http//www.uvm.edu/dhowell/StatPages/More_Stuff/R
epMeasMultComp/RepMeasMultComp.html
44TESTS OF CHOICE
- In much of the cognitive psychology literature
(at least) are - Dunn/Bonferroni - often called Bonferroni t - for
a priori - Newman-Keuls used to be recommended for post hoc
comparisons. Howell now recommends Ryan, which
can be readily calculated in SAS using the REGWQ
procedure (not so easy with SPSS).