Title: Paired Difference Experiments
1Paired Difference Experiments
- Rationale for using a paired groups design
- The paired groups design
- A problem
- Two distinct ways to estimate µ1 µ2
- Formal statement large sample test
- Formal statement small sample test
- Examples
2Rationale for using a paired groups design
- The basic problem
- When you measure two samples of cases that have
been treated differently, the differences between
the two resulting sets of scores will be produced
by either or both of two types of effect - the treatment
- everything else that matters
3Rationale for using a paired groups design
- Were not interested in the effect on performance
of everything else that matters. We want to
know whether the treatment effect is real. - But suppose that the variability due to
everything else that matters is much larger
than the variability due to the treatment. In
that case, we may not be able to detect the
signal (treatment effect) because of the noise
(error variability that is, variability due to
everything else that matters).
4Rationale for using a paired groups design
- We have to do something to reduce that part of
the difference between the groups that is due to
everything else that matters. - We do this by matching or testing the same people
twice. Both approaches remove the effects of
nuisance variables.
5Rationale for using a paired groups design
- If the two samples of cases are more alike in
things that matter, then the contribution of the
treatment to any difference between the means is
proportionally larger. - That is, if the contribution of the treatment
(to the difference between group means) stays the
same, but the contribution of other differences
between the groups goes down, then we have a more
sensitive test.
6Total variability in the data set
Variability due to all other causes
Variability due to the treatment effect
Here, most of the variability in the data set is
produced by things other than the treatment
effect.
7Total variability in the data set
Denominator of Z or t-test
Numerator of Z or t-test
8Total variability in the data set
Variability due to all other causes
Variability due to the treatment effect
Here, variability due to treatment effect is the
same, but variability due to other causes has
decreased.
9Total variability in the data set
Denominator of Z or t-test
Numerator of Z or t-test
10The paired groups design
- One way to reduce variability due to everything
else that matters is to use the paired groups
design. - Matched pairs select people in pairs matched on
some relevant variable (e.g., IQ), then randomly
assign one to each condition. - Repeated measures every person gets both
treatments, so acts as their own control.
11The paired groups design
- Suppose that for each person with IQ 110 in the
treatment condition we have a person with IQ
110 in the control condition. Similarly with all
other IQs represented in the treatment condition
each has a matched-IQ case in the control
condition. - Now, if we subtract score for one member of pair
from score for other member, the effect of IQ
cannot contribute to that difference.
12A problem
- When we match pairs or used repeated measures on
the same people, we violate one of the
assumptions of the independent groups tests of
difference between two population means - The statistical test is based on the assumption
that the observations in one group are
independent of the observations in the other
group.
13A problem
- Why is that assumption a problem?
- Because here, once we have selected Group 1, we
do not then independently select Group 2. - As a direct result the sample mean difference X1
X2 is not a good estimator of the population
mean difference µ1 µ2.
14Two distinct ways to estimate µ1 µ2
- 1. Choose a random sample from Population A.
Independently choose a random sample from
Population B. Compute the means for each sample
and find the difference between these means. - 2. Choose a random sample from Population A and a
matching sample from Population B. Find the
difference between each score in sample A and its
matched score in sample B. Compute the mean of
these differences.
15Two distinct ways to estimate µ1 µ2
- In the first case, we work with a sampling
distribution based on differences between
independent sample means. -
- Anything that could make one sample mean
different from the other will contribute to the
variability (sX1-X2) of that sampling
distribution. - With a more variable sampling distribution, we
need a larger sample difference to be confident
that the inferred population difference is real.
16Two distinct ways to estimate µ1 µ2
- In the second case, because of matching, many of
the (random) things that could drive sample means
apart are eliminated from the differences X1i
X2i. - As a result, the variability in the sampling
distribution (sD) has fewer sources. - So we can infer a real population difference
with a smaller sample difference (samples are
less likely to be different just by chance).
17Paired groups test large samples
- HO µD DO HO µD DO
- HA µD
- or µD DO
- (DO historical value of the difference between
the population means.) - Test statistic Z XD DO
- sD /vnD
18Paired groups test large samples
- Rejection region
- Z Za/2
- or Z Za
- Assumptions 1. Distribution of differences is
normal. 2. Difference scores are randomly
selected from the population of differences
(between matched pairs or repeated measures).
19Important note
- Z XD DO
- sD /vnD
- Notice that the numerator does not have an X1 and
an X2. It just has XD. - We begin by finding the differences between each
pair of observations. From then on, we work only
with these difference scores.
20Paired groups test, small samples
- HO µD DO HO µD DO
- HA µD
- or µD DO
- (DO historical value of the difference between
the population means.) - Test statistic t XD DO
- sD /vnD
21Paired groups test small samples
- Rejection region
- t ta/2
- or t ta
- Assumptions 1. Distribution of differences is
normal. 2. Difference scores are randomly
selected from the population of differences
(between matched pairs or repeated measures).
22Example 1
- A psychologist was studying the effectiveness of
several treatments to help people quit smoking.
In one treatment, smokers heard a lecture about
the effects of cigarette smoke on the human body,
accompanied by graphic slides of those effects.
In the other treatment, smokers had daily phone
conversations with a therapist who encouraged
them not to smoke that day. To control for
effects of age and sex, the psychologist assigned
people to the experimental groups in pairs
matched on those variables. The data, in the form
of number of hours without a cigarette appear on
the next slide
23Example 1
Notice the negative signs!!
- Pair LS DE Diff Diff2
- 1 105 122 17 289
- 2 98 86 -12 144
- 3 121 127 6 36
- 4 99 92 -7 49
- 5 65 85 20 400
- 6 130 152 22 484
- 7 108 92 -16 256
- 8 57 63 6 36
- ? 36 ? 1694
Without negative signs, S would be 106
24Example 1
- sD2 1694 (36)2 218.857
- 8
- 7
- sD v218.857 14.79
25Example 1
- HO µD DO
- HA µD ? DO
- Test statistic t XD DO
- sD /vnD
26Example 1
- Rejection region tcrit t7,.025 2.365
- tobt 4.5 0 4.5 0.861
- 14.79/v8 5.229
- Decision do not reject HO no evidence
treatment effects differ
27Example 2
- Tetris is a computer game requiring some spatial
information-processing skills and good eye-hand
coordination, either or both of which may improve
with practice. Six people who had never
previously played Tetris were tested on the game
at the beginning (Test 1) and at the end (Test 2)
of a 2-week period during which they played
Tetris for one hour each day. Their Tetris scores
on the two testing sessions appear on the next
slide.
28Example 2
- a. Did the subjects Tetris scores improve
significantly from Test 1 to Test 2 (a .05)? - Â
- b. Is the variance of the subjects Test 2 scores
significantly different from 400,000, the
variance among the population of experts at
Tetris (a .05)?
29Example 2a
- Subject Test 1 Test 2 Diff D2
- 1 3025 5642 2617 6848689
- 2 4120 5117 997 994009
- 3 2675 4333 1658 2748964
- 4 6715 6026 -689 474721
- 5 1997 5429 3432 11778624
- 6 4807 4807 0
- D ? ? 8015
- D2 ? ?22845007 SD 1558.095
30Example 2a
- Rejection region tcrit t5,.05 2.015
- tobt 1335.83 0 2.100
- 1558.095/v8
- Decision Reject HO scores improved
significantly