Title: Statistical Inference: TwoSample Case
1Statistical Inference Two-Sample Case
2Statistical Comparison of Two Independent Samples
- Often, in psychology, researchers wish to compare
two samples to determine if they have been drawn
from different populations (i.e. to see if the
two samples are significantly different). - E.g., Doctors wanted to see which of two drugs
should be given to patients after an organ
transplant in order to prevent transplant
rejection. One sample was given was given
cyclosporine whereas the other was given FK506.
The samples were then compared.
3- E.g., Lets imagine we are investigating drinking
habits of university students in North America.
Specifically, we want to compare students who
live in dorms to those who live off-campus. - We will draw every possible pair of samples (N
10) from the population. One sample will be 10
students that live in a dorm, whereas the other
will be 10 students who live off-campus. We will
calculate the mean number of alcoholic drinks
consumed per week by each sample and compare
them.
4First Pair Dorm Off-campus
Difference ? Mean 10
6 4
Second Pair Dorm Off-campus
Difference ? Mean 8
5 3
Third Pair Dorm Off-campus
Difference ? Mean 5
8 -3
We continue this process until every possible
pair of samples has been drawn from the
population. We obtain a difference score (?) and
plot these scores on a distribution called the
sampling distribution of the difference between
means.
5Sampling Distribution of the Difference Between
Means
Population in which ?D ?OC
Positive Numbers XD gt XOC
Negative Numbers XD lt XOC
0
We end up with a normal distribution with a mean
of 0.
6Going a Step Further
We have a normal distribution with a mean
difference between two populations and a standard
deviation. What statistic can we use to carry
out two-sample statistics?
Z-Scores
7Note, this formula assumes that we know the
population means and population standard
deviation. However, this is very rarely the
case. Therefore, this formula is hardly
ever used.
8The Independent t-test
- Thus, if we do not know the population means or
standard deviations, we test hypotheses with the
t formula.
S X1 - X2 is the standard error of the
difference between the means.
Usually, according to our H0 we assume that the
samples are drawn from the same population of
means (i.e., ? 1 ? 2). So we state that ? 1 -
? 2 0.
9Independent t-test (Equal N)
- E.g., A psychologist believes short-term memory
capacity is reduced by sleep loss. Twelve
subjects are randomly assigned to a group that
receives a normal amount of sleep or to a group
that is sleep deprived for 24 hours. The
subjects are then presented with a list of nine
numbers to be remembered over a short period.
The percentage of numbers correctly recalled by
each subject are provided below. Is the
researcher correct.? Use ? 0.01. - Note, we will use the standard error formula for
equal N.
10(No Transcript)
11- H0 ?1 ?2
- H1 ?1 gt ?2 (one-tailed, directional
hypothesis)
This is the formula for standard error when we
have equal N
25 698 - (392)2 6
29 591 - (421)2 6
50.8333
87.333
12 2.146
13Remember, according to H0 ?1 - ?2 0?
t (70.167 - 65.333) - 0 2.25
2.146
df N1 N2 -2 6 6 - 2 10
t crit(10) 2.764
Since tobs 2.25 lt tcrit 0.01 2.764, we do
not reject the H0. We can not conclude that
sleep deprivation has an effect on short-term
memory (t10 2.25, p gt 0.01).
14Another Example
- A clinician believes that depression may affect
sleep. She decides to test the idea. The sleep
of nine depressed patients and eight nondepressed
subjects is monitored for three nights. The
average number of hours slept by each subject is
provided below. Is the clinician correct? Set ?
at 0.05. - We will use the standard error formula for
unequal N. - H0 ?D ?N
- H1 ?D ?N
15(No Transcript)
16This is the formula for standard error when we
have unequal N.
SS2 457.92- (60.4)2 8
SS1 431.22 - (62.2)2
9
1.349
1.9
17Independent t-test (Unequal N)
1.349 1.9 1 1
9 8 - 2
9 8
0.226
18t (6.91 - 7.55) -2.83 0.226
df N1 N2 - 2 9 8 - 2 15
tcrit0.05 2.131
Since tobs15 -2.83 gt -2.131, we reject the
H0. Depression does have an effect on
sleep. (t15 -2.83, p lt 0.05)
19Assumptions of the t-test
- The t-distribution is used in two sample
statistics following three assumptions. - 1) The sampling distribution of the difference
between the means is normally distributed. - 2) We can get an unbiased estimate of variance
using the sample (i.e., S X1 - X2). - 3) Both samples are drawn from populations with
equal variances. This is referred to as
homogeneity of variance.
20The F-distribution
- Sometimes the variances of the populations are
not equal. We can test for this using the F
distribution in which we calculate an F ratio.
Where S 2 is the estimate variance of the
population based on the sample and is equal to...
The F-ratio is also useful in studies where we
expect dual effects. That is, the independent
variable causes some subjects scores to
increase while it cause others to decrease.
21- E.g., A researcher believes that drug X will
increase heart rate in some patients while
decreasing it in other patients (Hint dual
effects). He compares the heart rate of 5
patients given drug X to those who receive no
drug. The data are provided below. Is he
correct? Use normal decision rules.
?X1 385 ?X2 366 ?X12 29 829
?X22 26 884
22 29 829 - (385)2 5
26 884 - (366)2 5
184
92.8
Now calculate estimated variance.
23Because group 1 has the larger estimate of
population variance, it will be in the numerator
of the F-ratio formula, whereas the estimate of
variance for group 2 will be in the denominator.
F 46/ 23.2 1.98
This F ratio must now be compared to a critical
F-ratio. Thus, we need to calculate degrees of
freedom.
There are two separate degrees of freedom for an
F-ratio one for the numerator and one for the
denominator.
24- Use Table D on page 330 to determine Fcrit for df
4,4 (Note, use Table D1 if performing a two
tailed test). - The numbers in the top row of the table represent
the df of the numerator. The numbers in the
column on the left represent the df of the
denominator. - Note, the numbers in bold (bottom) represent
Fcrit where ? 0.01, whereas number in normal
font (top) represent Fcrit where ? 0.05.
25- Go to the column for df 4 in the numerator and
go down until you get to the fourth row where df
4 in the denominator. - Our Fcrit 6.39
- Our calculated F ratio must be greater than Fcrit
in order to reject the H0 and assume that the
variances are not homogeneous. - Since F4,4 1.98 lt Fcrit 6.39, we do not
reject H0. The variances are homogeneous (F4,4
1.98, p gt 0.05).
26Control and Experimental Groups
- In some cases in psychology, our independent
variable is a type of treatment (receiving a
drug, some type of training or instruction,
etc.). We then compare a group receiving a
treatment to a group that receives no treatment. - E.g., In our previous example, one group received
drug X, the other received no drug. - Control Group a sample that receives no
treatment. - Experimental Group a sample that receives
treatment.
27t-test for Correlated Samples
- Whenever we carry out a two sample study in
psychology, there is a great deal of variability.
Because of this variability, the scores in the
two samples overlap.
28- This variability (overlap) may be due to one of
three factors. - 1) The independent variable.
- In this case, depressed or nondepressed.
- 2) Random error.
- E.g., variation in procedure from subject to
subject, external distractions etc. - 3) Subject differences.
- E.g., some subjects sleep more or less than other
subjects, and this has nothing to do with
depression. - We can minimize variability caused by subject
differences by using correlated samples.
29t-test for Correlated Samples
- There are two types of correlated sample studies.
- 1) The Before and After Design A single sample
is measured before and after the introduction of
some independent variable. The same subjects are
in both samples. - 2) The Matched Group Design Each individual in
one sample is matched with a subject in the other
sample group. The matching is done so that the
two individuals are equivalent (or nearly
equivalent) with respect to a specific variable
the researcher would like to control.
30- In these cases, we must calculate the difference
between the scores of the two samples. We use
these differences in our calculation of t. - Note, we will also use a slightly different
estimate of the standard error of the difference
between the means, and a slightly different
t-formula.