Title: Comparisons between two means:
1Lecture 7
- Comparisons between two means
- Research questions about two separate or
independent groups - Research questions about two dependent or
correlated groups
2Univariate vs. Multivariate
- Univariate analysis usually refers to one
predictor variable and one outcome variable - Is gender a predictor of pneumonia?
- Multivariate analysis usually refers to more than
one predictor variable or more than one outcome
variable being evaluated simultaneously. - After adjusting for age, is gender a predictor of
pneumonia?
3Difference vs. Association
- Some tests are designed to assess whether there
are statistically significant differences between
groups. - Is there a statistically significant difference
between the age of patients with and without
pneumonia? - Some tests are designed to assess whether there
are statistically significant associations
between variables. - Is the age of the patient associated with the
number of days in the hospital?
4Unmatched vs. Matched
- Some statistical tests are designed to assess
groups that are unmatched or independent. - Is the admission systolic blood pressure
different between men and women? - Some statistical tests are designed to assess
groups that are matched or data that are paired. - Is the systolic blood pressure different between
admission and discharge?
5Hypothesis testing (Review)
- In any hypothesis testing situation, we first
need to define the null hypothesis. It is under
the null hypothesis that we will figure out how
our test statistic is distributed. - Knowing how our test statistic is distributed
will allow us to use the corresponding
probability distribution - We start by assuming that null hypothesis is
true. This gives us the acceptance and rejection
regions defined at our chosen significance level
for the distribution "under the null hypothesis"
6Hypothesis testing (Review)
- Note The null hypothesis, H0 and the alternative
hypothesis Ha should be mutually exclusive and
exhaustive. - 2 types of errors.
- 1. Type I error reject a true null hypothesis.
(commonly called a) - 2. Type II error fail to reject a false null.
(commonly called ß) - Consider a jury's hypothesis
- H0 The defendant is innocent.
- Ha The defendant is guilty.
- Therefore
- Type I error would entail a false conviction of
an innocent person. - Type II error would entail letting a guilty
person go free.
7Hypothesis Testing (Review)
- Significance tests are used to accept or reject
the null hypothesis. - This is done by studying the sampling
distribution for a statistic. - If the probability of observing your result is lt
.05, reject the null - If the probability of observing your result is gt
.05, accept the null. - There are many kinds of significance tests for
different kinds of statistics. Today were going
to discuss t-tests.
82-Sample T-Tests
- Independent t-test
- Dependent t-test
- Picking the correct test
9t-test example
- We are interested in whether caffeine consumption
improves peoples happiness. - We randomly assign 25 people to drink decaf and
25 people to drink regular coffee. - Subsequently we measure how happy people are.
- Note The independent variable is categorical
(youre in one group or the other), and there are
only two groups. - The dependent variable is continuouswe measure
how happy people are on a continuous metric.
10t-test example (cont)
- Lets say we find that the control group has a
mean score of 3 (SD 1) and the experimental
group has a mean score of 3.2 (SD .9). - Thus, there is a .20 difference between the two
groups. 3.2 3.0 .2 - Two possibilities
- The .2 difference between groups is due to
sampling error, not a real effect of caffeine. In
other words, the two samples are drawn from
populations with identical means and variances. - The .2 difference between groups is due to the
effect of caffeine, not sampling error. In other
words, the two samples are drawn from populations
with different means (and maybe different
variances).
11Population for control group
Population for experimental group
These two populations have identical means and
variances
These two samples may or may not have identical
means and variances because of sampling error
hence, one sample mean might be .2 points higher
than the other
12t-test example (cont)
- We need to know how likely it is that we would
observe a difference of .20 or higher if the null
hypothesis is true. - How can we do this?
- We can construct a sampling distribution of mean
differencesassuming the null hypothesis is true. - We can use this distribution to determine how
large of mean difference we will observe on
average when the population mean difference is
zero.
13Assumptions 2-Sample T-Test
- Data in each group follow a normal distribution.
- For pooled test, the variances for each group are
equal. - The samples are independent. That is, who is in
the second sample doesnt depend on who is in the
first sample (and vice versa).
14Indep t-test formula
(For our purposes, always zero)
Actual difference observed.
- Standard Error of the Difference (between the
means) - difference expected between sample means
- how much we expect the sample means to differ
purely by chance
15Ind. t-test Example
16Hypothesis Testing Steps (Ind. t)
- 1. Comparing xbar1 and xbar2, µ and s unknown.
- 2. H0 µ1 µ2 0 HA µ1 µ2 ? 0
- a .05, df n1n22 5 5 - 2 8
- tcritical 2.306
- 4. tcalculated -1.947
- 5. Accept (Fail to reject) the H0 .
- The research hypothesis was not supported.
- The weight of women in sororities (xbar111) does
not differ significantly from that of other women
(xbar127), t(8) -1.947, n.s..
(not needed if using SPSS)
17Ind. t-test Example (SPSS)
18Steps of Hypothesis Testing
- In an ideal research setting, we define a
strategy to follow this order - 1. Formulate hypothesis
- 2. Figure out what test statistic will test this
hypothesis - 3. Collect data
- 4. Perform the statistical test
- 5. Accept or reject the hypothesis.
- 4 elements in any hypothesis test.
- 1. A null hypothesis H0
- 2. An alternative hypothesis, Ha
- 3. A test statistic (how calculate, its
distribution) - 4. A rejection region (you want to be in the
rejection region!)
19Caution on hypothesis testing
- In the jury, the social choice is as to what
constitutes an acceptable risk -- to decide the
probability of type I error. - Generally, we refer to alpha (a) as the
significance level, which is the highest limit we
set on the probability of Type I error. - In most statistical situations, by convention we
set a 0.05. But in the criminal case, society
may chose a 0.001 or even a 0.0001 such that
we increase the probability of letting the guilty
go free so as not to falsely convict an innocent
person. - If one reduces Type I error, one by necessity
increases Type II error.
20What happens if samples arent independent?
- That is, they are
- dependent or correlated?
21Ways Pairing Can Occur
- When subjects in one group are matched with a
similar subject in the second group. - When subjects serve as their own control by
receiving both of two different treatments. - When, in before and after studies, the same
subjects are measured twice.
22What is the effect of alcohol on useful
consciousness?
- Ten male subjects taken to a simulated altitude
of 25,000 ft and given tasks to perform. - For each, time (in seconds) at which useful
consciousness ended was recorded. - 3 days later, experiment was repeated one hour
after subjects ingested 0.5 cm3 of 100-proof
whiskey per pound of body weight.
23What is the effect of alcohol on useful
consciousness?
H0 ?D 0 vs. H0 ?D gt 0
Paired T for NoAlcohol - Alcohol
N Mean StDev SE Mean NoAlcohol
10 546.6 238.8 75.5 Alcohol
10 351.0 210.9 66.7 Difference
10 195.6 230.5 72.9 95 CI for
mean difference (30.7, 360.5) T-Test of mean
difference 0 (vs gt 0) T-Value 2.68
P-Value 0.013
24What is the effect of time on memory recall?
- 8 people were given 10 minutes to memorize a list
of 20 nonsense words. - Each was asked to list as many words as he or she
could remember after 1 hour and again after 24
hours.
25What is the effect of time on memory recall?
Paired T for 1hour - 24hour N
Mean StDev SE Mean 1hour 8
12.75 3.69 1.31 24hour 8
9.13 3.52 1.25 Difference 8
3.625 2.066 0.730 95 CI for mean
difference (1.897, 5.353) T-Test of mean
difference 0 (vs not gt 0) T-Value 4.96
P-Value
0.001
26Do males earn higher average starting salaries
than females?
(in 1,000s) Males Females 22
20 29 28 80 78 35
32 Sample Average 41.5 39.5
Real question is whether males and females in the
same job earn different average salaries. Better
to compare the difference in salaries in pairs
of males and females.
27Paired Study
Salaries (in 1,000s) Job Males Females Differe
nceM-F Non-Profit 22 20 2.0 Education
29 28 1.0 Doctor 80
78 2.0 Scientist 35 32
3.0 Averages 41.5 39.5 2.0
P-value How likely is it that a paired sample
would have a difference as large as 2,000 if the
true difference were 0?
Problem reduces to a One-Sample T-test on
differences!!!!
28The Paired-T Test Statistic
- If
- there are n pairs
- and the differences are normally distributed
Then The test statistic, which follows a
t-distribution with n-1 degrees of freedom, gives
us our p-value
29The Paired-T Confidence Interval
- If
- there are n pairs
- and the differences are normally distributed
Then The confidence interval, with t following
t-distribution with n-1 d.f. estimates the actual
population difference
30Data analyzed as Paired T
Paired T for M - F N Mean
StDev SE Mean M 4 41.5
26.2 13.1 F 4 39.5
26.1 13.1 Difference 4 2.000
0.816 0.408 95 CI for mean difference
(0.701, 3.299) T-Test of mean difference 0 (vs
not 0) T-Value 4.90
P-Value 0.016
P 0.016. Reject null. Sufficient evidence to
conclude that average starting salaries differ
between males and females.
31Now, Data analyzed as 2-Sample T
Two sample T for M vs F N Mean StDev
SE Mean M 4 41.5 26.2 13 F 4
39.5 26.1 13 95 CI for µ M - µ
F ( -43, 47) T-Test µ M µ F (vs not ) T
0.11 P 0.92 DF 6
P 0.92. Do not reject null. Insufficient
evidence to conclude that average starting
salaries differ between males and females.
32What happened?
- P-value from two-sample t-test is just plain
wrong. (Assumptions not met.) - We removed or blocked out the extra variability
in the data due to differences in jobs, thereby
focusing directly on the differences in salaries. - The paired t-test is more powerful because the
paired design reduces the variability in the data.
33Example 3
- You have a very important psychological
question Is apple pie preferred over pecan
pie? You give 9 people a slice of apple pie and
a slice of pecan pie. Since you are such a
skilled researcher you present the slices of pie
in a counterbalanced order across subjects. You
measure the number of grams of apple and pecan
pie that each person eats.
34- What test? Related Samples t-test
- Level of significance?
- 2. State IV, levels of IV, and DV?
- 3. One-tailed or two-tailed test?
- 4. Hypotheses?
- 5. Critical value?
- df 8, tcrit 1.860
35Example Cont.
36Steps 6 7
t obs - 1.095 t obs lt t crit, therefore we fail
to reject the null hypothesis. Apple pie is not
preferred over pecan pie.
37(No Transcript)