Title: Comparison of 2 Population Means
1Comparison of 2 Population Means
- Goal To compare 2 populations/treatments wrt a
numeric outcome - Sampling Design Independent Samples (Parallel
Groups) vs Paired Samples (Crossover Design) - Data Structure Normal vs Non-normal
- Sample Sizes Large (n1,n2gt20) vs Small
2Independent Samples
- Units in the two samples are different
- Sample sizes may or may not be equal
- Large-sample inference based on Normal
Distribution (Central Limit Theorem) - Small-sample inference depends on distribution of
individual outcomes (Normal vs non-Normal)
3Parameters/Estimates (Independent Samples)
- Parameter
- Estimator
- Estimated standard error
- Shape of sampling distribution
- Normal if data are normal
- Approximately normal if n1,n2gt20
- Non-normal otherwise (typically)
4Large-Sample Test of m1-m2
- Null hypothesis The population means differ by
D0 (which is typically 0) - Alternative Hypotheses
- 1-Sided
- 2-Sided
- Test Statistic
5Large-Sample Test of m1-m2
- Decision Rule
- 1-sided alternative
- If zobs ? za gt Conclude m1-m2 gt D0
- If zobs lt za gt Do not reject m1-m2 D0
- 2-sided alternative
- If zobs ? za/2 gt Conclude m1-m2 gt D0
- If zobs ? -za/2 gt Conclude m1-m2 lt D0
- If -za/2 lt zobs lt za/2 gt Do not reject m1-m2
D0
6Large-Sample Test of m1-m2
- Observed Significance Level (P-Value)
- 1-sided alternative
- PP(z ? zobs) (From the std. Normal
distribution) - 2-sided alternative
- P2P( z? zobs ) (From the std. Normal
distribution) - If P-Value ? a, then reject the null hypothesis
7Large-Sample (1-a)100 Confidence Interval for
m1-m2
- Confidence Coefficient (1-a) refers to the
proportion of times this rule would provide an
interval that contains the true parameter value
m1-m2 if it were applied over all possible
samples - Rule
8Large-Sample (1-a)100 Confidence Interval for
m1-m2
- For 95 Confidence Intervals, z.0251.96
- Confidence Intervals and 2-sided tests give
identical conclusions at same a-level - If entire interval is above D0, conclude m1-m2 gt
D0 - If entire interval is below D0, conclude m1-m2 lt
D0 - If interval contains D0, do not reject m1-m2 D0
9Example Vitamin C for Common Cold
- Outcome Number of Colds During Study Period for
Each Student - Group 1 Given Placebo
- Group 2 Given Ascorbic Acid (Vitamin C)
Source Pauling (1971)
102-Sided Test to Compare Groups
- H0 m1-m2 0 (No difference in trt effects)
- HA m1-m2? 0 (Difference in trt effects)
- Test Statistic
- Decision Rule (a0.05)
- Conclude m1-m2 gt 0 since zobs 25.3 gt z.025
1.96
1195 Confidence Interval for m1-m2
- Point Estimate
- Estimated Std. Error
- Critical Value z.025 1.96
- 95 CI 0.30 1.96(0.0119) ? 0.30 0.023
- ? (0.277 , 0.323) Entire interval gt 0
12Small-Sample Test for m1-m2 Normal Populations
(P. 538)
- Case 1 Common Variances (s12 s22 s2)
- Null Hypothesis
- Alternative Hypotheses
- 1-Sided
- 2-Sided
- Test Statistic(where Sp2 is a pooled estimate
of s2)
13Small-Sample Test for m1-m2 Normal Populations
- Decision Rule (Based on t-distribution with
nn1n2-2 df) - 1-sided alternative
- If tobs ? ta,n gt Conclude m1-m2 gt D0
- If tobs lt ta,n gt Do not reject m1-m2 D0
- 2-sided alternative
- If tobs ? ta/2 ,n gt Conclude m1-m2 gt D0
- If tobs ? -ta/2,n gt Conclude m1-m2 lt D0
- If -ta/2,n lt tobs lt ta/2,n gt Do not reject
m1-m2 D0
14Small-Sample Test for m1-m2 Normal Populations
- Observed Significance Level (P-Value)
- Special Tables Needed, Printed by Statistical
Software Packages - 1-sided alternative
- PP(t ? tobs) (From the tn distribution)
- 2-sided alternative
- P2P( t ? tobs ) (From the tn distribution)
- If P-Value ? a, then reject the null hypothesis
15Small-Sample (1-a)100 Confidence Interval for
m1-m2 - Normal Populations
- Confidence Coefficient (1-a) refers to the
proportion of times this rule would provide an
interval that contains the true parameter value
m1-m2 if it were applied over all possible
samples - Rule
- Interpretations same as for large-sample CIs
16Small-Sample Inference for m1-m2 Normal
Populations (P.529)
- Case 2 s12 ? s22
- Dont pool variances
- Use adjusted degrees of freedom
(Satterthwaites Approximation)
17Example - Maze Learning (Adults/Children)
- Groups Adults (n114) / Children (n210)
- Outcome Average of Errors in Maze Learning
Task - Raw Data on next slide
- Conduct a 2-sided test of whether mean scores
differ - Construct a 95 Confidence Interval for true
difference
Source Gould and Perrin (1916)
18Example - Maze Learning (Adults/Children)
19Example - Maze LearningCase 1 - Equal Variances
H0 m1-m2 0 HA m1-m2 ? 0 (a
0.05)
No significant difference between 2 age groups
20Example - Maze LearningCase 2 - Unequal Variances
H0 m1-m2 0 HA m1-m2 ? 0 (a
0.05)
No significant difference between 2 age groups
21SPSS Output
22Small Sample Test to Compare Two Medians -
Nonnormal Populations
- Two Independent Samples (Parallel Groups)
- Procedure (Wilcoxon Rank-Sum Test)
- Rank measurements across samples from smallest
(1) to largest (n1n2). Ties take average ranks. - Obtain the rank sum for each group (W1 ,W2 )
- 1-sided testsConclude HA M1 gt M2 if W2 ? W0
- 2-sided testsConclude HA M1 ? M2 if min(W1,
W2) ? W0 - Values of W0 are given in many texts for various
sample sizes and significance levels. P-values
are printed by statistical software packages.
23Normal Approximation (Supp PP5-7)
- Under the null hypothesis of no difference in the
two groups (let WW1 from last slide) - A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
24Example - Maze Learning
25Example - Maze Learning
As with the t-test, no evidence of population
group differences
26Computer Output - SPSS
27Inference Based on Paired Samples (Crossover
Designs)
- Setting Each treatment is applied to each
subject or pair (preferably in random order) - Data di is the difference in scores (Trt1-Trt2)
for subject (pair) i - Parameter mD - Population mean difference
- Sample Statistics
28Test Concerning mD
- Null Hypothesis H0mDD0 (almost always 0)
- Alternative Hypotheses
- 1-Sided HA mD gt D0
- 2-Sided HA mD ? D0
- Test Statistic
29Test Concerning mD
Decision Rule (Based on t-distribution with
nn-1 df) 1-sided alternative If tobs ? ta,n
gt Conclude mD gt D0 If tobs lt ta,n gt Do
not reject mD D0 2-sided alternative If tobs ?
ta/2 ,n gt Conclude mD gt D0 If tobs ?
-ta/2,n gt Conclude mD lt D0 If -ta/2,n lt
tobs lt ta/2,n gt Do not reject mD D0
Confidence Interval for mD
30Example Antiperspirant Formulations
- Subjects - 20 Volunteers armpits
- Treatments - Dry Powder vs Powder-in-Oil
- Measurements - Average Rating by Judges
- Higher scores imply more disagreeable odor
- Summary Statistics (Raw Data on next slide)
Source E. Jungermann (1974)
31Example Antiperspirant Formulations
32Example Antiperspirant Formulations
Evidence that scores are higher (more unpleasant)
for the dry powder (formulation 1)
33Small-Sample Test For Nonnormal Data
- Paired Samples (Crossover Design)
- Procedure (Wilcoxon Signed-Rank Test)
- Compute Differences di (as in the paired t-test)
and obtain their absolute values (ignoring 0s) - Rank the observations by di (smallest1),
averaging ranks for ties - Compute W and W-, the rank sums for the positive
and negative differences, respectively - 1-sided testsConclude HA M1 gt M2 if W- ? T0
- 2-sided testsConclude HA M1 ? M2 if min(W, W-
) ? T0 - Values of T0 are given in many texts for various
sample sizes and significance levels. P-values
printed by statistical software packages.
34Normal Approximation (Supp PP18-21)
- Under the null hypothesis of no difference in the
two groups - A z-statistic can be computed and P-value
(approximate) can be obtained from Z-distribution
35Example - Caffeine and Endurance
- Subjects 9 well-trained cyclists
- Treatments 13mg Caffeine (Condition 1) vs 5mg
(Condition 2) - Measurements Minutes Until Exhaustion
- This is subset of larger study (well see later)
- Step 1 Take absolute values of differences
(eliminating 0s) - Step 2 Rank the absolute differences (averaging
ranks for ties) - Step 3 Sum Ranks for positive and negative true
differences
Source Pasman, et al (1995)
36Example - Caffeine and Endurance
Original Data
37Example - Caffeine and Endurance
Absolute Differences
Ranked Absolute Differences
W 12467828 W- 35917
38Example - Caffeine and Endurance
Under the null hypothesis of no difference in the
two groups
There is no evidence that endurance times differ
for the 2 doses (we will see later that both are
higher than no dose)
39SPSS Output
Note that SPSS is taking MG5-MG13, while we used
MG13-MG5
40Data Sources
- Pauling, L. (1971). The Significance of the
Evidence about Ascorbic Acid and the Common
Cold, Proceedings of the National Academies of
Sciences of the United States of America, 11
2678-2681 - Gould, M.C. and F.A.C. Perrin (1916). A
Comparison of the Factors Involved in the Maze
Learning of Human Adults and Children, Journal
of Experimental Psychology, 1122-??? - Jungermann, E. (1974). Antiperspirants New
Trends in Formulation and Testing Technology,
Journal of the Society of Cosmetic Chemists
25621-638 - Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and
A. de Haan (1995). The Effect of Different
Dosages of Caffeine on Endurance Performance
Time, International Journal of Sports Medicine,
16225-230