Title: Introduction to choosing the correct statistical test
1Introduction to choosing the correct statistical
test
-
- Tests for Continuous Outcomes I
2Questions to ask yourself
- What is the outcome (dependent) variable?
- Is the outcome variable continuous,
binary/categorical, or time-to-event? - What is the unit of observation?
- person (most common)
- lesion
- half a face
- physician
- clinical center
- Are the observations independent or correlated?
- Independent observations are unrelated (usually
different, unrelated people) - Correlated some observations are related to one
another, for example the same person over time
(repeated measures), lesions within a person,
half a face, hands within a person, controls who
have each been selected to a particular case,
sibling pairs, husband-wife pairs, mother-infant
pairs
3Correlated data example
- Split-face trial
- Researchers assigned 56 subjects to apply SPF 85
sunscreen to one side of their faces and SPF 50
to the other prior to engaging in 5 hours of
outdoor sports during mid-day. - Sides of the face were randomly assigned
subjects were blinded to SPF strength. - Outcome sunburn
Russak JE et al. JAAD 2010 62 348-349.
4Results
Table IÂ Â --Â Dermatologist grading of sunburn
after an average of 5 hours of skiing/snowboarding
(P .03 Fishers exact test)
Sun protection factor Sunburned Not sunburned
85 1 55
50 8 48
Fishers exact test compares the following
proportions 1/56 versus 8/56. Note that
individuals are being counted twice!
5Correct analysis of data
Table 1. Correct presentation of the data from
Russak JE et al. JAAD 2010 62 348-349. (P
.016 McNemars test).
SPF-50 side SPF-50 side
SPF-85 side Sunburned Not sunburned
Sunburned 1 0
Not sunburned 7 48
McNemars test evaluates the probability of the
following In all 7 out of 7 cases where the
sides of the face were discordant (i.e., one side
burnt and the other side did not), the SPF 50
side sustained the burn.
6Overview of common statistical tests
Outcome Variable Are the observations correlated? Are the observations correlated? Assumptions
Outcome Variable independent correlated Assumptions
Continuous (e.g. blood pressure, age, pain score) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship.
Binary or categorical (e.g. breast cancer yes/no) Chi-square test Relative risks Logistic regression McNemars test Conditional logistic regression GEE modeling Chi-square test assumes sufficient numbers in each cell (gt5)
Time-to-event (e.g. time-to-death, time-to-fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups
7Overview of common statistical tests
Outcome Variable Are the observations correlated? Are the observations correlated? Assumptions
Outcome Variable independent correlated Assumptions
Continuous (e.g. blood pressure, age, pain score) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship.
Binary or categorical (e.g. breast cancer yes/no) Chi-square test Relative risks Logistic regression McNemars test Conditional logistic regression GEE modeling Sufficient numbers in each cell (gt5)
Time-to-event (e.g. time-to-death, time-to-fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups
8Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
9Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
10Example two-sample t-test
- In 1980, some researchers reported that men have
more mathematical ability than women as
evidenced by the 1979 SATs, where a sample of 30
random male adolescents had a mean score 1
standard deviation of 43677 and 30 random female
adolescents scored lower 41681 (genders were
similar in educational backgrounds,
socio-economic status, and age). Do you agree
with the authors conclusions?
11Two sample ttest
- Statistical question Is there a difference in
SAT math scores between men and women? - What is the outcome variable? Math SAT scores
- What type of variable is it? Continuous
- Is it normally distributed? Yes
- Are the observations correlated? No
- Are groups being compared, and if so, how many?
Yes, two - ? two-sample ttest
12Two-sample ttest mechanics
13Data Summary
n Sample Mean Sample Standard Deviation
Group 1 women 30 416 81
Group 2 men 30 436 77
14Two-sample t-test
- 1. Define your hypotheses (null, alternative)
- H0 ?-? math SAT 0
- Ha ?-? math SAT ? 0 two-sided
15Two-sample t-test
- 2. Specify your null distribution
-
- F and M have approximately equal standard
deviations/variances, so make a pooled estimate
of standard deviation/variance
The standard error of a difference of two means
is
Differences in means follow a T-distribution
16T distribution
- A t-distribution is like a Z distribution, except
has slightly fatter tails to reflect the
uncertainty added by estimating the standard
deviation. - The bigger the sample size (i.e., the bigger the
sample size used to estimate ?), then the closer
t becomes to Z. - If ngt100, t approaches Z.
17Students t Distribution
Note t Z as n increases
Standard Normal (t with df ?)
t (df 13)
t-distributions are bell-shaped and symmetric,
but have fatter tails than the normal
t (df 5)
t
0
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
18Students t Table
Upper Tail Area
Let n 3 df n - 1 2 ? .10
?/2 .05
df
.25
.10
.05
1
1.000
3.078
6.314
2
0.817
1.886
2.920
?/2 .05
3
0.765
1.638
2.353
The body of the table contains t values, not
probabilities
0
t
2.920
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
19t distribution values
With comparison to the Z value
Confidence t t
t Z Level (10 d.f.)
(20 d.f.) (30 d.f.) ____ .80
1.372 1.325 1.310 1.28
.90 1.812 1.725
1.697 1.64 .95 2.228
2.086 2.042 1.96 .99
3.169 2.845 2.750 2.58
Note t Z as n increases
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
20Two-sample t-test
- 2. Specify your null distribution
-
- F and M have approximately equal standard
deviations/variances, so make a pooled estimate
of standard deviation/variance
The standard error of a difference of two means
is
Differences in means follow a T-distribution
here we have a T-distribution with 58 degrees of
freedom (60 observations 2 means)
21Two-sample t-test
- 3. Observed difference in our experiment 20
points
22Two-sample t-test
- 4. Calculate the p-value of what you observed
Critical value for two-tailed p-value of .05 for
T582.000 0.98lt2.000, so pgt.05
5. Do not reject null! No evidence that men
are better in math )
23Corresponding confidence interval
Note that the 95 confidence interval crosses 0
(the null value).
24Review Question 1
- A t-distribution
- Is approximately a normal distribution if ngt100.
- Can be used interchangeably with a normal
distribution as long as the sample size is large
enough. - Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation. - All of the above.
25Review Question 1
- A t-distribution
- Is approximately a normal distribution if ngt100.
- Can be used interchangeably with a normal
distribution as long as the sample size is large
enough. - Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation. - All of the above.
26Review Question 2
- In a medical student class, the 6 people born on
odd days had heights of 64.6?4 inches the 10
people born on even days had heights of 71.1?5
inches. Height is roughly normally distributed.
Which of the following best represents the
correct statistical test for these data? - a.
- b.
- c.
- d.
27Review Question 2
- In a medical student class, the 6 people born on
odd days had heights of 64.6?4 inches the 10
people born on even days had heights of 71.1?5
inches. Height is roughly normally distributed.
Which of the following best represents the
correct statistical test for these data? - a.
- b.
- c.
- d.
28Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
29Example paired ttest
Difference Significance
Difference Significance
Before BTxnA After BTxnA Difference Significance
Social skills 5.90 5.84 NS .293
Academic performance 5.86 5.78 .08 .068
Date success 5.17 5.30 .13 .014
Occupational success 6.08 5.97 .11 .013
Attractiveness 4.94 5.07 .13 .030
Financial success 5.67 5.61 NS .230
Relationship success 5.68 5.68 NS .967
Athletic success 5.15 5.38 .23 .000
Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level.
TABLE 1. Â Difference between Means of "Before"
and "After" Botulinum Toxin A Treatment
30Paired ttest
- Statistical question Is there a difference in
date success after BoTox? - What is the outcome variable? Date success
- What type of variable is it? Continuous
- Is it normally distributed? Yes
- Are the observations correlated? Yes, its the
same patients before and after - How many time points are being compared? Two
- ? paired ttest
31Paired ttest mechanics
- Calculate the change in date success score for
each person. - Calculate the average change in date success for
the sample. (.13) - Calculate the standard error of the change in
date success. (.05) - Calculate a T-statistic by dividing the mean
change by the standard error (T.13/.052.6). - Look up the corresponding p-values. (T2.6
corresponds to p.014). - Significant p-values indicate that the average
change is significantly different than 0.
32Paired ttest example 2
33Example problem paired ttest
Null Hypothesis Average Change 0
34Example problem paired ttest
With 5 df, Tgt2.571 corresponds to plt.05
(two-sided test)
35Example problem paired ttest
Note does not include 0.
36Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
37Using our class data
- Hypothesis Students who consider themselves
street smart drink more alcohol than students who
consider themselves book smart. - Null hypothesis no difference in alcohol
drinking between street smart and book smart
students.
38Non-normal class dataalcohol
39Wilcoxon sum-rank test
- Statistical question Is there a difference in
alcohol drinking between street smart and book
smart students? - What is the outcome variable? Weekly alcohol
intake (drinks/week) - What type of variable is it? Continuous
- Is it normally distributed? No (and small n)
- Are the observations correlated? No
- Are groups being compared, and if so, how many?
two - ? Wilcoxon sum-rank test
40Results
Book smart
Street smart
Mean1.6 drinks/week median 1.5
Mean2.7 drinks/week median 3.0
41Wilcoxon rank-sum test mechanics
- Book smart values (n13) 0 0 0 0 1 1 2 2 2 3 3
4 5 - Street Smart values (n7) 0 0 2 3 3 5 6
- Combined groups (n20) 0 0 0 0 0 0 1 1 2 2 2 2 3
3 3 3 4 5 5 6 - Corresponding ranks 3.5 3.5 3.5 3.5 3.5 3.5 7.5
7.5 10.5 10.5 10.5 10.5 14.5 14.5 14.5 14.5 17
18.5 18.5 20 - ties are assigned average ranks e.g., there are
6 zeros, so zeros get the average of the ranks
1 through 6.
42Wilcoxon rank-sum test
- Ranks, book smart 3.5 3.5 3.5 3.5 7.5 7.5 10.5
10.5 10.5 14.5 14.5 17 18.5 - Ranks, street smart 3.5 3.5 10.5 14.5 14.5 18.5
20 - Sum of ranks book smart 3.53.53.53.57.57.51
0.510.510.5 14.514.51718.5 125 - Sum of ranks street smart 3.53.510.514.5
14.518.520 85 - Wilcoxon sum-rank test compares these numbers
accounting for the differences in sample size in
the two groups. - Resulting p-value (from computer) 0.24
- Not significantly different!
43Example 2, Wilcoxon sum-rank test
10 dieters following Atkins diet vs. 10 dieters
following Jenny Craig Hypothetical
RESULTS Atkins group loses an average of 34.5
lbs. J. Craig group loses an average of 18.5
lbs. Conclusion Atkins is better?
44Example non-parametric tests
BUT, take a closer look at the individual
data Atkins, change in weight (lbs) 4, 3,
0, -3, -4, -5, -11, -14, -15, -300 J. Craig,
change in weight (lbs) -8, -10, -12, -16, -18,
-20, -21, -24, -26, -30
45Jenny Craig
30
25
20
P
e
r
c
15
e
n
t
10
5
0
-30
-25
-20
-15
-10
-5
0
5
10
15
20
Weight Change
46Atkins
30
25
20
P
e
r
c
15
e
n
t
10
5
0
-300
-280
-260
-240
-220
-200
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
20
Weight Change
47Wilcoxon Rank-Sum test
- RANK the values, 1 being the least weight loss
and 20 being the most weight loss. - Atkins
- 4, 3, 0, -3, -4, -5, -11, -14, -15, -300
- Â 1, 2, 3, 4, 5, 6, 9, 11, 12, 20
- J. Craig
- -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
- 7, 8, 10, 13, 14, 15, 16, 17, 18,
19
48Wilcoxon Rank-Sum test
- Sum of Atkins ranks
- Â 1 2 3 4 5 6 9 11 12 2073
- Sum of Jenny Craigs ranks
- 7 8 10 13 14 1516 17 1819137
- Jenny Craig clearly ranked higher!
- P-value (from computer) .018
49Review Question 3
- When you want to compare mean blood pressure
between two groups, you should - Use a ttest
- Use a nonparametric test
- Use a ttest if blood pressure is normally
distributed. - Use a two-sample proportions test.
- Use a two-sample proportions test only if blood
pressure is normally distributed.
50Review Question 3
- When you want to compare mean blood pressure
between two groups, you should - Use a ttest
- Use a nonparametric test
- Use a ttest if blood pressure is normally
distributed. - Use a two-sample proportions test.
- Use a two-sample proportions test only if blood
pressure is normally distributed.
51Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
52DHA and eczema
Figure 3 from Koch C, Dölle S, Metzger M, Rasche
C, Jungclas H, Rühl R, Renz H, Worm M.
Docosahexaenoic acid (DHA) supplementation in
atopic eczema a randomized, double-blind,
controlled trial. Br J Dermatol. 2008
Apr158(4)786-92. Epub 2008 Jan 30.
53Wilcoxon sign-rank test
- Statistical question Did patients improve in
SCORAD score from baseline to 8 weeks? - What is the outcome variable? SCORAD
- What type of variable is it? Continuous
- Is it normally distributed? No (and small
numbers) - Are the observations correlated? Yes, its the
same people before and after - How many time points are being compared? two
- ? Wilcoxon sign-rank test
54Wilcoxon sign-rank test mechanics
- 1. Calculate the change in SCORAD score for each
participant. - 2. Rank the absolute values of the changes in
SCORAD score from smallest to largest. - 3. Add up the ranks from the people who improved
and, separately, the ranks from the people who
got worse. - 4. The Wilcoxon sign-rank compares these values
to determine whether improvements significantly
exceed declines (or vice versa).
55Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
56ANOVA example
Mean micronutrient intake from the school lunch
by school
a School 1 (most deprived 40 subsidized
lunches).b School 2 (medium deprived lt10
subsidized).c School 3 (least deprived no
subsidization, private school).d ANOVA
significant differences are highlighted in bold
(Plt0.05).
FROM Gould R, Russell J, Barker ME. School lunch
menus and 11 to 12 year old children's food
choice in three secondary schools in England-are
the nutritional standards being met? Appetite.
2006 Jan46(1)86-92.
57ANOVA
- Statistical question Does calcium content of
school lunches differ by school type (privileged,
average, deprived) - What is the outcome variable? Calcium
- What type of variable is it? Continuous
- Is it normally distributed? Yes
- Are the observations correlated? No
- Are groups being compared and, if so, how many?
Yes, three - ? ANOVA
58ANOVA (ANalysis Of VAriance)
- Idea For two or more groups, test difference
between means, for normally distributed
variables. - Just an extension of the t-test (an ANOVA with
only two groups is mathematically equivalent to a
t-test).
59One-Way Analysis of Variance
- Assumptions, same as ttest
- Normally distributed outcome
- Equal variances between the groups
- Groups are independent
60Hypotheses of One-Way ANOVA
61ANOVA
- Its like this If I have three groups to
compare - I could do three pair-wise ttests, but this would
increase my type I error - So, instead I want to look at the pairwise
differences all at once. - To do this, I can recognize that variance is a
statistic that lets me look at more than one
difference at a time
62The F-test
Is the difference in the means of the groups more
than background noise (variability within
groups)?
63The F-distribution
- A ratio of variances follows an F-distribution
- The F-test tests the hypothesis that two
variances are equal. - F will be close to 1 if sample variances are
equal.
64ANOVA example 2
- Randomize 33 subjects to three groups 800 mg
calcium supplement vs. 1500 mg calcium supplement
vs. placebo. - Compare the spine bone density of all 3 groups
after 1 year.
65Spine bone density vs. treatment
1.2
1.1
1.0
S
P
I
N
E
0.9
0.8
0.7
PLACEBO
800mg CALCIUM
1500 mg CALCIUM
66Group means and standard deviations
- Placebo group (n11)
- Mean spine BMD .92 g/cm2
- standard deviation .10 g/cm2
- 800 mg calcium supplement group (n11)
- Mean spine BMD .94 g/cm2
- standard deviation .08 g/cm2
- 1500 mg calcium supplement group (n11)
- Mean spine BMD 1.06 g/cm2
- standard deviation .11 g/cm2
67The F-Test
68Review Question 4
- Which of the following is an assumption of ANOVA?
-
- The outcome variable is normally distributed.
- The variance of the outcome variable is the same
in all groups. - The groups are independent.
- All of the above.
- None of the above.
69Review Question 4
- Which of the following is an assumption of ANOVA?
-
- The outcome variable is normally distributed.
- The variance of the outcome variable is the same
in all groups. - The groups are independent.
- All of the above.
- None of the above.
70ANOVA summary
- A statistically significant ANOVA (F-test) only
tells you that at least two of the groups differ,
but not which ones differ. - Determining which groups differ (when its
unclear) requires more sophisticated analyses to
correct for the problem of multiple comparisons
71Question Why not just do 3 pairwise ttests?
- Answer because, at an error rate of 5 each
test, this means you have an overall chance of up
to 1-(.95)3 14 of making a type-I error (if all
3 comparisons were independent) - Â If you wanted to compare 6 groups, youd have to
do 15 pairwise ttests which would give you a
high chance of finding something significant just
by chance.
72Multiple comparisons
73Correction for multiple comparisons
- How to correct for multiple comparisons post-hoc
- Bonferroni correction (adjusts p by most
conservative amount assuming all tests
independent, divide p by the number of tests) - Tukey (adjusts p)
- Scheffe (adjusts p)
741. Bonferroni
For example, to make a Bonferroni correction,
divide your desired alpha cut-off level (usually
.05) by the number of comparisons you are making.
Assumes complete independence between
comparisons, which is way too conservative.
752/3. Tukey and Sheffé
- Both methods increase your p-values to account
for the fact that youve done multiple
comparisons, but are less conservative than
Bonferroni (let computer calculate for you!).
76Review Question 5
- I am doing an RCT of 4 treatment regimens for
blood pressure. At the end of the day, I compare
blood pressures in the 4 groups using ANOVA. My
p-value is .03. I conclude -
- All of the treatment regimens differ.
- I need to use a Bonferroni correction.
- One treatment is better than all the rest.
- At least one treatment is different from the
others. - In pairwise comparisons, no treatment will be
different.
77Review Question 5
- I am doing an RCT of 4 treatment regimens for
blood pressure. At the end of the day, I compare
blood pressures in the 4 groups using ANOVA. My
p-value is .03. I conclude -
- All of the treatment regimens differ.
- I need to use a Bonferroni correction.
- One treatment is better than all the rest.
- At least one treatment is different from the
others. - In pairwise comparisons, no treatment will be
different.
78Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
79Non-parametric ANOVA (Kruskal-Wallis test)
- Statistical question Do nevi counts differ by
training velocity (slow, medium, fast) group in
marathon runners? - What is the outcome variable? Nevi count
- What type of variable is it? Continuous
- Is it normally distributed? No (and small sample
size) - Are the observations correlated? No
- Are groups being compared and, if so, how many?
Yes, three - ? non-parametric ANOVA
80Example Nevi counts and marathon runners
Richtig et al. Melanoma Markers in Marathon
Runners Increase with Sun Exposure and Physical
Strain. Dermatology 200821738-44.
81Non-parametric ANOVA
- Kruskal-Wallis one-way ANOVA
- (just an extension of the Wilcoxon Sum-Rank test
for 2 groups based on ranks)
82Example Nevi counts and marathon runners
By non-parametric ANOVA, the groups differ
significantly in nevi count (plt.05) overall. By
Wilcoxon sum-rank test (adjusted for multiple
comparisons), the lowest velocity group differs
significantly from the highest velocity group
(plt.05)
Richtig et al. Melanoma Markers in Marathon
Runners Increase with Sun Exposure and Physical
Strain. Dermatology 200821738-44.
83Review Question 6
- I want to compare depression scores between
three groups, but Im not sure if depression is
normally distributed. What should I do? -
- Dont worry about itrun an ANOVA anyway.
- Test depression for normality.
- Use a Kruskal-Wallis (non-parametric) ANOVA.
- Nothing, I cant do anything with these data.
- Run 3 nonparametric ttests.
84Review Question 6
- I want to compare depression scores between
three groups, but Im not sure if depression is
normally distributed. What should I do? -
- Dont worry about itrun an ANOVA anyway.
- Test depression for normality.
- Use a Kruskal-Wallis (non-parametric) ANOVA.
- Nothing, I cant do anything with these data.
- Run 3 nonparametric ttests.
85Review Question 7
- If depression score turns out to be very
non-normal, then what should I do? -
- Dont worry about itrun an ANOVA anyway.
- Test depression for normality.
- Use a Kruskal-Wallis (non-parametric) ANOVA.
- Nothing, I cant do anything with these data.
- Run 3 nonparametric ttests.
86Review Question 7
- If depression score turns out to be very
non-normal, then what should I do? -
- Dont worry about itrun an ANOVA anyway.
- Test depression for normality.
- Use a Kruskal-Wallis (non-parametric) ANOVA.
- Nothing, I cant do anything with these data.
- Run 3 nonparametric ttests.
87Review Question 8
- I measure blood pressure in a cohort of elderly
men yearly for 3 years. To test whether or not
their blood pressure changed over time, I compare
the mean blood pressures in each time period
using a one-way ANOVA. This strategy is -
- Correct. I have three means, so I have to use
ANOVA. - Wrong. Blood pressure is unlikely to be normally
distributed. - Wrong. The variance in BP is likely to greatly
differ at the three time points. - Correct. It would also be OK to use three ttests.
- Wrong. The samples are not independent.
88Review Question 8
- I measure blood pressure in a cohort of elderly
men yearly for 3 years. To test whether or not
their blood pressure changed over time, I compare
the mean blood pressures in each time period
using a one-way ANOVA. This strategy is -
- Correct. I have three means, so I have to use
ANOVA. - Wrong. Blood pressure is unlikely to be normally
distributed. - Wrong. The variance in BP is likely to greatly
differ at the three time points. - Correct. It would also be OK to use three ttests.
- Wrong. The samples are not independent.