Introduction to choosing the correct statistical test presentation

About This Presentation

Title:

Introduction to choosing the correct statistical test

Description:

Introduction to choosing the correct statistical test + Tests for Continuous Outcomes I Questions to ask yourself: What is the outcome (dependent) variable? –

Number of Views:1480

Avg rating:3.0/5.0

Slides: 89

Provided by: stanfordE4

Learn more at: https://web.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to choosing the correct statistical test

1
Introduction to choosing the correct statistical
test

Tests for Continuous Outcomes I

2
Questions to ask yourself

What is the outcome (dependent) variable?
Is the outcome variable continuous,
binary/categorical, or time-to-event?
What is the unit of observation?
person (most common)
lesion
half a face
physician
clinical center
Are the observations independent or correlated?
Independent observations are unrelated (usually
different, unrelated people)
Correlated some observations are related to one
another, for example the same person over time
(repeated measures), lesions within a person,
half a face, hands within a person, controls who
have each been selected to a particular case,
sibling pairs, husband-wife pairs, mother-infant
pairs

3
Correlated data example

Split-face trial
Researchers assigned 56 subjects to apply SPF 85
sunscreen to one side of their faces and SPF 50
to the other prior to engaging in 5 hours of
outdoor sports during mid-day.
Sides of the face were randomly assigned
subjects were blinded to SPF strength.
Outcome sunburn

Russak JE et al. JAAD 2010 62 348-349.
4
Results
Table I -- Dermatologist grading of sunburn
after an average of 5 hours of skiing/snowboarding
(P .03 Fishers exact test)
Sun protection factor Sunburned Not sunburned
85 1 55
50 8 48

Fishers exact test compares the following
proportions 1/56 versus 8/56. Note that
individuals are being counted twice!
5
Correct analysis of data
Table 1. Correct presentation of the data from
Russak JE et al. JAAD 2010 62 348-349. (P
.016 McNemars test).
SPF-50 side SPF-50 side
SPF-85 side Sunburned Not sunburned
Sunburned 1 0
Not sunburned 7 48
McNemars test evaluates the probability of the
following In all 7 out of 7 cases where the
sides of the face were discordant (i.e., one side
burnt and the other side did not), the SPF 50
side sustained the burn.
6
Overview of common statistical tests
Outcome Variable Are the observations correlated? Are the observations correlated? Assumptions
Outcome Variable independent correlated Assumptions
Continuous (e.g. blood pressure, age, pain score) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship.
Binary or categorical (e.g. breast cancer yes/no) Chi-square test Relative risks Logistic regression McNemars test Conditional logistic regression GEE modeling Chi-square test assumes sufficient numbers in each cell (gt5)
Time-to-event (e.g. time-to-death, time-to-fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups
7
Overview of common statistical tests
Outcome Variable Are the observations correlated? Are the observations correlated? Assumptions
Outcome Variable independent correlated Assumptions
Continuous (e.g. blood pressure, age, pain score) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship.
Binary or categorical (e.g. breast cancer yes/no) Chi-square test Relative risks Logistic regression McNemars test Conditional logistic regression GEE modeling Sufficient numbers in each cell (gt5)
Time-to-event (e.g. time-to-death, time-to-fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups
8
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
9
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
10
Example two-sample t-test

In 1980, some researchers reported that men have
more mathematical ability than women as
evidenced by the 1979 SATs, where a sample of 30
random male adolescents had a mean score 1
standard deviation of 43677 and 30 random female
adolescents scored lower 41681 (genders were
similar in educational backgrounds,
socio-economic status, and age). Do you agree
with the authors conclusions?

11
Two sample ttest

Statistical question Is there a difference in
SAT math scores between men and women?
What is the outcome variable? Math SAT scores
What type of variable is it? Continuous
Is it normally distributed? Yes
Are the observations correlated? No
Are groups being compared, and if so, how many?
Yes, two
? two-sample ttest

12
Two-sample ttest mechanics
13
Data Summary
n Sample Mean Sample Standard Deviation
Group 1 women 30 416 81
Group 2 men 30 436 77
14
Two-sample t-test

1. Define your hypotheses (null, alternative)
H0 ?-? math SAT 0
Ha ?-? math SAT ? 0 two-sided

15
Two-sample t-test

2. Specify your null distribution
F and M have approximately equal standard
deviations/variances, so make a pooled estimate
of standard deviation/variance

The standard error of a difference of two means
is
Differences in means follow a T-distribution
16
T distribution

A t-distribution is like a Z distribution, except
has slightly fatter tails to reflect the
uncertainty added by estimating the standard
deviation.
The bigger the sample size (i.e., the bigger the
sample size used to estimate ?), then the closer
t becomes to Z.
If ngt100, t approaches Z.

17
Students t Distribution
Note t Z as n increases
Standard Normal (t with df ?)
t (df 13)
t-distributions are bell-shaped and symmetric,
but have fatter tails than the normal
t (df 5)
t
0
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
18
Students t Table
Upper Tail Area
Let n 3 df n - 1 2 ? .10
?/2 .05
df
.25
.10
.05
1
1.000
3.078
6.314
2
0.817
1.886
2.920
?/2 .05
3
0.765
1.638
2.353
The body of the table contains t values, not
probabilities
0
t
2.920
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
19
t distribution values
With comparison to the Z value
Confidence t t
t Z Level (10 d.f.)
(20 d.f.) (30 d.f.) ____ .80
1.372 1.325 1.310 1.28
.90 1.812 1.725
1.697 1.64 .95 2.228
2.086 2.042 1.96 .99
3.169 2.845 2.750 2.58
Note t Z as n increases
from Statistics for Managers Using Microsoft
Excel 4th Edition, Prentice-Hall 2004
20
Two-sample t-test

2. Specify your null distribution
F and M have approximately equal standard
deviations/variances, so make a pooled estimate
of standard deviation/variance

The standard error of a difference of two means
is
Differences in means follow a T-distribution
here we have a T-distribution with 58 degrees of
freedom (60 observations 2 means)
21
Two-sample t-test

3. Observed difference in our experiment 20
points

22
Two-sample t-test

4. Calculate the p-value of what you observed

Critical value for two-tailed p-value of .05 for
T582.000 0.98lt2.000, so pgt.05
5. Do not reject null! No evidence that men
are better in math )
23
Corresponding confidence interval
Note that the 95 confidence interval crosses 0
(the null value).
24
Review Question 1

A t-distribution
Is approximately a normal distribution if ngt100.
Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation.
All of the above.

25
Review Question 1

A t-distribution
Is approximately a normal distribution if ngt100.
Can be used interchangeably with a normal
distribution as long as the sample size is large
enough.
Reflects the uncertainty introduced when using
the sample, rather than population, standard
deviation.
All of the above.

26
Review Question 2

In a medical student class, the 6 people born on
odd days had heights of 64.6?4 inches the 10
people born on even days had heights of 71.1?5
inches. Height is roughly normally distributed.
Which of the following best represents the
correct statistical test for these data?
a.
b.
c.
d.

27
Review Question 2

In a medical student class, the 6 people born on
odd days had heights of 64.6?4 inches the 10
people born on even days had heights of 71.1?5
inches. Height is roughly normally distributed.
Which of the following best represents the
correct statistical test for these data?
a.
b.
c.
d.

28
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
29
Example paired ttest
Difference Significance
Difference Significance
Before BTxnA After BTxnA Difference Significance

Social skills 5.90 5.84 NS .293
Academic performance 5.86 5.78 .08 .068
Date success 5.17 5.30 .13 .014
Occupational success 6.08 5.97 .11 .013
Attractiveness 4.94 5.07 .13 .030
Financial success 5.67 5.61 NS .230
Relationship success 5.68 5.68 NS .967
Athletic success 5.15 5.38 .23 .000

Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level. Significant at 5 level. Significant at 1 level.
TABLE 1. Difference between Means of "Before"
and "After" Botulinum Toxin A Treatment

30
Paired ttest

Statistical question Is there a difference in
date success after BoTox?
What is the outcome variable? Date success
What type of variable is it? Continuous
Is it normally distributed? Yes
Are the observations correlated? Yes, its the
same patients before and after
How many time points are being compared? Two
? paired ttest

31
Paired ttest mechanics

Calculate the change in date success score for
each person.
Calculate the average change in date success for
the sample. (.13)
Calculate the standard error of the change in
date success. (.05)
Calculate a T-statistic by dividing the mean
change by the standard error (T.13/.052.6).
Look up the corresponding p-values. (T2.6
corresponds to p.014).
Significant p-values indicate that the average
change is significantly different than 0.

32
Paired ttest example 2
33
Example problem paired ttest
Null Hypothesis Average Change 0
34
Example problem paired ttest
With 5 df, Tgt2.571 corresponds to plt.05
(two-sided test)
35
Example problem paired ttest
Note does not include 0.
36
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
37
Using our class data

Hypothesis Students who consider themselves
street smart drink more alcohol than students who
consider themselves book smart.
Null hypothesis no difference in alcohol
drinking between street smart and book smart
students.

38
Non-normal class dataalcohol

39
Wilcoxon sum-rank test

Statistical question Is there a difference in
alcohol drinking between street smart and book
smart students?
What is the outcome variable? Weekly alcohol
intake (drinks/week)
What type of variable is it? Continuous
Is it normally distributed? No (and small n)
Are the observations correlated? No
Are groups being compared, and if so, how many?
two
? Wilcoxon sum-rank test

40
Results

Book smart
Street smart
Mean1.6 drinks/week median 1.5
Mean2.7 drinks/week median 3.0
41
Wilcoxon rank-sum test mechanics

Book smart values (n13) 0 0 0 0 1 1 2 2 2 3 3
4 5
Street Smart values (n7) 0 0 2 3 3 5 6
Combined groups (n20) 0 0 0 0 0 0 1 1 2 2 2 2 3
3 3 3 4 5 5 6
Corresponding ranks 3.5 3.5 3.5 3.5 3.5 3.5 7.5
7.5 10.5 10.5 10.5 10.5 14.5 14.5 14.5 14.5 17
18.5 18.5 20
ties are assigned average ranks e.g., there are
6 zeros, so zeros get the average of the ranks
1 through 6.

42
Wilcoxon rank-sum test

Ranks, book smart 3.5 3.5 3.5 3.5 7.5 7.5 10.5
10.5 10.5 14.5 14.5 17 18.5
Ranks, street smart 3.5 3.5 10.5 14.5 14.5 18.5
20
Sum of ranks book smart 3.53.53.53.57.57.51
0.510.510.5 14.514.51718.5 125
Sum of ranks street smart 3.53.510.514.5
14.518.520 85
Wilcoxon sum-rank test compares these numbers
accounting for the differences in sample size in
the two groups.
Resulting p-value (from computer) 0.24
Not significantly different!

43
Example 2, Wilcoxon sum-rank test
10 dieters following Atkins diet vs. 10 dieters
following Jenny Craig Hypothetical
RESULTS Atkins group loses an average of 34.5
lbs. J. Craig group loses an average of 18.5
lbs. Conclusion Atkins is better?
44
Example non-parametric tests
BUT, take a closer look at the individual
data Atkins, change in weight (lbs) 4, 3,
0, -3, -4, -5, -11, -14, -15, -300 J. Craig,
change in weight (lbs) -8, -10, -12, -16, -18,
-20, -21, -24, -26, -30
45
Jenny Craig
30
25
20
P
e
r
c
15
e
n
t
10
5
0
-30
-25
-20
-15
-10
-5
0
5
10
15
20
Weight Change
46
Atkins
30
25
20
P
e
r
c
15
e
n
t
10
5
0
-300
-280
-260
-240
-220
-200
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
20
Weight Change
47
Wilcoxon Rank-Sum test

RANK the values, 1 being the least weight loss
and 20 being the most weight loss.
Atkins
4, 3, 0, -3, -4, -5, -11, -14, -15, -300
1, 2, 3, 4, 5, 6, 9, 11, 12, 20
J. Craig
-8, -10, -12, -16, -18, -20, -21, -24, -26, -30
7, 8, 10, 13, 14, 15, 16, 17, 18,
19

48
Wilcoxon Rank-Sum test

Sum of Atkins ranks
1 2 3 4 5 6 9 11 12 2073
Sum of Jenny Craigs ranks
7 8 10 13 14 1516 17 1819137
Jenny Craig clearly ranked higher!
P-value (from computer) .018

49
Review Question 3

When you want to compare mean blood pressure
between two groups, you should
Use a ttest
Use a nonparametric test
Use a ttest if blood pressure is normally
distributed.
Use a two-sample proportions test.
Use a two-sample proportions test only if blood
pressure is normally distributed.

50
Review Question 3

When you want to compare mean blood pressure
between two groups, you should
Use a ttest
Use a nonparametric test
Use a ttest if blood pressure is normally
distributed.
Use a two-sample proportions test.
Use a two-sample proportions test only if blood
pressure is normally distributed.

51
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
52
DHA and eczema
Figure 3 from Koch C, Dölle S, Metzger M, Rasche
C, Jungclas H, Rühl R, Renz H, Worm M.
Docosahexaenoic acid (DHA) supplementation in
atopic eczema a randomized, double-blind,
controlled trial. Br J Dermatol. 2008
Apr158(4)786-92. Epub 2008 Jan 30.
53
Wilcoxon sign-rank test

Statistical question Did patients improve in
SCORAD score from baseline to 8 weeks?
What is the outcome variable? SCORAD
What type of variable is it? Continuous
Is it normally distributed? No (and small
numbers)
Are the observations correlated? Yes, its the
same people before and after
How many time points are being compared? two
? Wilcoxon sign-rank test

54
Wilcoxon sign-rank test mechanics

1. Calculate the change in SCORAD score for each
participant.
2. Rank the absolute values of the changes in
SCORAD score from smallest to largest.
3. Add up the ranks from the people who improved
and, separately, the ranks from the people who
got worse.
4. The Wilcoxon sign-rank compares these values
to determine whether improvements significantly
exceed declines (or vice versa).

55
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
56
ANOVA example
Mean micronutrient intake from the school lunch
by school
a School 1 (most deprived 40 subsidized
lunches).b School 2 (medium deprived lt10
subsidized).c School 3 (least deprived no
subsidization, private school).d ANOVA
significant differences are highlighted in bold
(Plt0.05).
FROM Gould R, Russell J, Barker ME. School lunch
menus and 11 to 12 year old children's food
choice in three secondary schools in England-are
the nutritional standards being met? Appetite.
2006 Jan46(1)86-92.
57
ANOVA

Statistical question Does calcium content of
school lunches differ by school type (privileged,
average, deprived)
What is the outcome variable? Calcium
What type of variable is it? Continuous
Is it normally distributed? Yes
Are the observations correlated? No
Are groups being compared and, if so, how many?
Yes, three
? ANOVA

58
ANOVA (ANalysis Of VAriance)

Idea For two or more groups, test difference
between means, for normally distributed
variables.
Just an extension of the t-test (an ANOVA with
only two groups is mathematically equivalent to a
t-test).

59
One-Way Analysis of Variance

Assumptions, same as ttest
Normally distributed outcome
Equal variances between the groups
Groups are independent

60
Hypotheses of One-Way ANOVA
61
ANOVA

Its like this If I have three groups to
compare
I could do three pair-wise ttests, but this would
increase my type I error
So, instead I want to look at the pairwise
differences all at once.
To do this, I can recognize that variance is a
statistic that lets me look at more than one
difference at a time

62
The F-test
Is the difference in the means of the groups more
than background noise (variability within
groups)?
63
The F-distribution

A ratio of variances follows an F-distribution

The F-test tests the hypothesis that two
variances are equal.
F will be close to 1 if sample variances are
equal.

64
ANOVA example 2

Randomize 33 subjects to three groups 800 mg
calcium supplement vs. 1500 mg calcium supplement
vs. placebo.
Compare the spine bone density of all 3 groups
after 1 year.

65
Spine bone density vs. treatment
1.2
1.1
1.0
S
P
I
N
E
0.9
0.8
0.7
PLACEBO
800mg CALCIUM
1500 mg CALCIUM
66
Group means and standard deviations

Placebo group (n11)
Mean spine BMD .92 g/cm2
standard deviation .10 g/cm2
800 mg calcium supplement group (n11)
Mean spine BMD .94 g/cm2
standard deviation .08 g/cm2
1500 mg calcium supplement group (n11)
Mean spine BMD 1.06 g/cm2
standard deviation .11 g/cm2

67
The F-Test
68
Review Question 4

Which of the following is an assumption of ANOVA?
The outcome variable is normally distributed.
The variance of the outcome variable is the same
in all groups.
The groups are independent.
All of the above.
None of the above.

69
Review Question 4

Which of the following is an assumption of ANOVA?
The outcome variable is normally distributed.
The variance of the outcome variable is the same
in all groups.
The groups are independent.
All of the above.
None of the above.

70
ANOVA summary

A statistically significant ANOVA (F-test) only
tells you that at least two of the groups differ,
but not which ones differ.
Determining which groups differ (when its
unclear) requires more sophisticated analyses to
correct for the problem of multiple comparisons

71
Question Why not just do 3 pairwise ttests?

Answer because, at an error rate of 5 each
test, this means you have an overall chance of up
to 1-(.95)3 14 of making a type-I error (if all
3 comparisons were independent)
If you wanted to compare 6 groups, youd have to
do 15 pairwise ttests which would give you a
high chance of finding something significant just
by chance.

72
Multiple comparisons
73
Correction for multiple comparisons

How to correct for multiple comparisons post-hoc
Bonferroni correction (adjusts p by most
conservative amount assuming all tests
independent, divide p by the number of tests)
Tukey (adjusts p)
Scheffe (adjusts p)

74
1. Bonferroni
For example, to make a Bonferroni correction,
divide your desired alpha cut-off level (usually
.05) by the number of comparisons you are making.
Assumes complete independence between
comparisons, which is way too conservative.
75
2/3. Tukey and Sheffé

Both methods increase your p-values to account
for the fact that youve done multiple
comparisons, but are less conservative than
Bonferroni (let computer calculate for you!).

76
Review Question 5

I am doing an RCT of 4 treatment regimens for
blood pressure. At the end of the day, I compare
blood pressures in the 4 groups using ANOVA. My
p-value is .03. I conclude
All of the treatment regimens differ.
I need to use a Bonferroni correction.
One treatment is better than all the rest.
At least one treatment is different from the
others.
In pairwise comparisons, no treatment will be
different.

77
Review Question 5

I am doing an RCT of 4 treatment regimens for
blood pressure. At the end of the day, I compare
blood pressures in the 4 groups using ANOVA. My
p-value is .03. I conclude
All of the treatment regimens differ.
I need to use a Bonferroni correction.
One treatment is better than all the rest.
At least one treatment is different from the
others.
In pairwise comparisons, no treatment will be
different.

78
Continuous outcome (means)
Outcome Variable Are the observations correlated? Are the observations correlated? Alternatives if the normality assumption is violated (and small n)
Outcome Variable independent correlated Alternatives if the normality assumption is violated (and small n)
Continuous (e.g. blood pressure, age, pain score) Ttest compares means between two independent groups ANOVA compares means between more than two independent groups Pearsons correlation coefficient (linear correlation) shows linear correlation between two continuous variables Linear regression multivariate regression technique when the outcome is continuous gives slopes or adjusted means Paired ttest compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling multivariate regression techniques to compare changes over time between two or more groups Non-parametric statistics Wilcoxon sign-rank test non-parametric alternative to paired ttest Wilcoxon sum-rank test (Mann-Whitney U test) non-parametric alternative to the ttest Kruskal-Wallis test non-parametric alternative to ANOVA Spearman rank correlation coefficient non-parametric alternative to Pearsons correlation coefficient
79
Non-parametric ANOVA (Kruskal-Wallis test)

Statistical question Do nevi counts differ by
training velocity (slow, medium, fast) group in
marathon runners?
What is the outcome variable? Nevi count
What type of variable is it? Continuous
Is it normally distributed? No (and small sample
size)
Are the observations correlated? No
Are groups being compared and, if so, how many?
Yes, three
? non-parametric ANOVA

80
Example Nevi counts and marathon runners
Richtig et al. Melanoma Markers in Marathon
Runners Increase with Sun Exposure and Physical
Strain. Dermatology 200821738-44.
81
Non-parametric ANOVA

Kruskal-Wallis one-way ANOVA
(just an extension of the Wilcoxon Sum-Rank test
for 2 groups based on ranks)

82
Example Nevi counts and marathon runners
By non-parametric ANOVA, the groups differ
significantly in nevi count (plt.05) overall. By
Wilcoxon sum-rank test (adjusted for multiple
comparisons), the lowest velocity group differs
significantly from the highest velocity group
(plt.05)
Richtig et al. Melanoma Markers in Marathon
Runners Increase with Sun Exposure and Physical
Strain. Dermatology 200821738-44.
83
Review Question 6

I want to compare depression scores between
three groups, but Im not sure if depression is
normally distributed. What should I do?
Dont worry about itrun an ANOVA anyway.
Test depression for normality.
Use a Kruskal-Wallis (non-parametric) ANOVA.
Nothing, I cant do anything with these data.
Run 3 nonparametric ttests.

84
Review Question 6

I want to compare depression scores between
three groups, but Im not sure if depression is
normally distributed. What should I do?
Dont worry about itrun an ANOVA anyway.
Test depression for normality.
Use a Kruskal-Wallis (non-parametric) ANOVA.
Nothing, I cant do anything with these data.
Run 3 nonparametric ttests.

85
Review Question 7

If depression score turns out to be very
non-normal, then what should I do?
Dont worry about itrun an ANOVA anyway.
Test depression for normality.
Use a Kruskal-Wallis (non-parametric) ANOVA.
Nothing, I cant do anything with these data.
Run 3 nonparametric ttests.

86
Review Question 7

If depression score turns out to be very
non-normal, then what should I do?
Dont worry about itrun an ANOVA anyway.
Test depression for normality.
Use a Kruskal-Wallis (non-parametric) ANOVA.
Nothing, I cant do anything with these data.
Run 3 nonparametric ttests.

87
Review Question 8

I measure blood pressure in a cohort of elderly
men yearly for 3 years. To test whether or not
their blood pressure changed over time, I compare
the mean blood pressures in each time period
using a one-way ANOVA. This strategy is
Correct. I have three means, so I have to use
ANOVA.
Wrong. Blood pressure is unlikely to be normally
distributed.
Wrong. The variance in BP is likely to greatly
differ at the three time points.
Correct. It would also be OK to use three ttests.
Wrong. The samples are not independent.

88
Review Question 8

I measure blood pressure in a cohort of elderly
men yearly for 3 years. To test whether or not
their blood pressure changed over time, I compare
the mean blood pressures in each time period
using a one-way ANOVA. This strategy is
Correct. I have three means, so I have to use
ANOVA.
Wrong. Blood pressure is unlikely to be normally
distributed.
Wrong. The variance in BP is likely to greatly
differ at the three time points.
Correct. It would also be OK to use three ttests.
Wrong. The samples are not independent.

Write a Comment

User Comments (0)

About PowerShow.com