Title: Nonparametric Inference
1Nonparametric Inference
2Why Nonparametric Tests?
- We have been primarily discussing parametric
tests i.e. , tests that hold certain assumptions
about when they are valid, e.g. t-tests and ANOVA
both had assumptions regarding the shape of the
distribution (normality) and about the necessity
of having similar groups (homogeneity of
variance). - When these assumptions hold we can use standard
sampling distributions (e.g. t-distribution,
F-distribution) to find p-values.
3Why Nonparametric Tests?
- When these assumptions are violated it is
necessary to turn to tests that do not have such
stringent assumptions nonparametric or
"distribution-free" tests. - Specifically, there are three cases which
necessitate the use of non-parametric tests
1) The data for the response is not at least
interval scale, i.e. measurements. For
example the response might be ordinal.
3) There exists severely unequal variances
between groups, i.e. there is obviously a
violation of the homogeneity of variance
assumption required for parametric tests.
In the last two cases, we have interval level
data, but it violates our parametric assumptions.
Therefore, we no longer treat this data as
interval, but as ordinal. In a sense, we demote
it because it fails to meet specific assumptions.
2) The distribution of the data for the response
is not normal. Recall that a
relatively normal distribution is assumed
for parametric tests.
4Table of Parametric Nonparametric Tests
Parametric Test Nonparametric Test Purpose of Test
Two-Sample t-Test (either case) Mann-Whitney/ Wilcoxon Rank Sum Test Compare two independent samples
Paired t-Test Sign Test or Wilcoxon Signed-Rank Test Compare dependent samples
Oneway ANOVA Kruskal-Wallis Test Compare k-independent samples
5Independent Samples
- For two populations we use
- Mann-Whitney/Wilcoxon Rank Sum Test
- For three or more populations we use
- Kruskal-Wallis Test (at the end)
6Mann-Whitney/Wilcoxon Rank Sum Test
- Alternative to two-sample t-Test
- Use when
- - populations being sampled are not normally
distributed. - - sample sizes are small so assessing
normality is not possible (ni lt 20). - - response is ordinal
7Mann-Whitney/Wilcoxon Rank Sum Test
- General Hypotheses
- Ho distribution of pop. A and pop. B are the
same, i.e. A B - HA distribution of pop. A and pop. B are NOT
the same, i.e A B - HA distribution of pop. A is shifted to the
right of pop. B, i.e. A gt B. - HA distribution of pop. A is shifted to the left
of pop. B, i.e. A lt B
8Mann-Whitney/Wilcoxon Rank Sum Test
Q Is there evidence that the values in
population A are generally larger than those in
population B?
9Mann-Whitney/Wilcoxon Rank Sum Test(Test
Procedure)
- Rank all N nA nB observations in the
combined sample from both populations in
ascending order. - Sum the ranks of the observations from
populations A and B separately and denote the
sums wA and wB. Assign average rank to tied
observations. - For HA A lt B reject Ho if wA is small or wB is
big.For HA A gt B reject Ho if wA is big or
wB is small. - Use tables to determine how big or small the
rank sums must be in order to reject Ho or use
software to conduct the test.
10Mann-Whitney/Wilcoxon Rank Sum Test(Critical
Value Table)
This table contains the value the smaller rank
sum must be less than in order to reject the Ho
for a one-tailed test situation for two
significance levels (a .05 .01) Tables exist
for the two-tailed tests as well.
n is the sample size of the group with the
smaller rank sum.
11Example Huntingtons Disease and
Fasting Glucose Levels
- Davidson et al. studied the responses to oral
glucose in patients with Huntingtons disease and
in a group of control subjects. The five-hour
responses are shown below. Is there evidence to
suggest the five-hour glucose (mg present) is
greater for patients with Huntingtons disease?
Ho Control Huntingtons i.e. C H HA
Control lt Huntingtons i.e. C lt H
12Example Observations Ranks
Control Group (nA 10) Huntingtons Disease (nB 11)
83 85
73 89
65 86
65 91
90 77
77 93
78 100
97 82
85 92
75 86
86
10.5
9
15
3
13
1.5
17
1.5
5.5
16
5.5
19
7
21
8
20
18
10.5
4
13
13
wA 78
wB 153
13Example Critical Value Table
Here, nC 10 (control) nH 11 (Huntingtons) we
will reject Ho C H in favor of HA C lt H if
the rank sum for the control group is less than
86 at a .05 level and less than 77 at a .01
level.
14Example Decision/Conclusion
- Using the Wilcoxon Rank Sum Test we have
evidence to suggest that the five hour glucose
level for individuals with Huntingtons disease
is greater than that for healthy controls (p lt
.05). - Note p lt .05 because the observed rank sum for
the control group is less than 86 which is
the critical value for a .05.
15Rank Sum Test in JMP
The p-values reported based upon large sample
approximations which generally should not be used
when sample sizes are small. Here the conclusion
reached is the same but in general we should use
tables if they are available.
16Rank Sum Test in SPSS
Exact one-tailed p-value .024/2 .012
17Dependent Samples
- Sign Test
- Wilcoxon Signed-Rank Test
18Sign Test
- The sign test can be used in place of the paired
t-test when we have evidence that the paired
differences are NOT normally distributed. - It can be used when the response is ordinal.
- Best used when the response is difficult to
quantify and only improvement can be measured,
i.e. subject got better, got worse, or no change. - Magnitude of the paired difference is lost when
using this test.
19Sign Test
- The sign test looks at the number of () and (-)
differences amongst the nonzero paired
differences. - A preponderance of s or s can indicate that
some type of change has occurred. - If the null hypothesis of no change is true we
expect s and s to be equally likely to occur,
i.e. P() P(-) .50 and the number of each
observed follows a binomial distribution.
20Example Sign Test
- A study evaluated hepatic arterial infusion of
floxuridine and cisplatin for the treatment of
liver metastases of colorectral cancer. - Performance scores for 29 patients was recorded
before and after infusion. Is there evidence
that patients had a better performance score
after infusion?
21Example Sign Test
Patient Before (B) Infusion After (A) Infusion Difference (A B) Patient Before (B) Infusion After (A) Infusion Difference (A B)
1 2 1 -1 16 0 0 0
2 0 0 0 17 0 3 3
3 0 0 0 18 2 3 1
4 1 0 -1 19 2 3 1
5 3 3 0 20 3 2 -1
6 1 0 -1 21 0 4 4
7 1 3 2 22 0 3 3
8 0 0 0 23 1 2 1
9 0 0 0 24 0 3 3
10 0 0 0 25 0 2 2
11 1 0 -1 26 1 1 0
12 1 1 0 27 3 3 0
13 2 1 -1 28 1 2 1
14 3 1 -2 29 0 2 2
15 0 0 0
22Example Sign Test
- Ho No change in performance score following
infusion, or more specifically median
change in performance score is 0. - HA Performance scores improve following
infusion, or more specifically median - change in performance score gt 0.
- Intuitively we will reject Ho if there is a
large number of s.
23Example Sign Test
17 nonzeros differences, 11 s 6 s
Patient Before (B) Infusion After (A) Infusion Difference (A B) Patient Before (B) Infusion After (A) Infusion Difference (A B)
1 2 1 -1 16 0 0 0
2 0 0 0 17 0 3 3
3 0 0 0 18 2 3 1
4 1 0 -1 19 2 3 1
5 3 3 0 20 3 2 -1
6 1 0 2 21 0 4 4
7 1 3 0 22 0 3 3
8 0 0 0 23 1 2 1
9 0 0 0 24 0 3 3
10 0 0 -1 25 0 2 2
11 1 0 0 26 1 1 0
12 1 1 -1 27 3 3 0
13 2 1 -2 28 1 2 1
14 3 1 0 29 0 2 2
15 0 0 0
-
-
-
-
-
-
24Example Sign Test
- If Ho is true, X the number of s has a
binomial dist. with n 17 and p P() .50. - Therefore the p-value is simply the
- P(X gt 11n17, p .50).166 gt a
- We fail to reject Ho, there is insufficient
evidence to conclude the performance score
improves following infusion (p .166).
25Wilcoxon Signed-Rank Test
- The problem with the sign test is that the
magnitude or size of the paired differences is
lost. - The Wilcoxon Signed-Rank Test uses ranks of the
paired differences to retain some sense of their
size. - Use when the distribution of the paired
differences are NOT normal or when sample size is
small. - Can be used with an ordinal response.
26Wilcoxon Signed Rank Test(Test Procedure)
- Exclude any differences which are zero.
- Put the rest of differences in ascending order
ignoring their signs. - Assign them ranks.
- If any differences are equal, average their ranks.
27Example Wilcoxon Signed Rank Test
- Resting Energy Expenditure (REE) for Patient
with Cystic Fibrosis - A researcher believes that patients with cystic
fibrosis (CF) expend greater energy during
resting than those without CF. To obtain a fair
comparison she matches 13 patients with CF to 13
patients without CF on the basis of age, sex,
height, and weight.
28Example Wilcoxon Signed Rank Test
Pair CF (C) Healthy (H) Difference d C - H Sign of Difference Abs. Diff. d Rank d Signed Rank
1 1153 996 157 157 6
2 1132 1080 52 52 3
3 1165 1182 -17 - 17 2
4 1460 1452 8 8 1
5 1634 1162 472 472 13
6 1493 1619 -126 - 126 5
7 1358 1140 218 218 9
8 1453 1123 330 330 11
9 1185 1113 72 72 4
10 1824 1463 361 361 12
11 1793 1632 161 161 7
12 1930 1614 316 216 8
13 2075 1836 239 239 10
6
3
-2
1
13
-5
9
11
4
12
7
8
10
29Example Wilcoxon Signed Rank Test
Pair CF (C) Healthy (H) Difference d C - H Signed Rank
1 1153 996 157 6
2 1132 1080 52 3
3 1165 1182 -17 -2
4 1460 1452 8 1
5 1634 1162 472 13
6 1493 1619 -126 - 5
7 1358 1140 218 9
8 1453 1123 330 11
9 1185 1113 72 4
10 1824 1463 361 12
11 1793 1632 161 7
12 1930 1614 316 8
13 2075 1836 239 10
We then calculate the sum of the positive ranks (
T ) and the sum of the negative ranks (T-
). Here we have T 6 3 1 13 9 11 4
12 7 8 10 84and T- 2 5 7
30Wilcoxon Signed Rank Test(Test Statistic)
- Intuitively we will reject the Ho ,which states
that there is no difference between the
populations, if either one of these rank sums is
large and the other is small. - The Wilcoxon Signed Rank Test uses the smaller
rank sum, T min( T ,T- ) , as the test
statistic.
31Example Wilcoxon Signed Rank Test
- For the cystic fibrosis example we have the
following hypotheses - Ho there is no difference in the resting energy
expenditure of individuals with CF and healthy
controls who are the same gender, age, height,
and weight. - HA the resting energy expenditure of
individuals with CF is greater than that of
healthy individuals who are the same gender, age,
height, and weight.
MEDIAN PAIRED DIFFERENCE 0
MEDIAN PAIRED DIFFERENCE gt 0
32Example Wilcoxon Signed Rank Test
- HA the resting energy expenditure of
individuals with CF is greater than that of
healthy individuals who are the same gender, age,
height, and weight. - The alternative is clearly supported if T is
large or T- is small. - The test statistic T min( T , T- ) 7
- Is T 7 considered small, i.e. what is the
corresponding p-value? - To answer this question we need a Wilcoxon Signed
Rank Test table or statistical software.
33Example Wilcoxon Signed Rank Test
This table gives the value of T min( T , T- )
that our observed value must be less than in
order to reject Ho for the both two- and
one-tailed tests. Here we have n 13 T 7.
We can see that our test statistic is less than
21 (a .05) and 12 (a .01) so we will reject
Ho and we also estimate that our p-value lt .01.
34Example Wilcoxon Signed Rank Test
- We conclude that individuals with cystic fibrosis
(CF) have a large resting energy expenditure when
compared to healthy individuals who are the same
gender, age, height, and weight (p lt .01).
35Analysis in JMP
The test statistic is reported as (T - T-)/2
(84 7)/2 38.50 but we only need p-value
.0023.
36Analysis in SPSS
Click on CF first and then Healthy to specify
that the paired difference will be defined as CF
Healthy specify which tests to conduct.
Note the Difference column is not actually used
in the SPSS analysis.
37Analysis in SPSS
For one-tailed Wilcoxon Signed Rank Test our
p-value .007/2 .0035 (not exact!) For
the Sign Test we have a one-tailed p-value
.022/2 .011
38Independent Samples
- If we have three or more populations to compare
we use - Kruskal Wallis Test
39Kruskal-Wallis Test
- One-way ANOVA for a completely randomized design
is based on the assumption of normality and
equality of variance. - The nonparametric alternative not relying on
these assumptions is called the Kruskal-Wallis
Test. - Like the Mann-Whitney/Wilcoxon Rank Sum Test we
use the sum of the ranks assigned to each group
when considering the combined sample as the basis
for our test statistic.
40Kruskal-Wallis Test
- Basic Idea
- 1) Looking at all observations together, rank
them. - 2) Let R1, R2, ,Rk be the sum of the ranks
of each group - 3) If some Ris are much larger than others,
it indicates the response values in different
groups come from different populations.
41Kruskal-Wallis Test
- The test statistic is
- where,
- N total sample size n1 n2 ... nk
42Kruskal-Wallis Test
- The test statistic is
- Under the null hypothesis, this has an
approximate chi-square distribution with df k
-1, i.e. . - The approximation is OK when each group contains
at least 5 observations. - N total sample size n1 n2 ... nk
43Chi-squared Distribution and p-value
Area p-value
44Example Kruskal-Wallis Test
- A clinical trial evaluating the fever reducing
effects of aspirin, ibuprofen, and acetaminophen
was conducted. Study subjects were adults seen
in an ER with diagnoses of flu with body
temperatures between 100o F and 100.9o F.
Subjects were randomly assigned to treatment.
Changes in body temperature were recorded 2 hrs.
after administration of treatments.
45Example Kruskal-Wallis Test
- Resulting Data Temperature Decrease (deg. F)
Aspirin Rank Ibuprofen Rank Acetaminophen Rank
.95 .39 .19
1.48 .44 1.02
1.33 1.31 .07
1.28 2.48 .01
1.39 .62
-.39 (i.e. temp increase)
4
5
8
6
9
14
11
12
3
15
10
2
13
7
1
N 15 R1 44 R2 50
R3 26 n1 4
n2 5 n3 6
46Example Kruskal-Wallis Test
N 15 R1 44 R2 50
R3 26 n1 4
n2 5 n3 6
47Chi-squared Distribution and p-value
Area .033
48Kruskal-Wallis in JMP (Demo)
Analyze gt Fit Y by X
RESULTS R1 44 n1 4 R2 50 n2 5 R3 26
n3 6 H 6.833 df 2 p .033
49Kruskal-Wallis in SPSS (Demo)
RESULTS R1 /n1 11.00 R2 /n2 10.00 R3 /n3
4.33 H 6.833 df 2 p .033
50Decision/Conclusion
- Using the Kruskal-Wallis test have evidence to
suggest that the temperature changes after taking
the different drugs are not the same (p .033). - Now we might like to know which drugs
significantly differ from one another.
51Multiple Comparisons forKruskal Wallis Test
- If we decide at least two populations differ in
term of what is typical of their values we can
use multiple comparisons to determine which
populations differ. - To do this we calculate an approximate p-value
for each pair-wise comparison and then compare
that p-value to a Bonferroni corrected
significance level (a).
52Multiple Comparisons forKruskal Wallis Test
To determine if group i significantly differs
from group j we compute
.
and then compute p-value
and compare to a/2m where m is the number of
possible pair-wise comparisons, m
53Multiple Comparisons forKruskal Wallis Test
- Comparing Aspirin to Acetominophen
N 15 Aspirin Acetominophen R1
44 R3 26
n1 4 n3 6
Computing the Bonferroni corrected significance
level we have .05/2(3) .00833
54Multiple Comparisons forKruskal Wallis Test
- As this is not significant no others will either,
so how can this be? - The problem is the Bonferroni correction is too
conservative and the approximate normality of the
multiple comparison is valid only when
sample sizes are large and the sample sizes
here quite small. - Thus the comparison shown is fine for a
demonstration of the procedure but the results
cannot be trusted.
55Nonparametric Multiple Comparisons in JMP
56Nonparametric Multiple Comparisons in JMP
57Nonparametric Tests in R