Title: Non-Parametric Methods
1Statistics for Health Research
Non-Parametric Methods
Peter T. Donnan Professor of Epidemiology and
Biostatistics
2Objectives of Presentation
- Introduction
- Ranks Median
- Wilcoxon Signed Rank Test
- Paired Wilcoxon Signed Rank
- Mann-Whitney test
- Spearmans Rank Correlation Coefficient
- Others.
3What are non-parametric tests?
- Parametric tests involve estimating parameters
such as the mean, and assume that distribution of
sample means are normally distributed - Often data does not follow a Normal distribution
eg number of cigarettes smoked, cost to NHS etc. - Positively skewed distributions
4A positively skewed distribution
5What are non-parametric tests?
- Non-parametric tests were developed for these
situations where fewer assumptions have to be
made - NP tests STILL have assumptions but are less
stringent - NP tests can be applied to Normal data but
parametric tests have greater power IF
assumptions met
6Ranks
- Practical differences between parametric and NP
are that NP methods use the ranks of values
rather than the actual values - E.g.
- 1,2,3,4,5,7,13,22,38,45 - actual
- 1,2,3,4,5,6, 7, 8, 9,10 - rank
7Median
- The median is the value above and below which 50
of the data lie. - If the data is ranked in order, it is the middle
value - In symmetric distributions the mean and median
are the same - In skewed distributions, median more appropriate
8Median
- BPs
- 135, 138, 140, 140, 141, 142, 143
- Median
9Median
- BPs
- 135, 138, 140, 140, 141, 142, 143
- Median140
- No. of cigarettes smoked
- 0, 1, 2, 2, 2, 3, 5, 5, 8, 10
- Median
10Median
- BPs
- 135, 138, 140, 140, 141, 142, 143
- Median140
- No. of cigarettes smoked
- 0, 1, 2, 2, 2, 3, 5, 5, 8, 10
- Median2.5
11T-test
- T-test used to test whether the mean of a sample
is sig different from a hypothesised sample mean - T-test relies on the sample being drawn from a
normally distributed population - If sample not Normal then use the Wilcoxon Signed
Rank Test as an alternative
12Wilcoxon Signed Rank Test
- NP test relating to the median as measure of
central tendency - The ranks of the absolute differences between the
data and the hypothesised median calculated - The ranks for the negative and the positive
differences are then summed separately (W- and W
resp.) - The minimum of these is the test statistic, W
13Wilcoxon Signed Rank TestExample
- The median heart rate for an 18 year old girl is
supposed to be 82bpm. A student takes the pulse
rates of 8 female students (all aged 18) - 83, 90, 96, 82, 85, 80, 81, 87
- Do these results suggest that the median might
not be 82?
14Wilcoxon Signed Rank TestExample
15Wilcoxon Signed Rank TestExample
16Wilcoxon Signed Rank TestExample
17Wilcoxon Signed Rank TestExample
- H0 median82
- H1 median?82
- Two-tailed test
- Because one result equals 82 this cannot be used
in the analysis
18Wilcoxon Signed Rank TestExample
Result Above or below median Absolute difference from median82 Rank of difference
83 1 1.5
90 8 6
96 14 7
85 3 4
80 - 2 3
81 - 1 1.5
87 5 5
W 1.5674523.5 W- 31.54.5 So,
W4.5 n7, so the value of W gt tabulated value of
2, so pgt0.05
19Wilcoxon Signed Rank TestExample
- Therefore, the student should conclude that these
results could have come from a population which
had a median of 82 as the result is not
significantly different to the null hypothesis
value.
20Wilcoxon Signed Rank Test Normal Approximation
- As the number of ranks (n) becomes larger, the
distribution of W becomes approximately Normal - Generally, if ngt20
- Mean Wn(n1)/4
- Variance Wn(n1)(2n1)/24
- Z(W-mean W)/SD(W)
21Wilcoxon Signed Rank Test Assumptions
- Population should be approximately symmetrical
but need not be Normal - Results must be classified as either being
greater than or less than the median ie exclude
resultsmedian - Can be used for small or large samples
22Paired samples t-test
- Disadvantage Assumes data are a random sample
from a population which is Normally distributed - Advantage Uses all detail of the available data,
and if the data are normally distributed it is
the most powerful test
23The Wilcoxon Signed Rank Test for Paired
Comparisons
- Disadvantage Only the sign ( or -) of any
change is analysed - Advantage Easy to carry out and data can be
analysed from any distribution or population
24Paired And Not Paired Comparisons
- If you have the same sample measured on two
separate occasions then this is a paired
comparison - Two independent samples is not a paired
comparison - Different samples which are matched by age and
gender are paired
25The Wilcoxon Signed Rank Test for Paired
Comparisons
- Similar calculation to the Wilcoxon Signed Rank
test, only the differences in the paired results
are ranked - Example using SPSS
- A group of 10 patients with chronic anxiety
receive sessions of cognitive therapy. Quality of
Life scores are measured before and after
therapy.
26Wilcoxon Signed Rank Test example
QoL Score QoL Score
Before After
6 9
5 12
3 9
4 9
2 3
1 1
3 2
8 12
6 9
12 10
27Wilcoxon Signed Rank Test example
28(No Transcript)
29(No Transcript)
30(No Transcript)
31SPSS Output
p lt 0.05
32Mann-Whitney test
- Used when we want to compare two unrelated or
INDEPENDENT groups - For parametric data you would use the unpaired
(independent) samples t-test - The assumptions of the t-test were
- The distribution of the measure in each group is
approx Normally distributed - The variances are similar
33Example (1)
- The following data shows the number
- of alcohol units per week collected in a
- survey
- Men (n13) 0,0,1,5,10,30,45,5,5,1,0,0,0
- Women (n14) 0,0,0,0,1,5,4,1,0,0,3,20,0,0
- Is the amount greater in men compared
- to women?
34Example (2)
- How would you test whether the
- distributions in both groups are
- approximately Normally distributed?
35Example (2)
- How would you test whether the
- distributions in both groups are
- approximately Normally distributed?
- Plot histograms
- Stem and leaf plot
- Box-plot
- Q-Q or P-P plot
36Boxplots of alcohol units per week by gender
37Example (3)
- Are those distributions symmetrical?
38Example (3)
- Are those distributions symmetrical?
- Definitely not!
- They are both highly skewed so not
- Normal. If transformation is still not Normal
- then use non-parametric test Mann Whitney
- Suggests perhaps that males tend to
- have a higher intake than women.
39Mann-Whitney on SPSS
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44Normal approx (NS)
Mann-Whitney (NS)
45Spearman Rank Correlation
- Method for investigating the relationship between
2 measured variables - Non-parametric equivalent to Pearson correlation
- Variables are either non-Normal or measured on
ordinal scale
46Spearman Rank Correlation Example
- A researcher wishes to assess whether
- the distance to general practice
- influences the time of diagnosis of
- colorectal cancer.
- The null hypothesis would be that
- distance is not associated with time to
- diagnosis. Data collected for 7 patients
47Distance from GP and time to diagnosis
Distance (km) Time to diagnosis (weeks)
5 6
2 4
4 3
8 4
20 5
45 5
10 4
48Scatterplot
49Distance from GP and time to diagnosis
Distance (km) Time (weeks) Rank for distance Rank for time Difference in Ranks D2
2 4 1 3 -2 4
4 3 2 1 1 1
5 6 3 7 -4 16
8 4 4 3 1 1
10 4 5 3 2 4
20 5 6 5.5 0.5 0.25
45 5 7 5.5 1.5 2.25
Total 0 ?d228.5
50Spearman Rank Correlation Example
- The formula for Spearmans rank
- correlation is
- where n is the number of pairs
51Spearmans on SPSS
52Spearmans in SPSS
53Spearmans in SPSS
54Spearmans in SPSS
55Spearman Rank Correlation Example
- In our example, rs0.468
- In SPSS we can see that this value is not
significant, ie.p0.29 - Therefore there is no significant
- relationship between the distance to a
- GP and the time to diagnosis but note that
correlation is quite high!
56Spearman Rank Correlation
- Correlations lie between 1 to 1
- A correlation coefficient close to zero indicates
weak or no correlation - A significant rs value depends on sample size and
tells you that its unlikely these results have
arisen by chance - Correlation does NOT measure causality only
association
57Chi-squared test
- Used when comparing 2 or more groups of
categorical or nominal data (as opposed to
measured data) - Already covered!
- In SPSS Chi-squared test is test of observed vs.
expected in single categorical variable
58More than 2 groups
- So far we have been comparing 2 groups
- If we have 3 or more independent groups and data
is not Normal we need NP equivalent to ANOVA - If independent samples use Kruskal-Wallis
- If related samples use Friedman
- Same assumptions as before
59More than 2 groups
60Parametric related to Non-parametric test
Parametric Tests Non-parametric Tests
Single sample t-test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearsons correlation
61Parametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearsons correlation
62Parametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test Paired Wilcoxon-signed rank
2 independent samples t-test
One-way Analysis of Variance
Pearsons correlation
63Parametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test Paired Wilcoxon-signed rank
2 independent samples t-test Mann-Whitney test (Note sometimes called Wilcoxon Rank Sums test!)
One-way Analysis of Variance
Pearsons correlation
64Parametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test Paired Wilcoxon-signed rank
2 independent samples t-test Mann-Whitney test (Note sometimes called Wilcoxon Rank Sums test!)
One-way Analysis of Variance Kruskal-Wallis
Pearsons correlation
65Parametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test Paired Wilcoxon-signed rank
2 independent samples t-test Mann-Whitney test(Note sometimes called Wilcoxon Rank Sums test!)
One-way Analysis of Variance Kruskal-Wallis
Pearsons correlation Spearman Rank
66Summary Non-parametric
- Non-parametric methods have fewer assumptions
than parametric tests - So useful when these assumptions not met
- Often used when sample size is small and
difficult to tell if Normally distributed - Non-parametric methods are a ragbag of tests
developed over time with no consistent framework - Read in datasets LDL, etc and carry out
appropriate Non-Parametric tests
67References
Corder GW, Foreman DI. Non-parametric Statistics
for Non-Statisticians. Wiley, 2009. Nonparametric
statistics for the behavioural Sciences. Siegel
S, Castellan NJ, Jr. McGraw-Hill, 1988 (first
edition was 1956)