Title: Using Statistics To Make Inferences 6
1Using Statistics To Make Inferences 6
- Summary
-
- Non-parametric tests.
- Wilcoxon Signed Ranks Test.
- Wilcoxon Matched Pairs Signed Ranks Test.
- Wilcoxon Rank Sum Test /
- Mann-Whitney Test.
1
2Goals
- Perform and interpret Wilcoxon Signed Ranks
Test. -
- Perform and interpret Wilcoxon Matched Pairs
Signed Ranks Test. -
- Perform and interpret Wilcoxon Rank Sum Test /
Mann-Whitney Test. -
- Know when each test is appropriate.
2
3Practical
- Perform a series of Mann-Whitney tests and
compare the results to those obtained from
t-tests.
3
4Recall
- Which tests might be appropriate when testing the
mean of normal data? - What criteria would you employ to select the
appropriate test?
z or t
For z need s, for t calculate s
4
5Today
- What if we cannot assume normality?
- We shall develop a more powerful test than the
sign test for the median (see lecture 2).
5
6Non-parametric Tests
- A single sample test
-
- Wilcoxon Signed Ranks Test
-
6
7Wilcoxon Signed Ranks Test Procedure
- Take the difference between each observation and
the median (? - eta).
- Rank the absolute differences from 1 to n,
allowing for ties. (1 smallest, n largest)
- Sum the rank values for those observations above
?, let this be W
- Sum the rank values for those observations below
?, let this be W-
- Use the smaller of W or W- to use as the test
statistic. Call this Wcalc (or simply W).
Critical values of the test statistic are given
in tables for various significance levels.
7
8Wilcoxon Signed Ranks Test Notes
- If two or more differences are equal (tied) they
are assigned the average of the ranks. -
If a difference is zero, it is omitted.
8
9Example
- It has been established that an individuals
median reaction time is 0.250 seconds. Twelve
trials are conducted after the individual has
consumed alcohol. The measured times are
0.235 0.252 0.312 0.264 0.323 0.241 0.284 0.306
0.248 0.284 0.298 0.320
Test whether the data are consistent with the
median value.
9
10Step 1
- The first step is to subtract the median (0.25)
from every value.
10
11Step 1
Median 0.250 0.235 0.250 -0.015
11
12Step 2
- Now calculate and rank on the absolute
differences
12
13Step 2
Now rank the data
13
14Step 2
Is the ranking unambiguous?
14
15Step 2
ltlt Tied Values, so average ranks
15
16Step 2
16
17Step 3/4
- Now separate the contributions for positive
(negative) differences
17
18Step 3/4
Now sum the contributions for positive and
negative differences
18
19Step 3/4
19
20Step 5
W68.5 W-9.5
As a cross check Note that WW- ½ n (n1) Here
n 12 and ½ n (n1) ½ 12 (121) 78 As
expected
20
21Step 5
W68.5 W-9.5
Therefore the result is significant at the 5
level (14 gt 9.5), so the null hypothesis can be
rejected. The median is apparently not consistent
with 0.250 seconds.
21
22Normal approximation
- W 9.5 n 12
- Employing the normal approximation,
In this case the continuity correction is added
since a lower tail is being considered.
22
23Normal approximation
The p value is 2 x 0.0115 0.02 remarkably
close to the exact value
23
24Wilcoxon Matched Pairs Signed Ranks Test -
Example
- Certain mental tasks are performed before and
after exercise. The scores were recorded.
Is there any evidence of a significant difference
in the levels of performance under the two
conditions?
Still effectively a single sample since we seek a
change.
24
25Find Differences
Find differences
25
26Find Absolute Differences
Find absolute differences
26
27Find Absolute Differences
Now rank the absolute differences
27
28Rank the Absolute Differences
Now deal with ties
28
29Rank the Absolute Differences
Now find the true ranks.
29
30Rank the Absolute Differences
Now separate positive and negative values
30
31Separate the Contributions for Positive
(Negative) Differences
Now form the totals
31
32Separate the Contributions for Positive
(Negative) Differences
32
33Conclusion
W50 W-5
As a cross check Note that WW- ½ n (n1) Here
n 10 and ½ n (n1) ½ 10 (101) 55 As
expected
33
34Conclusion
W50 W-5
Therefore the result is significant at the 5
level (8 gt 5), so the null hypothesis can be
rejected. The scores appear to differ.
34
35Aside
- It is claimed that it takes 15 minutes to mark
an examination question. - Test this claim using the following timings (x)
for marking 10 questions. - 12 15 14 16 12 15 14 16 11 15
- You might find the following sums useful
- Sx 140 and Sx2 1988
35
36Aside
- What are we testing?
- Which test is appropriate?
µ the population mean
a
t, we dont know s
36
37Aside
n 10 Sx 140 Sx2 1988
mean
variance
37
38Aside
n 10 Sx 140 Sx2 1988
C C C C C C C C C C c
38
39Aside
n 10 µ 15
? n 1 9
Highlighting the values for 95 and 99
confidence for a two tail test.
39
40Aside
- Since tcalc lt tcrit (1.79 lt 2.262) the
experiment is consistent with a population mean
of 15 at 95 confidence. - This is supported by software which has a 95
confidence interval that includes 15 and a
p-value of 0.107 (gt0.05).
C C C C C C C C C C C c
40
41Aside
C C C C C C C C C C C C c
t value p-value Confidence interval
Confidence interval (15-2.26,15.26)
(12.74,15.26)
41
42Now for Two Samples
- For normal data we have the two sample t test of
lecture 4. - What if data is not normal?
42
43Wilcoxon Rank Sum Test / Mann-Whitney Test
- Combine the observations from the two samples
(sizes n1 and n2).
- Rank the sorted data from 1 to (n1n2).
- Calculate R1, as the sum of the ranks of the
first sample and R2 for the second.
- Form the Mann-Whitney test statistic.
43
44Normal Approximation
- If n1 and n2 are greater than 8 a normal
approximation may be employed where
In this case the continuity correction is added
since a lower tail is being considered. The
approximation is particularly relevant if the
tables are not extensive enough.
44
45Wilcoxon Rank Sum Test Example
- A study of 22 patients suffering from Parkinsons
disease was conducted. An operation was performed
on 8 of them, while it improved their general
condition it might adversely affect their speech.
In the data a higher value indicates a greater
difficulty in speaking.
45
46Wilcoxon Rank Sum Test Data
Operated 2.6 2.0 1.7 2.7 2.5 2.6 2.5
3.0 Others 1.2 1.8 1.9 2.3 1.3 3.0 2.2 1.3 1.5
1.6 1.3 1.5 2.7 2.0
46
47Ranked Data
Speech Source Rank 1.2 Others
1 1.3 Others 2 1.3 Others
3 1.3 Others 4 1.5 Others
5 1.5 Others 6
1.6 Others 7 1.7 Operated 8
1.8 Others 9 1.9 Others
10 2.0 Operated 11 2.0 Others
12 2.2 Others 13
2.3 Others 14 2.5 Operated 15
2.5 Operated 16 2.6 Operated 17
2.6 Operated 18 2.7 Operated 19
2.7 Others 20
3.0 Operated 21 3.0 Others 22
Note the ties
47
48Ranked Data
Speech Source Rank 1.2 Others
1 1.3 Others 2
1.3 Others 3 1.3 Others 4
1.5 Others 5
1.5 Others 6 1.6 Others 7
1.7 Operated 8 1.8 Others
9 1.9 Others 10
2.0 Operated 11 2.0 Others 12
2.2 Others 13 2.3 Others 14
2.5 Operated 15 2.5 Operated 16
2.6 Operated 17 2.6 Operated 18
2.7 Operated 19 2.7 Others 20
3.0 Operated 21 3.0 Others
22
Now find the true ranks
48
49Ranked Data
Speech Source Rank True Rank 1.2 Others
1 1 1.3 Others 2 3
1.3 Others 3 3 1.3 Others
4 3 1.5 Others 5 5.5
1.5 Others 6 5.5 1.6 Others
7 7 1.7 Operated 8 8
1.8 Others 9 9 1.9 Others 10
10 2.0 Operated 11 11.5
2.0 Others 12 11.5 2.2 Others 13
13 2.3 Others 14 14
2.5 Operated 15 15.5 2.5 Operated 16
15.5 2.6 Operated 17 17.5
2.6 Operated 18 17.5 2.7 Operated 19
19.5 2.7 Others 20 19.5
3.0 Operated 21 21.5 3.0 Others 22
21.5
Now separate the contributions for each source.
49
50Ranked Data
Speech Source Rank True Rank Others Operated
1.2 Others 1 1 1
1.3 Others 2 3 3
1.3 Others 3 3 3
1.3 Others 4 3 3
1.5 Others 5 5.5 5.5
1.5 Others 6 5.5 5.5
1.6 Others 7 7 7
1.7 Operated 8 8 8
1.8 Others 9 9 9
1.9 Others 10 10 10
2.0 Operated 11 11.5 11.5
2.0 Others 12 11.5 11.5
2.2 Others 13 13 13
2.3 Others 14 14 14
2.5 Operated 15 15.5 15.5
2.5 Operated 16 15.5 15.5
2.6 Operated 17 17.5 17.5
2.6 Operated 18 17.5 17.5
2.7 Operated 19 19.5 19.5
2.7 Others 20 19.5 19.5
3.0 Operated 21 21.5 21.5
3.0 Others 22 21.5 21.5
Now sum the contributions for each source
50
51Ranked Data
Speech Source Rank True Rank Others Operated
1.2 Others 1 1 1
1.3 Others 2 3 3
1.3 Others 3 3 3
1.3 Others 4 3 3
1.5 Others 5 5.5 5.5
1.5 Others 6 5.5 5.5
1.6 Others 7 7 7
1.7 Operated 8 8 8
1.8 Others 9 9 9
1.9 Others 10 10 10
2.0 Operated 11 11.5 11.5
2.0 Others 12 11.5 11.5
2.2 Others 13 13 13
2.3 Others 14 14 14
2.5 Operated 15 15.5 15.5
2.5 Operated 16 15.5 15.5
2.6 Operated 17 17.5 17.5
2.6 Operated 18 17.5 17.5
2.7 Operated 19 19.5 19.5
2.7 Others 20 19.5 19.5
3.0 Operated 21 21.5 21.5
3.0 Others 22 21.5 21.5
Total 126.5 126.5
It is pure chance that the two sums are equal
51
52Calculations
R1126.5, R2126.5, n1 8, n2 14
Note, only need to evaluate one.
52
53Conclusion
U1 21.5 U2 90.5
(mid-point ½ n1 n2 56 so only need calculate
one)
For n18, n214, the critical value from the
tables for p0.05 is 26. The result is
significant at the 5 level (26 gt 21.5), the two
samples appear to differ.
53
54Normal Approximation
- Employing the normal approximation for U1 21.5
The p value is 2 x 0.0102 0.02 remarkably
close to the exact value
54
55SPSS Verification
- Note that the groups must have numerical
identifiers - 1 operated 2 others.
- Analyze gt Nonparametric tests gt Legacy Dialogs gt
- 2 independent samples
55
56SPSS Verification
Note the need to define group variables (1/2)
56
57SPSS Verification
The previous rank sums are reproduced.
57
58SPSS Verification
The previous U and Z values are reproduced.
The p value is less than .05
58
59Parametric vs Non-Parametric Tests
- Parametric Tests
-
- They are robust with respect to violations of
their assumptions. - They are more powerful - more likely to detect
an effect when one is present. - They are more versatile there are tests for
every experimental design.
59
60Parametric vs Non-Parametric Tests
- Non-Parametric Tests
-
- They make fewer assumptions.
- They are ideal for ordinal data, which is common
in Psychology, where as parametric tests require
interval or ratio data.
60
61Light Back Ground Reading
Last year, the BBC ran a six-part primer by
Michael Blastland on understanding statistics in
the news. Blastland takes on the medias handling
of surveys/polls, counting, percentages,
averages, causation and doubt. Wouldn't it be
good, Blastland said, to have the mental
agility to separate the wheat from the chaff? He
then proceeds, in six weekly articles, to point
out the obvious vs. the correct ways to interpret
the data. Topics covered are Surveys Counting Perc
entages Averages Causation Doubt
A Statistical Primer from the BBC
61
62Read
- Read Howitt and Cramer pages 168-177
- Read Howitt and Cramer (e-text) pages 154-164,
167-173 - Read Russo (e-text) pages 168-175
- Read Davis and Smith pages 448-459
62
63General Reading
- The Mann-Whitney U A Test for Assessing Whether
Two Independent Samples Come - from the Same Distribution
- N. Nachar
- Tutorials in Quantitative Methods for Psychology
2008, vol. 4(1), p. 13-20. - Paper
63
64Practical 6
- This material is available from the module web
page. - http//www.staff.ncl.ac.uk/mike.cox
Module Web Page
64
65Practical 6
- This material for the practical is available.
Instructions for the practical Practical 6
Material for the practical Practical 6
65
66Whoops!
Be under no illusion the 139.50 for colour
licence (47 for black and white) will be
infinitely more affordable than the maximum,
1,000 fine for avoidance. The Guardian 4
November 2008
66