Title: Wilcoxon
1Wilcoxons Rank-Sum Test (two independent
samples) n1 n2 25 Same Distributions
Runs (Labor Data) Naïve Bayes Acc (n1) Ranks Naïve Bayes Acc (n2) Ranks
1 2 3 4 5 6 7 8 9 80.0 88.89 90.0 94.44 94.74 94.74 95.0 95.0 100.0 1.0 4.5 7.0 8.0 10.0 10.0 12.5 12.5 15.0 84.2 85.0 88.89 89.47 94.74 100.0 100.0 2.0 3.0 4.5 6.0 10.0 15.0 15.0
Sample Size 9 7
Mean 92.53 80.44
Rank Sum (W) 80.5 55.5 (accept)
Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2)
Significance, test type 0.05, two-tailed 0.01, two-tailed 0.05, one-tailed 0.01, one-tailed
V 40 35 43 37
2Wilcoxons Rank-Sum Test (two independent
samples) n1 n2 25 Different Distributions
Runs (Labor Data) Naïve Bayes Acc (n1) Ranks J48 Acc (n2) Ranks
1 2 3 4 5 6 7 8 9 80.0 88.89 90.0 94.44 94.74 94.74 95.0 95.0 100.0 3.5 7.5 9.5 11.0 12.5 12.5 14.5 14.5 16.0 65.0 70.0 80.0 84.21 85.0 88.89 90.0 1.0 2.0 3.5 5.0 6.0 7.5 9.5
Sample Size 9 7
Mean 92.53 80.44
Rank Sum (W) 101.5 34.5 (reject)
Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2) Critical Values (Wilcoxon table) H0 mean(Acc1) mean(Acc2)
Significance, test type 0.05, two-tailed 0.01, two-tailed 0.05, one-tailed 0.01, one-tailed
V 40 35 43 37
3Wilcoxons Rank-Sum Test (two independent
samples) n1 n2 gt 25 Different Distributions
Adult Data n1 Naïve Bayes Acc(rank) runs 1 - 15 n1 Naïve Bayes Acc(rank) runs 16 - 30 n2 J48 Acc(rank) runs 1 - 15 n2 J48 Acc(rank) runs 16 - 30
82.66 (1.0) 82.86 (2.0) 82.99 (3.0) 83.06 (4.0) 83.07 (5.0) 83.08 (6.0) 83.1 (7.0) 83.14 (8.0) 83.16 (9.0) 83.21 (10.0) 83.24 (11.0) 83.28 (12.0) 83.31 (13.0) 83.34 (14.0) 83.38 (15.0) 83.39 (16.0) 83.4 (17.0) 83.42 (18.0) 83.43 (19.5) 83.43 (19.5) 83.44 (21.0) 83.45 (22.0) 83.52 (23.0) 83.57 (24.0) 83.61 (25.0) 83.63 (26.0) 83.69 (27.0) 83.71 (28.0) 83.78 (29.0) 83.81 (30.0) 85.7 (31.0) 85.73 (32.0) 85.82 (33.0) 85.83 (34.0) 85.87 (35.0) 85.91 (36.5) 85.91 (36.5) 85.93 (38.0) 85.94 (39.0) 85.95 (40.0) 85.96 (41.0) 85.98 (42.0) 85.99 (43.0) 86.03 (44.5) 86.03 (44.5) 86.04 (46.5) 86.04 (46.5) 86.1 (48.5) 86.1 (48.5) 86.12 (50.5) 86.12 (50.5) 86.2 (52.0) 86.25 (53.0) 86.26 (54.0) 86.27 (55.0) 86.28 (56.0) 86.31 (57.0) 86.36 (58.0) 86.42 (59.0) 86.7 (60.0)
Sample Size 30 30 30 30
Mean 83.339 83.339 86.072 86.072
Rank Sum (W) 465.0 465.0 1365.0 1365.0
Mean(W) 915, STD(W) 67.6387 Mean(W) 915, STD(W) 67.6387 Mean(W) 915, STD(W) 67.6387 Mean(W) 915, STD(W) 67.6387 Mean(W) 915, STD(W) 67.6387
Z statistic -6.653 lt 1.96 (z at alpha 0.05) reject H0 mean(Acc1) mean(Acc2) -6.653 lt 1.96 (z at alpha 0.05) reject H0 mean(Acc1) mean(Acc2) -6.653 lt 1.96 (z at alpha 0.05) reject H0 mean(Acc1) mean(Acc2) -6.653 lt 1.96 (z at alpha 0.05) reject H0 mean(Acc1) mean(Acc2)
4Wilcoxons Matched Pairs Signed Ranks Test (for
paired scores) n 50
Data Example Classifier 1 scores (A) Classifier 1 scores (A) Classifier 2 scores (B) Classifier 2 scores (B) A-B A-B A-B A-B Rank(A-B) Rank(A-B) Signed Rank(A-B)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 78 24 64 45 64 52 30 50 64 50 78 22 84 40 90 72 78 24 64 45 64 52 30 50 64 50 78 22 84 40 90 72 78 24 62 48 68 56 25 44 56 40 68 36 68 20 58 32 78 24 62 48 68 56 25 44 56 40 68 36 68 20 58 32 0 0 2 3 4 4 5 6 8 10 10 14 16 20 32 40 0 0 2 3 4 4 5 6 8 10 10 14 16 20 32 40 0 0 2 3 4 4 5 6 8 10 10 14 16 20 32 40 0 0 2 3 4 4 5 6 8 10 10 14 16 20 32 40 remove remove 1 2 3.5 3.5 5 6 7 8.5 8.5 10 11 12 13 14 remove remove 1 2 3.5 3.5 5 6 7 8.5 8.5 10 11 12 13 14 remove remove 1 2 3.5 3.5 5 6 7 8.5 8.5 10 11 12 13 14
Sum of Signed Ranks Sum of Signed Ranks Sum of Signed Ranks W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0) W 86 W- -19 Select W 19 (reject H0)
Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0 Critical Values (Wilcoxon table) H0 mean(signed_rank(A-B) 0
Significance, test type Significance, test type 0.05, two-tailed 0.05, two-tailed 0.01, two-tailed 0.01, two-tailed 0.05, one-tailed 0.05, one-tailed 0.01, one-tailed 0.01, one-tailed 0.05, two-tailed 0.05, two-tailed
V V 40 40 35 35 43 43 37 37 40 40
5Wilcoxons Matched Pairs Signed Ranks Test (for
paired scores) n gt 50
- Randomly split the Adult data set at 50 100
times. - For each training/testing data set, run Naïve
Bayes and J48 and record their accuracy values as
a pair for which we compute the difference in
accuracy - Determine the signed ranks of the difference for
each pair (as previous example data is omitted
due to space constraints) - We get W 0 and W- 5050 (J48 produces higher
accuracy always), N 100 - We get, mean(W) 2525, STD(W)290.84
- Z(0-2525)/290.84 -8.6818 lt 1.96 (at alpha
0.05)
6What is the Effect Size? (The effect of using
LaPlace smoothing on accuracy of J48)
Runs on Adult data Accuracy of J48 (no LePlace) Acc J48 (LePlace)
1 2 3 4 5 6 7 8 9 10 85.83 85.91 86.12 85.82 86.28 86.42 85.91 86.10 85.95 86.12 85.83 85.91 86.12 85.82 86.28 86.42 85.90 86.10 85.95 86.11
Mean 86.05 86.04
Standard Deviation 0.18585 0.196002
SP2 SP (9 (0.18585)2 9 (0.196002)2) / 18 0.0365 Sqrt(0.0365) 0.1910 (9 (0.18585)2 9 (0.196002)2) / 18 0.0365 Sqrt(0.0365) 0.1910
d (86.05 86.04) / 0.1910 0.0524 This is less than 0.2 ? d is very small to no effect (86.05 86.04) / 0.1910 0.0524 This is less than 0.2 ? d is very small to no effect
7One-Way ANOVA (J48 on three domains)
Runs Runs J48 Acc Adult J48 Acc Pima J48 Acc Pima J48 Acc Credit J48 Acc Credit
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 85.83 85.91 86.12 85.82 86.28 86.42 85.91 86.10 85.95 86.12 75.86 73.18 69.08 74.05 74.71 65.90 76.25 75.10 70.50 73.95 75.86 73.18 69.08 74.05 74.71 65.90 76.25 75.10 70.50 73.95 84.19 85.90 83.83 85.11 86.38 81.20 86.38 86.75 88.03 87.18 84.19 85.90 83.83 85.11 86.38 81.20 86.38 86.75 88.03 87.18
Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot) Results High F and very low p ? Groups are significantly different (see plot)
Source of Variability Sum Squares Degree of Freedom Mean Squares F Statistic MSG/MSE F Statistic MSG/MSE Pro. gt F (p-value)
Groups 1113.2 2 556.598 110.56 110.56 9.9E-14
Error 135.92 27 5.034
Total 1249.12 29
8One-Way ANOVA (J48 on three domains)
9Two-Way ANOVA (J48 N.B. on 3 domains)
Classifier Runs Acc Adult Acc Adult Acc Pima Acc Pima Acc Credit Acc Credit
J48 (A) J48 (A) J48 (A) J48 (A) J48 (A) 1 2 3 4 5 85.83 85.91 86.12 85.82 86.28 85.83 85.91 86.12 85.82 86.28 75.86 73.18 69.08 74.05 74.71 75.86 73.18 69.08 74.05 74.71 84.19 85.90 83.83 85.11 86.38 84.19 85.90 83.83 85.11 86.38
NB (B) NB (B) NB (B) NB (B) NB (B) 1 2 3 4 5 83.08 83.07 83.63 83.16 83.71 83.08 83.07 83.63 83.16 83.71 78.54 74.33 71.37 76.72 78.93 78.54 74.33 71.37 76.72 78.93 74.36 76.07 78.30 79.57 80.00 74.36 76.07 78.30 79.57 80.00
p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different p-values are low ? Columns (H0A), and Interactions(H0AB) are significantly different but Rows(H0B) are the least different
Source of Variability Sum Squares Degree of Freedom Mean Squares Mean Squares F Statistic MSG/MSE F Statistic MSG/MSE Pro. gt F (p-value)
Columns H0A 517.7133 2 258.8567 258.8567 65.4636 65.4636 1.9099E-10
Rows H0B 46.6503 1 46.6503 46.6503 11.7976 11.7976 0.0021643
Interactions H0AB 125.7066 2 62.8533 62.8533 15.8953 15.8953 4.0161E-05
Error 94.901 24 3.9542 3.9542
Total 784.9711 29