Title: Similarity-based Classifier Combination for Decision Making
1Similarity-based Classifier Combination for
Decision Making
- Authors Gongde Guo, Daniel Neagu
- Department of Computing, University of Bradford
2Outline of Presentation
- Background
- Classification process
- Drawbacks of A Single Classifier
- Solutions
- Approaches for Multiple Classifier Systems
- Explanation of the Four Approaches
- An Architecture of Multiple Classifier System
- Involved Classifiers for Combination
- K-Nearest Neighbour Method (kNN)
- Weighted k-Nearest Neighbour Method (wkNN)
- Contextual Probability-based Classification (CPC)
- kNN Model-based Method (kNNModel)
- Combination Strategies
- Majority voting based combination
- Maximal Similarity-based Combination
- Average Similarity-based Combination
- Weighted Similarity-based Combination
- Experimental Results
- Conclusions
3Background - Classification Process
- Classification occurs in a wide range of human
activities. At its broadest, the term could cover
any activity in which some decision or forecast
is made on the basis of currently available
information, and a classifier is then some formal
method for repeatedly making such judgments in
new situations (Michie et al. 1994) . - Various approaches to classification have
been developed and applied to real-world
applications for decision making. Examples
include probabilistic decision theory,
discriminant analysis, fuzzy-neural networks,
belief networks, non-parametric methods,
tree-structured classifiers, and rough sets.
4Background - Drawbacks of A Single Classifier
- Unfortunately, no dominant classifier exists
for all the data distributions, and the data
distribution of the task at hand is usually
unknown. A single classifier cannot be
discriminative enough if the number of classes is
huge. For applications where the classes of
content are numerous, unlimited, and
unpredictable, one specific classifier cannot
solve the problem with a good accuracy.
5Background - Solutions
- A Multiple Classifier System (MCS) is a
powerful solution to difficult decision making
problems involving large sets and noisy input
because it allows simultaneous use of arbitrary
feature descriptors and classification
procedures. - The ultimate goal of designing such a
multiple classifier system is to achieve the best
possible classification performance for the task
at hand. Empirical studies have observed that
different classifier designs potentially offer
complementary information about the patterns to
be classified, which could be harnessed to
improve the performance of the selected
classifier.
6Architecture of Multiple Classification Systems
Given a set of classifiers CC1, C2, , CL and
a dataset D, each instance x in D represents as a
feature vector x1, x2, , xnT, x
A classifier gets as its input x and
assigns it to a class label from O, i.e.
Four approaches are generally used to design a
classifier combination system (Kuncheva, 2003).
7Explanation of the Four Approaches
Approach 1 The problem is to pick a combination
scheme for L classifiers C1, C2, , CL studied to
form a combiner. Approach 2 The problem is to
choose individuals (classifiers) by considering
the issues of similarity/ diversity,
homogeneous/heterogeneous etc. Approach 3 The
problem is to build each Ci on an individual
subset of features (subspace of ) Approach
4 The problem is to select training subsets D1,
D2, , Dm of the dataset D to lead to a team of
diverse classifiers.
8An Architecture of Multiple Classifier System
Data Sets
kNN
Output1
Classifier Combination
kNNModel
Output2
Output
Data Pre-processing
wkNN
Output3
GR
IG
CFS
MSC
ASC
WSC
CPC
Output4
9Involved Classifiers for Combination- kNN
Given an instance x, the k-nearest neighbour
classifier finds its k nearest instances, and
traditionally uses the majority rule (or majority
voting rule) to determine its class, i.e.
assigning the single most frequent class label
associated with the k nearest neighbours to x.
This is illustrated in Figure 3. The two classes
here are depicted by ? and o, with ten
instances for each class. Each instance is
represented by a two-dimensional point within a
continuous-valued Euclidean space. The instance
x, represented as .
10 Involved Classifiers for Combination- wkNN
In wkNN, the k nearest neighbours are assigned
different weights. Let ? be a distance measure,
and x1, x2, , xk be the k nearest neighbours of
x arranged in increasing order of ?(xi, x). So
x1 is the first nearest neighbour of x. The
distance weight wi for i-th neighbour xi is
defined as follows
Instance x is assigned to the class for which the
weights of the representatives among the k
nearest neighbours sum to the greatest value.
11Involved Classifiers for Combination- CPC
- Contextual probability-based classifier (CPC)
(Guo et al., 2004) is based on a new function G
a probability function used to calculate the
support of overlapping or non-overlapping
neighbourhoods. The idea of CPC is to aggregate
the support of multiple sets of nearest
neighbours of a new instance for various classes
to give a more reliable support value, which
better reveals the true class of this instance.
12 Involved Classifiers for Combination- kNNModel
The basic idea of kNN model-based classification
method (kNNModel) (Guo et al. 2003) is to find a
set of more meaningful representatives of the
complete data set to serve as the basis for
further classification. Each chosen
representative xi is represented in the form of
ltCls(xi), Sim(xi), Num(xi), Rep(xi)gt which
respectively represents the class label of xi
the similarity of xi to the furthest instance
among the instances covered by Ni the number of
instances covered by Ni a representation of
intance xi. The symbol Ni represents the area
that the distance to Ni is less than or equal to
Sim(xi). kNNModel can generate a set of optimal
representatives via inductively learning from the
dataset.
13Combination StrategyMajority Voting-based
Combination
Given a new instance x to be classified, whose
true class label is tx
and k predefined classifiers
are denoted as A1, A2, , Ak respectively, the
classifier Ai approximates a discrete-valued
function
The final class label of x, obtained by using
majority voting-based classifier combination, is
described as follows f(x)
where
if ab, and
otherwise
14Combination Strategy Class-wise
similarity-based classifier combination
The classification result of x classified by Aj
is given by a vector of normalized similarity
values of x to each class, represented by S
ltSj1, Sj2, , Sjmgt, where j1, 2, , k. The final
class label of x can be obtained in three
different ways a) Maximal Similarity-based
Combination (MSC)
b) Average Similarity-based Combination (ASC)
c) Weighted Similarity-based Combination (WSC)
,
where
is a control parameter used for setting the
relative importance of local optimization and
global optimization of combination.
15Experimental Results
This study mainly focuses on Approach 1. Given
four classifiers kNN, kNNModel, CPC and wkNN, we
proposed three similarity-based classifier
combination schemes empirically. After evaluating
them on fifteen public datasets from UCI machine
learning repository, we apply the best approach
to a real-world application of toxicity
prediction of the environment effects of
chemicals in order to obtain better
classification performance.
16Fifteen public data sets from the UCI machine
learning repository and one data set (Phenols)
from real-world applications (toxicity prediction
of chemical compounds) have been collected for
training and testing. Some information about
these data sets is given in Table 1.
Table 1. Some information about the data sets
Data set NF NN NO NB NI CD
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 14 23 8 9 13 13 19 34 4 6 60 18 16 13 16 173 20 173 20 4 16 0 0 3 3 6 0 0 0 0 0 0 0 16 0 0 0 0 6 7 8 9 7 7 1 34 4 6 60 18 0 13 0 173 20 173 20 4 0 0 0 3 3 12 0 0 0 0 0 16 0 0 0 0 0 0 690 368 768 214 303 270 155 351 150 345 208 846 435 178 90 250 250 250 250 383307 232136 268500 701776013929 164139 120150 32123 126225 505050 145200 97111 212217218199 267168 597148 3718312479 1732741927 1732741927 3715261 3715261
In Table 1, NF-Number of Features, NN-Number of
Nominal features, NO-Number of Ordinal features,
NB-Number of Binary features, NI-Number of
Instances, CD-Class Distribution. Four Phenols
data sets are used in the experiment, where
Phenols_M represents the phenols data set with
MOA (Mechanism of Action) as endpoint for
prediction Phenols_M_FS represents the Phenols_M
data set after feature selection Phenols_T
represents the Phenols data set with toxicity as
endpoint for prediction, and Phenols_T_FS
represents Phenols_T data set after feature
selection.
17Table 2. A comparison of four individual
algorithms and MV in classification performance.
Data set kNNModel e N vkNN wkNN CPC MV
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 86.09 83.61 75.78 69.52 82.67 81.85 89.33 94.29 96.00 68.53 84.00 66.55 91.74 95.29 92.22 83.20 89.20 71.60 75.60 2 1 1 3 0 1 1 0 0 2 0 2 4 0 0 0 0 2 2 5 4 5 3 1 3 2 1 2 2 3 3 5 1 2 0 0 4 4 85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56 87.20 88.80 74.40 73.60 82.46 81.94 72.37 67.42 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56 86.80 92.80 74.40 72.40 84.64 83.61 72.63 68.57 82.67 81.48 82.67 84.86 96.00 65.88 87.50 70.12 91.74 95.88 96.67 87.60 91.20 74.80 77.20 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00
Average 83.00 / / 82.25 82.17 82.93 83.53
18Table 3. A comparison of different combination
schemes
Data set SVM C5.0 MV MSC ASC WSC a e N
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 81.45 83.89 77.11 62.86 83.67 84.07 82.67 87.14 98.67 69.71 74.00 77.50 96.96 95.29 97.79 84.40 89.20 65.60 76.00 85.5 80.9 76.6 66.3 74.9 75.6 80.7 84.5 92.0 65.8 69.4 67.9 96.1 92.1 91.1 90.0 89.2 72.8 74.0 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00 86.52 84.72 75.13 70.95 82.33 81.85 87.33 89.43 96.67 70.59 88.50 70.83 92.61 96.47 95.56 88.40 92.40 76.40 77.20 86.23 84.17 75.13 70.95 82.33 81.48 86.67 88.86 96.67 71.18 88.50 71.90 92.61 96.47 96.67 88.20 92.40 76.00 76.40 86.52 84.72 75.13 70.95 82.67 81.85 87.33 89.43 96.67 71.18 89.00 71.90 92.61 96.47 96.67 88.80 92.40 76.40 77.20 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.7 0.7 0.8 0.7 0.8 0.7 0.7 0.7 0.8 0.7 0.7 0.7 2 3 4 2 4 2 1 1 1 3 2 2 2 1 0 0 4 2 3 5 0 0 0 0 4 5 0 4 5 0 5 0 0 0 0 0 0 0
Average 82.53 80.28 83.53 84.42 84.39 84.64 / / /
19Table 4. The signed test of different classifiers
SVM C5.0 kNNModel vkNN wkNN CPC MV
MV -0.69 (-) 2.98 () -0.33 (-) 2.52 () 2.07 () 0.23 (-) /
WSC 1.15 (-) 2.98 () 2.07 () 3.44 () 2.98 () 2.98 () 2.52 ()
In Table 4, the item 2.07 () in cell (3, 4), for
example, means WSC is better than kNNModel in
terms of performance over the nineteen data sets.
That is, the corresponding ZgtZ0.951.729. The
item 1.15 (-) in cell (3, 2) means there is no
significant difference in terms of performance
between WSC and SVM over nineteen data sets as
the corresponding ZltZ0.951.729.
20 Conclusions
- The proposed methods directly employ
class-wise similarity measure used in each
individual classifier for combination without
changing the representation from similarity to
probability. - It significantly improves the average
classification accuracy carried out over nineteen
data sets. The average classification accuracy of
WSC is better than that of any other individual
classifiers and the majority voting-based
combination method. - The statistical test also shows that the
proposed combination method WSC is better than
any individual classifier with an exception of
SVM. - The average classification accuracy of WSC is
still better than that of SVM with a 2.49
improvement. - Further research is required into how to
combine heterogeneous classifiers using
class-wise similarity-based combination methods.
21References
- (Michie et al. 1994)
- D. Michie, D.J.Spiegelhalter, and
C.C.Taylor. Machine Learning, Neural and
Statistical Classification, Ellis Horwood, 1994. - (Guo et al. 2003)
- G. Guo, H. Wang, D. Bell, Y. Bi, K.
Greer. kNN Model-Based Approach in
Classification. In Proc. of ODBASE 2003, LNCS
2888/2003, pp. 986-996, 2003. - (Guo et al. 2004)
- G. Guo, H. Wang, D. Bell, Z. Liao.
Contextual Probability-Based Classification. In
Proc. of ER 2004, LNCS 3288/2004, pp. 313-326,
Springer-Verlag, 2004. - (Kuncheva, 2003)
- Kuncheva. L.I. Combining Classifiers
Soft Computing Solutions. In S.K. Pal (Eds.)
Pattern Recognition From Classical to Modern
Approaches, pp. 427-452, World Scientific,
Singapore, 2003.
22 Thank you very much!