Similarity-based Classifier Combination for Decision Making - PowerPoint PPT Presentation

1 / 22
About This Presentation

Similarity-based Classifier Combination for Decision Making


Title: data preprocessing and classification Author: gongde guo Last modified by: gguo Created Date: 10/5/2004 11:01:49 AM Document presentation format – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 23
Provided by: gongd1


Transcript and Presenter's Notes

Title: Similarity-based Classifier Combination for Decision Making

Similarity-based Classifier Combination for
Decision Making
  • Authors Gongde Guo, Daniel Neagu
  • Department of Computing, University of Bradford

Outline of Presentation
  • Background
  • Classification process
  • Drawbacks of A Single Classifier
  • Solutions
  • Approaches for Multiple Classifier Systems
  • Explanation of the Four Approaches
  • An Architecture of Multiple Classifier System
  • Involved Classifiers for Combination
  • K-Nearest Neighbour Method (kNN)
  • Weighted k-Nearest Neighbour Method (wkNN)
  • Contextual Probability-based Classification (CPC)
  • kNN Model-based Method (kNNModel)
  • Combination Strategies
  • Majority voting based combination
  • Maximal Similarity-based Combination
  • Average Similarity-based Combination
  • Weighted Similarity-based Combination
  • Experimental Results
  • Conclusions

Background - Classification Process
  • Classification occurs in a wide range of human
    activities. At its broadest, the term could cover
    any activity in which some decision or forecast
    is made on the basis of currently available
    information, and a classifier is then some formal
    method for repeatedly making such judgments in
    new situations (Michie et al. 1994) .
  • Various approaches to classification have
    been developed and applied to real-world
    applications for decision making. Examples
    include probabilistic decision theory,
    discriminant analysis, fuzzy-neural networks,
    belief networks, non-parametric methods,
    tree-structured classifiers, and rough sets.

Background - Drawbacks of A Single Classifier
  • Unfortunately, no dominant classifier exists
    for all the data distributions, and the data
    distribution of the task at hand is usually
    unknown. A single classifier cannot be
    discriminative enough if the number of classes is
    huge. For applications where the classes of
    content are numerous, unlimited, and
    unpredictable, one specific classifier cannot
    solve the problem with a good accuracy.

Background - Solutions
  • A Multiple Classifier System (MCS) is a
    powerful solution to difficult decision making
    problems involving large sets and noisy input
    because it allows simultaneous use of arbitrary
    feature descriptors and classification
  • The ultimate goal of designing such a
    multiple classifier system is to achieve the best
    possible classification performance for the task
    at hand. Empirical studies have observed that
    different classifier designs potentially offer
    complementary information about the patterns to
    be classified, which could be harnessed to
    improve the performance of the selected

Architecture of Multiple Classification Systems
Given a set of classifiers CC1, C2, , CL and
a dataset D, each instance x in D represents as a
feature vector x1, x2, , xnT, x
A classifier gets as its input x and
assigns it to a class label from O, i.e.
Four approaches are generally used to design a
classifier combination system (Kuncheva, 2003).
Explanation of the Four Approaches
Approach 1 The problem is to pick a combination
scheme for L classifiers C1, C2, , CL studied to
form a combiner. Approach 2 The problem is to
choose individuals (classifiers) by considering
the issues of similarity/ diversity,
homogeneous/heterogeneous etc. Approach 3 The
problem is to build each Ci on an individual
subset of features (subspace of ) Approach
4 The problem is to select training subsets D1,
D2, , Dm of the dataset D to lead to a team of
diverse classifiers.
An Architecture of Multiple Classifier System
Data Sets
Classifier Combination
Data Pre-processing
Involved Classifiers for Combination- kNN
Given an instance x, the k-nearest neighbour
classifier finds its k nearest instances, and
traditionally uses the majority rule (or majority
voting rule) to determine its class, i.e.
assigning the single most frequent class label
associated with the k nearest neighbours to x.
This is illustrated in Figure 3. The two classes
here are depicted by ? and o, with ten
instances for each class. Each instance is
represented by a two-dimensional point within a
continuous-valued Euclidean space. The instance
x, represented as .

Involved Classifiers for Combination- wkNN
In wkNN, the k nearest neighbours are assigned
different weights. Let ? be a distance measure,
and x1, x2, , xk be the k nearest neighbours of
x arranged in increasing order of ?(xi, x). So
x1 is the first nearest neighbour of x. The
distance weight wi for i-th neighbour xi is
defined as follows

Instance x is assigned to the class for which the
weights of the representatives among the k
nearest neighbours sum to the greatest value.
Involved Classifiers for Combination- CPC
  • Contextual probability-based classifier (CPC)
    (Guo et al., 2004) is based on a new function G
    a probability function used to calculate the
    support of overlapping or non-overlapping
    neighbourhoods. The idea of CPC is to aggregate
    the support of multiple sets of nearest
    neighbours of a new instance for various classes
    to give a more reliable support value, which
    better reveals the true class of this instance.


Involved Classifiers for Combination- kNNModel
The basic idea of kNN model-based classification
method (kNNModel) (Guo et al. 2003) is to find a
set of more meaningful representatives of the
complete data set to serve as the basis for
further classification. Each chosen
representative xi is represented in the form of
ltCls(xi), Sim(xi), Num(xi), Rep(xi)gt which
respectively represents the class label of xi
the similarity of xi to the furthest instance
among the instances covered by Ni the number of
instances covered by Ni a representation of
intance xi. The symbol Ni represents the area
that the distance to Ni is less than or equal to
Sim(xi). kNNModel can generate a set of optimal
representatives via inductively learning from the

Combination StrategyMajority Voting-based
Given a new instance x to be classified, whose
true class label is tx
and k predefined classifiers
are denoted as A1, A2, , Ak respectively, the
classifier Ai approximates a discrete-valued

The final class label of x, obtained by using
majority voting-based classifier combination, is
described as follows f(x)
if ab, and
Combination Strategy Class-wise
similarity-based classifier combination
The classification result of x classified by Aj
is given by a vector of normalized similarity
values of x to each class, represented by S
ltSj1, Sj2, , Sjmgt, where j1, 2, , k. The final
class label of x can be obtained in three
different ways a) Maximal Similarity-based
Combination (MSC)
b) Average Similarity-based Combination (ASC)
c) Weighted Similarity-based Combination (WSC)
is a control parameter used for setting the
relative importance of local optimization and
global optimization of combination.
Experimental Results
This study mainly focuses on Approach 1. Given
four classifiers kNN, kNNModel, CPC and wkNN, we
proposed three similarity-based classifier
combination schemes empirically. After evaluating
them on fifteen public datasets from UCI machine
learning repository, we apply the best approach
to a real-world application of toxicity
prediction of the environment effects of
chemicals in order to obtain better
classification performance.
Fifteen public data sets from the UCI machine
learning repository and one data set (Phenols)
from real-world applications (toxicity prediction
of chemical compounds) have been collected for
training and testing. Some information about
these data sets is given in Table 1.
Table 1. Some information about the data sets
Data set NF NN NO NB NI CD
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 14 23 8 9 13 13 19 34 4 6 60 18 16 13 16 173 20 173 20 4 16 0 0 3 3 6 0 0 0 0 0 0 0 16 0 0 0 0 6 7 8 9 7 7 1 34 4 6 60 18 0 13 0 173 20 173 20 4 0 0 0 3 3 12 0 0 0 0 0 16 0 0 0 0 0 0 690 368 768 214 303 270 155 351 150 345 208 846 435 178 90 250 250 250 250 383307 232136 268500 701776013929 164139 120150 32123 126225 505050 145200 97111 212217218199 267168 597148 3718312479 1732741927 1732741927 3715261 3715261
In Table 1, NF-Number of Features, NN-Number of
Nominal features, NO-Number of Ordinal features,
NB-Number of Binary features, NI-Number of
Instances, CD-Class Distribution. Four Phenols
data sets are used in the experiment, where
Phenols_M represents the phenols data set with
MOA (Mechanism of Action) as endpoint for
prediction Phenols_M_FS represents the Phenols_M
data set after feature selection Phenols_T
represents the Phenols data set with toxicity as
endpoint for prediction, and Phenols_T_FS
represents Phenols_T data set after feature
Table 2. A comparison of four individual
algorithms and MV in classification performance.
Data set kNNModel e N vkNN wkNN CPC MV
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 86.09 83.61 75.78 69.52 82.67 81.85 89.33 94.29 96.00 68.53 84.00 66.55 91.74 95.29 92.22 83.20 89.20 71.60 75.60 2 1 1 3 0 1 1 0 0 2 0 2 4 0 0 0 0 2 2 5 4 5 3 1 3 2 1 2 2 3 3 5 1 2 0 0 4 4 85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56 87.20 88.80 74.40 73.60 82.46 81.94 72.37 67.42 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56 86.80 92.80 74.40 72.40 84.64 83.61 72.63 68.57 82.67 81.48 82.67 84.86 96.00 65.88 87.50 70.12 91.74 95.88 96.67 87.60 91.20 74.80 77.20 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00
Average 83.00 / / 82.25 82.17 82.93 83.53
Table 3. A comparison of different combination
Data set SVM C5.0 MV MSC ASC WSC a e N
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo P_MOA P_MOA_FS P_T P_T_FS 81.45 83.89 77.11 62.86 83.67 84.07 82.67 87.14 98.67 69.71 74.00 77.50 96.96 95.29 97.79 84.40 89.20 65.60 76.00 85.5 80.9 76.6 66.3 74.9 75.6 80.7 84.5 92.0 65.8 69.4 67.9 96.1 92.1 91.1 90.0 89.2 72.8 74.0 85.65 83.33 74.87 69.52 83.00 81.85 85.33 88.57 96.00 68.82 87.00 71.43 91.74 95.29 95.56 87.60 91.60 74.00 76.00 86.52 84.72 75.13 70.95 82.33 81.85 87.33 89.43 96.67 70.59 88.50 70.83 92.61 96.47 95.56 88.40 92.40 76.40 77.20 86.23 84.17 75.13 70.95 82.33 81.48 86.67 88.86 96.67 71.18 88.50 71.90 92.61 96.47 96.67 88.20 92.40 76.00 76.40 86.52 84.72 75.13 70.95 82.67 81.85 87.33 89.43 96.67 71.18 89.00 71.90 92.61 96.47 96.67 88.80 92.40 76.40 77.20 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.7 0.7 0.8 0.7 0.8 0.7 0.7 0.7 0.8 0.7 0.7 0.7 2 3 4 2 4 2 1 1 1 3 2 2 2 1 0 0 4 2 3 5 0 0 0 0 4 5 0 4 5 0 5 0 0 0 0 0 0 0
Average 82.53 80.28 83.53 84.42 84.39 84.64 / / /
Table 4. The signed test of different classifiers
SVM C5.0 kNNModel vkNN wkNN CPC MV
MV -0.69 (-) 2.98 () -0.33 (-) 2.52 () 2.07 () 0.23 (-) /
WSC 1.15 (-) 2.98 () 2.07 () 3.44 () 2.98 () 2.98 () 2.52 ()
In Table 4, the item 2.07 () in cell (3, 4), for
example, means WSC is better than kNNModel in
terms of performance over the nineteen data sets.
That is, the corresponding ZgtZ0.951.729. The
item 1.15 (-) in cell (3, 2) means there is no
significant difference in terms of performance
between WSC and SVM over nineteen data sets as
the corresponding ZltZ0.951.729.
  • The proposed methods directly employ
    class-wise similarity measure used in each
    individual classifier for combination without
    changing the representation from similarity to
  • It significantly improves the average
    classification accuracy carried out over nineteen
    data sets. The average classification accuracy of
    WSC is better than that of any other individual
    classifiers and the majority voting-based
    combination method.
  • The statistical test also shows that the
    proposed combination method WSC is better than
    any individual classifier with an exception of
  • The average classification accuracy of WSC is
    still better than that of SVM with a 2.49
  • Further research is required into how to
    combine heterogeneous classifiers using
    class-wise similarity-based combination methods.

  • (Michie et al. 1994)
  • D. Michie, D.J.Spiegelhalter, and
    C.C.Taylor. Machine Learning, Neural and
    Statistical Classification, Ellis Horwood, 1994.
  • (Guo et al. 2003)
  • G. Guo, H. Wang, D. Bell, Y. Bi, K.
    Greer. kNN Model-Based Approach in
    Classification. In Proc. of ODBASE 2003, LNCS
    2888/2003, pp. 986-996, 2003.
  • (Guo et al. 2004)
  • G. Guo, H. Wang, D. Bell, Z. Liao.
    Contextual Probability-Based Classification. In
    Proc. of ER 2004, LNCS 3288/2004, pp. 313-326,
    Springer-Verlag, 2004.
  • (Kuncheva, 2003)
  • Kuncheva. L.I. Combining Classifiers
    Soft Computing Solutions. In S.K. Pal (Eds.)
    Pattern Recognition From Classical to Modern
    Approaches, pp. 427-452, World Scientific,
    Singapore, 2003.

Thank you very much!
Write a Comment
User Comments (0)