Title: The use of Boolean concepts in general classification contexts
1The use of Boolean concepts in general
classification contexts
- Miguel Moreira
- December 2000
2Data classification?
3Data classification?
class label
B
attr1 attr2 attr3
attributes
an instance x to be classified
4Data classification?
Goal of a classification system
given new examples
determine their class label
?
classification system
B
5Classification model
At the core of the classification system is a
classification model
- it is an approximation of the target
concept F - it is built using a set of training data,
representative of F - it must generalize to new (previously unseen)
data
classification system
can be of various types
- neural networks,
- SVM,
- decision trees,
- LAD
6Boolean classification model
the target concept is modeled by a Boolean
function
?
0 1 0 0
0 1 1 0
0
1
1 0 1 1
1
0 1 0 1
0
Boolean vectors of size B at the input
Boolean values at the output
B - number of Boolean attributes
7Boolean classification model
Why use a Boolean-based model?
class
positive data
negative data
- the classification of a new instance is based on
whether it is covered by positive or negative
patterns
b - Boolean attribute
8Logical Analysis of Data (LAD)
- the LAD classification model is based on patterns
?
if it is a holiday travel, and the train is less
expensive, and the destination is not
overseas, then we take the train (applies to 40
of the negative cases)
positive we go by plane
negative we go by train
9Adapting the Boolean model
general framework
Boolean framework
10The thesis 1. input mapping
- Goal transform arbitrary input data (in )
- into Boolean format
11The thesis 2. output mapping
- Goal allow models with two-class output
- to be applied to multi-class
problems
12The thesis 3. multi-class Boolean model
- Goal make the Boolean model directly
applicable to - multi-class problems
classification system
13The thesis 1. input mapping
14The thesis 1. input mapping
15Boolean transformation
construct a mapping m to transform arbitrary
input data into Boolean format
gender female
marital_status single
education gt masters
age gt 30
age gt 50
gender F F M
marital_status married single married
education high_school bachelor bachelor
age 56 26 40
1 1 0
0 1 0
0 0 0
1 0 1
1 0 0
m
- the input data is represented in numeric format
m A ? B 0, 1
A - number of regular attributes B - number of
Boolean attributes
16Boolean transformation
consistency constraint
instances of different classes must have
different Boolean images
F(x) ? F(x) ? m(x) ? m(x)
in some cases, a higher consistency may be better
F(x) ? F(x) ? distance ( m(x), m(x) ) ? c
desired
- minimize the number of Boolean attributes B
(influences the model generation complexity) - fast procedure
17Eliminative approach
is it redundant?
yes!
eliminate it!
a2
a1
3. eliminate redundant discriminants iteratively
18Eliminative approach
a2
a1
1. project the data along each attribute
2. insert an exhaustive set of discriminants
3. eliminate discriminants iteratively
19Experimental setup
described in Appendices A and B
abalone adult breast-cancer credit dermatology eco
li glass heart-disease hepatitis ionosphere letter
mushroom optdigits pendigits pi-diabetes segmenta
tion soybean spambase voting vowel wine yeast vowe
l
- 22 data sets from the UCI Machine Learning
repository - 15548842 instances
- 226 classes
- 764 attributes
performance is measured using repetitions of 50
training / 50 testing data splits
5x2 cross-validation scheme with a statistical
test
Dietterich, 1998, Alpaydin, 1999
C4.5 decision tree algorithm often used as base
learner
Quinlan, 1993
20Results
results averaged over the 22 data sets
average execution time (sec.)
max. execution time (minutes)
consistency level (in IDEAL)
consistency level (in IDEAL)
final of discriminants (from total)
classification accuracy
consistency level (in IDEAL)
consistency level (in IDEAL)
Simple-Greedy is an incremental approach
Almuallim and Dietterich, 1991, 1994
21The thesis 2. output mapping
22The thesis 2. output mapping
23Solving multi-class problems using two-class
classifiers
Motivation
- a binary classifier can only take two-class
decisions - some interesting classification algorithms (other
than LAD) are binary (e.g. SVMs are binary
classifiers, and so are ANNs, in essence)
Solution
- decompose the original problem into several
two-class sub-problems - apply a binary classifier to each sub-problem
- combine the answers to the sub-problems in order
to generate the final class decision
(reconstruction)
24Decomposition scheme
dichotomy a classification problem involving two
classes
classes
each dichotomy makes a positive/negative
re-labeling of the original classes
F
E
D
C
B
A
F
E
D
C
B
A
F
E
D
C
B
A
F
E
D
C
B
A
dichotomies
binary output code
input x
x - data instance - dichotomy
fq
25Decomposition matrix
classes ck (k 1K)
D
ternary logic is useful! 1, 0, 1
dichotomies (q 1Q)
26Some existing decomposition schemes
K - number of classes
27A priori / a posteriori schemes
- all existing schemes are defined a priori
- the decomposition matrix is generated
independently of the data - this may create complex dichotomies (awkward
class groupings)
D
ECOC
28A priori / a posteriori schemes
- all existing schemes are defined a priori
- the decomposition matrix is generated
independently of the data - this may create complex dichotomies (awkward
class groupings)
D
ECOC
29A priori / a posteriori schemes
- all existing schemes are defined a priori
- the decomposition matrix is generated
independently of the data - this may create complex dichotomies (awkward
class groupings)
D
ECOC
30Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
D
D
C
B
A
A
B
C
PD
D
1
D
C
B
A
31Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
_
D
D
C
B
A
A
B
C
1
1
PD
D
1
1
D
C
B
A
32Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
_
D
D
C
B
A
A
B
1
C
2
1
PD
D
2
1
D
C
B
A
33Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
D
D
C
B
A
A
B
1
C
2
1
PD
D
2
1
D
C
B
A
34Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
D
D
C
B
A
A
B
1
C
3
2
PD
D
2
1
1
D
C
B
A
35Pertinent dichotomies
- dichotomies are defined a posteriori (depending
on the data) - iteratively
D
D
C
B
A
A
B
1
C
3
2
PD
D
2
1
1
D
C
B
A
36Pertinent dichotomies
algorithm PertinentDichotomies (Chap. 4)
- dichotomies are defined a posteriori (depending
on the data) - iteratively
D
D
C
B
A
A
B
2
C
4
2
PD
D
2
2
2
D
C
B
A
37Results
results averaged over 12 data sets
model complexity (number of DT nodes)
number of dichotomies
classification accuracy ()
classification accuracy ()
38The thesis 3. multi-class Boolean model
39The thesis 3. multi-class Boolean model
classification system
40A Boolean multi-class model
Motivation
- the challenge of creating a multi-class version
of LAD - possibility ? use of decomposition schemes
- alternative ? integrate multi-class mechanisms
inside the method
Procedure
- patterns are generated iteratively
- a single, common pattern set shared by all
classes (instead of a pattern set per class) - inspired by the algorithm of pertinent dichotomies
41A Boolean multi-class model
12
11
14
1
13
6
2
4
8
3
5
7
9
10
42A Boolean multi-class model
12
11
14
1
13
6
2
4
8
3
5
7
9
10
43A Boolean multi-class model
class
12
11
multip.
b5
14
1
13
6
2
b4
4
8
3
5
7
b3
9
10
b1
b2
separation
44A Boolean multi-class model
class
12
11
multip.
b5
14
1
13
6
2
b4
4
8
3
5
7
b3
9
10
b1
b2
_
b4
pattern
separation
vs.
coverage
45A Boolean multi-class model
class
12
11
multip.
b5
14
1
13
6
2
b4
4
8
3
5
7
b3
9
10
b1
b2
separation
46A Boolean multi-class model
class
12
11
multip.
b5
14
1
13
6
2
b4
4
8
3
5
7
b3
9
10
b1
b2
b2
pattern
separation
vs.
coverage
47A Boolean multi-class model
class
12
11
multip.
b5
14
1
13
6
2
b4
4
8
3
5
7
b3
9
10
b1
b2
algorithm MultiClassLAD (Chap. 5)
48Model generation and use
algorithm GeneratePattern (Chap. 6)
- the search for each pattern is formulated as an
optimization problem
Tabu search is used to find a solution
each feasible solution is a pattern
_
_
p - pattern
49Results
results averaged over the 22 data sets
execution time (seconds)
model size (number of patterns)
classification accuracy ()
classification accuracy ()
- CN2 Clark and Niblett, 1989
- Tabata Brézellec and Soldano, 1998
50Model interpretability (two classes)
Examples
spambase data set with two classes the
instances are e-mail messages the classes are
contains spam/does not contain spam
(applies to no messages without spam)
51Model interpretability (multi-class)
Examples
dermatology data set with six classes the
instances are patients suffering from a skin
disease (Erythemato-Squamous) the classes are six
different varieties of that disease
52Conclusion
53Conclusions
54Future directions
Boolean mapping
- handle noise conveniently
- relax the constraint use an unsupervised
approach
Two-class ? multi-class
- make a similar analysis using different types of
classifiers
Boolean multi-class model
- further explore the possibilities of knowledge
extraction - comparison with a decomposed model (regarding
both classification and knowledge extraction) - use cross-validation techniques to improve the
model