Title: Bayesian network classifiers versus selective kNN classifier
1Bayesian network classifiers versus selective
k-NN classifier
- Franz Pernkopf
- Pattern Recognition, pp.1-10, 2005
- Present by ???
- Group 6
- ??? ??? ??? ??? ???
2Outlines
- Introduction
- Bayesian network classifier
- Feature selection algorithms
- Experiment
- Conclusion
3Introduction (1/2)
- Different structures of Bayesian and k-nearest
neighbor (k-NN) - k-NN uses a subset established by means of
sequential feature selection methods. - Bayesian network B ltG,Tgt is a directed acyclic
graph G - G models probabilistic relationships of random
variables U X1, . . . , Xn,O U1, . . . ,
Un1. - T represents the set of parameters which quantify
the network. - Each node Ui (Xi) represented local conditional
probability distribution P(Ui?Ui ) given its
parents ?Ui, joint probability distribution P(U)
is
4Introduction (2/2)
- Classification performance pattern recognition
and data analysis methods - Data analysis methods
- Feature selection reduction of the feature set
may even improve the classification rate - Feature selection algorithms sequential feature
selection algorithms - Model statistical (accuracy) dependencies
between attributes.
5Bayesian network classifier (1/8)
- Three types of Bayesian network classifiers
- Naïve Bayesian classifier (NB)
- Tree Augmented naïve Bayesian (TAB)
- Selective Unrestricted Bayesian network (SUB)
- Two techniques for parameter (T)
- Learning are maximum likelihood estimation
- Bayesian approach
6Bayesian network classifier (2/8)
- Naïve Bayesian classifier (NB)
- Xi each attribute,O class variable (parent,O
?Ui)
7Bayesian network classifier (3/8)
- Naïve Bayesian decision rule assumes that
- All the attributes Xi are conditionally
independent given the class label of node O - Selective Naïve Bayesian classifier (SNB)
extension of the naïve Bayesian decision rule - The joint probability distribution for this
network is P(U)P(X1, . . . , Xn,O) P(O) ? P(Xi
O) - Conditional probability for the classes in O
given the values of the attributes is P(OX1, . .
. , Xn)a P(O) ? P(Xi O) - a is a normalization constant.
8Bayesian network classifier (4/8)
- Tree Augmented naïve Bayesian (TAB)
9Bayesian network classifier (5/8)
- Independence assumption is unrealistic, edges
(arcs) are allowed, two attributes Xi and Xj are
not independent. - The posterior probability P(OX1, . . . , Xn)
takes all the attributes into account. Maximum
arcs between attributes are n-1. - Feature selection and arcs are found by means of
a search algorithm
10Bayesian network classifier (6/8)
- Selective Unrestricted Bayesian network (SUB)
11Bayesian network classifier (7/8)
- Selective Unrestricted Bayesian network
- Generalization of the Tree Augmented naïve
Bayesian network - The class node is equally treated as an attribute
node and may have attribute nodes as parents - The classifier is based on a subset of selected
features - The size of the conditional probability tables of
the nodes increases exponentially with the number
of parents - The posterior probability distribution of O given
the value of all attributes is only sensitive to
those attributes which form the Markov blanket
(search algorithm).
12Bayesian network classifier (8/8)
- Hill Climbing Search (HFS) learn the structure
of the Bayesian network - Classical Floating Search (CFS) algorithm
- Feature selection for TAB and SUB
- Main disadvantage of the hill climbing search
- Once an arc has been added to the network
structure, the algorithm has no mechanism for
removing the arc at a later stage. - Suffers from the nesting effect
- Floating search method is used to overcome this
drawback - More evaluations to obtain the network structure
and computationally less efficient than the hill
climbing search.
13Feature selection algorithms(1/5)
- Two major groups filter approach?wrapper
approach - Filter approach
- Assesses features from the data set and the
selection mainly based on statistical measures,
applications where huge data sets are considered. - Wrapper approach
- More appropriate, performance for evaluating the
feature subset, achieves a high predictive
accuracy, high computational costs. - Or, a taxonomy of feature selection algorithms,
shown in Fig. 4.
14Feature selection algorithms(2/5)
15Feature selection algorithms(3/5) Optimal methods
- Exhaustive search
- Total number of competing subsets is given by 2n-
1, n is the number of extracted features - If the size of the final feature subset d is
given, the total number of subsets q
n!/(n-d)!d! - Branch-and-bound
- Advantage faster than exhaustive search
- Drawback requires a feature selection criterion,
a new feature add cant decrease the evaluation
function, not fulfilled by each evaluation
criterion, e.g. k-NN classifier.
16Feature selection algorithms(4/5)Suboptimal
methods
- Genetic algorithms
- Sequential feature selection algorithms
- Sequential forward selection (SFS)
- Bottom up search method
- Each iteration one feature is added to the
subset, so that the subset maximizes the
evaluation criterion J. - Drawbackno mechanism for rejecting already
selected features, even if it becomes
superfluous. This effect is called nesting. - Sequential backward selection (SBS)
- Counterpart of the SFS
- One feature is rejected so that the remaining
subset gives the best result of the evaluation
criterion J.
17Feature selection algorithms(5/5)Suboptimal
methods
- Sequential forward floating selection (SFFS)
- Adapted for learning the structure of Bayesian
network classifiers - Drawbacks time consuming than SFS, especially
when data is great complexity. - Adaptive Sequential Forward Floating Selection
(ASFFS(rmax, b, d))33 - Similar to the SFFS procedure, r is determined
dynamically maximum restricted by a user defined
bound rmax - r and parameter d depending on the subset size k
- Parameter b is depending on d
- This algorithm is initialized with an empty subset
18Experiments
- Bayesian network use discretized features,
recursive minimal entropy partitioning,
conditional probability tables 0 replaced with e
0.00001. -
- NB Naive Bayes classifier
- CFS-SNB Selective Naive Bayesian classifier
(Classical Floating Search) - HCS-TAN Tree Augmented Naive Bayesian classifier
(Hill-Climbing Search) - CFS-TAN Tree Augmented Naive Bayesian classifier
(Classical Floating Search) - CFS-SUN Selective Unrestricted Bayesian network
(Classical Floating Search) - SFFS-k-NN-C k-NN classifier (k ? 1, 3, 5, 9)
(Continuous-valued data and the SFFS method) - SFFS-k-NN-D k-NN classifier (k ? 1, 3, 5, 9)
(Discrete-valued data and the SFFS method)
19Experiments- First experiment(1/8)
- Data set?consist 516 surface segments
- (42 features/surf. Seg., a sample)
- Data set is divided into six subsets, for finding
the optimal classifier (five-fold
cross-validation, each part is comprised of 90
samples). -
20Experiments- First experiment(2/8)
- Floating algorithms SFFS ASFFS perform in a
better way - (ASFFS(3,4,5) subset size of 5 within a
neighborhood of 4 is optimized more thoroughly.) - The number of classifier evaluations is only 5086
for the SFFS - Compared to the ASFFS with 7768
- GSFS and GPTA with 6391 and 13201 classifier
evaluations (worse than the SFFS method) - PTA and SFS achieve the lowest scores for
different sizes of subsets, 2623 and 903
evaluations are necessary. - Computational costs depend on the characteristics
of the data set due to floating property. - SFFS achieves good between computational demands
and classification rate, so further feature
selection results consider only this method.
21Experiments- First experiment(3/8)
- Parameters 230 and 12 structure , 14 arcs
22Experiments- First experiment(4/8)
23Experiments- First experiment(5/8)
- Compares SFFS approach to five Bayesian network
methods - (CV5) accuracy estimate
- (H) performance
- (Evaluations) of classifier evaluations
- (Parameters) of independent probabilities
- (Features) of features
- (Arcs) of arcs
- CFS-SUN achieves best accuracy estimate on the
five Bayesian network. - Additionally, the number evaluations of the TAN
is high compared to CFS-SUN since the Markov
blanket is used for the SUN.
24Experiments- First experiment(6/8)
- For accuracy estimate k-NN slightly outperforms
the CFS-SUN - The CFS-SUN is simple to evaluate but still
maintains a high predictive accuracy - Bayesian outperform k-NN methods in terms of
memory requirements and computational demands - The k-NN time consuming in case of a large number
of samples, a large amount of memory might be
required - If decision the optimal size of the feature
subset be considered - Discriminatory information may be lost (for too
few features) - Smaller features results in lower computational
costs since a limited features be extracted and
dimensionality of feature space is lower. - Additionally, a small set of features used for
classification may perform better on new data
samples.
25Experiments- Second experiment(7/8)
26Experiments- Second experiment(8/8)
275. Conclusions
- Bayesian more often achieve a better
classification rate on different data sets as
selective k-NN classifiers. - Bayesian outperform k-NN methods in terms of
memory requirements and computational demands.
28??
- accuracy ?and rate???????,??????,?????????????,??s
equential feature selection methods(????algorithms
)??????? - memory requirements and computational
demands?????????????,????,????????????? - ????????????,??????,??????
- ????,????(?model)??????characteristics(???)???,???
?????? - ?????????project???????,?????project????????,?????
?????????