Bayesian network classifiers versus selective kNN classifier - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Bayesian network classifiers versus selective kNN classifier

Description:

Counterpart of the SFS ... time consuming than SFS, especially when data is ... PTA and SFS achieve the lowest scores for different sizes of subsets, 2623 and ... – PowerPoint PPT presentation

Number of Views:229

Avg rating:3.0/5.0

Slides: 29

Provided by: selabIec

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian network classifiers versus selective kNN classifier

1
Bayesian network classifiers versus selective
k-NN classifier

Franz Pernkopf
Pattern Recognition, pp.1-10, 2005
Present by ???
Group 6
??? ??? ??? ??? ???

2
Outlines

Introduction
Bayesian network classifier
Feature selection algorithms
Experiment
Conclusion

3
Introduction (1/2)

Different structures of Bayesian and k-nearest
neighbor (k-NN)
k-NN uses a subset established by means of
sequential feature selection methods.
Bayesian network B ltG,Tgt is a directed acyclic
graph G
G models probabilistic relationships of random
variables U X1, . . . , Xn,O U1, . . . ,
Un1.
T represents the set of parameters which quantify
the network.
Each node Ui (Xi) represented local conditional
probability distribution P(Ui?Ui ) given its
parents ?Ui, joint probability distribution P(U)
is

4
Introduction (2/2)

Classification performance pattern recognition
and data analysis methods
Data analysis methods
Feature selection reduction of the feature set
may even improve the classification rate
Feature selection algorithms sequential feature
selection algorithms
Model statistical (accuracy) dependencies
between attributes.

5
Bayesian network classifier (1/8)

Three types of Bayesian network classifiers
Naïve Bayesian classifier (NB)
Tree Augmented naïve Bayesian (TAB)
Selective Unrestricted Bayesian network (SUB)
Two techniques for parameter (T)
Learning are maximum likelihood estimation
Bayesian approach

6
Bayesian network classifier (2/8)

Naïve Bayesian classifier (NB)
Xi each attribute,O class variable (parent,O
?Ui)

7
Bayesian network classifier (3/8)

Naïve Bayesian decision rule assumes that
All the attributes Xi are conditionally
independent given the class label of node O
Selective Naïve Bayesian classifier (SNB)
extension of the naïve Bayesian decision rule
The joint probability distribution for this
network is P(U)P(X1, . . . , Xn,O) P(O) ? P(Xi
O)
Conditional probability for the classes in O
given the values of the attributes is P(OX1, . .
. , Xn)a P(O) ? P(Xi O)
a is a normalization constant.

8
Bayesian network classifier (4/8)

Tree Augmented naïve Bayesian (TAB)

9
Bayesian network classifier (5/8)

Independence assumption is unrealistic, edges
(arcs) are allowed, two attributes Xi and Xj are
not independent.
The posterior probability P(OX1, . . . , Xn)
takes all the attributes into account. Maximum
arcs between attributes are n-1.
Feature selection and arcs are found by means of
a search algorithm

10
Bayesian network classifier (6/8)

Selective Unrestricted Bayesian network (SUB)

11
Bayesian network classifier (7/8)

Selective Unrestricted Bayesian network
Generalization of the Tree Augmented naïve
Bayesian network
The class node is equally treated as an attribute
node and may have attribute nodes as parents
The classifier is based on a subset of selected
features
The size of the conditional probability tables of
the nodes increases exponentially with the number
of parents
The posterior probability distribution of O given
the value of all attributes is only sensitive to
those attributes which form the Markov blanket
(search algorithm).

12
Bayesian network classifier (8/8)

Hill Climbing Search (HFS) learn the structure
of the Bayesian network
Classical Floating Search (CFS) algorithm
Feature selection for TAB and SUB
Main disadvantage of the hill climbing search
Once an arc has been added to the network
structure, the algorithm has no mechanism for
removing the arc at a later stage.
Suffers from the nesting effect
Floating search method is used to overcome this
drawback
More evaluations to obtain the network structure
and computationally less efficient than the hill
climbing search.

13
Feature selection algorithms(1/5)

Two major groups filter approach?wrapper
approach
Filter approach
Assesses features from the data set and the
selection mainly based on statistical measures,
applications where huge data sets are considered.
Wrapper approach
More appropriate, performance for evaluating the
feature subset, achieves a high predictive
accuracy, high computational costs.
Or, a taxonomy of feature selection algorithms,
shown in Fig. 4.

14
Feature selection algorithms(2/5)
15
Feature selection algorithms(3/5) Optimal methods

Exhaustive search
Total number of competing subsets is given by 2n-
1, n is the number of extracted features
If the size of the final feature subset d is
given, the total number of subsets q
n!/(n-d)!d!
Branch-and-bound
Advantage faster than exhaustive search
Drawback requires a feature selection criterion,
a new feature add cant decrease the evaluation
function, not fulfilled by each evaluation
criterion, e.g. k-NN classifier.

16
Feature selection algorithms(4/5)Suboptimal
methods

Genetic algorithms
Sequential feature selection algorithms
Sequential forward selection (SFS)
Bottom up search method
Each iteration one feature is added to the
subset, so that the subset maximizes the
evaluation criterion J.
Drawbackno mechanism for rejecting already
selected features, even if it becomes
superfluous. This effect is called nesting.
Sequential backward selection (SBS)
Counterpart of the SFS
One feature is rejected so that the remaining
subset gives the best result of the evaluation
criterion J.

17
Feature selection algorithms(5/5)Suboptimal
methods

Sequential forward floating selection (SFFS)
Adapted for learning the structure of Bayesian
network classifiers
Drawbacks time consuming than SFS, especially
when data is great complexity.
Adaptive Sequential Forward Floating Selection
(ASFFS(rmax, b, d))33
Similar to the SFFS procedure, r is determined
dynamically maximum restricted by a user defined
bound rmax
r and parameter d depending on the subset size k
Parameter b is depending on d
This algorithm is initialized with an empty subset

18
Experiments

Bayesian network use discretized features,
recursive minimal entropy partitioning,
conditional probability tables 0 replaced with e
0.00001.
NB Naive Bayes classifier
CFS-SNB Selective Naive Bayesian classifier
(Classical Floating Search)
HCS-TAN Tree Augmented Naive Bayesian classifier
(Hill-Climbing Search)
CFS-TAN Tree Augmented Naive Bayesian classifier
(Classical Floating Search)
CFS-SUN Selective Unrestricted Bayesian network
(Classical Floating Search)
SFFS-k-NN-C k-NN classifier (k ? 1, 3, 5, 9)
(Continuous-valued data and the SFFS method)
SFFS-k-NN-D k-NN classifier (k ? 1, 3, 5, 9)
(Discrete-valued data and the SFFS method)

19
Experiments- First experiment(1/8)

Data set?consist 516 surface segments
(42 features/surf. Seg., a sample)
Data set is divided into six subsets, for finding
the optimal classifier (five-fold
cross-validation, each part is comprised of 90
samples).

20
Experiments- First experiment(2/8)

Floating algorithms SFFS ASFFS perform in a
better way
(ASFFS(3,4,5) subset size of 5 within a
neighborhood of 4 is optimized more thoroughly.)
The number of classifier evaluations is only 5086
for the SFFS
Compared to the ASFFS with 7768
GSFS and GPTA with 6391 and 13201 classifier
evaluations (worse than the SFFS method)
PTA and SFS achieve the lowest scores for
different sizes of subsets, 2623 and 903
evaluations are necessary.
Computational costs depend on the characteristics
of the data set due to floating property.
SFFS achieves good between computational demands
and classification rate, so further feature
selection results consider only this method.

21
Experiments- First experiment(3/8)

Parameters 230 and 12 structure , 14 arcs

22
Experiments- First experiment(4/8)
23
Experiments- First experiment(5/8)

Compares SFFS approach to five Bayesian network
methods
(CV5) accuracy estimate
(H) performance
(Evaluations) of classifier evaluations
(Parameters) of independent probabilities
(Features) of features
(Arcs) of arcs
CFS-SUN achieves best accuracy estimate on the
five Bayesian network.
Additionally, the number evaluations of the TAN
is high compared to CFS-SUN since the Markov
blanket is used for the SUN.

24
Experiments- First experiment(6/8)

For accuracy estimate k-NN slightly outperforms
the CFS-SUN
The CFS-SUN is simple to evaluate but still
maintains a high predictive accuracy
Bayesian outperform k-NN methods in terms of
memory requirements and computational demands
The k-NN time consuming in case of a large number
of samples, a large amount of memory might be
required
If decision the optimal size of the feature
subset be considered
Discriminatory information may be lost (for too
few features)
Smaller features results in lower computational
costs since a limited features be extracted and
dimensionality of feature space is lower.
Additionally, a small set of features used for
classification may perform better on new data
samples.

25
Experiments- Second experiment(7/8)
26
Experiments- Second experiment(8/8)
27
5. Conclusions

Bayesian more often achieve a better
classification rate on different data sets as
selective k-NN classifiers.
Bayesian outperform k-NN methods in terms of
memory requirements and computational demands.

28
??

accuracy ?and rate???????,??????,?????????????,??s
equential feature selection methods(????algorithms
)???????
memory requirements and computational
demands?????????????,????,?????????????
????????????,??????,??????
????,????(?model)??????characteristics(???)???,???
??????
?????????project???????,?????project????????,?????
?????????