Bayesian network classifiers versus selective kNN classifier - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Bayesian network classifiers versus selective kNN classifier

Description:

Counterpart of the SFS ... time consuming than SFS, especially when data is ... PTA and SFS achieve the lowest scores for different sizes of subsets, 2623 and ... – PowerPoint PPT presentation

Number of Views:229
Avg rating:3.0/5.0
Slides: 29
Provided by: selabIec
Category:

less

Transcript and Presenter's Notes

Title: Bayesian network classifiers versus selective kNN classifier


1
Bayesian network classifiers versus selective
k-NN classifier
  • Franz Pernkopf
  • Pattern Recognition, pp.1-10, 2005
  • Present by ???
  • Group 6
  • ??? ??? ??? ??? ???

2
Outlines
  • Introduction
  • Bayesian network classifier
  • Feature selection algorithms
  • Experiment
  • Conclusion

3
Introduction (1/2)
  • Different structures of Bayesian and k-nearest
    neighbor (k-NN)
  • k-NN uses a subset established by means of
    sequential feature selection methods.
  • Bayesian network B ltG,Tgt is a directed acyclic
    graph G
  • G models probabilistic relationships of random
    variables U X1, . . . , Xn,O U1, . . . ,
    Un1.
  • T represents the set of parameters which quantify
    the network.
  • Each node Ui (Xi) represented local conditional
    probability distribution P(Ui?Ui ) given its
    parents ?Ui, joint probability distribution P(U)
    is

4
Introduction (2/2)
  • Classification performance pattern recognition
    and data analysis methods
  • Data analysis methods
  • Feature selection reduction of the feature set
    may even improve the classification rate
  • Feature selection algorithms sequential feature
    selection algorithms
  • Model statistical (accuracy) dependencies
    between attributes.

5
Bayesian network classifier (1/8)
  • Three types of Bayesian network classifiers
  • Naïve Bayesian classifier (NB)
  • Tree Augmented naïve Bayesian (TAB)
  • Selective Unrestricted Bayesian network (SUB)
  • Two techniques for parameter (T)
  • Learning are maximum likelihood estimation
  • Bayesian approach

6
Bayesian network classifier (2/8)
  • Naïve Bayesian classifier (NB)
  • Xi each attribute,O class variable (parent,O
    ?Ui)

7
Bayesian network classifier (3/8)
  • Naïve Bayesian decision rule assumes that
  • All the attributes Xi are conditionally
    independent given the class label of node O
  • Selective Naïve Bayesian classifier (SNB)
    extension of the naïve Bayesian decision rule
  • The joint probability distribution for this
    network is P(U)P(X1, . . . , Xn,O) P(O) ? P(Xi
    O)
  • Conditional probability for the classes in O
    given the values of the attributes is P(OX1, . .
    . , Xn)a P(O) ? P(Xi O)
  • a is a normalization constant.

8
Bayesian network classifier (4/8)
  • Tree Augmented naïve Bayesian (TAB)

9
Bayesian network classifier (5/8)
  • Independence assumption is unrealistic, edges
    (arcs) are allowed, two attributes Xi and Xj are
    not independent.
  • The posterior probability P(OX1, . . . , Xn)
    takes all the attributes into account. Maximum
    arcs between attributes are n-1.
  • Feature selection and arcs are found by means of
    a search algorithm

10
Bayesian network classifier (6/8)
  • Selective Unrestricted Bayesian network (SUB)

11
Bayesian network classifier (7/8)
  • Selective Unrestricted Bayesian network
  • Generalization of the Tree Augmented naïve
    Bayesian network
  • The class node is equally treated as an attribute
    node and may have attribute nodes as parents
  • The classifier is based on a subset of selected
    features
  • The size of the conditional probability tables of
    the nodes increases exponentially with the number
    of parents
  • The posterior probability distribution of O given
    the value of all attributes is only sensitive to
    those attributes which form the Markov blanket
    (search algorithm).

12
Bayesian network classifier (8/8)
  • Hill Climbing Search (HFS) learn the structure
    of the Bayesian network
  • Classical Floating Search (CFS) algorithm
  • Feature selection for TAB and SUB
  • Main disadvantage of the hill climbing search
  • Once an arc has been added to the network
    structure, the algorithm has no mechanism for
    removing the arc at a later stage.
  • Suffers from the nesting effect
  • Floating search method is used to overcome this
    drawback
  • More evaluations to obtain the network structure
    and computationally less efficient than the hill
    climbing search.

13
Feature selection algorithms(1/5)
  • Two major groups filter approach?wrapper
    approach
  • Filter approach
  • Assesses features from the data set and the
    selection mainly based on statistical measures,
    applications where huge data sets are considered.
  • Wrapper approach
  • More appropriate, performance for evaluating the
    feature subset, achieves a high predictive
    accuracy, high computational costs.
  • Or, a taxonomy of feature selection algorithms,
    shown in Fig. 4.

14
Feature selection algorithms(2/5)
15
Feature selection algorithms(3/5) Optimal methods
  • Exhaustive search
  • Total number of competing subsets is given by 2n-
    1, n is the number of extracted features
  • If the size of the final feature subset d is
    given, the total number of subsets q
    n!/(n-d)!d!
  • Branch-and-bound
  • Advantage faster than exhaustive search
  • Drawback requires a feature selection criterion,
    a new feature add cant decrease the evaluation
    function, not fulfilled by each evaluation
    criterion, e.g. k-NN classifier.

16
Feature selection algorithms(4/5)Suboptimal
methods
  • Genetic algorithms
  • Sequential feature selection algorithms
  • Sequential forward selection (SFS)
  • Bottom up search method
  • Each iteration one feature is added to the
    subset, so that the subset maximizes the
    evaluation criterion J.
  • Drawbackno mechanism for rejecting already
    selected features, even if it becomes
    superfluous. This effect is called nesting.
  • Sequential backward selection (SBS)
  • Counterpart of the SFS
  • One feature is rejected so that the remaining
    subset gives the best result of the evaluation
    criterion J.

17
Feature selection algorithms(5/5)Suboptimal
methods
  • Sequential forward floating selection (SFFS)
  • Adapted for learning the structure of Bayesian
    network classifiers
  • Drawbacks time consuming than SFS, especially
    when data is great complexity.
  • Adaptive Sequential Forward Floating Selection
    (ASFFS(rmax, b, d))33
  • Similar to the SFFS procedure, r is determined
    dynamically maximum restricted by a user defined
    bound rmax
  • r and parameter d depending on the subset size k
  • Parameter b is depending on d
  • This algorithm is initialized with an empty subset

18
Experiments
  • Bayesian network use discretized features,
    recursive minimal entropy partitioning,
    conditional probability tables 0 replaced with e
    0.00001.
  • NB Naive Bayes classifier
  • CFS-SNB Selective Naive Bayesian classifier
    (Classical Floating Search)
  • HCS-TAN Tree Augmented Naive Bayesian classifier
    (Hill-Climbing Search)
  • CFS-TAN Tree Augmented Naive Bayesian classifier
    (Classical Floating Search)
  • CFS-SUN Selective Unrestricted Bayesian network
    (Classical Floating Search)
  • SFFS-k-NN-C k-NN classifier (k ? 1, 3, 5, 9)
    (Continuous-valued data and the SFFS method)
  • SFFS-k-NN-D k-NN classifier (k ? 1, 3, 5, 9)
    (Discrete-valued data and the SFFS method)

19
Experiments- First experiment(1/8)
  • Data set?consist 516 surface segments
  • (42 features/surf. Seg., a sample)
  • Data set is divided into six subsets, for finding
    the optimal classifier (five-fold
    cross-validation, each part is comprised of 90
    samples).

20
Experiments- First experiment(2/8)
  • Floating algorithms SFFS ASFFS perform in a
    better way
  • (ASFFS(3,4,5) subset size of 5 within a
    neighborhood of 4 is optimized more thoroughly.)
  • The number of classifier evaluations is only 5086
    for the SFFS
  • Compared to the ASFFS with 7768
  • GSFS and GPTA with 6391 and 13201 classifier
    evaluations (worse than the SFFS method)
  • PTA and SFS achieve the lowest scores for
    different sizes of subsets, 2623 and 903
    evaluations are necessary.
  • Computational costs depend on the characteristics
    of the data set due to floating property.
  • SFFS achieves good between computational demands
    and classification rate, so further feature
    selection results consider only this method.

21
Experiments- First experiment(3/8)
  • Parameters 230 and 12 structure , 14 arcs

22
Experiments- First experiment(4/8)
23
Experiments- First experiment(5/8)
  • Compares SFFS approach to five Bayesian network
    methods
  • (CV5) accuracy estimate
  • (H) performance
  • (Evaluations) of classifier evaluations
  • (Parameters) of independent probabilities
  • (Features) of features
  • (Arcs) of arcs
  • CFS-SUN achieves best accuracy estimate on the
    five Bayesian network.
  • Additionally, the number evaluations of the TAN
    is high compared to CFS-SUN since the Markov
    blanket is used for the SUN.

24
Experiments- First experiment(6/8)
  • For accuracy estimate k-NN slightly outperforms
    the CFS-SUN
  • The CFS-SUN is simple to evaluate but still
    maintains a high predictive accuracy
  • Bayesian outperform k-NN methods in terms of
    memory requirements and computational demands
  • The k-NN time consuming in case of a large number
    of samples, a large amount of memory might be
    required
  • If decision the optimal size of the feature
    subset be considered
  • Discriminatory information may be lost (for too
    few features)
  • Smaller features results in lower computational
    costs since a limited features be extracted and
    dimensionality of feature space is lower.
  • Additionally, a small set of features used for
    classification may perform better on new data
    samples.

25
Experiments- Second experiment(7/8)
26
Experiments- Second experiment(8/8)
27
5. Conclusions
  • Bayesian more often achieve a better
    classification rate on different data sets as
    selective k-NN classifiers.
  • Bayesian outperform k-NN methods in terms of
    memory requirements and computational demands.

28
??
  • accuracy ?and rate???????,??????,?????????????,??s
    equential feature selection methods(????algorithms
    )???????
  • memory requirements and computational
    demands?????????????,????,?????????????
  • ????????????,??????,??????
  • ????,????(?model)??????characteristics(???)???,???
    ??????
  • ?????????project???????,?????project????????,?????
    ?????????
Write a Comment
User Comments (0)
About PowerShow.com