Bayesian network classifiers - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Bayesian network classifiers

Description:

Learner finds a model with shortest description of original data ... All attributes have class variable as parent. At most 1 attribute pointing to it ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 32
Provided by: yur96
Category:

less

Transcript and Presenter's Notes

Title: Bayesian network classifiers


1
Bayesian network classifiers
2
Classification
  • Basic task for
  • Data analysis
  • Pattern recognition
  • Requires a classifier

3
Classifier
  • Naive Bayesian
  • C4.5
  • A state-of-the-art classifier

4
Naive Bayes
  • Simple Bayesian classifier
  • Strong assumptions of independence among features
  • Competitive with state-of-the-art classifiers
  • Requires a small amount of training data

5
Naive Bayes
  • A fruit may be considered to be an apple if it
    is
  • 4 in diameter
  • Round
  • Red

6
Naive Bayes
  • Can a classifier with less restrictive
    assumptions perform better?

7
Bayesian networks
  • Can represent and manipulate independence
    assertions

8
Learning Bayesian networks
  • Form of unsupervised learning
  • Learner is only given unlabeled examples
  • Goal
  • Given training set D, find network B that best
    matches D

9
Scoring function
  • Evaluates each network with respect to the
    training data
  • Searches for best network

10
Scoring function
  • Common used scoring functions for BN
  • Bayesian scoring function
  • Minimal description length (MDL)
  • A function based on MDL
  • Paper uses MDL

11
Minimal Description Length
  • Learner finds a model with shortest description
    of original data
  • Length of description depends on
  • Description of the model (a network)
  • Description of data using the model
  • MDL is asymptotically correct

12
Minimal Description Length
  • MDL(BD) (logN / 2) B - LL(BD)
  • B Bayesian network
  • D training set
  • B of parameters in B

13
Minimal Description Length
  • MDL(BD) (logN / 2) B - LL(BD)
  • (logN / 2) B
  • Representation length of describing network B
  • Counts the bits needed to encode network B
  • (logN / 2) bits are used for each parameter

14
Minimal Description Length
  • MDL(BD) (logN / 2) B - LL(BD)
  • LL(BD) ?log(Pb(ui))
  • Negation of log likelihood of B given D
  • Measures amount of bits needed to describe D
    based on probability distribution Pb

15
Log likelihood
  • Statistical interpretation
  • The higher LL, the closer B to modeling
    probability distribution in D

16
Log likelihood
  • Problem LL
  • Favors fully connected graphs
  • Results in overfitting
  • Overfitting avoided by MDL
  • First term regulates complexity
  • Penalizes networks containing many variables

17
Bayesian networks as classifiers
  • Using MDL variables are no longer independent
  • Problem in practice
  • Network relative good MDL score
  • Poor classifier

18
Bayesian networks as classifiers
  • LL(BD)
  • ?logPB(cia1i,,ani) ?logPB(a1i,,ani)
  • Rewritten version of log likelihood function

19
Bayesian networks as classifiers
  • LL(BD)
  • ?logPB(cia1i,,ani) ?logPB(a1i,,ani)
  • ?logPB(cia1i,,ani)
  • Measures how well B estimates the probability of
    the class given the attributes

20
Bayesian networks as classifiers
  • LL(BD)
  • ?logPB(cia1i,,ani) ?logPB(a1i,,ani)
  • ?logPB(a1i,,ani)
  • Measures how well B estimates the joint
    distribution of the attributes

21
Bayesian networks as classifiers
  • LL(BD)
  • ?logPB(cia1i,,ani) ?logPB(a1i,,ani)
  • Problem
  • Only first term related to the score of the
    network as a classifier
  • 2nd part dominates when N is large
  • Results in poor classifier when N is large

22
Experiment BN vs NB
  • Observations
  • Unrestricted networks perform poorly on sets with
    5 attributes
  • Unrestricted networks perform significantly worse
    on sets with few relevent attributes

23
Experiment BN vs NB
  • Relevant attribute
  • Attributes which are contained in Markov Blanket
    of C
  • Bayesian networks
  • Some attributes in Markov Blanket of C (feature
    selection)
  • Naive bayesian
  • All attributes in Markov Blanket of C

24
Experiment BN vs NB
  • Feature selection
  • Can discard irrelevant attributes
  • Might discard crucial attributes
  • Result
  • Network with better score not necessarily better
    classifier

25
Experiment BN vs NB
  • Solution
  • Instead of LL use conditional log likelihood
    (CLL)
  • Problem
  • Score no longer maximized

26
Extensions to NB classifier
  • Maintain basic structure of NB
  • All attributes part of class variable Markov
    Blanket
  • Remove strong assumptions of independence
  • Approaches
  • Augmented naive bayes networks
  • Bayesian multinets

27
Augmented NB
  • Connect attributes when needed (augmenting edges)
  • Use Tree-augmented naïve Bayesian network (TAN)
  • All attributes have class variable as parent
  • At most 1 attribute pointing to it

28
Bayesian multinets
  • Partition training data set by classes
  • For each class construct a BN (local network)
  • Bayesian multinet set of local networks prior
    on C

29
Comparison
  • Bayesian multinets
  • Further partitioning
  • Higher risk of missing accurate weight of edge
  • Augmented naive bayesian
  • Forces same augmenting edges for all classes
  • Result
  • Overall performance is equal

30
Overall results
  • BN can lead to significant improvement over NB
  • Can result poorly with multiple attributes
  • TAN and CL
  • roughly equal in terms of accuracy
  • Dominate NB

31
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com