Bayesian network classifiers - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Bayesian network classifiers

Description:

Learner finds a model with shortest description of original data ... All attributes have class variable as parent. At most 1 attribute pointing to it ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 32

Provided by: yur96

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian network classifiers

1
Bayesian network classifiers
2
Classification

Basic task for
Data analysis
Pattern recognition
Requires a classifier

3
Classifier

Naive Bayesian
C4.5
A state-of-the-art classifier

4
Naive Bayes

Simple Bayesian classifier
Strong assumptions of independence among features
Competitive with state-of-the-art classifiers
Requires a small amount of training data

5
Naive Bayes

A fruit may be considered to be an apple if it
is
4 in diameter
Round
Red

6
Naive Bayes

Can a classifier with less restrictive
assumptions perform better?

7
Bayesian networks

Can represent and manipulate independence
assertions

8
Learning Bayesian networks

Form of unsupervised learning
Learner is only given unlabeled examples
Goal
Given training set D, find network B that best
matches D

9
Scoring function

Evaluates each network with respect to the
training data
Searches for best network

10
Scoring function

Common used scoring functions for BN
Bayesian scoring function
Minimal description length (MDL)
A function based on MDL
Paper uses MDL

11
Minimal Description Length

Learner finds a model with shortest description
of original data
Length of description depends on
Description of the model (a network)
Description of data using the model
MDL is asymptotically correct

12
Minimal Description Length

MDL(BD) (logN / 2) B - LL(BD)
B Bayesian network
D training set
B of parameters in B

13
Minimal Description Length

MDL(BD) (logN / 2) B - LL(BD)
(logN / 2) B
Representation length of describing network B
Counts the bits needed to encode network B
(logN / 2) bits are used for each parameter

14
Minimal Description Length

MDL(BD) (logN / 2) B - LL(BD)
LL(BD) ?log(Pb(ui))
Negation of log likelihood of B given D
Measures amount of bits needed to describe D
based on probability distribution Pb

15
Log likelihood

Statistical interpretation
The higher LL, the closer B to modeling
probability distribution in D

16
Log likelihood

Problem LL
Favors fully connected graphs
Results in overfitting
Overfitting avoided by MDL
First term regulates complexity
Penalizes networks containing many variables

17
Bayesian networks as classifiers

Using MDL variables are no longer independent
Problem in practice
Network relative good MDL score
Poor classifier

18
Bayesian networks as classifiers

LL(BD)
?logPB(cia1i,,ani) ?logPB(a1i,,ani)
Rewritten version of log likelihood function

19
Bayesian networks as classifiers

LL(BD)
?logPB(cia1i,,ani) ?logPB(a1i,,ani)
?logPB(cia1i,,ani)
Measures how well B estimates the probability of
the class given the attributes

20
Bayesian networks as classifiers

LL(BD)
?logPB(cia1i,,ani) ?logPB(a1i,,ani)
?logPB(a1i,,ani)
Measures how well B estimates the joint
distribution of the attributes

21
Bayesian networks as classifiers

LL(BD)
?logPB(cia1i,,ani) ?logPB(a1i,,ani)
Problem
Only first term related to the score of the
network as a classifier
2nd part dominates when N is large
Results in poor classifier when N is large

22
Experiment BN vs NB

Observations
Unrestricted networks perform poorly on sets with
5 attributes
Unrestricted networks perform significantly worse
on sets with few relevent attributes

23
Experiment BN vs NB