Bayesian classifiers - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Bayesian classifiers

Description:

Incremental: Each training example can incrementally increase/decrease the ... that combine Bayesian reasoning with causal relationships between attributes ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 14

Provided by: Sun464

Learn more at: https://www.cs.bu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian classifiers

1
Bayesian classifiers
2
Bayesian Classification Why?

Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems
Incremental Each training example can
incrementally increase/decrease the probability
that a hypothesis is correct. Prior knowledge
can be combined with observed data.
Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities
Standard Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against which
other methods can be measured

3
Bayesian Theorem

Given training data D, posteriori probability of
a hypothesis h, P(hD) follows the Bayes theorem
MAP (maximum posteriori) hypothesis
Practical difficulty require initial knowledge
of many probabilities, significant computational
cost

4
Naïve Bayes Classifier (I)

A simplified assumption attributes are
conditionally independent
Greatly reduces the computation cost, only count
the class distribution.

5
Naïve Bayesian Classification

If i-th attribute is categoricalP(diC) is
estimated as the relative freq of samples having
value di as i-th attribute in class C
If i-th attribute is continuousP(diC) is
estimated thru a Gaussian density function
Computationally easy in both cases

6
Play-tennis example estimating P(xiC)
7
Naive Bayesian Classifier (II)

Given a training set, we can compute the
probabilities

8
Play-tennis example classifying X

An unseen sample X ltrain, hot, high, falsegt
P(Xp)P(p) P(rainp)P(hotp)P(highp)P(fals
ep)P(p) 3/92/93/96/99/14 0.010582
P(Xn)P(n) P(rainn)P(hotn)P(highn)P(fals
en)P(n) 2/52/54/52/55/14 0.018286
Sample X is classified in class n (dont play)

9
The independence hypothesis

makes computation possible
yields optimal classifiers when satisfied
but is seldom satisfied in practice, as
attributes (variables) are often correlated.
Attempts to overcome this limitation
Bayesian networks, that combine Bayesian
reasoning with causal relationships between
attributes

10
Bayesian Belief Networks (I)
Age
FamilyH
(FH, A)
(FH, A)
(FH, A)
(FH, A)
M
0.7
0.8
0.5
0.1
Diabetes
Mass
M
0.3
0.2
0.5
0.9
The conditional probability table for the
variable Mass
Insulin
Glucose
Bayesian Belief Networks
11
Applying Bayesian nets

When all but one variable known
P(DA,F,M,G,I)

12
Bayesian belief network

Find joint probability over set of variables
making use of conditional independence whenever
known

a
d
ad ad ad ad
b
b
0.1 0.2 0.3 0.4
Variable e independent of d given b
b
0.3 0.2 0.1 0.5
e
C
13
Bayesian Belief Networks (II)

Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships
Several cases of learning Bayesian belief
networks
Given both network structure and all the
variables easy
Given network structure but only some variables
use gradient descent / EM algorithms
When the network structure is not known in
advance
Learning structure of network harder

14
The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D
space.
The nearest neighbor are defined in terms of
Euclidean distance.
The target function could be discrete- or real-
valued.
For discrete-valued, the k-NN returns the most
common value among the k training examples
nearest to xq.
Vonoroi diagram the decision surface induced by
1-NN for a typical set of training examples.

.
_
_
_
.
_
.

.

.
_

xq
.
_

15
Discussion on the k-NN Algorithm

The k-NN algorithm for continuous-valued target
functions
Calculate the mean values of the k nearest
neighbors
Distance-weighted nearest neighbor algorithm
Weight the contribution of each of the k
neighbors according to their distance to the
query point xq
giving greater weight to closer neighbors
Similarly, for real-valued target functions
Robust to noisy data by averaging k-nearest
neighbors
Curse of dimensionality distance between
neighbors could be dominated by irrelevant
attributes.
To overcome it, axes stretch or elimination of
the least relevant attributes.