Bayesian classifiers - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Bayesian classifiers

Description:

Incremental: Each training example can incrementally increase/decrease the ... that combine Bayesian reasoning with causal relationships between attributes ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 14
Provided by: Sun464
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Bayesian classifiers


1
Bayesian classifiers
2
Bayesian Classification Why?
  • Probabilistic learning Calculate explicit
    probabilities for hypothesis, among the most
    practical approaches to certain types of learning
    problems
  • Incremental Each training example can
    incrementally increase/decrease the probability
    that a hypothesis is correct. Prior knowledge
    can be combined with observed data.
  • Probabilistic prediction Predict multiple
    hypotheses, weighted by their probabilities
  • Standard Even when Bayesian methods are
    computationally intractable, they can provide a
    standard of optimal decision making against which
    other methods can be measured

3
Bayesian Theorem
  • Given training data D, posteriori probability of
    a hypothesis h, P(hD) follows the Bayes theorem
  • MAP (maximum posteriori) hypothesis
  • Practical difficulty require initial knowledge
    of many probabilities, significant computational
    cost

4
Naïve Bayes Classifier (I)
  • A simplified assumption attributes are
    conditionally independent
  • Greatly reduces the computation cost, only count
    the class distribution.

5
Naïve Bayesian Classification
  • If i-th attribute is categoricalP(diC) is
    estimated as the relative freq of samples having
    value di as i-th attribute in class C
  • If i-th attribute is continuousP(diC) is
    estimated thru a Gaussian density function
  • Computationally easy in both cases

6
Play-tennis example estimating P(xiC)
7
Naive Bayesian Classifier (II)
  • Given a training set, we can compute the
    probabilities

8
Play-tennis example classifying X
  • An unseen sample X ltrain, hot, high, falsegt
  • P(Xp)P(p) P(rainp)P(hotp)P(highp)P(fals
    ep)P(p) 3/92/93/96/99/14 0.010582
  • P(Xn)P(n) P(rainn)P(hotn)P(highn)P(fals
    en)P(n) 2/52/54/52/55/14 0.018286
  • Sample X is classified in class n (dont play)

9
The independence hypothesis
  • makes computation possible
  • yields optimal classifiers when satisfied
  • but is seldom satisfied in practice, as
    attributes (variables) are often correlated.
  • Attempts to overcome this limitation
  • Bayesian networks, that combine Bayesian
    reasoning with causal relationships between
    attributes

10
Bayesian Belief Networks (I)
Age
FamilyH
(FH, A)
(FH, A)
(FH, A)
(FH, A)
M
0.7
0.8
0.5
0.1
Diabetes
Mass
M
0.3
0.2
0.5
0.9
The conditional probability table for the
variable Mass
Insulin
Glucose
Bayesian Belief Networks
11
Applying Bayesian nets
  • When all but one variable known
  • P(DA,F,M,G,I)

12
Bayesian belief network
  • Find joint probability over set of variables
    making use of conditional independence whenever
    known

a
d
ad ad ad ad
b
b
0.1 0.2 0.3 0.4
Variable e independent of d given b
b
0.3 0.2 0.1 0.5
e
C
13
Bayesian Belief Networks (II)
  • Bayesian belief network allows a subset of the
    variables conditionally independent
  • A graphical model of causal relationships
  • Several cases of learning Bayesian belief
    networks
  • Given both network structure and all the
    variables easy
  • Given network structure but only some variables
    use gradient descent / EM algorithms
  • When the network structure is not known in
    advance
  • Learning structure of network harder

14
The k-Nearest Neighbor Algorithm
  • All instances correspond to points in the n-D
    space.
  • The nearest neighbor are defined in terms of
    Euclidean distance.
  • The target function could be discrete- or real-
    valued.
  • For discrete-valued, the k-NN returns the most
    common value among the k training examples
    nearest to xq.
  • Vonoroi diagram the decision surface induced by
    1-NN for a typical set of training examples.

.
_
_
_
.
_
.

.

.
_

xq
.
_

15
Discussion on the k-NN Algorithm
  • The k-NN algorithm for continuous-valued target
    functions
  • Calculate the mean values of the k nearest
    neighbors
  • Distance-weighted nearest neighbor algorithm
  • Weight the contribution of each of the k
    neighbors according to their distance to the
    query point xq
  • giving greater weight to closer neighbors
  • Similarly, for real-valued target functions
  • Robust to noisy data by averaging k-nearest
    neighbors
  • Curse of dimensionality distance between
    neighbors could be dominated by irrelevant
    attributes.
  • To overcome it, axes stretch or elimination of
    the least relevant attributes.
Write a Comment
User Comments (0)
About PowerShow.com