Bayesian Decision Theory (Classification) - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Bayesian Decision Theory (Classification)

Description:

The Normal Distribution. Basics of Probability. Discrete random variable (X) ... Probability mass function (pmf): Cumulative distribution function (cdf) ... – PowerPoint PPT presentation

Number of Views:449
Avg rating:3.0/5.0
Slides: 96
Provided by: tai64
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Decision Theory (Classification)


1
Bayesian Decision Theory(Classification)
  • ??????

2
Contents
  • Introduction
  • Generalize Bayesian Decision Rule
  • Discriminant Functions
  • The Normal Distribution
  • Discriminant Functions for the Normal
    Populations.
  • Minimax Criterion
  • Neyman-Pearson Criterion

3
Bayesian Decision Theory(Classification)
  • Introduction

4
What is Bayesian Decision Theory?
  • Mathematical foundation for decision making.
  • Using probabilistic approach to help making
    decision (e.g., classification) so as to minimize
    the risk (cost).

5
Preliminaries and Notations
a state of nature
prior probability
feature vector
class-conditional density
posterior probability
6
Bayesian Rule
7
Decision
unimportant in making decision
8
Decision
Decide ?i if P(?ix) gt P(?jx) ? j ? i
Decide ?i if p(x?i)P(?i) gt p(x?j)P(?j) ? j ? i
  • Special cases
  • P(?1)P(?2) ? ? ?P(?c)
  • p(x?1)p(x?2) ? ? ? p(x?c)

9
Two Categories
Decide ?i if P(?ix) gt P(?jx) ? j ? i
Decide ?i if p(x?i)P(?i) gt p(x?j)P(?j) ? j ? i
Decide ?1 if P(?1x) gt P(?2x) otherwise
decide ?2
Decide ?1 if p(x?1)P(?1) gt p(x?2)P(?2)
otherwise decide ?2
  • Special cases
  • P(?1)P(?2)
  • Decide ?1 if p(x?1) gt p(x?2) otherwise
    decide ?1
  • 2. p(x?1)p(x?2)
  • Decide ?1 if P(?1) gt P(?2) otherwise decide ?2

10
Example
R2
R1
P(?1)P(?2)
11
Example
P(?1)2/3 P(?2)1/3
Decide ?1 if p(x?1)P(?1) gt p(x?2)P(?2)
otherwise decide ?2
12
Classification Error
Consider two categories
Decide ?1 if P(?1x) gt P(?2x) otherwise
decide ?2
13
Classification Error
Consider two categories
Decide ?1 if P(?1x) gt P(?2x) otherwise
decide ?2
14
Bayesian Decision Theory(Classification)
  • Generalized Bayesian Decision Rule

15
The Generation
a set of c states of nature
a set of a possible actions
The loss incurred for taking action ?i when the
true state of nature is ?j.
Risk
can be zero.
We want to minimize the expected loss in making
decision.
16
Conditional Risk
Given x, the expected loss (risk) associated with
taking action ?i.
17
0/1 Loss Function
18
Decision
Bayesian Decision Rule
19
Overall Risk
Decision function
Bayesian decision rule the optimal one to
minimize the overall risk Its resulting overall
risk is called the Bayesian risk
20
Two-Category Classification
21
Two-Category Classification
Perform ?1 if R(?2x) gt R(?1x) otherwise
perform ?2
22
Two-Category Classification
Perform ?1 if R(?2x) gt R(?1x) otherwise
perform ?2
positive
positive
Posterior probabilities are scaled before
comparison.
23
Two-Category Classification
irrelevant
Perform ?1 if R(?2x) gt R(?1x) otherwise
perform ?2
24
Two-Category Classification
This slide will be recalled later.
Threshold
Likelihood Ratio
Perform ?1 if
25
Bayesian Decision Theory(Classification)
  • Discriminant Functions

26
The Multicategory Classification
How to define discriminant functions?
gi(x)s are called the discriminant functions.
g1(x)
?(x)
g2(x)
gc(x)
Assign x to ?i if
gi(x) gt gj(x) for all j ? i.
27
Simple Discriminant Functions
If f(.) is a monotonically increasing function,
than f(gi(.) )s are also be discriminant
functions.
Minimum Risk case
Minimum Error-Rate case
28
Decision Regions
Two-category example
Decision regions are separated by decision
boundaries.
29
Bayesian Decision Theory(Classification)
  • The Normal Distribution

30
Basics of Probability
Discrete random variable (X) - Assume integer
Probability mass function (pmf)
Cumulative distribution function (cdf)
Continuous random variable (X)
not a probability
Probability density function (pdf)
Cumulative distribution function (cdf)
31
Expectations
Let g be a function of random variable X.
The kth moment
The 1st moment
The kth central moments
32
Important Expectations
Fact
Mean
Variance
33
Entropy
The entropy measures the fundamental uncertainty
in the value of points selected randomly from a
distribution.
34
Univariate Gaussian Distribution
  • Properties
  • Maximize the entropy
  • Central limit theorem

XN(µ,s2)
EX µ
VarX s2
35
Random Vectors
A d-dimensional random vector
Vector Mean
Covariance Matrix
36
Multivariate Gaussian Distribution
XN(µ,S)
A d-dimensional random vector
EX µ
E(X-µ) (X-µ)T S
37
Properties of N(µ,S)
XN(µ,S)
A d-dimensional random vector
Let YATX, where A is a d k matrix.
YN(ATµ, ATSA)
38
Properties of N(µ,S)
XN(µ,S)
A d-dimensional random vector
Let YATX, where A is a d k matrix.
YN(ATµ, ATSA)
39
On Parameters of N(µ,S)
XN(µ,S)
40
More On Covariance Matrix
? is symmetric and positive semidefinite.
? orthonormal matrix, whose columns are
eigenvectors of ?.
? diagonal matrix (eigenvalues).
41
Whitening Transform
XN(µ,S)
YATX
YN(ATµ, ATSA)
Let
42
Whitening Transform
Whitening
XN(µ,S)
Linear Transform
YATX
YN(ATµ, ATSA)
Let
Projection
43
Mahalanobis Distance
XN(µ,S)
r2
constant
depends on the value of r2
44
Mahalanobis Distance
XN(µ,S)
r2
constant
depends on the value of r2
45
Bayesian Decision Theory(Classification)
  • Discriminant Functions for the Normal Populations

46
Minimum-Error-Rate Classification
XiN(µi,Si)
47
Minimum-Error-Rate Classification
Three Cases
Case 1
Classes are centered at different mean, and their
feature components are pairwisely independent
have the same variance.
Case 2
Classes are centered at different mean, but have
the same variation.
Case 3
Arbitrary.
48
Case 1. ?i ?2I
irrelevant
irrelevant
49
Case 1. ?i ?2I
50
Case 1. ?i ?2I
Boundary btw. ?i and ?j
51
Case 1. ?i ?2I
The decision boundary will be a hyperplane
perpendicular to the line btw. the means at
somewhere.
0 if P(?i)P(?j)
midpoint
Boundary btw. ?i and ?j
wT
52
Case 1. ?i ?2I
Minimum distance classifier (template matching)
53
Case 1. ?i ?2I
54
Case 1. ?i ?2I
55
Case 1. ?i ?2I
Demo
56
Case 2. ?i ?
Mahalanobis Distance
Irrelevant if P(?i) P(?j) ?i, j
irrelevant
57
Case 2. ?i ?
Irrelevant
58
Case 2. ?i ?
59
Case 2. ?i ?
60
Case 2. ?i ?
Demo
61
Case 3. ?i ? ? j
  • Decision surfaces are hyperquadrics, e.g.,
  • hyperplanes
  • hyperspheres
  • hyperellipsoids
  • hyperhyperboloids

Without this term In Case 1 and 2
irrelevant
62
Case 3. ?i ? ? j
Non-simply connected decision regions can arise
in one dimensions for Gaussians having unequal
variance.
63
Case 3. ?i ? ? j
64
Case 3. ?i ? ? j
65
Case 3. ?i ? ? j
Demo
66
Multi-Category Classification
67
Bayesian Decision Theory(Classification)
  • Minimax Criterion

68
Bayesian Decision RuleTwo-Category
Classification
Threshold
Likelihood Ratio
Decide ?1 if
Minimax criterion deals with the case that the
prior probabilities are unknown.
69
Basic Concept on Minimax
To choose the worst-case prior probabilities (the
maximum loss) and, then, pick the decision rule
that will minimize the overall risk.
Minimize the maximum possible overall risk.
70
Overall Risk
71
Overall Risk
72
Overall Risk
73
Overall Risk
74
Overall Risk
R(x) ax b
The value depends on the setting of decision
boundary
The value depends on the setting of decision
boundary
The overall risk for a particular P(?1).
75
Overall Risk
R(x) ax b
0 for minimax solution
Rmm, minimax risk
Independent on the value of P(?i).
76
Minimax Risk
77
Error Probability
Use 0/1 loss function
78
Minimax Error-Probability
Use 0/1 loss function
P(?1?2)
P(?2?1)
79
Minimax Error-Probability
P(?1?2)
P(?2?1)
80
Minimax Error-Probability
81
Bayesian Decision Theory(Classification)
  • Neyman-Pearson Criterion

82
Bayesian Decision RuleTwo-Category
Classification
Threshold
Likelihood Ratio
Decide ?1 if
Neyman-Pearson Criterion deals with the case that
both loss functions and the prior probabilities
are unknown.
83
Signal Detection Theory
  • The theory of signal detection theory evolved
    from the development of communications and radar
    equipment the first half of the last century.
  • It migrated to psychology, initially as part of
    sensation and perception, in the 50's and 60's as
    an attempt to understand some of the features of
    human behavior when detecting very faint stimuli
    that were not being explained by traditional
    theories of thresholds.

84
The situation of interest
  • A person is faced with a stimulus (signal) that
    is very faint or confusing.
  • The person must make a decision, is the signal
    there or not. 
  • What makes this situation confusing and difficult
    is the presences of other mess that is similar to
    the signal.  Let us call this mess noise.

85
Example
Noise is present both in the environment and in
the sensory system of the observer. The observer
reacts to the momentary total activation of the
sensory system, which fluctuates from moment to
moment, as well as responding to environmental
stimuli, which may include a signal.
86
Example
  • A radiologist is examining a CT scan, looking for
    evidence of a tumor.
  • A hard job, because there is always some
    uncertainty.
  • There are four possible outcomes
  • hit (tumor present and doctor says "yes'')
  • miss (tumor present and doctor says "no'')
  • false alarm (tumor absent and doctor says "yes")
  • correct rejection (tumor absent and doctor says
    "no").

Two types of Error
87
The Four Cases
Signal detection theory was developed to help us
understand how a continuous and ambiguous signal
can lead to a binary yes/no decision.
Correct Rejection
Miss
P(?1?2)
P(?1?1)
False Alarms
Hit
P(?2?2)
P(?2?1)
88
Decision Making
Discriminability
Based on expectancy (decision bias)
Criterion
Hit
P(?2?2)
False Alarm
P(?2?1)
89
ROC Curve(Receiver Operating Characteristic)
Hit
PHP(?2?2)
False Alarm
PFAP(?2?1)
90
Neyman-Pearson Criterion
Hit
PHP(?2?2)
NP
max. PH subject to PFA ? a
False Alarm
PFAP(?2?1)
91
Likelihood Ratio Test
where T is a threshold that meets the PFA
constraint (? a).
How to determine T?
92
Likelihood Ratio Test
93
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to
give PFA(?) a. There is no decision rule ? such
that PFA(?) ? a and PH(?) gt PH(?) .
Pf)
Let ? be a decision rule with
gt 0
1
? 0
94
Neyman-Pearson Lemma
Consider the aforementioned rule with T chosen to
give PFA(?) a. There is no decision rule ? such
that PFA(?) ?a and PH(?) gt PH(?) .
Pf)
Let ? be a decision rule with
?0
?
0
?0
95
Neyman-Pearson Lemma


OK
Consider the aforementioned rule with T chosen to
give PFA(?) a. There is no decision rule ? such
that PFA(?) ?a and PH(?) gt PH(?) .
Pf)
Let ? be a decision rule with
?0
Write a Comment
User Comments (0)
About PowerShow.com