240-650 Principles of Pattern Recognition - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

240-650 Principles of Pattern Recognition

Description:

Maximum-Likelihood and Bayesian ... The MLE for the variance s2 is biased ... all data sets of size n of the sample variance is not equal to the true variance ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 22
Provided by: Mon96
Category:

less

Transcript and Presenter's Notes

Title: 240-650 Principles of Pattern Recognition


1
240-650 Principles of Pattern Recognition
Montri Karnjanadecha montri_at_coe.psu.ac.th http//f
ivedots.coe.psu.ac.th/montri
2
Chapter 3
  • Maximum-Likelihood and Bayesian Parameter
    Estimation

3
Introduction
  • We could design an optimum classifier if we know
    P(wi) and p(xwi)
  • We rarely have knowledge about the probabilistic
    structure of the problem
  • We often estimate P(wi) and p(xwi) from training
    data or design samples

4
Maximum-Likelihood Estimation
  • ML Estimation
  • Always have good convergence properties as the
    number of training samples increases
  • Simpler that other methods

5
The General Principle
  • Suppose we separate a collection of samples
    according to class so that we have c data sets,
    D1, , Dc with the samples in Dj having been
    drawn independently according to the probability
    law p(xwj)
  • We say such samples are i.i.d. independently and
    identically distributed random variable

6
The General Principle
  • We assume that p(xwj) has a known parametric
    form and is determined uniquely by the value of
    a parameter vector qj
  • For example
  • We explicitly write p(xwj) as p(xwj, qj)

7
Problem Statement
  • To use the information provided by the training
    samples to obtain good estimates for the unknown
    parameter vectors q1,qc associated with each
    category

8
Simplified Problem Statement
  • If samples in Di give no information about qj if
    i j
  • We now have c separated problems of the following
    form
  • To use a set D of training samples drawn
    independently from the probability density p(xq)
    to estimate the unknown vector q.

9
  • Suppose that D contains n samples, x1,,xn.
  • Then we have
  • The Maximum-Likelihood estimate of q is the value
    of that maximizes p(Dq)

Likelihood of q with respect to the set of samples
10
(No Transcript)
11
  • Let q (q1, , qp)t
  • Let be the gradient operator

12
Log-Likelihood Function
  • We define l(q) as the log-likelihood function
  • We can write our solution as

13
MLE
  • From
  • We have
  • And
  • Necessary condition for MLE

14
The Gaussian Case Unknown m
  • Suppose that the samples are drawn from a
    multivariate normal population with mean m and
    covariance matrix S
  • Let m is the only unknown
  • Consider a sample point xk and find
  • and

15
  • The MLE of m must satisfy
  • After rearranging

16
Sample Mean
  • The MLE for the unknown population meanis just
    the arithmetic average of the training samples
    (or sample mean)
  • If we think of the n samples as a cloud of
    points, then the sample mean is the centroid of
    the cloud

17
The Gaussian Case Unknown m and S
  • This is a more typical case where mean and
    covariance matrix are unknown
  • Consider the univariate case with q1m and q2s2

18
  • And its derivative is
  • Set to 0
  • and

19
  • With a little rearranging, we have

20
MLE for multivariate case
21
Bias
  • The MLE for the variance s2 is biased
  • The expected value over all data sets of size n
    of the sample variance is not equal to the true
    variance
  • An Unbiased estimator for S is given by
Write a Comment
User Comments (0)
About PowerShow.com