4. Maximum Likelihood - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

4. Maximum Likelihood

Description:

Title: Slide 1 Author: Statistics Administrator Last modified by: aa Created Date: 1/3/2003 11:26:30 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 12
Provided by: Statis4
Category:

less

Transcript and Presenter's Notes

Title: 4. Maximum Likelihood


1
4. Maximum Likelihood
  • Prof. A.L. Yuille
  • Stat 231. Fall 2004.

2
Learning Probability Distributions.
  • Learn the likelihood functions and priors from
    datasets.
  • Two Main Strategies. Parametric and
    Non-Parametric.
  • This Lecture and the next will concentrate on
    Parametric methods.
  • (This assumes a parametric form for the
    distributions).

3
Maximum Likelihood Estimation.
  • Assume distribution is of form
  • Independent Identically Distributed (I.I.D.)
    samples
  • Choose

4
Supervised versus Unsupervised Learning.
  • Supervised Learning assumes that we known the
    class label for each datapoint.
  • I.e. We are given pairs
  • where is the datapoint and
    is the class label.
  • Unsupervised Learning does not assume that the
    class labels are specified. This is a harder
    task.
  • But unsupervised methods can also be used for
    supervised data if the goal is to determine
    structure in the data (e.g. mixture of
    Gaussians).
  • Stat 231 is almost entirely concerned with
    supervised learning.

5
Example of MLE.
  • One-Dimensional Gaussian Distribution
  • Solve for
    by differentiation

6
MLE
  • The Gaussian is unusual because the parameters
    of the distribution can be expressed as an
    analytic expression of the data.
  • More usually, algorithms are required.
  • Modeling problem for complicated patterns
    shape of fish, natural language, etc. it
    requires considerable work to find a suitable
    parametric form for the probability
    distributions.

7
MLE and Kullback-Leibler
  • What happens if the data is not generated by the
    model that we assume?
  • Suppose the true distribution is
    and our models are of form
  • The Kullback-Leiber divergence is
  • This is
  • K-L is a measure of the difference between

8
MLE and Kullback-Leibler
  • Samples
  • Approximate
  • By the empirical KL
  • Minimizing the empirical KL is equivalent to MLE.
  • We find the distribution of form

9
MLE example
We denote the log-likelihood as a function of q
q is computed by solving equations
For example, the Gaussian family gives close form
solution.
10
Learning with a Prior.
  • We can put a prior on the parameter values
  • We can estimate this recursively (if samples are
    i.i.d)
  • Bayes Learning estimate a probability
    distribution on

11
Recursive Bayes Learning
Write a Comment
User Comments (0)
About PowerShow.com