Title: Lecture 8 The Principle of Maximum Likelihood
1Lecture 8 The Principle of Maximum Likelihood
2Syllabus
Lecture 01 Describing Inverse ProblemsLecture
02 Probability and Measurement Error, Part
1Lecture 03 Probability and Measurement Error,
Part 2 Lecture 04 The L2 Norm and Simple Least
SquaresLecture 05 A Priori Information and
Weighted Least SquaredLecture 06 Resolution and
Generalized Inverses Lecture 07 Backus-Gilbert
Inverse and the Trade Off of Resolution and
VarianceLecture 08 The Principle of Maximum
LikelihoodLecture 09 Inexact TheoriesLecture
10 Nonuniqueness and Localized AveragesLecture
11 Vector Spaces and Singular Value
Decomposition Lecture 12 Equality and Inequality
ConstraintsLecture 13 L1 , L8 Norm Problems and
Linear ProgrammingLecture 14 Nonlinear
Problems Grid and Monte Carlo Searches Lecture
15 Nonlinear Problems Newtons Method Lecture
16 Nonlinear Problems Simulated Annealing and
Bootstrap Confidence Intervals Lecture
17 Factor AnalysisLecture 18 Varimax Factors,
Empircal Orthogonal FunctionsLecture
19 Backus-Gilbert Theory for Continuous
Problems Radons ProblemLecture 20 Linear
Operators and Their AdjointsLecture 21 Fréchet
DerivativesLecture 22 Exemplary Inverse
Problems, incl. Filter DesignLecture 23
Exemplary Inverse Problems, incl. Earthquake
LocationLecture 24 Exemplary Inverse Problems,
incl. Vibrational Problems
3Purpose of the Lecture
Introduce the spaces of all possible data, all
possible models and the idea of likelihood Use
maximization of likelihood as a guiding principle
for solving inverse problems
4Part 1The spaces of all possible data,all
possible models and the idea of likelihood
5viewpoint
- the observed data is one point in the space of
all possible observations - or
- dobs is a point in S(d)
6plot of dobs
7plot of dobs
dobs
8now suppose
- the data are independent
- each is drawn from a Gaussian distribution
- with the same mean m1 and variance s2
- (but m1 and s unknown)
9plot of p(d)
10plot of p(d)
cloud centered on d1d2d3 with radius
proportional to s
11now interpret
- p(dobs)
- as the probability that the observed data was in
fact observed
L log p(dobs) called the likelihood
12find parameters in the distribution
- maximize
- p(dobs)
- with respect to m1 and s
maximize the probability that the observed
data were in fact observed the Principle of
Maximum Likelihood
13Example
14solving the two equations
15solving the two equations
usual formula for the sample mean
almost the usual formula for the sample standard
deviation
16these two estimates linked to the assumption of
the data being Gaussian-distributed might
get a different formula for a different p.d.f.
17example of a likelihood surface
18likelihood maximization process will fail if
p.d.f. has no well-defined peak
19Part 2Using the maximization of likelihood as
a guiding principle for solving inverse problems
20linear inverse problem for with
Gaussian-distibuted datawith known covariance
cov dassumeGmdgives the mean d
T
21principle of maximum likelihoodmaximize L log
p(dobs)minimize
T
with respect to m
22principle of maximum likelihoodmaximize L log
p(dobs)minimize
T
E
This is just weighted least squares
23principle of maximum likelihoodwhen data
Gaussian-distributedsolve Gmd with weighted
least squareswith weighting of
24special case of uncorrelated dataeach datum with
a different variancecov dii sdi2minimize
25special case of uncorrelated dataeach datum with
a different variancecov dii sdi2minimize
errors weighted by their certainty
26but what about a priori information?
27probabilistic representation of a priori
information
- probability that the model parameters are
- near m
- given by p.d.f.
- pA(m)
28probabilistic representation of a priori
information
- probability that the model parameters are
- near m
- given by p.d.f.
- pA(m)
centered at a priori value ltmgt
29probabilistic representation of a priori
information
- probability that the model parameters are
- near m
- given by p.d.f.
- pA(m)
variance reflects uncertainty in a priori
information
30uncertain
certain
ltm2gt
ltm2gt
m2
m2
ltm1gt
ltm1gt
m1
m1
31(No Transcript)
32(No Transcript)
33(No Transcript)
34assessing the information contentin pA(m)
- Do we know a little about m
- or
- a lot about m ?
35Information Gain, S
-S called Relative Entropy,
36Relative Entropy, Salso called Information Gain
null p.d.f. state of no knowledge
37Relative Entropy, Salso called Information Gain
uniform p.d.f. might work for this
38probabilistic representation of data
- probability that the data are
- near d
- given by p.d.f.
- pA(d)
39probabilistic representation of data
- probability that the data are
- near d
- given by p.d.f.
- p(d)
centered at observed data dobs
40probabilistic representation of data
- probability that the data are
- near d
- given by p.d.f.
- p(d)
variance reflects uncertainty in measurements
41probabilistic representation of both prior
information and observed data
- assume observations and a priori information are
uncorrelated
42Example of
43the theoryd g(m)is a surface in the combined
space of data and model parameterson which the
estimated model parameters and predicted data
must lie
44the theoryd g(m)is a surface in the combined
space of data and model parameterson which the
estimated model parameters and predicted data
must liefor a linear theorythe surface is
planar
45the principle of maximum likelihood says
on the surface dg(m)
46(A)
47(No Transcript)
48(No Transcript)
49minimize
principle of maximum likelihoodwithGaussian-dist
ributed dataGaussian-distributed a priori
information
50this is just weighted least squareswith
so we already know the solution
51solve Fmf with simple least squares
52when cov dsd2I and cov msm2I
53this provides and answer to the questionWhat
should be the value of e2in damped least
squares?The answer
it should be set to the ratio of variances of the
data and the a priori model parameters
54if the a priori information isHmhwith
covariance cov hAthen the Fmf becomes
55Gmdobs with covariance cov dHmh with
covariance cov hAmest (FTF)-1FTdobs
the most useful formula in inverse theory
with