Advanced Topics in - PowerPoint PPT Presentation

About This Presentation

Title:

Advanced Topics in

Description:

Lecture 7. Advanced Topics in. Least Squares. the multivariate normal distribution for data, d ... Let's look at the transformations ... d'=Cd-1/2d. G' = Cd-1/2G ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 37

Provided by: billm7

Learn more at: https://www.ldeo.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Topics in

1
Lecture 7

Advanced Topics in
Least Squares

the multivariate normal distribution for data, d
p(d) (2?)-N/2 Cd-1/2 exp -1/2 (d-d)T Cd-1
(d-d)
Lets assume that the expectation d
Is given by a general linear model
d Gm
And that the covariance Cd
is known (prior covariance)

Then we have a distribution P(d m)
with unknown parameters, m
p(d)(2?)-N/2Cd-1/2exp -½ (d-Gm)T Cd-1 (d-Gm)
We can now apply the
principle of maximum likelihood
To estimate the unknown parameters m

4
Principle of Maximum Likelihood

Last lecture we stated this principle as
L(m) Si ln p(di m) with respect to m
but in this distribution
the whole data vector d is being treated as a
single quantity
So the princple becomes simply
Maximize L(m) ln p(d m)

p(dm)(2?)-N/2Cd-1/2exp -½ (d-Gm)T Cd-1
(d-Gm)
5

L(m) ln p(d m)
- ½Nln(2?) - ½ln(Cd) - ½(d-Gm)T Cd-1 (d-Gm)
The first two terms do not contain m, so the
principle of maximum likelihood is
Maximize -½ (d-Gm) T Cd-1 (d-Gm)
or
Minimize (d-Gm) T Cd-1 (d-Gm)

Minimize (d-Gm) T Cd-1 (d-Gm)
Special case of uncorrelated data with equal
variance
Cd sd2I
Minimize sd-2 (d-Gm)T (d-Gm) with respect to m
Which is the same as
Minimize (d-Gm)T (d- Gm) with respect to m
This is the Principle of Least Squares

This is the Principle of Least Squares
minimize E eTe (d-Gm)T(d-Gm) with respect to
m
follows from the Principle of Maximum Likelihood
in the special case of
a multivariate Normal distribution
the data being uncorrelated and of equal variance

Corollary
If your data are
NOT NORMALLY DISTRIBUTED
Then least-squares is
not the right method to use!

What if Cdsd2I but sd is unknown?
note Cd s2N
L(m,sd)
-½Nln(2?) - ½ln(Cd) - ½(d-Gm)T Cd-1 (d-Gm)
-½Nln(2?) Nln(sd) - ½sd-2(d-Gm)T (d-Gm)
The first two terms do not contain m, so the
principle of maximum likelihood still implies
Minimize (d-Gm)T(d-Gm) eTe E
Then ?L/?sd 0 Nsd-1 sd-3(d-Gm)T (d-Gm)
Or, solving for sd
sd2 N-1 (d-Gm)T (d-Gm) N-1 eTe

This is the Principle of Maximum Likelihood
implies that
sd2 N-1 (d-Gm)T (d-Gm) N-1 eTe
Is a good posterior estimate of the variance of
the data, when
the data follow a multivariate normal
distribution
the data are uncorrelated and with uniform
(but unknown) variance, sd2

But back to the general case
What formula for m does the rule
Minimize (d-Gm)T Cd-1 (d-Gm)
imply ?

Trick
Minimize (d-Gm)T (d-Gm)
Implies m GTG-1 GT d
Now write, Minimize
(d-Gm)T Cd-1 (d-Gm)
(d-Gm)T Cd-1/2 Cd-1/2 (d-Gm)
(Cd-1/2d-Cd-1/2Gm)T (Cd-1/2d-Cd-1/2Gm)
(d-Gm)T (d-Gm)
with dCd-1/2d
G Cd-1/2G
This is simple least squares, so m GTG-1 GT
d
or
m GTCd-1/2Cd-1/2G-1GTCd-1/2Cd-1/2d GT
Cd-1G-1GTCd-1d

Symmetric, so it inverse and square root is
symmetric, too
13

So,
minimize (d-Gm)T Cd-1 (d-Gm)
implies
m GT Cd-1G-1GTCd-1d
and
Cm GTCd-1G-1GTCd-1 Cd GTCd-1G-1GTCd-1
T
GTCd-1G-1GTCd-1G GTCd-1G-1 GTCd-1G-1

Remember formula Cm M Cd MT
14
Example with Correlated Noise
Uncorrelated Noise
Correlated Noise
15
Scatter Plots
di vs. di1 high correlation
di vs. di2 some correlation
di vs. di3 little correlation
16
data straight line correlated noise
d a bx n
x
17
Model for Cd Cd ij exp -c i-j with
c0.25exponential falloff from main diagonal
MatLab Code c 0.25 XX, YY meshgrid(
1N, 1N ) Cd (sd2)exp(-cabs(XX-YY))
18
Results
Intercept Correlated 10.96 20.6
Uncorrelated 8.42 7.9 True
1.0 Slope Correlated 1.92 0.35
Uncorrelated 1.97 0.14 True
2.0
d a bx n
Both fits about the same but
x
note error estimates are larger (more realistic
?) for the correlated case
19
How to make correlated noise
w 0.1, 0.3, 0.7, 1.0, 0.7, 0.3, 0.1' w
w/sum(w) Nw length(w) Nw2
(Nw-1)/2 N101 N2(N-1)/2 n1
random('Normal',0,1,NNw,1) n zeros(N,1) for
i -Nw2Nw2 n n w(iNw21)n1(iNw-Nw2
iNwN-1-Nw2) end
Define weighting function
Start with uncorrelated noise
Correlated noise is a weighted average of
neighboring uncorrelated noise values
20

Lets look at the transformations
dCd-1/2d
G Cd-1/2G
In the special case of uncorrelated data with
different variances Cd diag( s12, s22, sN2)
disi-1 di multiply each data by
the reciprocal of its error
Gij si-1 Gij multiply each row of the
data kernel by the same
amount
Then solve by ordinary least squares

s12 0 0 0 s22 0 0 0 s32
...
21
s1-1G11 s1-1G12 s1-1G13 s2-1G21
s2-1G22 s2-1G13 s3-1G31 s3-1G32
s3-1G33 sN-1GN1 sN-1GN2 sN-1GN3
s1-1d1s2-1d2s3-1d3sN-1dN
m
Rows have been weighted by a factor of si-1
22
So this special case is often called

Weighted Least Squares

Note that the total error isE eT Cd-1 e Si
si-2 ei2
weight
Each individual error is weighted by the
reciprocal of its variance, so errors involving
data with SMALL variance get MORE weight
23
Example fitting a straight line
100 data, first 50 have a different sd than the
last 50
24
N101N2(N-1)/2sd(1N2-1) 5sd(N2N)
100sd2i sd.(-2)Cdi diag(sd2i)G(,1)on
es(N,1)G(,2)xGTCdiGIinv(G'CdiG)m
GTCdiGIG'Cdidd2 m(1) m(2) . x
MatLab Code
Note that Cd-1 is explicitly defines as a
diagonal matrix
25
Equal variance
Left 50 sd 5 right 50 sd 5
26
Left has smaller variance
first 50 sd 5 last 50 sd 100
27
Right has smaller variance
first 50 sd 100 last 50 sd 5
28
Finally, two miscellaneous comments about
least-squares
29
Comment 1Case of fitting functions to a
datasetdi m1 f1(xi) m2 f2(xi) m3 f3(xi)
e.g. di m1 sin(xi) m2 cos(xi) m3 sin(2xi)
30
f1(x1) f2(x1) f3(x1) f1(x2) f2(x2)
f3(x2) f1(x3) f2(x3) f3(x3)
f1(xN) f2(xN) f3(xN)
d1d2d3dN
m
31
Note that the matrix GTGhas element i,j
GTGij Si fi(xk)fj(xk) fi ? fjand thus is
diagonal if the functions are orthogonal
32
if the functions are normalized so fi?fi1 then
GTG I and the least squares solution ism
GTd and Cmsd2 I
super-simple formula! mi fi ? d
guaranteed uncorrelated errors!
33
Example of Straight line
yi a bxi implies f1(x) 1 and f2(x)
x so condition f1(x)?f2(x)0 implies Si xi 0
or x0 this happens when the xs straddle the
origin The choice f1(x) 1 and f2(x)
x-x i.e. y a b (x-x) leads to
uncorrelated errors in (a,b)
y
a
a
x
x1
x2
x3
x4
x5
x
34
Example wavelet functions
Localized oscillation with a character-istic
frequency
35
GTG
Almost diagonal
36
Comment 2sometimes writing least-squares as
GTG m GT d or GTG m GT d is more
useful than m GTG-1 GT dsince you can use
some method other than a matrix inverse for
solving the equation

Write a Comment

User Comments (0)