Advanced Topics in - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Topics in

Description:

Lecture 7. Advanced Topics in. Least Squares. the multivariate normal distribution for data, d ... Let's look at the transformations ... d'=Cd-1/2d. G' = Cd-1/2G ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 37
Provided by: billm7
Category:

less

Transcript and Presenter's Notes

Title: Advanced Topics in


1
Lecture 7
  • Advanced Topics in
  • Least Squares

2
  • the multivariate normal distribution for data, d
  • p(d) (2?)-N/2 Cd-1/2 exp -1/2 (d-d)T Cd-1
    (d-d)
  • Lets assume that the expectation d
  • Is given by a general linear model
  • d Gm
  • And that the covariance Cd
  • is known (prior covariance)

3
  • Then we have a distribution P(d m)
  • with unknown parameters, m
  • p(d)(2?)-N/2Cd-1/2exp -½ (d-Gm)T Cd-1 (d-Gm)
  • We can now apply the
  • principle of maximum likelihood
  • To estimate the unknown parameters m

4
Principle of Maximum Likelihood
  • Last lecture we stated this principle as
  • L(m) Si ln p(di m) with respect to m
  • but in this distribution
  • the whole data vector d is being treated as a
    single quantity
  • So the princple becomes simply
  • Maximize L(m) ln p(d m)

p(dm)(2?)-N/2Cd-1/2exp -½ (d-Gm)T Cd-1
(d-Gm)
5
  • L(m) ln p(d m)
  • - ½Nln(2?) - ½ln(Cd) - ½(d-Gm)T Cd-1 (d-Gm)
  • The first two terms do not contain m, so the
    principle of maximum likelihood is
  • Maximize -½ (d-Gm) T Cd-1 (d-Gm)
  • or
  • Minimize (d-Gm) T Cd-1 (d-Gm)

6
  • Minimize (d-Gm) T Cd-1 (d-Gm)
  • Special case of uncorrelated data with equal
    variance
  • Cd sd2I
  • Minimize sd-2 (d-Gm)T (d-Gm) with respect to m
  • Which is the same as
  • Minimize (d-Gm)T (d- Gm) with respect to m
  • This is the Principle of Least Squares

7
  • This is the Principle of Least Squares
  • minimize E eTe (d-Gm)T(d-Gm) with respect to
    m
  • follows from the Principle of Maximum Likelihood
    in the special case of
  • a multivariate Normal distribution
  • the data being uncorrelated and of equal variance

8
  • Corollary
  • If your data are
  • NOT NORMALLY DISTRIBUTED
  • Then least-squares is
  • not the right method to use!

9
  • What if Cdsd2I but sd is unknown?
  • note Cd s2N
  • L(m,sd)
  • -½Nln(2?) - ½ln(Cd) - ½(d-Gm)T Cd-1 (d-Gm)
  • -½Nln(2?) Nln(sd) - ½sd-2(d-Gm)T (d-Gm)
  • The first two terms do not contain m, so the
    principle of maximum likelihood still implies
  • Minimize (d-Gm)T(d-Gm) eTe E
  • Then ?L/?sd 0 Nsd-1 sd-3(d-Gm)T (d-Gm)
  • Or, solving for sd
  • sd2 N-1 (d-Gm)T (d-Gm) N-1 eTe

10
  • This is the Principle of Maximum Likelihood
  • implies that
  • sd2 N-1 (d-Gm)T (d-Gm) N-1 eTe
  • Is a good posterior estimate of the variance of
    the data, when
  • the data follow a multivariate normal
    distribution
  • the data are uncorrelated and with uniform
  • (but unknown) variance, sd2

11
  • But back to the general case
  • What formula for m does the rule
  • Minimize (d-Gm)T Cd-1 (d-Gm)
  • imply ?

12
  • Trick
  • Minimize (d-Gm)T (d-Gm)
  • Implies m GTG-1 GT d
  • Now write, Minimize
  • (d-Gm)T Cd-1 (d-Gm)
  • (d-Gm)T Cd-1/2 Cd-1/2 (d-Gm)
  • (Cd-1/2d-Cd-1/2Gm)T (Cd-1/2d-Cd-1/2Gm)
  • (d-Gm)T (d-Gm)
  • with dCd-1/2d
  • G Cd-1/2G
  • This is simple least squares, so m GTG-1 GT
    d
  • or
  • m GTCd-1/2Cd-1/2G-1GTCd-1/2Cd-1/2d GT
    Cd-1G-1GTCd-1d

Symmetric, so it inverse and square root is
symmetric, too
13
  • So,
  • minimize (d-Gm)T Cd-1 (d-Gm)
  • implies
  • m GT Cd-1G-1GTCd-1d
  • and
  • Cm GTCd-1G-1GTCd-1 Cd GTCd-1G-1GTCd-1
    T
  • GTCd-1G-1GTCd-1G GTCd-1G-1 GTCd-1G-1

Remember formula Cm M Cd MT
14
Example with Correlated Noise
Uncorrelated Noise
Correlated Noise
15
Scatter Plots
di vs. di1 high correlation
di vs. di2 some correlation
di vs. di3 little correlation
16
data straight line correlated noise
d a bx n
x
17
Model for Cd Cd ij exp -c i-j with
c0.25exponential falloff from main diagonal
MatLab Code c 0.25 XX, YY meshgrid(
1N, 1N ) Cd (sd2)exp(-cabs(XX-YY))
18
Results
Intercept Correlated 10.96 20.6
Uncorrelated 8.42 7.9 True
1.0 Slope Correlated 1.92 0.35
Uncorrelated 1.97 0.14 True
2.0
d a bx n
Both fits about the same but
x
note error estimates are larger (more realistic
?) for the correlated case
19
How to make correlated noise
w 0.1, 0.3, 0.7, 1.0, 0.7, 0.3, 0.1' w
w/sum(w) Nw length(w) Nw2
(Nw-1)/2 N101 N2(N-1)/2 n1
random('Normal',0,1,NNw,1) n zeros(N,1) for
i -Nw2Nw2 n n w(iNw21)n1(iNw-Nw2
iNwN-1-Nw2) end
Define weighting function
Start with uncorrelated noise
Correlated noise is a weighted average of
neighboring uncorrelated noise values
20
  • Lets look at the transformations
  • dCd-1/2d
  • G Cd-1/2G
  • In the special case of uncorrelated data with
    different variances Cd diag( s12, s22, sN2)
  • disi-1 di multiply each data by
  • the reciprocal of its error
  • Gij si-1 Gij multiply each row of the
  • data kernel by the same
  • amount
  • Then solve by ordinary least squares

s12 0 0 0 s22 0 0 0 s32
...
21
s1-1G11 s1-1G12 s1-1G13 s2-1G21
s2-1G22 s2-1G13 s3-1G31 s3-1G32
s3-1G33 sN-1GN1 sN-1GN2 sN-1GN3
s1-1d1s2-1d2s3-1d3sN-1dN
m
Rows have been weighted by a factor of si-1
22
So this special case is often called
  • Weighted Least Squares

Note that the total error isE eT Cd-1 e Si
si-2 ei2
weight
Each individual error is weighted by the
reciprocal of its variance, so errors involving
data with SMALL variance get MORE weight
23
Example fitting a straight line
100 data, first 50 have a different sd than the
last 50
24
N101N2(N-1)/2sd(1N2-1) 5sd(N2N)
100sd2i sd.(-2)Cdi diag(sd2i)G(,1)on
es(N,1)G(,2)xGTCdiGIinv(G'CdiG)m
GTCdiGIG'Cdidd2 m(1) m(2) . x
MatLab Code
Note that Cd-1 is explicitly defines as a
diagonal matrix
25
Equal variance
Left 50 sd 5 right 50 sd 5
26
Left has smaller variance
first 50 sd 5 last 50 sd 100
27
Right has smaller variance
first 50 sd 100 last 50 sd 5
28
Finally, two miscellaneous comments about
least-squares
29
Comment 1Case of fitting functions to a
datasetdi m1 f1(xi) m2 f2(xi) m3 f3(xi)
e.g. di m1 sin(xi) m2 cos(xi) m3 sin(2xi)
30
f1(x1) f2(x1) f3(x1) f1(x2) f2(x2)
f3(x2) f1(x3) f2(x3) f3(x3)
f1(xN) f2(xN) f3(xN)
d1d2d3dN
m
31
Note that the matrix GTGhas element i,j
GTGij Si fi(xk)fj(xk) fi ? fjand thus is
diagonal if the functions are orthogonal
32
if the functions are normalized so fi?fi1 then
GTG I and the least squares solution ism
GTd and Cmsd2 I
super-simple formula! mi fi ? d
guaranteed uncorrelated errors!
33
Example of Straight line
yi a bxi implies f1(x) 1 and f2(x)
x so condition f1(x)?f2(x)0 implies Si xi 0
or x0 this happens when the xs straddle the
origin The choice f1(x) 1 and f2(x)
x-x i.e. y a b (x-x) leads to
uncorrelated errors in (a,b)
y
a
a
x
x1
x2
x3
x4
x5
x
34
Example wavelet functions
Localized oscillation with a character-istic
frequency
35
GTG
Almost diagonal
36
Comment 2sometimes writing least-squares as
GTG m GT d or GTG m GT d is more
useful than m GTG-1 GT dsince you can use
some method other than a matrix inverse for
solving the equation
Write a Comment
User Comments (0)
About PowerShow.com