Title: QMDA
1QMDA
2Things you should remember
31. Probability Statistics
4the Gaussian or normal distributionp(x)
exp - (x-x)2 / 2s2 )
variance
expected value
1 ?(2p)s
5Properties of the normal distribution
Expectation Median Mode x 95 of
probability within 2s of the expected value
p(x)
95
x
6Multivariate DistributionsThe Covariance
Matrix, C, is very importantCijthe diagonal
elements give the variance of each xisxi2
Cii
7The off-diagonal elemements of C indicate whether
pairs of xs are correlated. E.g.
C12
C12lt0 negative correlation
C12gt0 positive correlation
x2
x2
x1
x1
8- the multivariate normal distribution
- p(x) (2?)-N/2 Cx-1/2 exp -1/2 (x-x)T Cx-1
(x-x) - has expectation x
- covariance Cx
- And is normalized to unit area
9if y is linearly related to x, yMx then yMx
(rule for means) Cy M Cx MT(rule for
propagating error)These rules work regardless
of the distribution of x
102. Least Squares
11Simple Least Squares
- Linear relationship between data, d, and model, m
- d Gm
- Minimize prediction error EeTe with edobs-Gm
- mest GTG-1GTd
- If data are uncorrelated with variance, sd2, then
- Cm sd2 GTG-1
12Least Squares with prior constraints
- Given uncorrelated with variance, sd2, that
satisfy a linear relationship d Gm - And prior information with variance, sm2, that
satisfy a linear relationship h Dm - The best estimate for the model parameters, mest,
solves
d eh
G eD
m
Previously, we discussed only the special case h0
With e sm/sd.
13Newtons Method for Non-Linear Least-Squares
Problems
- Given data that satisfies a non-linear
relationship - d g(m)
- Guess a solution m(k) with k0 and linearize
around it - Dm m-m(k) and Dd d-g(m(k)) and DdGDm
- With Gij ?gi/?mj evaluated at m(k)
- Then iterate, m(k1) m(k) Dm with
DmGTG-1GTDd - hoping for convergence
143. Boot-straps
15Investigate the statistics of y by creating
many datasets yand examining their
statisticseach y is created throughrandom
sampling with replacementof the original dataset
y
16Example statistics of the mean of y, given N data
Random integers in the range 1-N
N original data
N resampled data
y1 y2 y3 y4 y5 y6 y7 yN
4 3 7 11 4 1 9 6
y1 y2 y3 y4 y5 y6 y7 yN
Compute estimate
1
Si yi
N
Now repeat a gazillion times and examine the
resulting distribution of estimates
174. Interpolation and Splines
18linear splines
in this interval y(x) yi (yi1-yi)?(x-xi)/(xi
1-xi)
y
yi1
yi
1st derivative discontinuous here
x
xi
xi1
19cubic splines
y
cubic abxcx2dx3 in this interval
yi1
a different cubic in this interval
yi
1st and 2nd derivative continuous here
x
xi
xi1
205. Hypothesis Testing
21- The Null Hypothesis
- always a variant of this theme
- the results of an experiment differs from the
expected value only because of random variation
22- Test of Significance of Results
- say to 95 significance
- The Null Hypothesis would generate the observed
result less than 5 of the time
23Four important distributions
- Normal distribution
- Chi-squared distribution
- Students t-distribution
- F-distribution
Distribution of xi
Distribution of c2 Si1Nxi2
Distribution of t x0 / ? N-1 Si1Nxi2
Distribution of F N-1Si1N xi2 / M-1Si1M
xNi2
245 tests
- mobs mprior when mprior and sprior are known
- normal distribution
- sobs sprior when mprior and sprior are known
- chi-squared distribution
- mobs mprior when mprior is known but sprior is
unknown - t distribution
- s1obs s2obs when m1prior and m2prior are known
- F distribution
- m1obs m2obs when s1prior and s2prior are
unknown - modified t distribution
256. filters
26Filtering operation g(t)f(t)h(t)convolution
g(t) ?-?t f(t-t) h(t) dt ? gk Dt Sp-?k
fk-p hp g(t) ?0? f(t) h(t-t) dt ? gk
Dt Sp0? fp hk-p
or alternatively
27How to do convolution by hand
xx0, x1, x2, x3, x4, T and yy0, y1, y2, y3,
y4, T
Reverse on time-series, line them up as shown,
and multiply rows. This is first element of xy
x0, x1, x2, x3, x4,
?
y4, y3, y2, y1, y0
x0y0
xy1
Then slide, multiply rows and add to get the
second element of xy
x0, x1, x2, x3, x4,
?
?
y4, y3, y2, y1, y0
x0y1x1y0
xy2
And etc
28Matrix formulations of g(t)f(t)h(t)
g F h
and
g H f
29g H f
Least-squares equation HTH f HTg
X(0) X(1) X(2) X(N)
f0 f1 fN
A(0) A(1) A(2) A(1) A(0) A(1)
A(2) A(1) A(0) A(N)
A(N-1) A(N-2)
Autocorrelation of h
Cross-correlation of h and g
30Ai and Xi
- Auto-correlation of a time-series, T(t)
- A(t) ?-?? T(t) T(t-t) dt
- Ai Sj Tj Tj-i
- Cross-correlation of two time-series T(1)(t) and
T(2)(t) - X(t) ?-?? T(1)(t) T(2)(t-t) dt
- Xi Sj T(1)j T(2)j-i
317. fourier transforms and spectra
32- Integral transforms
- C(w) ?-?? T(t) exp(-iwt) dt
- T(t) (1/2p) ?-?? C(w) exp(iwt) dw
- Discrete transforms (DFT)
- Ck Sn0N-1 Tn exp(-2pikn/N ) with k0, , N-1
- Tn N-1Sk0N-1 Ck exp(2pikn/N ) with n0, ,
N-1 - Frequency step DwDt 2p/N
- Maximum (Nyquist) Frequency wmax 1/ (2Dt)
33Aliasing and cyclicityin a digital world wnN
wn andsince time and frequency play
symmetrical roles in exp(-iwt) tkN tk
34One FFT that you should know FFT of a spike
at t0 is a constant
- C(w) ?-?? d(t) exp(-iwt) dt exp(0) 1
35Error Estimates for the DFT
- Assume uncorrelated, normally-distributed data,
dnTn, with variance sd2 - The matrix G in Gmd is GnkN-1 exp(2pikn/N )
- The problem Gmd is linear, so the unknowns,
mkCk, (the coefficients of the complex
exponentials) are also normally-distributed. - Since exponentials are orthogonal, GHGN-1I is
diagonal - and Cm sd2 GHG-1 N-1sd2I is diagonal, too
- Apportioning variance equally between real and
imaginary parts of Cm, each has variance s2
N-1sd2/2. - The spectrum sm2 Crm2 Cim2 is the sum of two
uncorrelated, normally distributed random
variables and is thus c22-distributed. - The 95 value of c22 is about 5.9, so that to be
significant, a peak must exceed 5.9N-1sd2/2
36- Convolution Theorem
- transform f(t)g(t)
- transformg(t) ? transformf(t)
37Power spectrum of a stationary time-series
- T(t) stationary time series
- C(w) ?-T/2T/2 T(t) exp(-iwt) dt
- S(w) limT?? T-1 C(w)2
- S(w) is called the power spectral density, the
spectrum normalized by the length of the time
series.
38Relationship of power spectral density to DFT
- To compute the Fourier transform, C(w), you
multiply the DFT coefficients, Ck, by Dt. - So to get power spectal density
- T-1 C(w)2
- (NDt)-1 Dt Ck2
- (Dt/N) Ck2
-
- You multiply the DFT spectrum, Ck2, by Dt/N.
39Windowed Timeseries
- Fourier transform of long time-series
- convolved with the Fourier Transform of the
windowing function - is Fouier transform of windowed time-series
40Window Functions
- Boxcar
- its Fourier transform is a sinc function
- which has a narrow central peak
- but large side lobes
- Hanning (Cosine) taper
- its Fourier transform
- has a somewhat wider central peak
- but now side lobes
418. EOFs and factor analysis
42SamplesN?M
Representation of samples as a linear mixing of
factors
S C F
(f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2)
(f2 in s2) (f3 in s2) (f1 in s3) (f2 in
s3) (f3 in s3) (f1 in sN) (f2 in sN)
(f3 in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)
Factors M?M
Coefficients N?M
43SamplesN?M
data approximated with only most important
factorsp most important factors those with
the biggest coefficients
S ? C F
(f1 in s1) (f2 in s1) (f1 in s2) (f2 in
s2) (f1 in s3) (f2 in s3) (f1 in sN) (f2
in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2)
ignore f3
ignore f3
selectedcoefficients N?p
selectedfactors p?M
44Singular Value Decomposition (SVD)Any N?M
matrix S and be written as the product of three
matricesS U L VTwhere U is N?N and
satisfies UTU UUTV is M?M and satisfies VTV
VVTandL is an N?M diagonal matrix of singular
values, li
45SVD decomposition of SS U L VT write asS
U L VT U L VT C FSo the coefficients
are C U Land the factors areF VTThe
factors with the biggest lis are the most
important
46Transformations of Factors
- If you chose the p most important factors, they
define both a subspace in which the samples must
lie, and a set of coordinate axes of that
subspace. The choice of axes is not unique, and
could be changed through a transformation, T - Fnew T Fold
- A requirement is that T-1 exists, else Fnew will
not span the same subspace as Fold - S C F C I F (C T-1) (T F) Cnew Fnew
- So you could try to implement the desirable
factors by designing an appropriate
transformation matrix, T
479. Metropolis Algorithm and Simulated Annealing
48Metropolis Algorithma method to generate a
vector x of realizations of the distribution p(x)
49The process is iterativestart with an x, say
x(i)then randomly generate another x in its
neighborhood, say x(i1), using a distribution
Q(x(i1)x(i))then test whether you will accept
the new x(i1)if it passes, you append x(i1)
to the vector x that you are accumulatingif it
fails, then you append x(i)
50a reasonable choice for Q(x(i1)x(i)) normal
distribution with meanx(i) and sx2 that
quantifies the sense of neighborhood The
acceptance test is as followsfirst compute the
quantify If agt1 always accept x(i1)If alt1
accept x(i1) with a probability of a and
accept x(i) with a probability of 1-a
51Simulated AnnealingApplication of Metropolis to
Non-linear optimizationfind m that minimizes
E(m)eTewhere e dobs-g(m)
52Based on using the Boltzman distribution for p(x)
in the Metropolis Algorithmp(x)
exp-E(m)/Twhere temperature, T, is slowly
decreased during the iterations
5310. Some final words
54Start Simple !
- Examine a small subset of your data and looking
them over carefully - Build processing scripts incrementally, checking
intermediated results at each stage - Make lots of plots and look them over carefully
- Do reality checks