QMDA - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

QMDA

Description:

95% of probability within 2s of the expected value. Properties of the ... Boxcar. its Fourier transform is a sinc function. which has a narrow central peak ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 55

Provided by: billm7

Learn more at: https://www.ldeo.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: QMDA

1
QMDA

Review Session

2
Things you should remember
3
1. Probability Statistics
4
the Gaussian or normal distributionp(x)
exp - (x-x)2 / 2s2 )
variance
expected value
1 ?(2p)s
5
Properties of the normal distribution
Expectation Median Mode x 95 of
probability within 2s of the expected value
p(x)
95
x
6
Multivariate DistributionsThe Covariance
Matrix, C, is very importantCijthe diagonal
elements give the variance of each xisxi2
Cii
7
The off-diagonal elemements of C indicate whether
pairs of xs are correlated. E.g.
C12
C12lt0 negative correlation
C12gt0 positive correlation
x2
x2
x1
x1
8

the multivariate normal distribution
p(x) (2?)-N/2 Cx-1/2 exp -1/2 (x-x)T Cx-1
(x-x)
has expectation x
covariance Cx
And is normalized to unit area

9
if y is linearly related to x, yMx then yMx
(rule for means) Cy M Cx MT(rule for
propagating error)These rules work regardless
of the distribution of x
10
2. Least Squares
11
Simple Least Squares

Linear relationship between data, d, and model, m
d Gm
Minimize prediction error EeTe with edobs-Gm
mest GTG-1GTd
If data are uncorrelated with variance, sd2, then
Cm sd2 GTG-1

12
Least Squares with prior constraints

Given uncorrelated with variance, sd2, that
satisfy a linear relationship d Gm
And prior information with variance, sm2, that
satisfy a linear relationship h Dm
The best estimate for the model parameters, mest,
solves

d eh
G eD
m
Previously, we discussed only the special case h0
With e sm/sd.
13
Newtons Method for Non-Linear Least-Squares
Problems

Given data that satisfies a non-linear
relationship
d g(m)
Guess a solution m(k) with k0 and linearize
around it
Dm m-m(k) and Dd d-g(m(k)) and DdGDm
With Gij ?gi/?mj evaluated at m(k)
Then iterate, m(k1) m(k) Dm with
DmGTG-1GTDd
hoping for convergence

14
3. Boot-straps
15
Investigate the statistics of y by creating
many datasets yand examining their
statisticseach y is created throughrandom
sampling with replacementof the original dataset
y
16
Example statistics of the mean of y, given N data
Random integers in the range 1-N
N original data
N resampled data
y1 y2 y3 y4 y5 y6 y7 yN
4 3 7 11 4 1 9 6
y1 y2 y3 y4 y5 y6 y7 yN
Compute estimate
1
Si yi
N
Now repeat a gazillion times and examine the
resulting distribution of estimates
17
4. Interpolation and Splines
18
linear splines
in this interval y(x) yi (yi1-yi)?(x-xi)/(xi
1-xi)
y
yi1
yi
1st derivative discontinuous here
x
xi
xi1
19
cubic splines
y
cubic abxcx2dx3 in this interval
yi1
a different cubic in this interval
yi
1st and 2nd derivative continuous here
x
xi
xi1
20
5. Hypothesis Testing
21

The Null Hypothesis
always a variant of this theme
the results of an experiment differs from the
expected value only because of random variation

Test of Significance of Results
say to 95 significance
The Null Hypothesis would generate the observed
result less than 5 of the time

23
Four important distributions

Normal distribution
Chi-squared distribution
Students t-distribution
F-distribution

Distribution of xi
Distribution of c2 Si1Nxi2
Distribution of t x0 / ? N-1 Si1Nxi2
Distribution of F N-1Si1N xi2 / M-1Si1M
xNi2
24
5 tests

mobs mprior when mprior and sprior are known
normal distribution
sobs sprior when mprior and sprior are known
chi-squared distribution
mobs mprior when mprior is known but sprior is
unknown
t distribution
s1obs s2obs when m1prior and m2prior are known
F distribution
m1obs m2obs when s1prior and s2prior are
unknown
modified t distribution

25
6. filters
26
Filtering operation g(t)f(t)h(t)convolution
g(t) ?-?t f(t-t) h(t) dt ? gk Dt Sp-?k
fk-p hp g(t) ?0? f(t) h(t-t) dt ? gk
Dt Sp0? fp hk-p
or alternatively
27
How to do convolution by hand
xx0, x1, x2, x3, x4, T and yy0, y1, y2, y3,
y4, T
Reverse on time-series, line them up as shown,
and multiply rows. This is first element of xy
x0, x1, x2, x3, x4,
?
y4, y3, y2, y1, y0
x0y0
xy1
Then slide, multiply rows and add to get the
second element of xy
x0, x1, x2, x3, x4,
?
?
y4, y3, y2, y1, y0
x0y1x1y0
xy2
And etc
28
Matrix formulations of g(t)f(t)h(t)
g F h
and
g H f
29
g H f
Least-squares equation HTH f HTg
X(0) X(1) X(2) X(N)
f0 f1 fN
A(0) A(1) A(2) A(1) A(0) A(1)
A(2) A(1) A(0) A(N)
A(N-1) A(N-2)

Autocorrelation of h
Cross-correlation of h and g
30
Ai and Xi

Auto-correlation of a time-series, T(t)
A(t) ?-?? T(t) T(t-t) dt
Ai Sj Tj Tj-i
Cross-correlation of two time-series T(1)(t) and
T(2)(t)
X(t) ?-?? T(1)(t) T(2)(t-t) dt
Xi Sj T(1)j T(2)j-i

31
7. fourier transforms and spectra
32

Integral transforms
C(w) ?-?? T(t) exp(-iwt) dt
T(t) (1/2p) ?-?? C(w) exp(iwt) dw
Discrete transforms (DFT)
Ck Sn0N-1 Tn exp(-2pikn/N ) with k0, , N-1
Tn N-1Sk0N-1 Ck exp(2pikn/N ) with n0, ,
N-1
Frequency step DwDt 2p/N
Maximum (Nyquist) Frequency wmax 1/ (2Dt)

33
Aliasing and cyclicityin a digital world wnN
wn andsince time and frequency play
symmetrical roles in exp(-iwt) tkN tk
34
One FFT that you should know FFT of a spike
at t0 is a constant

C(w) ?-?? d(t) exp(-iwt) dt exp(0) 1

35
Error Estimates for the DFT

Assume uncorrelated, normally-distributed data,
dnTn, with variance sd2
The matrix G in Gmd is GnkN-1 exp(2pikn/N )
The problem Gmd is linear, so the unknowns,
mkCk, (the coefficients of the complex
exponentials) are also normally-distributed.
Since exponentials are orthogonal, GHGN-1I is
diagonal
and Cm sd2 GHG-1 N-1sd2I is diagonal, too
Apportioning variance equally between real and
imaginary parts of Cm, each has variance s2
N-1sd2/2.
The spectrum sm2 Crm2 Cim2 is the sum of two
uncorrelated, normally distributed random
variables and is thus c22-distributed.
The 95 value of c22 is about 5.9, so that to be
significant, a peak must exceed 5.9N-1sd2/2

Convolution Theorem
transform f(t)g(t)
transformg(t) ? transformf(t)

37
Power spectrum of a stationary time-series

T(t) stationary time series
C(w) ?-T/2T/2 T(t) exp(-iwt) dt
S(w) limT?? T-1 C(w)2
S(w) is called the power spectral density, the
spectrum normalized by the length of the time
series.

38
Relationship of power spectral density to DFT

To compute the Fourier transform, C(w), you
multiply the DFT coefficients, Ck, by Dt.
So to get power spectal density
T-1 C(w)2
(NDt)-1 Dt Ck2
(Dt/N) Ck2
You multiply the DFT spectrum, Ck2, by Dt/N.

39
Windowed Timeseries

Fourier transform of long time-series
convolved with the Fourier Transform of the
windowing function
is Fouier transform of windowed time-series

40
Window Functions

Boxcar
its Fourier transform is a sinc function
which has a narrow central peak
but large side lobes
Hanning (Cosine) taper
its Fourier transform
has a somewhat wider central peak
but now side lobes

41
8. EOFs and factor analysis
42
SamplesN?M
Representation of samples as a linear mixing of
factors
S C F
(f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2)
(f2 in s2) (f3 in s2) (f1 in s3) (f2 in
s3) (f3 in s3) (f1 in sN) (f2 in sN)
(f3 in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2) (A in f3) (B in f3)
(C in f3)

Factors M?M
Coefficients N?M
43
SamplesN?M
data approximated with only most important
factorsp most important factors those with
the biggest coefficients
S ? C F
(f1 in s1) (f2 in s1) (f1 in s2) (f2 in
s2) (f1 in s3) (f2 in s3) (f1 in sN) (f2
in sN)
(A in s1) (B in s1) (C in s1) (A in s2)
(B in s2) (C in s2) (A in s3) (B in s3)
(C in s3) (A in sN) (B in sN) (C in sN)
(A in f1) (B in f1) (C in f1) (A in f2)
(B in f2) (C in f2)

ignore f3
ignore f3
selectedcoefficients N?p
selectedfactors p?M
44
Singular Value Decomposition (SVD)Any N?M
matrix S and be written as the product of three
matricesS U L VTwhere U is N?N and
satisfies UTU UUTV is M?M and satisfies VTV
VVTandL is an N?M diagonal matrix of singular
values, li
45
SVD decomposition of SS U L VT write asS
U L VT U L VT C FSo the coefficients
are C U Land the factors areF VTThe
factors with the biggest lis are the most
important
46
Transformations of Factors

If you chose the p most important factors, they
define both a subspace in which the samples must
lie, and a set of coordinate axes of that
subspace. The choice of axes is not unique, and
could be changed through a transformation, T
Fnew T Fold
A requirement is that T-1 exists, else Fnew will
not span the same subspace as Fold
S C F C I F (C T-1) (T F) Cnew Fnew
So you could try to implement the desirable
factors by designing an appropriate
transformation matrix, T

47
9. Metropolis Algorithm and Simulated Annealing
48
Metropolis Algorithma method to generate a
vector x of realizations of the distribution p(x)
49
The process is iterativestart with an x, say
x(i)then randomly generate another x in its
neighborhood, say x(i1), using a distribution
Q(x(i1)x(i))then test whether you will accept
the new x(i1)if it passes, you append x(i1)
to the vector x that you are accumulatingif it
fails, then you append x(i)
50
a reasonable choice for Q(x(i1)x(i)) normal
distribution with meanx(i) and sx2 that
quantifies the sense of neighborhood The
acceptance test is as followsfirst compute the
quantify If agt1 always accept x(i1)If alt1
accept x(i1) with a probability of a and
accept x(i) with a probability of 1-a
51
Simulated AnnealingApplication of Metropolis to
Non-linear optimizationfind m that minimizes
E(m)eTewhere e dobs-g(m)
52
Based on using the Boltzman distribution for p(x)
in the Metropolis Algorithmp(x)
exp-E(m)/Twhere temperature, T, is slowly
decreased during the iterations
53
10. Some final words
54
Start Simple !

Examine a small subset of your data and looking
them over carefully
Build processing scripts incrementally, checking
intermediated results at each stage
Make lots of plots and look them over carefully
Do reality checks

Write a Comment

User Comments (0)