Title: The Subjective Experience of Feelings Past
1- The General Linear Model
- Part IV
- (Generalized least squares)
- Autocorrelation
- Timeseries models
- Variance estimation and ReML
2Time Series Analysis
A time series is an ordered sequence of
observations.
We are interested in discrete equally spaced time
series.
3Stationary Time Series
- Same population mean and variance across time
Let Xt be a time series with
The mean function of Xt is
The covariance function of Xt is
for all r and s.
4Autocovariance and autocorrelation
Let Xt be a stationary time series.
The autocovariance function (ACVF) of Xt at lag
h is
The autocorrelation function (ACF) of Xt at lag
h is
5White Noise
A sequence of uncorrelated random variables, each
with mean 0 and variance ?2. We write Zt
WN(0,?2).
Gaussian White Noise
6Temporal Noise Modelingwith GLM
- WN implies independent errors
- Cov(?i,?j,) 0, i?j
- Not that data point i tells you nothing about
jrather, error at i tells you nothing about j - fMRI data exhibits temporal autocorrelation
- Attributed to physiological scanner instability
- What happens if we assume a white-noise model
when the noise isnt white?
7Temporal autocorrelation
- Example of autocorrelated timeseries
Power spectrum
Power spectrum
ACF
Red low-frequency part
8Temporal Autocorrelation
- Characteristics
- White noise? No...
PowerSpectrum
AutocorrelationFunction (ACF)
White noise would mean flat spectrum and zero-ACF
9fMRI Temporal Autocorrelation
- Characteristics
- 1/f in frequency domain
- Nearby time-points exhibit positive correlation
PowerSpectrum
AutocorrelationFunction (ACF)
Nearby time-points positively correlated
Excess Low Freq. variability
10Implications of fMRI noise structure
- 1) Lots of slow drift thats going to add noise
to your estimates unless you remove it - 2) Independence assumption is violated in fMRI
data. Single subject statistics are not valid
without an accurate model of the noise!
11Drift Modeling Issues
- fMRI drifts slowly over time 1/f noise
(Zarahn, 1997) - Experimental conditions that vary slowly will be
very confusable with drift. - Experimental designs should use high frequencies
(more rapid alternations of stimulus on/off
states) - A Horrible Design is shown below
Typical DriftPattern Magnitude
Typical SignalMagnitude
... youll never detect this signal, due to the
drift
Even if you did get a Task-Rest difference, would
you believe it wasnt just due to slowly drifting
noise?
12Accounting for drift in the GLM
- Drift
- Slowly varying
- Nuisance variability
- Models
- Any slowly varying curve
- Splines
- Linear, quadratic
- Discrete Cosine Transform
Discrete Cosine Transform Basis
13A New Model
b1 b2 b3 b4 b5 b6 b7 b8 b9
e
b
Y
X
From T. Nichols 2005
14High pass filtering example
15High-pass filtering
Frequency domain
fMRI Noise Time domain
Unfiltered
Filtered
16Temporal AutocorrelationModeling Approaches
- Independence
- Precoloring
- Prewhitening
17Autocorrelation Independence Model
- Ignore autocorrelation
- OK for multi-subject approach
- Results still valid at group level if using
summary-statistic approach - Bad for single-subject statistics
- Under-estimation of variance
- Over-estimation of significance
- Ts too big, Ps too small
- Too many false positives (at individual subject
level)
18Autocorrelation Independence Model
- ...and yes, it really does make a difference...
19AutocorrelationPrecoloring (Historical)
- Temporally blur, smooth your data
- This induces more dependence!
- But we exactly know the form of the dependence
induced - Assume that intrinsic autocorrelation is
negligible relative to smoothing - Correct GLM inferences based on known
autocorrelation - Used in SPM99... now not advisable
- Especially bad for event-related designs
Friston, et al., To smooth or not to smooth
NI 12196-208 2000
20Modeling autocorrelated noise
- Generalized least squares
- Estimate form of noise and whiten it remove
estimated correlation using linear algebra - Models
- AR(1) SPM5, AR(1)WN, ARMA(1,1)
- Arbitrary autocorrelation function (ACF) FSL
21AutocorrelationPrewhitening
- Statistically optimal solution
- If know true autocorrelation exactly, canundo
the dependence - De-correlate your data, your model
- Then proceed as with independent data
- Problem is obtaining accurate estimates of
autocorrelation - Timeseries models Generalized least squares
- Some sort of regularization is helpful
- Spatial smoothing of some sort
22Autocorrelation Models
- Autoregressive
- Error is fraction of previous error plus new
error - AR(1) ?i ??i-1 ?i
- Software fmristat, SPM99, SPM2, SPM5
- AR White Noise or ARMA(1,1)
- AR plus an independent WN series
- Software SPM2b
- Arbitrary autocorrelation function
- ?k corr( ?i, ?i-k )
- Software FSLs FEAT
- Also Moving average (MA), autoregressive moving
average (ARMA)
23AR Process
New error (innovation, Z) added at each time
point total signal (X) is propagated in time.
Xt is a autoregressive process of order p,
AR(p), if
where Zt WN(0,?) and are
constants.
24AR Process
So a higher-order AR model is more flexible And
thus better.right?
25Why bigger is not always better
- I generated the data below using AR(1), phi
.8, so I know the truth (black line below)
Blue Estimated ACF Black true ACF Red AR(1) Fit
26MA Process
New error (innovation, Z) added at each time
point prior innovations (Z) are propagated in
time.
Xt is a called a moving-average process of
order q, or MA(q), if
where Zt WN(0,?2) and are constants.
27Ex. MA(1)
The ACF for an MA(1) process is given by
?-0.8
?0.8
28Generalized least squares
- First lets assume we know autocorrelation
function (ACF) exactly - Let V be a t x t matrix that describes the
timeseries autocorrelation for each of t
observations
The format of V will depend on the noise model
used.
IID Case
AR(1) Case
29Optimal Estimates with Prewhitening
Since V describes error covariance (squared
error), we can whiten by pre-multiplying all
parts of the linear model with V1/2, and then use
OLS
or
This is equivalent to
or
- If Var(?) V?2 ? I?2, then Generalized Least
Squares (GLS) is optimal
30Estimating the Variance
Even if we assume we know V, we still need to
estimate the residual variance, ?2.
Our estimate of the variance is based on the
residuals.
Calculate the residuals.
Find the residual-inducing matrix R such that r
RY
Then
Note for OLS
31Estimating the Variance
The expected value of the squared residuals is
given by
An unbiased estimate of the standard deviation is
given by
32Estimating the Variance
Note that if V is the identity matrix
and
The distribution of
is ?2 with N-p degrees of freedom.
33Estimating the degrees of freedom
For IID noise (spherical error covariance),
degrees of freedom are N - p (observations -
parameters) If not, then there are fewer degrees
of freedom, which we can estimate using the
Satterthwaite approximation
The degrees of freedom are
34How to use an AR model in the GLS
- AR models assume a specific form of the
autocorrelation errors are propagated in time - AR(1) ?i ??i-1 ?i
- Software fmristat, SPM99, SPM2, SPM5
- Need to estimate the variance, autocovariance
magnitude (AR phi parameter), and degrees of
freedom - Need iterative estimation procedure.
35GLS Estimating V
Typically, the betas are unknown. Because of
whitening, the betas depend on the residual
variance (sigma2V) But the form of the
covariance matrix V is also unknown, and depends
on betas.
- Pre-coloring SPM99
- Smooth with V and then assume you know V
- Iterative methods
- Cochrane-Orcutt Procedure
- Fisher scoring
- E-M Algorithm
- Iterative generalized least squares (IGLS)
- Markov Chain Monte Carlo (MCMC)
36What is ReML?
- What is the right estimate of V and sigma2?
- Simple estimate of variance average squared
deviation
This is the ReML estimator!
37What is ReML?
- ReML, restricted maximum likelihood, is an
unbiased method of estimating the variance from
the residuals - Several algorithms can be used to iteratively
provide estimates of the betas and ReML variances - E-M is the one that SPM uses.
38ReML
The parameters ?i and ? are estimated using a
Restricted Maximum Likelihood (ReML) procedure
where the model parameters and the variance
components are re-estimated iteratively.
It can be shown (Harville 1974) that the
restricted log-likelihood is given by
where ? are parameters associated with V.
39ML vs ReML
- Maximum Likelihood
- Maximize likelihood of data y
- Used to estimate mean parameters ?
- But can produce biased estimates of variance
- E.g. ML one-sample variance estimate is
- Restricted Maximum Likelihood
- Maximize likelihood of residuals e y - Xb
- Used to estimate variance parameters, ?, ?
- E.g. ReML one-sample variance estimate is
40SPM approach
In SPM the covariance matrix is estimated using
two global parameters ?1 and ?2, and one local
parameter ?.
The ReML estimates are calculated using only data
from responsive voxels. These are voxels with
large F-statistics (p lt .001) The data from
these voxels are pooled and V is estimated using
this pooled data.
41SPM2/5s AR model
- AR(1), exactly Cor(?i,?j) ? i-j
- Problem This is non-linear in ?
- Time consuming to compute
The correlation matrix is expressed as a linear
combination of known matrices,
where Qj represents some basis set for the
covariance matrices.
In single-subject fMRI studies V is typically
expressed as the linear combination of two
matrices.
42SPM2/5s AR Model
- AR(1), approx Cor(?i,?j) ? ?0i-ji-j?0i-j-
1(?0-?) - Assume fixed scalar ?0 (0.2) that describes the
shape of the ACF - ?0 is scaled linearly by ?
- This approximation is now linear in ?
- ?ks estimatedwith ReML
?0i-j
i-j?0i-j-1
43SPM2 Autocorrelation ModelKey Details
- Only most significant voxels used to estimate
Cov(?) - F stat Plt0.001
- Once Cov(?) estimated, same estimate used over
whole brain - Autocorrelation pooling Assumes same over
brain! - c.f. FSL, which estimates ACF locally
- Requires two passes
- 1st pass OLS, then est. Cov(?) and V, via REML,
at selected voxels - 2nd pass GLS/whitening using V.
44ReML
- Maximum likelihood methods seek to find the
parameters that maximize the likelihood of
observing the data y. - They can produce biased estimates of variance.
- Restricted maximum likelihood methods seek to
find the parameters that maximize the likelihood
of observing the residuals. - They tend to be more appropriate for estimating
variance parameters such as ? and ?i.
45ReML
- Why are maximum likelihood estimates of variance
biased? - Say we have N data points and k predictors in our
model (e.g., mean, or regressors in multiple
regression) - Once you fit the model, the residuals live in (N
- k) dimensional space - Should divide sum of squared residuals by N - k
to estimate residual variance - MLE divides by N
46Estimating Covariance withRestricted Maximum
Likelihood
Covariance Model
observed
ReML/EM
estimated correlation matrix
47Should you model autocorrelation in SPM?
- Necessary for single-subject analysis
- Optimally efficient in group analysis IF
assumptions are met - V can be captured by sum of two basis matrices
with rho .2 - This holds over the whole brain
- Estimation algorithm converges on correct
solution - More likely than not, these assumptions are not
met.
48Autocorrelation Summary
49Modeling autocorrelation Key points
- You do not need to model autocorrelation if
youre doing a group analysis - You do need to for individual subject analyses,
which are often dicey anyway due to artifacts - SPM2/5 uses approximation of AR(1) and also pools
estimates across the brain. This will only
improve your stats if the AR model fits your data
very well.
50What are the problems of using the GLM for fMRI
data?
- Unknown neural response shape
- Solution Assume one (epoch or event)
- 2. BOLD responses have a delayed and dispersed
form. - Solution Use an HRF function
3. The hemodynamic response may differ in shape
across the brain. Solution Use basis
functions 4. The BOLD signal includes
substantial amounts of low-frequency noise.
Solution Use high-pass filtering 5. The data
are temporally autocorrelated this violates the
assumptions of the GLM Solution Use
generalized least squares
51End of this section.
52Algorithm
Repeat steps 1-6 until convergence.
1. 2. 3. 4. 5. 6.