Title: DCM
1Classical (frequentist) inference
Klaas Enno Stephan Laboratory for Social and
Neural System Research Institute for Empirical
Research in Economics University of
Zurich Functional Imaging Laboratory
(FIL) Wellcome Trust Centre for
Neuroimaging University College London
With many thanks for slides images to FIL
Methods group
Methods models for fMRI data analysis in
neuroeconomics14 October 2009
2Overview of SPM
Statistical parametric map (SPM)
Design matrix
Image time-series
Kernel
Realignment
Smoothing
General linear model
Gaussian field theory
Statistical inference
Normalisation
p lt0.05
Template
Parameter estimates
3Voxel-wise time series analysis
Time
Time
BOLD signal
single voxel time series
SPM
4Overview
- A recap of model specification and parameter
estimation - Hypothesis testing
- Contrasts and estimability
- T-tests
- F-tests
- Design orthogonality
- Design efficiency
5Mass-univariate analysis voxel-wise GLM
X
y
- Model is specified by
- Design matrix X
- Assumptions about e
N number of scans p number of regressors
The design matrix embodies all available
knowledge about experimentally controlled factors
and potential confounds.
6Parameter estimation
Objective estimate parameters to minimize
y
X
Ordinary least squares estimation (OLS) (assuming
i.i.d. error)
7OLS parameter estimation
The Ordinary Least Squares (OLS) estimators are
These estimators minimise
. They are found solving either
or
Under i.i.d. assumptions, the OLS estimates
correspond to ML estimates
NB precision of our estimates depends on design
matrix!
8Maximum likelihood (ML) estimation
probability density function (? fixed!)
likelihood function (y fixed!)
ML estimator
For cov(e)s2I, the ML estimator is equivalent to
the OLS estimator
OLS
For cov(e)s2V, the ML estimator is equivalent to
a weighted least sqaures (WLS) estimate (with
WV-1/2)
WLS
9SPM t-statistic based on ML estimates
c 1 0 0 0 0 0 0 0 0 0 0
For brevity
ReML-estimates
10Statistic
- A statistic is the result of applying a function
to a sample (set of data). - More formally, statistical theory defines a
statistic as a function of a sample where the
function itself is independent of the sample's
distribution. The term is used both for the
function and for the value of the function on a
given sample. - A statistic is distinct from an unknown
statistical parameter, which is a population
property and can only be estimated approximately
from a sample. A statistic used to estimate a
parameter is called an estimator. For example,
the sample mean is a statistic and an estimator
for the population mean, which is a parameter.
11Hypothesis testing
To test an hypothesis, we construct a test
statistic.
- Null hypothesis H0 there is no effect ?
cT? 0 - This is what we want to disprove.
- ? The alternative hypothesis H1 represents
the outcome of interest.
- The test statistic T
- The test statistic summarises the evidence
for H0. - Typically, the test statistic is small in
magnitude when H0 is true and large when H0 is
false. - ? We need to know the distribution of T
under the null hypothesis.
12Hypothesis testing
- Type I Error a
- Acceptable false positive rate a.
- Threshold ua controls the false positive
rate
u?
?
- Observation of test statistic t, a realisation of
T - A p-value summarises evidence against H0.
- This is the probability of observing t, or a
more extreme value, under the null hypothesis
Null Distribution of T
t
- The conclusion about the hypothesis
- We reject H0 in favour of H1 if t gt ua
p
Null Distribution of T
13Types of error
Actual condition
H0 true
H0 false
Reject H0
True positive
Test result
Fail to reject H0
True negative
14One cannot accept the null hypothesis(one can
just fail to reject it)
- Absence of evidence is not evidence of absence!
- If we do not reject H0, then all can we say is
that there is not enough evidence in the data to
reject H0. This does not mean that we can accept
H0.
- What does this mean for neuroimaging results
based on classical statistics? - A failure to find an activation in a particular
area does not mean we can conclude that this area
is not involved in the process of interest.
15Contrasts
- We are usually not interested in the whole ?
vector.
- A contrast cT? selects a specific effect of
interest ? a contrast vector c is a vector of
length p - ? cT? is a linear combination of regression
coefficients ?
cT 1 0 0 0 0
cTß 1b1 0b2 0b3 0b4 0b5 . . .
cT 0 -1 1 0 0
cTß 0b1 -1b2 1b3 0b4 0b5 . . .
NB the precision of our estimates depends on
design matrix and the chosen contrast !
16Estimability of a contrast
- If X is not of full rank then different
parameters can give identical predictions. - The parameters are therefore non-unique,
non-identifiable or non-estimable. - For such models, XTX is not invertible so we must
resort to generalised inverses (SPM uses the
Moore-Penrose pseudo-inverse). - This gives a parameter vector that has the
smallest norm of all possible solutions.
One-way ANOVA(unpaired two-sample t-test)
1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0
1 1 0 1 1
Rank(X)2
- Even parameters are non-estimable, certain
contrasts may well be! Examples
1 0 0, 0 1 0, 0 0 1 are not
estimable. 1 0 1, 0 1 1, 1 -1 0, 0.5
0.5 1 are estimable.
17Student's t-distribution
- first described by William Sealy Gosset, a
statistician at the Guinness brewery at Dublin - t-statistic is a signal-to-noise measure t
effect / standard deviation - t-distribution is an approximation to the normal
distribution for small samples - t-contrasts are simply combinations of the
betas? the t-statistic does not depend on the
scaling of the regressors or on the scaling of
the contrast - Unilateral test vs.
18t-contrasts SPMt
box-car amplitude gt 0 ? H1 cTb gt 0 ?
Question
cT 1 0 0 0 0 0 0 0
b1 b2 b3 b4 b5 ...
H0 cTb0
Null hypothesis
Test statistic
19t-contrasts in SPM
For a given contrast c
ResMS image
beta_???? images
spmT_???? image
con_???? image
SPMt
20t-contrast a simple example
Passive word listening versus rest
cT 1 0
Q activation during listening ?
Null hypothesis
21F-test the extra-sum-of-squares principle
- Model comparison Full vs. reduced model
Null Hypothesis H0 True model is X0 (reduced
model)
F-statistic ratio of unexplained variance under
X0 and total unexplained variance under the full
model
?1 rank(X) rank(X0) ?2 N rank(X)
Full model (X0 X1)?
22F-test multidimensional contrasts SPMF
- Tests multiple linear hypotheses
H0 True model is X0
H0 b3 b4 ... b9 0
test H0 cTb 0 ?
X1 (b3-9)
X0
X0
SPMF6,322
Full model?
Reduced model?
23F-contrast in SPM
ResMS image
beta_???? images
spmF_???? images
ess_???? images
( RSS0 - RSS )
SPMF
24F-test example movement related effects
To assess movement-related activation There is
a lot of residual movement-related artifact in
the data (despite spatial realignment), which
tends to be concentrated near the boundaries of
tissue types. By including the realignment
parameters in our design matrix, we can regress
out linear components of subject movement,
reducing the residual error, and hence improve
our statistics for the effects of interest.
25Differential F-contrasts
Think of it as constructing 3 regressors from the
3 differences and complement this new design
matrix such that data can be fitted in the same
exact way (same error, same fitted data).
26F-test a few remarks
- F-tests can be viewed as testing for the
additional variance explained by a larger model
wrt. a simpler (nested) model ? model comparison - F tests a weighted sum of squares of one or
several combinations of the regression
coefficients b. - In practice, partitioning of X into X0 X1 is
done by multidimensional contrasts.
Null hypothesis H0 ß1 ß2 ... ßp
0 Alternative hypothesis H1 At least one ßk ? 0
- F-tests are not directionalWhen testing a
uni-dimensional contrast with an F-test, for
example b1 b2, the result will be the same as
testing b2 b1.
27Example a suboptimal model
? Test for the green regressor not significant
28Example a suboptimal model
b1 0.22 b2 0.11
Residual Var. 0.3 p(Y b1 0) ? p-value 0.1
(t-test) p(Y b1 0) ? p-value 0.2
(F-test)
e
Y
X b
29A better model
? t-test of the green regressor almost
significant ? F-test very significant ? t-test of
the red regressor very significant
30A better model
b1 0.22 b2 2.15 b3 0.11
Residual Var. 0.2 p(Y b1 0) ? p-value
0.07 (t-test) p(Y b1 0, b2 0) ? p-value
0.000001 (F-test)
e
Y
X b
31Correlation among regressors
y
x2
x2
x1
Correlated regressors explained variance is
shared between regressors
When x2 is orthogonalized with regard to x1, only
the parameter estimate for x1 changes, not that
for x2!
32Design orthogonality
- For each pair of columns of the design matrix,
the orthogonality matrix depicts the magnitude of
the cosine of the angle between them, with the
range 0 to 1 mapped from white to black. - The cosine of the angle between two vectors a and
b is obtained by
- If both vectors have zero mean then the cosine of
the angle between the vectors is the same as the
correlation between the two variates.
33Correlated regressors
True signal
Model (green and red)
Fit (blue total fit)
Residual
34Correlated regressors
b1 0.79 b2 0.85 b3 0.06
Residual var. 0.3 p(Y b1 0) ? p-value
0.08 (t-test) P(Y b2 0) ? p-value
0.07 (t-test) p(Y b1 0, b2 0) ? p-value
0.002 (F-test)
e
Y
X b
1
2
1
2
35After orthogonalisation
True signal
Fit (does not change)
Residuals (do not change)
36After orthogonalisation
(0.79) (0.85) (0.06)
b1 1.47 b2 0.85 b3 0.06
Residual var. 0.3 p(Y b1 0) p-value
0.0003 (t-test) p(Y b2 0) p-value
0.07 (t-test) p(Y b1 0, b2 0) p-value
0.002 (F-test)
does change
does not change
does not change
e
Y
X b
1
2
37Design efficiency
- The aim is to minimize the standard error of a
t-contrast (i.e. the denominator of a
t-statistic).
- This is equivalent to maximizing the efficiency e
Noise variance
Design variance
- If we assume that the noise variance is
independent of the specific design
NB efficiency depends on design matrix and the
chosen contrast !
- This is a relative measure all we can say is
that one design is more efficient than another
(for a given contrast).
38Design efficiency
- XTX is the covariance matrix of the regressors in
the design matrix - efficiency decreases with increasing covariance
- but note that efficiency differs across contrasts
cT 1 0 ? e 0.19 cT 1 1 ? e 0.05 cT
1 -1 ? e 0.95
blue dots noise with the covariance structure of
XTX
39Example working memory
A
B
C
Stimulus
Response
Stimulus
Response
Stimulus
Response
Time (s)
Correlation -.65Efficiency (1 0) 29
Correlation .33Efficiency (1 0) 40
Correlation -.24Efficiency (1 0) 47
- A Response follows each stimulus with (short)
fixed delay. - B Jittering the delay between stimuli and
responses. - C Requiring a response only for half of all
trials (randomly chosen).
40Bibliography
- Friston KJ et al. (2007) Statistical Parametric
Mapping The Analysis of Functional Brain Images.
Elsevier.
- Christensen R (1996) Plane Answers to Complex
Questions The Theory of Linear Models.
Springer. - Friston KJ et al. (1995) Statistical parametric
maps in functional imaging a general linear
approach. Human Brain Mapping 2 189-210. - Mechelli A et al. (2003) Estimating efficiency a
priori a comparison of blocked and randomized
designs. NeuroImage 18798-805.
41Thank you