Title: Single session analysis using FEAT
1Single session analysis using FEAT
- David Field
- Thanks to.
- Tom Johnstone, Jason Gledhill, FMRIB
2Single session or first level FMRI analysis
- In FMRI you begin with 1000s of time courses,
one for each voxel location - Lots of preprocessing is applied to maximise
the quality of the data - Then, each voxel time course is modelled
independently - the model is a set of regressors (EVs) that vary
over time - the same model is applied to every voxel time
course - if (part of) the model fits the voxel time course
well the voxel is said to be active
241
256
250
234
260
242
254
3Plan for today
- Detailed look at the process of modelling a
single voxel time course - The input is a voxel time course of image
intensity values and the design matrix, which is
made up of multiple EVs - The GLM is used to find the linear combination of
EVs that explains the most variance in the voxel
time course - The output is a PE for each EV in the design
matrix - Preprocessing of voxel time courses
- this topic probably only makes much sense if you
understand 1) above, which is why I am breaking
with tradition and covering this topic second
rather than first - Implementing a single session analysis in FSL
using FEAT (workshop) - Note There is no formal meeting in week 3 of the
course, but the room will be open for you to
complete worksheets from today and last week - at least one experienced FSL user will be here to
help
4Preprocessing.
5(No Transcript)
6- The BET brain extraction option refers to the 4D
functional series - This will not run BET on the structural
- Normally turn this on
- Brain extraction for the structural image that
will be used as the target for registration has
to be performed by you, before you use FEAT to
set up the processing pipeline
7- On the Misc tab you can toggle balloon help
- Balloon help will tell you what most of the
options mean if you hover the mouse over a button
or tickbox - But it gets annoying after a while
- If progress watcher is selected then FEAT opens a
web browser that shows regular updates of the
stage your analysis has reached
8Motion Correction MCFLIRT
- Aims to make sure that there is a consistent
mapping between voxel X,Y,Z position and actual
anatomical locations - Each image in the 4D series is registered using a
6 DOF rigid body spatial transform to the
reference image (by default, the series mid
point) - This means every image apart from the reference
image is moved slightly - MCFLIRT plots the movements that were made as
output
9MCFLIRT output
Head rotation
Head translation
Total displacement
10The total displacement plot
displacement is calculated relative to the volume
in the middle of the time series
Relative displacement is head position at each
time point relative to the previous time point.
Absolute displacement is relative to the
reference image.
11The total displacement plot
Why should you be particularly concerned about
high values in the relative motion plot?
The first thing to do is look at the range of
values plotted on the y axis, because MCFLIRT
auto-scales the y axis to the data range
12Slice timing correction
Each slice in a functional image is acquired
separately. Acquisition is normally interleaved,
which prevents blending of signal from adjacent
slices Assuming a TR of 3 seconds and, what is
the time difference between the acquisition of
adjacent slices?
A single functional brain area may span two or
more slices
13Why are the different slice timings a problem?
- The same temporal model (design matrix) is fitted
to every voxel time course - therefore, the model assumes that all the voxel
values in a single functional volume were
measured simultaneously - Given that they are not measured simultaneously,
what are the implications? - Consider two voxels from adjacent slices
- both voxels are from the same functional brain
area - this time the TR is 1.5, but the slice order
differed from the standard interleaved procedure,
so there is a 1 sec time gap between acquisition
of two voxels in the adjacent slices that cover
the functional brain area
14- Blue line real HRF in response to a brief event
at time 0
Blue squares intensity values at a voxel first
sampled at 0.5 sec, then every 1.5 sec thereafter
(TR 1.5)
Red circles intensity values at a voxel from an
adjacent slice first sampled at 1.5 sec, then
every 1.5 sec thereafter (TR 1.5)
15These are the two voxel time courses that are
submitted to the model
16The model time course is yoked to the mid point
of the volume acquisition (TR), so there will be
a better fit for voxels in slices acquired at or
near that time.
These are the two voxel time courses that are
submitted to the model
17Slice timing solutions
- Any ideas based on what was covered earlier?
- Including temporal derivatives of the main
regressors in the model allows the model to be
shifted in time to fit voxels in slices acquired
far away from the mid point of the TR - But,
- this makes it difficult to interpret the PEs for
the derivatives do they represent slice timing,
or do they represent variations in the underlying
HRF? - and, you end up having to use F contrasts instead
of t contrasts (see later for why this is to be
avoided if possible)
18Slice timing solutions
- Any ideas based on what was covered earlier?
- Use a block design with blocks long enough that
the BOLD response is summated over a long time
period - But,
- Not all experimental stimuli / tasks can be
presented in a block design - So,
- another option is to shift the data from
different slices in time by small amounts so that
it is as if all slices were acquired at once - this is what the preprocessing option in FEAT
does
19Shifting the data in time - caveats
- But if the TR is 3 seconds, and you need to move
the data for a given slice by, e.g., 1 sec, you
dont have a real data point to use - get round this by interpolating the missing time
points to allow whole voxel time courses to be
shifted in time so effectively they were all
sampled at once at the middle of the TR - But
- Interpolation works OK for short TR, but it does
not work well for long TR (gt 3 sec?) - So this solution only works well when slice
timing issues are relatively minor - There is a debate about whether to do slice
timing correction before or after motion
correction - FSL does motion correction first
- some people advise against any slice timing
correction - Generally, if you have an event related design
then use it, but make sure you check carefully
with the scanner technician what order your
slices were acquired in!
20Temporal filtering
- Filtering in time and/or space is a
long-established method in any signal detection
process to help "clean up" your signal - The idea is if your signal and noise are present
at separable frequencies in the data, you can
attenuate the noise frequencies and thus increase
your signal to noise ratio
21(No Transcript)
22AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
23AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
Time
24Raw data
25After low pass filter
26Very low frequency component, suggests that high
pass filter also needed
27Low and high frequencies removed
28Setting the high pass filter for FMRI
- The rule recommended by FSL is that the lowest
setting for the high pass filter should be equal
to the duration of a single cycle of the design - In an ABCABCABC block design, it will be equal to
ABC - If you set a shorter duration than this you will
remove signal that is associated with the
experiment - If you set a higher duration than this then any
unsystematic variation (noise) in the data with a
periodic structure lower in frequency than the
experimental cycle time will remain in voxel time
courses - In complicated event related designs lacking a
strongly periodic structure there is a subjective
element to the setting of the high pass filter
29Setting the high pass filter for FMRI
- Do not use experimental designs with many
conditions where the duration of a single
experimental cycle is very long - e.g. ABCDEABCDE, where ABCDE 300 seconds
- setting the high pass to 300 sec will allow a lot
of the low frequency FMRI noise to pass through
to the modelling stage - Furthermore, the noise can easily become
correlated with the experimental time course
because you are using an experimental time course
that has a similar frequency to that of FMRI
noise - In any signal detection experiment, not just
FMRI, you need to ensure that the signal of
interest and noise are present at different
temporal frequencies
30Low pass filter?
- As well as removing oscillations with a longer
cycle time than the experiment, you can also
elect to remove oscillations with a higher cycle
time than the experiment - high frequency noise
- In theory this should enhance signal and reduce
noise, and it was practiced in the early days of
FMRI - However, it has now been demonstrated that
because FMRI noise has temporal structure (i.e.
it is not white noise), the low pass filter can
actually enhance noise relative to signal - The temporal structure in the noise is called
temporal autocorrelation and is dealt with in
FSL using FILM prewhitening instead of low pass
filtering - in another break with the traditional structure
of FMRI courses temporal autocorrelation and
spatial smoothing will be covered after t and F
contrasts
31After preprocessing and model fitting
- You can now begin to answer the questions you set
out to answer.
32Which voxels activated in each experimental
condition?
- In the auditory / visual stimulus experiment, how
do you decide if a voxel was more activated
during the visual stimulus than the baseline? - If the visual condition PE is gt 0 then the voxel
is active - but gt 0 has to take into account the noise in the
data - We need to be confident that if you repeated the
experiment many times the PE would nearly always
be gt 0 - How can you take the noise into account and
quantify confidence that PE gt 0? - PE / residual variation in the voxel time course
- this is a t statistic, which can be converted to
a p value by taking into account the degrees of
freedom - The p value is the probability of a PE as large
or larger than the observed PE if the true value
of the PE was 0 (null hypothesis) and the only
variation present in the data was random
variation - FSL converts t statistics to z scores,
simplifying interpretation because z can be
converted to p without using deg of freedom
33Why include effects of no interest in the model?
- Also called nuisance regressors
- Imagine an experiment about visual processing of
faces versus other classes of object - Why add EVs to the design matrix based on the
time course of - head motion parameters
- image intensity spikes
- physiological variables
- galvanic skin response, heart rate, pupil
diameter - Answer to reduce the size of the residual error
term - t will be bigger when PE is bigger, but t will
also be bigger when error is smaller - You can model residual variation that is
systematic in some way, but some of the residual
variation is truly random in nature, e.g. thermal
noise from the scanner, and cannot be modelled out
34t contrasts
- Contrast is short for Contrast of Parameter
Estimates (COPE) - it means a linear sum of PEs. The simplest
examples are implicit contrasts of individual
PEs with baseline. Using the example from the
interactive spreadsheet - visual PE 1 auditory PE 0
- visual PE 0 auditory PE 1
- To locate voxels where the visual PE is larger
than the auditory PE - visual PE 1 auditory PE -1
- To locate voxels where the auditory PE is larger
than the visual PE - visual PE -1 auditory PE 1
35t contrasts
- The value of the contrast in each voxel is
divided by an estimate of the residual variation
in the voxel time course (standard error) - produces a t statistic of the contrast
- residual variation is based on the raw time
course values minus the predicted values from the
fitted model - Activation maps (red and yellow blobs
superimposed on anatomical images) are produced
by mapping the t value at each voxel to a colour - crudely, thresholding is just setting a value of
t below which no colour is assigned.
36What will the 1 1 contrast give you?
37F contrasts
- These can be used to find voxels that are active
in any one or more of a set of t contrasts - Visual Auditory Tactile
- 1 -1 0
- 1 0 -1
- F contrasts are bidirectional (1 -1 also implies
-1 1) - rarely a good thing in practice..
- If the response to an event was modelled with the
standard HRF regressor plus time derivative then
you can use an F contrast to view both components
of model of the response to the event on a single
activation map - If you are relying on the time derivative to deal
with slice timing correction then you are
strongly advised to do this
38Thresholding / multiple comparisons problem
39Temporal autocorrelation
- The image intensity value in a voxel at time X is
partially predictable from the same voxel at
times X-1, X-2, X-3 etc (even in baseline scans
with no experiment) - why is this a problem?
- it makes it hard to know the number of
statistically independent observations in a voxel
time course - life would be simple if the number of
observations was equal to the number of
functional volumes - temporal autocorrelation results in a true N is
lower than this - Why do we need to know the number of independent
observations? - because calculating a t statistic requires
dividing by the standard error, and the standard
error is SD / square root (N-1) - Degrees of freedom are needed to convert t stats
to p values - If you use the number of time points in the voxel
time course as N then p values will be too small
(false positive)
40Measuring autocorrelation
41Measuring autocorrelation
42Measuring autocorrelation
43Measuring autocorrelation (SPSS style)
44Plot the correlation against the degree of shift
45Temporal autocorrelation
- Generally, the value of a voxel at time t is
partially predictable from nearby time points
about 3-6 seconds in the past - So, if you use a very long TR, e.g. 6, then you
mostly avoid the problem as the original time
points will have sufficient independence from
each other - Voxel values are also predictable from more
distant time points due to low frequency noise
with periodic structure - but the high pass filter should deal with this
problem
46Autocorrelation FILM prewhitening
- First, fit the model (regressors of interest and
no interest) to the voxel time course using the
GLM - (ignoring the autocorrelation for the moment)
- Estimate the temporal autocorrelation structure
in the residuals - note if model is good residuals noise?
- The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation
structure in the original data - the time points are now independent and so N
the number of time points (volumes) - the filter is also applied to the design matrix
- Refit the GLM
- Run t and F tests with valid standard error and
degrees of freedom - Prewhitening is selected on the stats tab in FEAT
- it is computationally intensive, but with a
modern PC it is manageable and there are almost
no circumstances where you would turn this option
off
47Spatial smoothing
- FMRI noise varies across space as well as time
- smoothing is a way of reducing spatial noise and
thereby increasing the ratio of signal to noise
(SNR) in the data - Unlike FMRI temporal noise, FMRI spatial noise is
more like white noise, making it easier to deal
with - it is essentially random, essentially independent
from voxel to voxel, and has as mean of about
zero - therefore if you average image intensity across
several voxels, noise tends to average towards
zero, whereas signal that is common to the voxels
you are averaging across will remain unchanged,
dramatically improving the signal to noise ratio
(SNR) - A secondary benefit of smoothing is to reduce
anatomical variation between participants that
remains after registration to the template image - this is because smoothing blurs the images
48Spatial smoothing FWHM
- FSL asks you to specify a Gaussian smoothing
kernel defined by its Full Width at Half Maximum
(FWHM) in mm - To find the FWHM of a Gaussian
- Find the point on the y axis where the function
attains half its maximum value - Then read off the corresponding x axis values
49Spatial smoothing FWHM
- The Gaussian is centred on a voxel, and the value
of the voxel is averaged with that of adjacent
voxels that fall under the Gaussian - The averaging is weighted by the y axis value of
the Gaussian at the appropriate distance
50No smoothing
4 mm
9 mm
51Effects of Smoothing on activations
Unsmoothed Data
Smoothed Data (kernel width 5 voxels)
52When should you smooth? When should you not?
- Smoothing is a good idea if
- You're not particularly concerned with
voxel-by-voxel resolution. - You're not particularly concerned with finding
small (less than a handful of voxels) clusters - You want (or need) to improve your
signal-to-noise ratio - You're averaging results over a group, in a brain
region where functional anatomy and organization
isn't precisely known - You want to use p-values corrected for multiple
comparisons with Gaussian field theory (as
opposed to False Discovery Rate) - this is the Voxel option in FSL and the FWE
option in SPM - Smoothing kernel should be small (or no
smoothing) if - You need voxel-by-voxel resolution
- You believe your activations of interest will
only be a few voxels in size - You're confident your task will generate large
amounts of signal relative to noise - You're working primarily with single-subject
results - You're mainly interested in getting
region-of-interest data from very specific
structures that you've drawn with high resolution
on single subjects
53How do you determine the size of the kernel?
- Based on functional voxel size? Or brain
structure size? - A little of both, it seems.
- The matched filter theorem, from the signal
processing field, tells us that if we're trying
to recover a signal (like an activation) in noisy
data (like FMRI), we can best do it by smoothing
our data with a kernel that's about the same size
as our activation. - Trouble is, though, most of us don't know how big
our activations are going to be before we run our
experiment - Even if you have a particular structure of
interest (say, the hippocampus), you may not get
activation over the whole region - only a part - A lot of people set FWHM to functional voxel size
2
54(No Transcript)
55Old slides beyond this point
56Slice timing correction
- Each functional volume that forms part of the 4D
time series is made up of slices - Each slice is acquired at a different point in
time relative to the start of the TR - e.g., slice 1 at 100 msec, slice 2 at 200 msec,
etc - For each slice, its the same time point relative
to the start of the TR in every volume - So, the interval between successive acquisitions
is constant for every voxel - But the actual time of acquisition is different
for every slice - The model of the time course assumes that within
each volume, every slice was acquired
simultaneously at the mid point of the TR - so, the model is likely to fit better for one
slice than all the others (bad) - To use slice timing correction, you will need to
tell FSL the order your slices were acquired in - interleaved is the most common, but ask your
scanner technician! - Adjustment is to the middle of the TR period
57Slice timing correction
- For each voxel, slice-timing correction examines
the time course and shifts it by a small amount - This requires interpolating between the time
points you actually sampled to infer a more
detailed version of the time course - The more detailed time course can have small
shifts applied to it that are slightly different
for each voxel, depending on the actual order the
slices were acquired in - This allows you to make the assumption in your
modelling that every voxel in each volume was
acquired simultaneously
58Slice timing correction
- The problem this tries to solve is more severe if
you have a longer TR (e.g. 4 seconds) - two adjacent slices in an interleaved sequence
could be sampled almost 2 seconds apart - But temporal interpolation also becomes dodgy
with longer TRs ? - For block designs (stimuli that are long relative
to the TR, e.g. TR 2 sec, stimulus lasts 16
sec) slice timing errors are not a significant
factor influencing the fitting of a model to the
data - For event related designs (brief stimuli
separated by variable pauses), slice timing
correction is important - People argue about whether to do slice timing
correction before or after motion correction - FSL does motion correction first
- some people advise against any slice timing
correction
59Temporal derivatives
- In the FEAT practical you will add temporal
derivatives of the HRF convolved experimental
time courses to the design matrix - what is the purpose of this?
- Each experimental time course is convolved with a
model of the HRF - this is to build the delay and blurring of the
blood flow response relative to the neural
response into the model - but the delay varies between brain areas and
between people
60Temporal Derivatives
- The green line is the first temporal derivative
of the blue line - its rate of change
- the positive max of the derivative is earlier
than the normal HRF peak - the negative max of the derivative is later than
the normal HRF peak - If fitting the model results in a positive beta
weight on a derivative this implies that the HRF
peak is earlier in that voxel - A negative beta weight for the derivative implies
a later peak than typical
61Temporal derivatives
- The basic HRF shape (blue on the previous slide)
has some physiological underpinning (in visual
cortex) - But the use of the derivative to model faster /
slower responses is just a mathematical
convenience - The second temporal derivative (dispersion in
time) can be used to model haemodynamic responses
that are thinner or fatter in time than the
basic shape - The three functions together are sometimes called
the informed basis set by SPM users - the blue line is referred to as canonical, but
in fact it is only canonical for primary visual
cortex - The informed basis set can only model slight
departures from the canonical response shape - If you are interested in the prefrontal cortex of
the elderly youll need to use a more flexible
basis set to model the temporal dynamics of the
response - or use a block design where timing issues are
less severe
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66Cluster size based thresholding
- Intuitively, if a voxel with a Z statistic of
1.96 for a particular COPE is surrounded by other
voxels with very low Z values this looks
suspicious - unless you are looking for a very small brain
area - Consider a voxel with a Z statistic of 1.96 is
surrounded by many other voxels with similar Z
values, forming a large blob - Intuitively, for such a voxel the Z of 1.96 (p
0.05) is an overestimate of the probability of
the model fit to this voxel being a result of
random, stimulus unrelated, fluctuation in the
time course - The p value we want to calculate is the
probability of obtaining one or more clusters of
this size or larger under a suitable null
hypothesis - one or more gives us control over the multiple
comparisons problem by setting the family wise
error rate - p value will be low for big clusters
- p value will be high for small clusters
67Comparison of voxel (height based) thresholding
and cluster thresholding
?
space
Significant Voxels
No significant Voxels
? is the height threshold, e.g. 0.001 applied
voxelwise (will be Z about 3)
68Comparison of voxel (height based) thresholding
and cluster thresholding
?
space
Cluster significant
Cluster not significant
k?
k?
K? is the probability of the image containing 1
or more blobs with k or more voxels (and you can
control is at 0.05) The cluster size, in voxels,
that corresponds to a particular value of K?
depends upon the initial value of height
threshold ? used to define the number of clusters
in the image and their size It is usual to set
height ? quite low when using cluster level
thresholding, but this arbitrary choice will
influence the outcome
69Dependency of number of clusters on choice of
height threshold
The number and size of clusters also depends upon
the amount of smoothing that took place in
preprocessing
70(No Transcript)
71- Nyquist frequency is important to know about
- Half the sampling rate (e.g. TR 2 sec is 0.5 Hz,
so Nyquist is 0.25 hz, or 4 seconds) - No signal higher frequency than Nyquist can be
present in the data (important for experimental
design) - But such signal could appear as an aliasing
artefact at a lower frequency
72(No Transcript)
73Overview
- Todays practical session will cover processing
of a single functional session from a single
participant using FEAT - FEAT is an umbrella program that brings together
various other FSL programs into a customisable
processing pipeline - for example, it makes use of BET and FLIRT, which
were programs covered in week 1 - Definitions of single session and multi
session - We will also make use of an interactive
spreadsheet that demonstrates how the general
linear model (GLM) can be used to locate active
regions of the brain given your predictions about
the time course of activation - The lecture will provide theoretical background
for each processing step - There is no formal meeting in week 3 of the
course, but the room will be open for you to
complete worksheets from today and last week - at least one experienced FSL user will be here to
help
74Overview of single session FMRI
- The data is a 4D functional time series
- Many thousands of spatial locations (voxels)
- Each voxel has a time course defined by a single
intensity value per TR ( per volume acquired) - The task is to model the changes in image
intensity over time separately for each voxel - mass univariate approach
- begin with a set of regressors (design matrix /
model) - regressors usually reflect the time course of
experimental conditions - find the best linear combination of regressors to
explain each voxel time course (basically,
multiple regression) - Before modelling the 4D time series a number of
preprocessing steps are applied to the data - remove unwanted sources of variation from the
time series - increase the signal to noise ratio
75Voxel-wise single session modelling
- After the data has been optimised by
preprocessing you search for voxels where the
time course of image intensity changes is
correlated with the experimental time course - activation
- This is achieved using the General Linear Model
(GLM) - similar to multiple regression
- The input to the GLM is the data, plus a set of
explanatory variables called the Design Matrix - sometimes EVs are included to model sources of
variation that are of no interest to the
experimenter - this is to reduce the residual (error) variance
- The GLM is fitted independently for each voxel
timecourse - ignores the spatial structure in the brain
76(No Transcript)
77Regressor 0 (stimulus on)
Regressor 1 (stimulus off)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81(No Transcript)
82The voxel time courses are standardised so that
beta weights are comparable between voxels
83If there is structure in the residual time
courses something important has not been modelled
Experimental time course regressors no longer
square wave because convolved with HRF model
84Autocorrelation FILM prewhitening
- First, fit the GLM
- Estimate the temporal autocorrelation structure
in the residuals - The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation
structure in the data - the filter is also applied to the design matrix
- Refit the GLM
- DOF n-1 will now correctly reflect what is really
free to vary in the timecourse - Prewhitening is selected on the stats tab in FEAT
- it is computationally intensive, but with a
modern PC it is manageable and there are almost
no circumstances where you would turn this option
off
85(No Transcript)
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94(No Transcript)
95(No Transcript)
96(No Transcript)
97(No Transcript)
98(No Transcript)
99(No Transcript)
100Temporal filtering
- Filtering in time and/or space is a
long-established method in any signal detection
process to help "clean up" your signal - The idea is if your signal and noise are present
at separable frequencies in the data, you can
attenuate the noise frequencies and thus increase
your signal to noise ratio
I could illustrate this by drawing a low
frequency sinusoid called noise on the board, or
with matlab. Then draw a high frequency one
called signal underneath. Draw a third where they
are added together, and point out that the two
sinusoids could be seperated mathematically, even
if you did not know apriori their amplitudes and
frequencies. In a second example I make noise and
signal have similar frequency and show that when
added together they are inseperable. This is
key point of FMRI data analysis and guiding
principle in experimental design.