Single session analysis using FEAT

About This Presentation

Title:

Single session analysis using FEAT

Description:

Single session analysis using FEAT David Field Thanks to . Tom Johnstone, Jason Gledhill, FMRIB Single session or first level FMRI analysis In FMRI you ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 101

Provided by: DavidT109

Category:

more less

Transcript and Presenter's Notes

Title: Single session analysis using FEAT

1
Single session analysis using FEAT

David Field
Thanks to.
Tom Johnstone, Jason Gledhill, FMRIB

2
Single session or first level FMRI analysis

In FMRI you begin with 1000s of time courses,
one for each voxel location
Lots of preprocessing is applied to maximise
the quality of the data
Then, each voxel time course is modelled
independently
the model is a set of regressors (EVs) that vary
over time
the same model is applied to every voxel time
course
if (part of) the model fits the voxel time course
well the voxel is said to be active

241
256
250
234
260
242
254
3
Plan for today

Detailed look at the process of modelling a
single voxel time course
The input is a voxel time course of image
intensity values and the design matrix, which is
made up of multiple EVs
The GLM is used to find the linear combination of
EVs that explains the most variance in the voxel
time course
The output is a PE for each EV in the design
matrix
Preprocessing of voxel time courses
this topic probably only makes much sense if you
understand 1) above, which is why I am breaking
with tradition and covering this topic second
rather than first
Implementing a single session analysis in FSL
using FEAT (workshop)
Note There is no formal meeting in week 3 of the
course, but the room will be open for you to
complete worksheets from today and last week
at least one experienced FSL user will be here to
help

4
Preprocessing.
5
(No Transcript)
6

The BET brain extraction option refers to the 4D
functional series
This will not run BET on the structural
Normally turn this on
Brain extraction for the structural image that
will be used as the target for registration has
to be performed by you, before you use FEAT to
set up the processing pipeline

On the Misc tab you can toggle balloon help
Balloon help will tell you what most of the
options mean if you hover the mouse over a button
or tickbox
But it gets annoying after a while
If progress watcher is selected then FEAT opens a
web browser that shows regular updates of the
stage your analysis has reached

8
Motion Correction MCFLIRT

Aims to make sure that there is a consistent
mapping between voxel X,Y,Z position and actual
anatomical locations
Each image in the 4D series is registered using a
6 DOF rigid body spatial transform to the
reference image (by default, the series mid
point)
This means every image apart from the reference
image is moved slightly
MCFLIRT plots the movements that were made as
output

9
MCFLIRT output
Head rotation
Head translation
Total displacement
10
The total displacement plot
displacement is calculated relative to the volume
in the middle of the time series
Relative displacement is head position at each
time point relative to the previous time point.
Absolute displacement is relative to the
reference image.
11
The total displacement plot
Why should you be particularly concerned about
high values in the relative motion plot?
The first thing to do is look at the range of
values plotted on the y axis, because MCFLIRT
auto-scales the y axis to the data range
12
Slice timing correction
Each slice in a functional image is acquired
separately. Acquisition is normally interleaved,
which prevents blending of signal from adjacent
slices Assuming a TR of 3 seconds and, what is
the time difference between the acquisition of
adjacent slices?
A single functional brain area may span two or
more slices
13
Why are the different slice timings a problem?

The same temporal model (design matrix) is fitted
to every voxel time course
therefore, the model assumes that all the voxel
values in a single functional volume were
measured simultaneously
Given that they are not measured simultaneously,
what are the implications?
Consider two voxels from adjacent slices
both voxels are from the same functional brain
area
this time the TR is 1.5, but the slice order
differed from the standard interleaved procedure,
so there is a 1 sec time gap between acquisition
of two voxels in the adjacent slices that cover
the functional brain area

Blue line real HRF in response to a brief event
at time 0

Blue squares intensity values at a voxel first
sampled at 0.5 sec, then every 1.5 sec thereafter
(TR 1.5)
Red circles intensity values at a voxel from an
adjacent slice first sampled at 1.5 sec, then
every 1.5 sec thereafter (TR 1.5)
15
These are the two voxel time courses that are
submitted to the model
16
The model time course is yoked to the mid point
of the volume acquisition (TR), so there will be
a better fit for voxels in slices acquired at or
near that time.
These are the two voxel time courses that are
submitted to the model
17
Slice timing solutions

Any ideas based on what was covered earlier?
Including temporal derivatives of the main
regressors in the model allows the model to be
shifted in time to fit voxels in slices acquired
far away from the mid point of the TR
But,
this makes it difficult to interpret the PEs for
the derivatives do they represent slice timing,
or do they represent variations in the underlying
HRF?
and, you end up having to use F contrasts instead
of t contrasts (see later for why this is to be
avoided if possible)

18
Slice timing solutions

Any ideas based on what was covered earlier?
Use a block design with blocks long enough that
the BOLD response is summated over a long time
period
But,
Not all experimental stimuli / tasks can be
presented in a block design
So,
another option is to shift the data from
different slices in time by small amounts so that
it is as if all slices were acquired at once
this is what the preprocessing option in FEAT
does

19
Shifting the data in time - caveats

But if the TR is 3 seconds, and you need to move
the data for a given slice by, e.g., 1 sec, you
dont have a real data point to use
get round this by interpolating the missing time
points to allow whole voxel time courses to be
shifted in time so effectively they were all
sampled at once at the middle of the TR
But
Interpolation works OK for short TR, but it does
not work well for long TR (gt 3 sec?)
So this solution only works well when slice
timing issues are relatively minor
There is a debate about whether to do slice
timing correction before or after motion
correction
FSL does motion correction first
some people advise against any slice timing
correction
Generally, if you have an event related design
then use it, but make sure you check carefully
with the scanner technician what order your
slices were acquired in!

20
Temporal filtering

Filtering in time and/or space is a
long-established method in any signal detection
process to help "clean up" your signal
The idea is if your signal and noise are present
at separable frequencies in the data, you can
attenuate the noise frequencies and thus increase
your signal to noise ratio

21
(No Transcript)
22
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
23
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
Time
24
Raw data
25
After low pass filter
26
Very low frequency component, suggests that high
pass filter also needed
27
Low and high frequencies removed
28
Setting the high pass filter for FMRI

The rule recommended by FSL is that the lowest
setting for the high pass filter should be equal
to the duration of a single cycle of the design
In an ABCABCABC block design, it will be equal to
ABC
If you set a shorter duration than this you will
remove signal that is associated with the
experiment
If you set a higher duration than this then any
unsystematic variation (noise) in the data with a
periodic structure lower in frequency than the
experimental cycle time will remain in voxel time
courses
In complicated event related designs lacking a
strongly periodic structure there is a subjective
element to the setting of the high pass filter

29
Setting the high pass filter for FMRI

Do not use experimental designs with many
conditions where the duration of a single
experimental cycle is very long
e.g. ABCDEABCDE, where ABCDE 300 seconds
setting the high pass to 300 sec will allow a lot
of the low frequency FMRI noise to pass through
to the modelling stage
Furthermore, the noise can easily become
correlated with the experimental time course
because you are using an experimental time course
that has a similar frequency to that of FMRI
noise
In any signal detection experiment, not just
FMRI, you need to ensure that the signal of
interest and noise are present at different
temporal frequencies

30
Low pass filter?

As well as removing oscillations with a longer
cycle time than the experiment, you can also
elect to remove oscillations with a higher cycle
time than the experiment
high frequency noise
In theory this should enhance signal and reduce
noise, and it was practiced in the early days of
FMRI
However, it has now been demonstrated that
because FMRI noise has temporal structure (i.e.
it is not white noise), the low pass filter can
actually enhance noise relative to signal
The temporal structure in the noise is called
temporal autocorrelation and is dealt with in
FSL using FILM prewhitening instead of low pass
filtering
in another break with the traditional structure
of FMRI courses temporal autocorrelation and
spatial smoothing will be covered after t and F
contrasts

31
After preprocessing and model fitting

You can now begin to answer the questions you set
out to answer.

32
Which voxels activated in each experimental
condition?

In the auditory / visual stimulus experiment, how
do you decide if a voxel was more activated
during the visual stimulus than the baseline?
If the visual condition PE is gt 0 then the voxel
is active
but gt 0 has to take into account the noise in the
data
We need to be confident that if you repeated the
experiment many times the PE would nearly always
be gt 0
How can you take the noise into account and
quantify confidence that PE gt 0?
PE / residual variation in the voxel time course
this is a t statistic, which can be converted to
a p value by taking into account the degrees of
freedom
The p value is the probability of a PE as large
or larger than the observed PE if the true value
of the PE was 0 (null hypothesis) and the only
variation present in the data was random
variation
FSL converts t statistics to z scores,
simplifying interpretation because z can be
converted to p without using deg of freedom

33
Why include effects of no interest in the model?

Also called nuisance regressors
Imagine an experiment about visual processing of
faces versus other classes of object
Why add EVs to the design matrix based on the
time course of
head motion parameters
image intensity spikes
physiological variables
galvanic skin response, heart rate, pupil
diameter
Answer to reduce the size of the residual error
term
t will be bigger when PE is bigger, but t will
also be bigger when error is smaller
You can model residual variation that is
systematic in some way, but some of the residual
variation is truly random in nature, e.g. thermal
noise from the scanner, and cannot be modelled out

34
t contrasts

Contrast is short for Contrast of Parameter
Estimates (COPE)
it means a linear sum of PEs. The simplest
examples are implicit contrasts of individual
PEs with baseline. Using the example from the
interactive spreadsheet
visual PE 1 auditory PE 0
visual PE 0 auditory PE 1
To locate voxels where the visual PE is larger
than the auditory PE
visual PE 1 auditory PE -1
To locate voxels where the auditory PE is larger
than the visual PE
visual PE -1 auditory PE 1

35
t contrasts

The value of the contrast in each voxel is
divided by an estimate of the residual variation
in the voxel time course (standard error)
produces a t statistic of the contrast
residual variation is based on the raw time
course values minus the predicted values from the
fitted model
Activation maps (red and yellow blobs
superimposed on anatomical images) are produced
by mapping the t value at each voxel to a colour
crudely, thresholding is just setting a value of
t below which no colour is assigned.

36
What will the 1 1 contrast give you?
37
F contrasts

These can be used to find voxels that are active
in any one or more of a set of t contrasts
Visual Auditory Tactile
1 -1 0
1 0 -1
F contrasts are bidirectional (1 -1 also implies
-1 1)
rarely a good thing in practice..
If the response to an event was modelled with the
standard HRF regressor plus time derivative then
you can use an F contrast to view both components
of model of the response to the event on a single
activation map
If you are relying on the time derivative to deal
with slice timing correction then you are
strongly advised to do this

38
Thresholding / multiple comparisons problem
39
Temporal autocorrelation

The image intensity value in a voxel at time X is
partially predictable from the same voxel at
times X-1, X-2, X-3 etc (even in baseline scans
with no experiment)
why is this a problem?
it makes it hard to know the number of
statistically independent observations in a voxel
time course
life would be simple if the number of
observations was equal to the number of
functional volumes
temporal autocorrelation results in a true N is
lower than this
Why do we need to know the number of independent
observations?
because calculating a t statistic requires
dividing by the standard error, and the standard
error is SD / square root (N-1)
Degrees of freedom are needed to convert t stats
to p values
If you use the number of time points in the voxel
time course as N then p values will be too small
(false positive)

40
Measuring autocorrelation
41
Measuring autocorrelation
42
Measuring autocorrelation
43
Measuring autocorrelation (SPSS style)
44
Plot the correlation against the degree of shift
45
Temporal autocorrelation

Generally, the value of a voxel at time t is
partially predictable from nearby time points
about 3-6 seconds in the past
So, if you use a very long TR, e.g. 6, then you
mostly avoid the problem as the original time
points will have sufficient independence from
each other
Voxel values are also predictable from more
distant time points due to low frequency noise
with periodic structure
but the high pass filter should deal with this
problem

46
Autocorrelation FILM prewhitening

First, fit the model (regressors of interest and
no interest) to the voxel time course using the
GLM
(ignoring the autocorrelation for the moment)
Estimate the temporal autocorrelation structure
in the residuals
note if model is good residuals noise?
The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation
structure in the original data
the time points are now independent and so N
the number of time points (volumes)
the filter is also applied to the design matrix
Refit the GLM
Run t and F tests with valid standard error and
degrees of freedom
Prewhitening is selected on the stats tab in FEAT
it is computationally intensive, but with a
modern PC it is manageable and there are almost
no circumstances where you would turn this option
off

47
Spatial smoothing

FMRI noise varies across space as well as time
smoothing is a way of reducing spatial noise and
thereby increasing the ratio of signal to noise
(SNR) in the data
Unlike FMRI temporal noise, FMRI spatial noise is
more like white noise, making it easier to deal
with
it is essentially random, essentially independent
from voxel to voxel, and has as mean of about
zero
therefore if you average image intensity across
several voxels, noise tends to average towards
zero, whereas signal that is common to the voxels
you are averaging across will remain unchanged,
dramatically improving the signal to noise ratio
(SNR)
A secondary benefit of smoothing is to reduce
anatomical variation between participants that
remains after registration to the template image
this is because smoothing blurs the images

48
Spatial smoothing FWHM

FSL asks you to specify a Gaussian smoothing
kernel defined by its Full Width at Half Maximum
(FWHM) in mm
To find the FWHM of a Gaussian
Find the point on the y axis where the function
attains half its maximum value
Then read off the corresponding x axis values

49
Spatial smoothing FWHM

The Gaussian is centred on a voxel, and the value
of the voxel is averaged with that of adjacent
voxels that fall under the Gaussian
The averaging is weighted by the y axis value of
the Gaussian at the appropriate distance

50
No smoothing
4 mm
9 mm
51
Effects of Smoothing on activations
Unsmoothed Data
Smoothed Data (kernel width 5 voxels)
52
When should you smooth? When should you not?

Smoothing is a good idea if
You're not particularly concerned with
voxel-by-voxel resolution.
You're not particularly concerned with finding
small (less than a handful of voxels) clusters
You want (or need) to improve your
signal-to-noise ratio
You're averaging results over a group, in a brain
region where functional anatomy and organization
isn't precisely known
You want to use p-values corrected for multiple
comparisons with Gaussian field theory (as
opposed to False Discovery Rate)
this is the Voxel option in FSL and the FWE
option in SPM
Smoothing kernel should be small (or no
smoothing) if
You need voxel-by-voxel resolution
You believe your activations of interest will
only be a few voxels in size
You're confident your task will generate large
amounts of signal relative to noise
You're working primarily with single-subject
results
You're mainly interested in getting
region-of-interest data from very specific
structures that you've drawn with high resolution
on single subjects

53
How do you determine the size of the kernel?

Based on functional voxel size? Or brain
structure size?
A little of both, it seems.
The matched filter theorem, from the signal
processing field, tells us that if we're trying
to recover a signal (like an activation) in noisy
data (like FMRI), we can best do it by smoothing
our data with a kernel that's about the same size
as our activation.
Trouble is, though, most of us don't know how big
our activations are going to be before we run our
experiment
Even if you have a particular structure of
interest (say, the hippocampus), you may not get
activation over the whole region - only a part
A lot of people set FWHM to functional voxel size
2

54
(No Transcript)
55
Old slides beyond this point
56
Slice timing correction

Each functional volume that forms part of the 4D
time series is made up of slices
Each slice is acquired at a different point in
time relative to the start of the TR
e.g., slice 1 at 100 msec, slice 2 at 200 msec,
etc
For each slice, its the same time point relative
to the start of the TR in every volume
So, the interval between successive acquisitions
is constant for every voxel
But the actual time of acquisition is different
for every slice
The model of the time course assumes that within
each volume, every slice was acquired
simultaneously at the mid point of the TR
so, the model is likely to fit better for one
slice than all the others (bad)
To use slice timing correction, you will need to
tell FSL the order your slices were acquired in
interleaved is the most common, but ask your
scanner technician!
Adjustment is to the middle of the TR period

57
Slice timing correction

For each voxel, slice-timing correction examines
the time course and shifts it by a small amount
This requires interpolating between the time
points you actually sampled to infer a more
detailed version of the time course
The more detailed time course can have small
shifts applied to it that are slightly different
for each voxel, depending on the actual order the
slices were acquired in
This allows you to make the assumption in your
modelling that every voxel in each volume was
acquired simultaneously

58
Slice timing correction

The problem this tries to solve is more severe if
you have a longer TR (e.g. 4 seconds)
two adjacent slices in an interleaved sequence
could be sampled almost 2 seconds apart
But temporal interpolation also becomes dodgy
with longer TRs ?
For block designs (stimuli that are long relative
to the TR, e.g. TR 2 sec, stimulus lasts 16
sec) slice timing errors are not a significant
factor influencing the fitting of a model to the
data
For event related designs (brief stimuli
separated by variable pauses), slice timing
correction is important
People argue about whether to do slice timing
correction before or after motion correction
FSL does motion correction first
some people advise against any slice timing
correction

59
Temporal derivatives

In the FEAT practical you will add temporal
derivatives of the HRF convolved experimental
time courses to the design matrix
what is the purpose of this?

Each experimental time course is convolved with a
model of the HRF
this is to build the delay and blurring of the
blood flow response relative to the neural
response into the model
but the delay varies between brain areas and
between people

60
Temporal Derivatives

The green line is the first temporal derivative
of the blue line
its rate of change
the positive max of the derivative is earlier
than the normal HRF peak
the negative max of the derivative is later than
the normal HRF peak
If fitting the model results in a positive beta
weight on a derivative this implies that the HRF
peak is earlier in that voxel
A negative beta weight for the derivative implies
a later peak than typical

61
Temporal derivatives

The basic HRF shape (blue on the previous slide)
has some physiological underpinning (in visual
cortex)
But the use of the derivative to model faster /
slower responses is just a mathematical
convenience
The second temporal derivative (dispersion in
time) can be used to model haemodynamic responses
that are thinner or fatter in time than the
basic shape
The three functions together are sometimes called
the informed basis set by SPM users
the blue line is referred to as canonical, but
in fact it is only canonical for primary visual
cortex
The informed basis set can only model slight
departures from the canonical response shape
If you are interested in the prefrontal cortex of
the elderly youll need to use a more flexible
basis set to model the temporal dynamics of the
response
or use a block design where timing issues are
less severe

62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
Cluster size based thresholding

Intuitively, if a voxel with a Z statistic of
1.96 for a particular COPE is surrounded by other
voxels with very low Z values this looks
suspicious
unless you are looking for a very small brain
area
Consider a voxel with a Z statistic of 1.96 is
surrounded by many other voxels with similar Z
values, forming a large blob
Intuitively, for such a voxel the Z of 1.96 (p
0.05) is an overestimate of the probability of
the model fit to this voxel being a result of
random, stimulus unrelated, fluctuation in the
time course
The p value we want to calculate is the
probability of obtaining one or more clusters of
this size or larger under a suitable null
hypothesis
one or more gives us control over the multiple
comparisons problem by setting the family wise
error rate
p value will be low for big clusters
p value will be high for small clusters

67
Comparison of voxel (height based) thresholding
and cluster thresholding
?
space
Significant Voxels
No significant Voxels
? is the height threshold, e.g. 0.001 applied
voxelwise (will be Z about 3)
68
Comparison of voxel (height based) thresholding
and cluster thresholding
?
space
Cluster significant
Cluster not significant
k?
k?
K? is the probability of the image containing 1
or more blobs with k or more voxels (and you can
control is at 0.05) The cluster size, in voxels,
that corresponds to a particular value of K?
depends upon the initial value of height
threshold ? used to define the number of clusters
in the image and their size It is usual to set
height ? quite low when using cluster level
thresholding, but this arbitrary choice will
influence the outcome
69
Dependency of number of clusters on choice of
height threshold
The number and size of clusters also depends upon
the amount of smoothing that took place in
preprocessing
70
(No Transcript)
71

Nyquist frequency is important to know about
Half the sampling rate (e.g. TR 2 sec is 0.5 Hz,
so Nyquist is 0.25 hz, or 4 seconds)
No signal higher frequency than Nyquist can be
present in the data (important for experimental
design)
But such signal could appear as an aliasing
artefact at a lower frequency

72
(No Transcript)
73
Overview

Todays practical session will cover processing
of a single functional session from a single
participant using FEAT
FEAT is an umbrella program that brings together
various other FSL programs into a customisable
processing pipeline
for example, it makes use of BET and FLIRT, which
were programs covered in week 1
Definitions of single session and multi
session
We will also make use of an interactive
spreadsheet that demonstrates how the general
linear model (GLM) can be used to locate active
regions of the brain given your predictions about
the time course of activation
The lecture will provide theoretical background
for each processing step
There is no formal meeting in week 3 of the
course, but the room will be open for you to
complete worksheets from today and last week
at least one experienced FSL user will be here to
help

74
Overview of single session FMRI

The data is a 4D functional time series
Many thousands of spatial locations (voxels)
Each voxel has a time course defined by a single
intensity value per TR ( per volume acquired)
The task is to model the changes in image
intensity over time separately for each voxel
mass univariate approach
begin with a set of regressors (design matrix /
model)
regressors usually reflect the time course of
experimental conditions
find the best linear combination of regressors to
explain each voxel time course (basically,
multiple regression)
Before modelling the 4D time series a number of
preprocessing steps are applied to the data
remove unwanted sources of variation from the
time series
increase the signal to noise ratio

75
Voxel-wise single session modelling

After the data has been optimised by
preprocessing you search for voxels where the
time course of image intensity changes is
correlated with the experimental time course
activation
This is achieved using the General Linear Model
(GLM)
similar to multiple regression
The input to the GLM is the data, plus a set of
explanatory variables called the Design Matrix
sometimes EVs are included to model sources of
variation that are of no interest to the
experimenter
this is to reduce the residual (error) variance
The GLM is fitted independently for each voxel
timecourse
ignores the spatial structure in the brain

76
(No Transcript)
77
Regressor 0 (stimulus on)
Regressor 1 (stimulus off)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
The voxel time courses are standardised so that
beta weights are comparable between voxels
83
If there is structure in the residual time
courses something important has not been modelled
Experimental time course regressors no longer
square wave because convolved with HRF model
84
Autocorrelation FILM prewhitening

First, fit the GLM
Estimate the temporal autocorrelation structure
in the residuals
The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation
structure in the data
the filter is also applied to the design matrix
Refit the GLM
DOF n-1 will now correctly reflect what is really
free to vary in the timecourse
Prewhitening is selected on the stats tab in FEAT
it is computationally intensive, but with a
modern PC it is manageable and there are almost
no circumstances where you would turn this option
off

85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
Temporal filtering

Filtering in time and/or space is a
long-established method in any signal detection
process to help "clean up" your signal
The idea is if your signal and noise are present
at separable frequencies in the data, you can
attenuate the noise frequencies and thus increase
your signal to noise ratio

I could illustrate this by drawing a low
frequency sinusoid called noise on the board, or
with matlab. Then draw a high frequency one
called signal underneath. Draw a third where they
are added together, and point out that the two
sinusoids could be seperated mathematically, even
if you did not know apriori their amplitudes and
frequencies. In a second example I make noise and
signal have similar frequency and show that when
added together they are inseperable. This is
key point of FMRI data analysis and guiding
principle in experimental design.

Write a Comment

User Comments (0)