Title: BCS547 Neural Decoding
1BCS547Neural Decoding
2Nature of the problem
In response to a stimulus with unknown
orientation q, you observe a pattern of activity
A. What can you say about q given A?
Bayesian approach recover P(qA) (the posterior
distribution)
3Population Code
Tuning Curves
Pattern of activity (A)
4(No Transcript)
5 6(No Transcript)
7Estimation theory
- A common measure of decoding performance is the
mean square error between the estimate and the
true value - This error can be decomposed as
8Efficient Estimators
- The smallest achievable variance for an unbiased
estimator is known as the Cramer-Rao bound, sCR2. - An efficient estimator is such that
-
- In general
-
9Fisher Information
Fisher information is defined as and
it is equal to
where P(A q) is the distribution of the
neuronal noise.
10Fisher Information
11Fisher Information
- For one neuron with Poisson noise
- For n independent neurons
12Fisher Information and Tuning Curves
- Fisher information is maximum where the slope is
maximum - This is consistent with adaptation experiments
13Fisher Information
- In 1D, Fisher information decreases with the
width of the tuning curves - In 2D, Fisher information does not depend on the
width of the tuning curve - In 3D and above, Fisher information increases
with the width of the tuning curves. - ATTENTION this is true for independent gaussian
noise.
14Ideal observer
- The discrimination threshold of an ideal
observer, dq, is proportional to the variance of
the Cramer-Rao Bound. -
- In other words, an efficient estimator is an
ideal observer.
15- An ideal observer is an observer that can recover
all the Fisher information in the activity (easy
link between Fisher information and behavioral
performance) - If all distributions are gaussian, Fisher
information is the same as Shannon information.
16(No Transcript)
17Voting Methods
18Voting Methods
19Voting Methods
- Optimal Linear Estimator
-
- Center of Mass
-
-
-
20Center of Mass/Population Vector
- The center of mass is optimal (unbiased and
efficient) iff The tuning curves are gaussian
with a zero baseline, uniformly distributed and
the noise follows a Poisson distribution - In general, the center of mass has a large bias
and a large variance
21Voting Methods
- Optimal Linear Estimator
-
- Center of Mass
-
-
- Population Vector
22Population Vector
23Population Vector
Typically, Population vector is not the optimal
linear estimator.
24Population Vector
- Population vector is optimal iff The tuning
curves are cosine, uniformly distributed and the
noise follows a normal distribution with fixed
variance - In most cases, the population vector is biased
and has a large variance - The variance of the population vector estimate
does not reflect Fisher information
25Population Vector
Population vector
Fisher Information
Population vector should NEVER be used to
estimate information content!!!!
26Population Vector
27Maximum Likelihood
28Maximum Likelihood
- The estimate is the value of q that maximizes
the likelihood P(Aq). Therefore, we seek such
that
29Maximum Likelihood
- If the noise is gaussian and independent
- Therefore
- and the estimate is given by
30Maximum Likelihood
31Gradient descent for ML
- To minimize the likelihood function with respect
to q, one can use a gradient descent technique in
which q is updated according to
32Gaussian noise with variance proportional to the
mean
- If the noise is gaussian with variance
proportional to the mean, the distance being
minimized changes to
33Poisson noise
If the noise is Poisson then And
34ML and template matching
- Maximum likelihood is a template matching
procedure BUT the metric used is not always the
Euclidean distance, it depends on the noise
distribution.
35Bayesian approach
- We want to recover P(qA). Using Bayes theorem,
we have
36Bayesian approach
What is the likelihood of q, P(A q)? It is the
distribution of the noise It is the same
distribution we used for maximum likelihood.
37Bayesian approach
- The prior P(q) correspond to any knowledge we may
have about q before we get to see any activity. - Ex Zhang et al.
38Bayesian approach
Once we have P(qA), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about q. For
instance, we can estimate q as being the value
that maximizes P(qA). This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
39Using the prior Zhang et al
- For a time varying variable, one can use the
distribution over the previous estimate as a
prior for the next one.
40Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data Alternative estimate P(qA)
directly using a nonlinear estimate.
41Bayesian approachlogistic regression
Example Decoding finger movements in M1. On each
trial, we observe 100 cells and we want to know
which one of the 5 fingers is being moved.
P(F5A)
1 0
5 categories
1
2
3
4
5
1
2
3
100
100 input units
A
42Bayesian approachmultinomial distributions
Example Decoding finger movements in M1. Each
finger can take 3 mutually exclusive states no
movement, flexion, extension.
43Decoding time varying signals
s(t)
r(t)
44Decoding time varying signals
45Decoding time varying signals
46Decoding time varying signals
s(t)
r(t)
47(No Transcript)