BCS547 Neural Decoding - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

BCS547 Neural Decoding

Description:

... variance for an unbiased estimator is known as the Cramer-Rao bound, sCR2. ... an ideal observer, dq, is proportional to the variance of the Cramer-Rao Bound. ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 48

Provided by: Alexandr204

Category:

more less

Transcript and Presenter's Notes

Title: BCS547 Neural Decoding

1
BCS547Neural Decoding
2
Nature of the problem
In response to a stimulus with unknown
orientation q, you observe a pattern of activity
A. What can you say about q given A?
Bayesian approach recover P(qA) (the posterior
distribution)
3
Population Code
Tuning Curves
Pattern of activity (A)
4
(No Transcript)
5

6
(No Transcript)
7
Estimation theory

A common measure of decoding performance is the
mean square error between the estimate and the
true value
This error can be decomposed as

8
Efficient Estimators

The smallest achievable variance for an unbiased
estimator is known as the Cramer-Rao bound, sCR2.
An efficient estimator is such that
In general

9
Fisher Information
Fisher information is defined as and
it is equal to
where P(A q) is the distribution of the
neuronal noise.
10
Fisher Information
11
Fisher Information

For one neuron with Poisson noise
For n independent neurons

12
Fisher Information and Tuning Curves

Fisher information is maximum where the slope is
maximum
This is consistent with adaptation experiments

13
Fisher Information

In 1D, Fisher information decreases with the
width of the tuning curves
In 2D, Fisher information does not depend on the
width of the tuning curve
In 3D and above, Fisher information increases
with the width of the tuning curves.
ATTENTION this is true for independent gaussian
noise.

14
Ideal observer

The discrimination threshold of an ideal
observer, dq, is proportional to the variance of
the Cramer-Rao Bound.
In other words, an efficient estimator is an
ideal observer.

An ideal observer is an observer that can recover
all the Fisher information in the activity (easy
link between Fisher information and behavioral
performance)
If all distributions are gaussian, Fisher
information is the same as Shannon information.

16
(No Transcript)
17
Voting Methods

Optimal Linear Estimator

18
Voting Methods

Optimal Linear Estimator

19
Voting Methods

Optimal Linear Estimator
Center of Mass

20
Center of Mass/Population Vector

The center of mass is optimal (unbiased and
efficient) iff The tuning curves are gaussian
with a zero baseline, uniformly distributed and
the noise follows a Poisson distribution
In general, the center of mass has a large bias
and a large variance

21
Voting Methods

Optimal Linear Estimator
Center of Mass
Population Vector

22
Population Vector
23
Population Vector
Typically, Population vector is not the optimal
linear estimator.
24
Population Vector

Population vector is optimal iff The tuning
curves are cosine, uniformly distributed and the
noise follows a normal distribution with fixed
variance
In most cases, the population vector is biased
and has a large variance
The variance of the population vector estimate
does not reflect Fisher information

25
Population Vector
Population vector
Fisher Information
Population vector should NEVER be used to
estimate information content!!!!
26
Population Vector
27
Maximum Likelihood
28
Maximum Likelihood

The estimate is the value of q that maximizes
the likelihood P(Aq). Therefore, we seek such
that

29
Maximum Likelihood

If the noise is gaussian and independent
Therefore
and the estimate is given by

30
Maximum Likelihood
31
Gradient descent for ML

To minimize the likelihood function with respect
to q, one can use a gradient descent technique in
which q is updated according to

32
Gaussian noise with variance proportional to the
mean

If the noise is gaussian with variance
proportional to the mean, the distance being
minimized changes to

33
Poisson noise
If the noise is Poisson then And

34
ML and template matching

Maximum likelihood is a template matching
procedure BUT the metric used is not always the
Euclidean distance, it depends on the noise
distribution.

35
Bayesian approach

We want to recover P(qA). Using Bayes theorem,
we have

36
Bayesian approach
What is the likelihood of q, P(A q)? It is the
distribution of the noise It is the same
distribution we used for maximum likelihood.
37
Bayesian approach

The prior P(q) correspond to any knowledge we may
have about q before we get to see any activity.
Ex Zhang et al.

38
Bayesian approach
Once we have P(qA), we can proceed in two
different ways. We can keep this distribution for
Bayesian inferences (as we would do in a Bayesian
network) or we can make a decision about q. For
instance, we can estimate q as being the value
that maximizes P(qA). This is known as the
maximum a posteriori estimate (MAP). For flat
prior, ML and MAP are equivalent.
39
Using the prior Zhang et al

For a time varying variable, one can use the
distribution over the previous estimate as a
prior for the next one.

40
Bayesian approach
Limitations the Bayesian approach and ML require
a lot of data Alternative estimate P(qA)
directly using a nonlinear estimate.
41
Bayesian approachlogistic regression
Example Decoding finger movements in M1. On each
trial, we observe 100 cells and we want to know
which one of the 5 fingers is being moved.
P(F5A)
1 0
5 categories
1
2
3
4
5

1
2
3
100
100 input units
A
42
Bayesian approachmultinomial distributions
Example Decoding finger movements in M1. Each
finger can take 3 mutually exclusive states no
movement, flexion, extension.
43
Decoding time varying signals
s(t)
r(t)
44
Decoding time varying signals
45
Decoding time varying signals
46
Decoding time varying signals
s(t)
r(t)
47
(No Transcript)

Write a Comment

User Comments (0)