Neural Implementations of Bayesian Inference - PowerPoint PPT Presentation

1 / 80

About This Presentation

Title:

Neural Implementations of Bayesian Inference

Description:

Bayesian inference with spikes: Multisensory integration ... Unimodal Gaussian probability distributions over the stimulus. Is this more general? ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 81

Provided by: Alexandr201

Category:

more less

Transcript and Presenter's Notes

Title: Neural Implementations of Bayesian Inference

1
Neural Implementations of Bayesian Inference
Alexandre Pouget Department of Brain and
Cognitive Sciences University of Rochester
2
Outline

Encoding probability distributions with spikes
Bayesian inference with spikes Multisensory
integration
Bayesian inference with spikes Decision making
Alternative schemes
Maximum likelihood estimation

3
Visuo-Tactile Integration
(Ernst and Banks, Nature, 2002)
4
Visuo-Tactile Integration
Bimodal p(sVision,Touch)
ap(sVision) p(sTouch)
Probability
p(sVision)
S (Width)
5
Main Issues

How do cortical neurons represent probability
distributions?
How do they take products of distributions?
How do we make optimal decisions? How do neurons
collapse distributions onto maximum likelihood
estimates?

6
Main Issues

And how do they do so given the high level of
variability in neuronal responses in cortex?

7
Poisson Variability in Cortex
The variability is Poisson-like p(rs) (rspike
counts) is bell shaped with variance proportional
to the mean (Fano factors within 0.3-1.8, Fano
factor for a Poisson process is 1)
Trial 1
Trial 2
Trial 3
Trial 4
8
Probabilistic population code

As an example, we consider a population of
neurons with Gaussian tuning curves and
independent Poisson variability.

rr1,r2,,rn
100
100
80
80
60
Activity
60
Activity (Spike count)
40
40
20
20
0
0
-45
0
45
-45
0
45
Stimulus
Preferred stimulus
Population pattern of activity on a single trial
9
Population codes

Standard approach estimating

100
Population vector
r
80
60
Activity (spike count)
40
20
0
-45
0
45
Preferred stimulus
Underlying assumption population codes encode
single values.
10
Probabilistic population codes

Alternative compute a posterior distribution,
p(sr) from (Foldiak, 1993 Sanger 1996).

100
r
80
60
Activity (spike count)
40
20
0
-45
0
45
Preferred stimulus
Variability in neural responses for a constant
stimulus Poisson-like
11
Probabilistic population codes

For independent Poisson noise product of experts

12
For independent Poisson noise, There
fore, the gain encodes the certainty associated
with the encoded variable.
13
Gain and variance

For independent Poisson noise, we have

This is average of the width of the posterior
14
Experimental Evidence

Contrast

Anderson et al, Nature 2000
15
Experimental Evidence

Contrast
Motion Coherency
Retinal Eccentricity

16
Outline

Bayesian inference multisensory integration

17
Inferences with probabilistic population codes
100
g1
80
Vision
C1
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
C2
Activity
60
g2
Touch
40
20
0
-45
0
45
Preferred S
18
100
g1
80
C1
Activity
60
40
20
100
gg1g2
0
-45
0
45
80
Preferred S
Activity
60

40
20
100
0
80
-45
0
45
C2
Activity
Preferred S
60
g2
40
20
0
-45
0
45
Preferred S
19
Visuo-Tactile Integration
Bimodal p(sVision,Touch)
ap(sVision) p(sTouch)
Probability
p(sVision)
S (Width)
20
Normalization

Divisive normalization can be used to keep
neurons within their firing range.

21
Assumptions

Neural noise independent Poisson noise
Gaussian tuning curves
Unimodal Gaussian probability distributions over
the stimulus
Is this more general?

22
Bayesian decoder
100
r1
80
C1
Activity
60
40
20
r1r2
100
0
-45
0
45
80
Preferred S
Activity
60

40
20
100
r2
0
80
-45
0
45
C2
Activity
Preferred S
60
40
Bayesian decoder
20
0
-45
0
45
Preferred S
Bayesian decoder
Bayesian inference
23
Variability requirements

Exponential distributions

Covariance matrix of r
Derivative of the tuning curves
24
Kernel h(s)
Covariance matrix of r
Derivative of the tuning curves
Covariance between r and s
Local optimal linear estimator!
25
Covariance requirements

This family includes any distribution in which
the covariance matrix is proportional to the
mean, regardless of the form of the correlations.
Any exponential distribution with a fixed Fano
factor works.

26
Tuning curve requirements

The tuning curve f(s) can take any shape.
However, h(s) has to be the same in all
populations. What if its not the same?

27
Tuning curves Identical Gaussians
Activity
Preferred S
Cue 1
Cue 2
28
Tuning Curves Gaussians with different widths
Cue 1
Cue 2
29
Tuning curves Gaussians vs Sigmoids
40
30
Activity
20
10
0
-50
0
50
Preferred S
Cue 1
Cue 2
30
Tuning curve requirements

Let say r1 has gaussian tuning curves and r2 uses
sigmoidal tuning curves. Then, the optimal
combination is a linear combination.
The matrix A exists if the tuning curves are
basis sets.

31
Distribution over s

p(rs) does not have to be a normal distribution
over s.

32
Prior Distributions

Prior are easily incorporated
Prediction baseline activity in cortex (e.g.
before the start of a trial) should encode the
prior distribution
There is evidence for this idea in LIP (Glimcher
and Platt) and the superior Colliculus (Basso and
Wurtz).

33
Summary

Linear combinations of PPCs are equivalent to
optimal Bayesian inference when the variability
follows an exponential distribution. This works
for
all covariance matrices that are proportional to
the mean (Fixed Fano factor)
any set of tuning curves that forms a basis set
any probability distribution over s
any prior distribution over s

34
Integrate and fire neurons

Can we get a similar result with realistic
networks of spiking neurons, such as integrate
and fire neurons?

35
Integrate and fire neurons

Output layer
1200 conductance-based integrate-and-fire
neurons, 1000 excitatory, 200 inhibitory
Lateral connections
High Fano factors (0.3 to 1)
Correlated activity
Linear in rates

100
100
g1
Input near-Poisson correlated spike trains with
different gains and slightly different means
80
80
Activity
Activity
60
60
g2
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
36
Test cue 1 alone
r1
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
Cue 1
37
Test cue 2 alone
r2
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
Cue 2
38
Test cue1 and cue2 together
r3
100
80
Activity
60
40
20
0
-45
0
45
Preferred S
100
100
80
80
Activity
Activity
60
60
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
39
Compare the distributions
r3
100
80
Activity
60
40
20
How does p(r3s) compare to p(r1s)p(r2s)?
0
-45
0
45
Preferred S
100
100
80
80
Activity
Activity
60
60
40
40
20
20
0
0
-45
0
45
-45
0
45
Preferred S
Preferred S
Cue 1
Cue 2
40
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Preferred S
Cue 2
Activity
Preferred S
Identical tuning curves
41
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Mean
Variance
96
3
95
2.5
94
Preferred S
2
93
Variance of p(r3s)
Mean of p(r3s)
92
1.5
91
0.5
90
0
89
0
0.5
1
1.5
2
2.5
3
89
90
91
92
93
94
95
96
Cue 2
Activity
Mean of p(r1s)p(r2s)
Variance of p(r1s)p(r2s)
Preferred S
Different tuning curves and different correlations
42
p(r3s) versus p(r1s)p(r2s)
Cue 1
Activity
Preferred S
Cue 2
Activity
Preferred S
43
Experimental prediction

Multisensory neurons should be linear on average

44
Experimental prediction

The main results in the literature are nonlinear
combinations (superadditivity)!

Wallace, Meredith, and Stein, J Neurophys 1998
45
Experimental prediction

The main results in the literature are nonlinear
combinations (superadditivity)!
In fact, nonlinearity is the criteria used to
define multisensory areas in fMRI
Are we already proven wrong?

46
Experimental prediction
Perrault, Vaughan, Stein, and Wallace. J
Neurophys 2005
47
Inference over time

Can we generalize this approach to inference over
time, and more generally time varying signals?

48
Outline

Bayesian inference decision making

49
Binary Decision Making
Shadlen et al.
50
Binary Decision Making

The Bayesian strategy involves computing the
posterior distribution given all activity
patterns from MT up to the current time,
Therefore, all we need to do is add the activity
patterns over time.
This predicts that decision neurons act like
integrators

51
Bayesian decoder
50
40
Activity
30
20
10
0
-45
0
45
Preferred S
52
LIP
Roitman Shadlen, 2002 J. Neurosci.
53
Outline

Alternative schemes

54
Alternative schemes
Log likelihood ratio (Shadlen et al Deneve)
55
Log Likelihood

Race models and Bayesian approach

Temporal sum
over
56
Differences between Log odds and PPCs

With PPCs, LIP neurons do not compute the
activity difference between MT neurons with
opposite direction preferences
PPCs and log odds turn products into sums but for
log odds, sums are products regardless of the
noise distribution. Not so for PPCs
At the end of the integration, LIP encodes the
posterior distribution over direction, i.e, LIP
knows how much it can trust its choice

57
Alternative schemes
Log likelihood ratio (Shadlen et al Deneve)

Log probability
(Barlow Rao Jazayeri, Movshon)

Probability
(Anastasio et al Simoncelli Hoyer and
Hyvarynen Rao Koechlin et al)

Convolution codes
(Anderson Zemel, Dayan, Pouget)

58
Alternative schemes
Si
Log likelihood ratio (Shadlen et al Deneve)
100
80
60

Log probability
(Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus

Probability
(Anastasio et al Simoncelli Hoyer and
Hyvarynen Rao Koechlin et al)

Convolution codes
(Anderson Zemel, Dayan, Pouget)

59
Alternative schemes
Si?
Si?
Log likelihood ratio (Shadlen et al Deneve)
100
80
60

Log probability
(Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus

Probability
(Anastasio et al Simoncelli Hoyer and
Hyvarynen Rao Koechlin et al)

Convolution codes
(Anderson Zemel, Dayan, Pouget)

60
Alternative schemes
Si?
Log likelihood ratio (Shadlen et al Deneve)
100
80
60

Log probability
(Barlow Rao Jazayeri, Movshon)

Activity
40
20
0
-90
0
90
Stimulus

Probability
(Anastasio et al Simoncelli Hoyer and
Hyvarynen Rao Koechlin et al)

Convolution codes
(Anderson Zemel, Dayan, Pouget)

61
Alternative schemes
The convolution codes and the log likelihood fail
to account for contrast invariance
0.04
Log P(sr)
0.02
0
-45
0
45
Orientation (deg)
Contrast invariance
Prediction for Convolution codes
Prediction for Log Likelihood
62
Outline

Maximum likelihood estimation

63
Decision Making
Superior Colliculus
LIP
64
Maximum Likelihood
Activity
0
Preferred Direction (deg)
65
Neural implementation

Attractor networks

66
Optimal decision making
100
80
60
40
20
0
-100
0
100
Superior Colliculus
Preferred saccade direction
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
67
Nonlinear Networks

Networks in which the activity at time t1 is a
nonlinear function of the activity at the
previous time step.

68
Line Attractor Networks

Attractor network with population code
Periodic variable
Translation invariant weights

69
Line Attractor Networks

Fixed point when

70
Line Attractor Networks

Computing the weights

Desired profile
Desired profile over u
71
Line Attractor Networks

The problem with the previous approach is that
the weights tend to oscillate. Instead, we
minimize
The solution is

72
Weight Pattern
5
4
Amplitude
2
0
-2
0
Difference in preferred orientation
73
Optimal decision making
Superior Colliculus
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
74
Optimal decision making
A maximum likelihood estimate minimizes this
variance
100
80
60
40
20
0
-100
0
100
Superior Colliculus
Preferred saccade direction
LIP
100
80
60
40
20
0
-100
0
100
Preferred saccade direction
75
Is the network an ML estimator?
Variances
Above Maximum Likelihood
Pop Vector
Network
76
Optimality constaint
Eigenvector with eigenvalue equal to 0
Covariance matrix of r
Derivative of the tuning curves
Covariance between r and s
Local optimal linear estimator!
This network is effectively projecting its input
on the LOLE
77
General Results

Line attractor networks (stable smooth hills) are
equivalent to maximum likelihood estimators.
This result holds regardless of the exact form
of the nonlinear activation function.

78
Performance Over Time
6
5
4
3
Standard Deviation (deg)
2
1
0
-1 0 5 10
15
Time ( of iterations)
79
Optimal decision making
100
80
60
40
20
Sensorimotor transformation lecture
0
-100
0
100
f(S)
S
100
80
60
40
20
0
-100
0
100
S
80
Kalman and Particle Filters

Write a Comment

User Comments (0)