Its raining outside - PowerPoint PPT Presentation

About This Presentation

Title:

Its raining outside

Description:

Use an idea suggested by Jaeger (2001) and Maass. et al. (2002) ... H. Jaeger, German National Research Center for Information Technology, GMD Report 148 (2001) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 18

Provided by: gatsby

Category:

more less

Transcript and Presenter's Notes

Title: Its raining outside

1
Typical exchanges in London
Its raining outside want to go to the
pub? Its dry outside want to go to the pub?
Sure Ill grab the umbrella. What, are you
insane? Ill grab the umbrella.
2

The present state of networks depend on past
input.
For many task, past means 10s of seconds.
Goal understand how a single network can do
this.
Use an idea suggested by Jaeger (2001) and Maass
et al. (2002).

3
The idea
Output is a linear combination of activity many
linear combinations are possible in the
same network.
A particular input and only that input
strongly activates an output unit.
Time varying input drives a randomly
connected recurrent network.
output
time
Can randomly connected networks like this one do
a good job classifying input? In other words
can randomly connected networks tell that two
different inputs really are different?
4
Answer can be visualized by looking at
trajectories in activity space
input 1
input 2
input 3
r3
r1
time
r2
Activity space (N-dimensional)
There is a subtlety involving time
5
distinguishable inputs
inputs are the same starting here
r3
indistinguishable inputs
r1
time
r2
t
How big can we make t before the inputs are
indistinguishable?
6
input
t
-T
0
Three main regimes
converging
neutral
diverging
t 0
t 0
t 0
t t
t t
t t
t -T
t -T
t -T
Can we build a network that operates here?
7
Reduced model xi(t1) sign?j wij
xj(t) ui(t) Question what happens to
nearby trajectories? Bertschinger and
Natschläger (2004) low connectivity. Our
network high connectivity. Analysis is
virtually identical.
temporally uncorrelated input
random matrix mean 0 variance s2/N number of
neurons N
8
Analysis

Two trajectories
x1,i(t) and x2,i(t) (different initial
conditions)
Normalized Hamming distance
d(t) (1/N) ?i x1,i(t)-x2,i(t)/2
How does d(t) evolve in time? For small d,
d(t1) d(t)1/2
This leads to very rapid growth of small
separations
d(t) d(0)1/2 gt d(t) 1 when t log
log 1/d(0)

d(t1)
d(t)
d(t2)
.
.
.
.
.
.
simulations
1
d(t1)
0
0
1
d(t)
t
9
Derivation
xi(t1) signhi(t) u
?j wij xj(t)

What happens if one neuron (neuron k) is
different between
the two trajectories?
x1k -x2,k
h1,i h2,i 2wik
h2,i Order(s/N1/2)
gt N O(s/N1/2)/s O(N1/2) neurons
are different on the next time step.
In other words,
d(0) 1/N
d(1) N1/2/N N-1/2 d(0)1/2

P(h)
s
s/N1/2
h
0
-u
threshold
10
Real neurons
spike generation surface small differences in
initial conditions are strongly amplified (gt
chaos). van Vreeswijk and Sompolinsky
(1996) Banerjee (2001)
V
w
m
Operation in the neutral regime (on the edge of
chaos) is not an option in realistic networks.
11
Implications
t -1
t 0
t t
input
t
-T
0

Trajectories evolve onto chaotic attractors
(blobs).
Different initial conditions will lead to
different points on the
attractor.
What is the typical distance between points on
an attractor?
How does that compare the typical distance
between attractors?

12
Typical distance between points on an attractor
d.
f(d)
1
stable equilibrium, d near attractor,
d(t1)-d f'(d) (d(t)-d)
gt d(t)-d expt log(f'(d))
d(t1)
0
0
1
d(t)
Typical distance between attractors d0 at time
0 d at long times.
t -1
t 0
t t
Distance between attractors is d0 gt d
After a long time, the distance between
attractors decays to d. At that point, inputs
are no longer distinguishable (with caveat).
13
All points on the attractor are a distance
dO(1/N1/2) apart. Distance between attractors
is d(d(0)-d)expt log(f'(d))O(1/N1/2). Stat
e of the network no longer provides reliable
information about the input when expt
log(f'(d)) 1/N1/2, or t
Linear readout
predictions
distance between attractors
distance within attractors
simulations
1
indistinguishable when O(1/N1/2)
d0
fraction correct
n1000 4000 16000
d
0
t
-T
0
0
15
t
input different
same
14
Conclusions

Expanding on a very simple model proposed by
Bertschinger and Natschläger (2004), we found
that
randomly connected networks cannot exhibit a
temporal memory that extends much beyond the
time constants of the individual neurons.
Scaling with the size of the network is not
favorable
memory scales as log N.
Our arguments were based on the observation that
high connectivity, recurrent networks are
chaotic
(Banerjee, 2001), and so our conclusions
should be
very general.

15
Technical details
Mean field limit d(t) probsign?j wij
x1,j(t) ui(t) ? sign?j
wij x2,j(t) ui(t) Define hk,i ?j wij
xk,j(t), k1, 2 hk,i is a zero mean
Gaussian random variable. Covariance
matrix Rkl lthk hlgt (1/N)?i?jj' wij xk,j(t)
wij' xl,j'(t) (s2/N) ?j
xk,j(t)xl,j(t) s21 2d(t) (1-dkl)
s2djj'
16
More succinctly R s2 Can compute d(t1)
as a function of d(t) by doing Gaussian
integrals The d1/2 scaling is generic
it comes from the fact that the Gaussian ellipse
has width d1/2 in the narrow direction.
(-u, -u)
integral is over these regions
d1/2
17
This scaling also holds for more realistic
reduced models with excitatory and inhibitory
cells and synaptic and cellular time
constants. xi(t1) sign?j wxx,ij zxj(t) - ?j
wxy,ij zyj(t) ui(t) (1-a)xi(t) yi(t1)
sign?j wyx,ij zxj(t) - ?j wyy,ij zyj(t) ui(t)
(1-ß)yi(t) zxi(t1) xi(t) (1-?)
zxi(t) zyi(t1) xi(t) (1-?) zyi(t)
leaky integrator
synapses with temporal dynamics
References H. Jaeger, German National Research
Center for Information Technology, GMD Report 148
(2001). W. Maass, T. Natschläger, and H. Markram,
Neural Computation 142531-2560
(2004). Bertschinger and Natschläger, Neural
Computation 161413-1436 (2004). C. van Vreeswijk
and H. Sompolinsky, Science 2741724-1726
(1996). A. Banerjee, Neural Computation
13161-193, 195-225 (2001).

Write a Comment

User Comments (0)