New developments for recurrent neural systems - PowerPoint PPT Presentation

1 / 65

About This Presentation

Title:

New developments for recurrent neural systems

Description:

[Hawkins, Boden, The Applicability of Recurrent Neural Networks for Biological ... [Xu, Hu, Wunsch, Inference of genetic regulatory networks with recurrent neural ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 66

Provided by: sebastia3

Category:

more less

Transcript and Presenter's Notes

Title: New developments for recurrent neural systems

1
New developments for recurrent neural systems
Barbara Hammer Institute of Informatics TU
Clausthal hammer_at_in.tu-clausthal.de
2

Recurrent neural networks
Definition
Architectural bias
Recurrent self-organizing maps
Definition
Capacity
Contextual models
Background
Approximation capability

3
Recurrent neural networks - Definition
4
Recurrent neural networks

Feedforward processing
Recurrent processing

Moo
Moo
Moo
Moo
5
Recurrent neural networks

Application areas
Hawkins, Boden, The Applicability of Recurrent
Neural Networks for Biological Sequence Analysis,
IEEE/ACM TCBB, 2005
Xu, Hu, Wunsch, Inference of genetic regulatory
networks with recurrent neural network models,
IEMBS 2004
Pollastri, Baldi, Prediction of contact maps by
GIOHMMs and recurrent neural networks using
lateral propagation from all four cardinal
corners, Bioinformatics, 2002
Bonet et al., Predicting Human Immunodeficiency
Virus (HIV) Drug Resistance using Recurrent
Neural Networks. Proceedings of the 10th
International Electronic Conference on Synthetic
Organic Chemistry, 2006
Reczko et al., Finding signal peptides in human
protein sequences using recurrent neural
networks, WABI 2002
Chen, Chaudhari, Bidirectional segmented-memory
recurrent neural network for protein secondary
structure prediction, Soft Computing - A Fusion
of Foundations, Methodologies and Applications,
2006
Bates et al., Detection of seizure foci by
recurrent neural networks, Engineering in
Medicine and Biology Society, 2000. Proceedings
of the 22nd Annual International Conference of
the IEEE
Güler, Übeyli, Güler, Recurrent neural networks
employing Lyapunov exponents for EEG signals
classification, Expert Systems with Applications,
2005
Petrosian, Prokhorov, Schiffer, Early
recognition of Alzheimer's disease in EEG using
recurrent neural network and wavelet transform,
Proc. SPIE, 2000

6
Recurrent neural networks
7
Recurrent neural networks

Feedforward neural network
neurons connected in an acyclic graph
every neuron computes x ? sgd(wtx-b)
network computes function on vector spaces
Recurrent neural network
feedforward network enriched with recurrent
connections which set a temporal context
recurrent connections use the output of the
previous time step
network computes function on time series

8
Recurrent neural networks
x(t)
o(t)
z(t) f(x(t),z(t-1)) o(t) g(z(t))
z(t)
z(t-1)
9
Recurrent neural networks

well established, training by minimizing the
quadratic error ? backpropagation through time,
real time recurrent learning, Kalman filtering,
long term dependencies cannot be captured due to
vanishing gradients

derivative
The derivative vanishes if propagated through
several time steps!
10
Recurrent neural networks

Recent trend

fixed recurrent part based on universal pinciples
readout trained by means of simple gradient
mechansim
11
Recurrent neural networks

Fractal prediction machines

input alphabet T C G A context is two-dimensional
A C A
T
C

resulting points constitute a fractal
Markovian property emphasis on one part of the
sequence

G
A
12
Recurrent neural networks

Fractal predistion machine demo
daily volatility change of the Dow Jones
Industrial Average 2/1918-4/1997, predict the
direction of volatility move for the next day

Tino,Dorffner, Predicting the future from
fractal representations of the past, Machine
Learning, 2001
13
Recurrent neural networks

Echo state networks

very high dimension, random connections

echo state property
in the limit, the context does not depend on the
initialization
e.g. spectral radius smaller than one
activation initialized by long enough recurrence

14
Recurrent neural networks

Echo state networks demo

Mackey-Glass time series
Laser data
Lorentz attractor
Jaeger/Haas, Harnessing nonlinearity predicting
chaotic systems and saving energy in wireless
communication, Science, 2004
15
Recurrent neural networks Architectural bias
16
Recurrent neural networks

Approximation completeness RNNs can aproximate
every recursive system with continuous transition
and finite time horizon
connection to recursive symbolic computation
mechanisms?

dynamics of a symbolic formalism?
17
Recurrent neural networks

Symbolic mechanisms
finite memory models look only at a finite time
window
f(x1,x2,) f(x1,,xL) for fixed L
finite state automata computation based on a
finite internal state
pushdown automata computation based on an
interior stack
context sensitive language - computations in
linear space
Turing machines
beyond (? computation with real numbers!)

18
Recurrent neural networks
RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with Gaussian noise finite memory models
Maass/Sontag
19
Recurrent neural networks

Motivation architectural bias

easy divide this form into two parts with the
same size and form
difficult divide this form into four parts with
the same size and form
extremely difficult divide this form into six
parts with the same size and form
20
Recurrent neural networks

RNNs are initialized with small weights what is
the bias?
It holds Hammer/Tino
small weight RNNs ? FMMs For every RNN with
small weights one can find a finite memory length
L such that the RNN can be approximated by a FMM
with memory length L.
FMMs ? small weight RNNs For every FMM, an RNN
with randomly initialized small weights exists
which approximates the FMM.
small weight RNNs have excellent generalization
ability (distribution independent UCED property)
for RNNs with small weights, the empirical error
represents the real error independent of the
underlying distribution

21
Recurrent neural networks
RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with Gaussian noise finite memory models
Maass/Sontag
22
Recurrent Self-organizing maps Definition
23
Recurrent self-organizing maps

Supervised learning
Unsupervised learning

24
Recurrent self-organizing maps
Self-organizing map (SOM) Kohonen popular
unsupervised self-organizing neural method for
data mining and visualization
network given by prototypes wj ? Rn in a lattice
j(j1,j2)
mapping Rn?x ? position j in the lattice for
which x-wj minimal
Hebbian learning based on examples xi and
neighborhood cooperation
x
j x - wj minimal
i.e. choose xi and adapt all wj wj wj
?nhd(j,j0)(xi-wj)
?
25
Recurrent self-organizing maps

Neural gas Martinetz no prior lattice,
adaptation according to the rank
wj wj ? rk(wj,xi)(xi-wj)
HSOM Ritter hyperbolic lattice structure
wj wj ?nhdH(j,j0)(xi-wj)
but for real vectors of fixed size only!
Time series and recurrence?

26
Recurrent self-organizing maps

Temporal Kohonen map Chappell/Taylor
Recurrent SOM Varsta/Heikkonen

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
27
Recurrent self-organizing maps

TKM/RSOM compute a leaky average of time series
It is not clear how they can differentiate
various contexts
no explicit context!

is the same as
28
Recurrent self-organizing maps

Merge SOM Hammer/Strickert, 2003 explicit
notion of context

(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
29
Recurrent self-organizing maps

Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
30
Recurrent self-organizing maps

Training
MSOM wj wj ?nhd(j,j0)(xt-wj)
cj cj
?nhd(j,j0)(Ct-cj)
MNG wj wj ?rk(wj,xt)(xt-wj)
cj wj
?rk(wj,xt)(Ct-cj)
MHSOM wj wj ?nhdH(j,j0)(xt-wj)
cj cj
?nhdH(j,j0)(Ct-cj)

31
Recurrent self-organizing maps

Experiment
speaker identification, Japanese vowel ae
9 speakers, 30 articulations per speaker in
training set
separate test set
http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
paneseVowels.html

time
12-dim. cepstrum
32
Merge SOM

MNG with posterior labeling
? 0.5, a 0.99?0.63, ? 0.3
150 neurons
0 training error
2.7 test error
1000 neurons
0 training error
1.6 test error
rule based 5.9, HMM 3.8 Kudo et al.

33
Merge SOM

Experiment
Reber grammar
3106 input vectors for training
106 vectors for testing
MNG, 617 neurons, ? 0.5, a 1?0.57
evaluation by the test data
attach the longest unique sequence to each winner
428 distinct words
average length 8.902
reconstruction from the map
backtracking of the best matching predecessor
triplets only valid Reber words
unlimited average 13.78
TVPXTTVVEBTSXXTVPSEBPVPXTVVEBPVVEB

BTXXVPXVPXVPSE
BTXXVPXVPSE
(W,C)
34
Merge SOM

Experiment
classification of donor sites for C.elegans
5 settings with 10000 training data, 10000 test
data, 50 nucleotides TCGA embedded in 3 dim, 38
donor Sonnenburg, Rätsch et al.
MNG with posterior labeling
512 neurons, ?0.25, ?0.075, a 0.999 ?
0.4,0.7
14.060.66 training error, 14.260.39 test
error
sparse representation 512 6 dim

35
Recurrent self-organizing maps Capacity
36
Recurrent self-organizing maps

Theorem context representation
Assume
a SOM with merge context is given (no
neighborhood)
a sequence x0, x1, x2, x3, is given
enough neurons are available
Then
the optimum weight/context pair for xt is
w xt, c ?i0..t-1
?(1-?)t-i-1xi
Hebbian training converges to this setting as a
stable fixed point
Compare to TKM
optimum weights are w ?i0..t (1-a)ixt-i /
?i0..t (1-a)i
but no fixed point for TKM

37
Recurrent self-organizing maps

Theorem - capacity
MSOM can simulate finite automata
TKM cannot
? MSOM is strictly more powerful than TKM/RSOM!

state
input
state
d
state
input (1,0,0,0)
38
Recurrent self-organizing maps
General recursive maps
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Ct
Hebbian learning w ? xt c ? Ct
39
Recurrent self-organizing maps
xt,xt-1,xt-2,,x0
(w,c)
xt w2
Ct - c2
xt
MSOM Ct merged content of the winner in the
previous time step TKM/RSOM Ct activation of
the current neuron (implicit c)
Ct
xt-1,xt-2,,x0
40
Recurrent self-organizing maps

MSOM
Ct merged content of the winner in the
previous time step
TKM/RSOM
Ct activation of the current neuron
(implicit c)
Recursive SOM (RecSOM) Voegtlin
Ct exponential transformation of the
activation of all neurons
(exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
Feedback SOM (FSOM) Horio/Yamakawa
Ct leaky integrated activation of all
neurons
(d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
SOM for structured data (SOMSD)
Hagenbuchner/Sperduti/Tsoi
Ct index of the winner in the previous
step
Supervised recurrent networks
Ct sgd(activation), metric as dot product

41
Recurrent self-organizing maps
for normalized or WTA semilinear context
42
Recurrent self-organizing maps

Experiment
Mackey-Glass time series
100 neurons
different lattices
different contexts
evaluation by the temporal quantization error

average(mean activity k steps into the past -
observed activity k steps into the past)2
43
Recurrent self-organizing maps
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
44
Contextual models - Background
45
Contextual models
46
Contextual models

time series ? sensor signals, spoken language,
sequences ? text, DNA,
tree structures ? terms, formulas, logic,
graph structures ? chemical molecules, graphic,
networks,
neural networks for structures
kernel methods Haussler, Watkins et al.
recursive networks Küchler et al.

47
Contextual models
Recursive network

training given pattern (xi,f(xi))
selection of the architecture
optimization of the weights
evaluation of the test error

inp.
output
cont.
cont.
directed acyclic graphs over Rn with one
supersource and fan-out 2
where frec(Rn)2?Rc frec(?)
0 frec(a(l,r)) f(a,frec(l),frec(r))
g?frec(Rn)2?Ro
48
Contextual models

Cascade Correlation Fahlmann/Lebiere
given data (x,y) in RnxR, find f such that f(x)y

minimize the error on the given data
x
y
maximize the correlation of the units output and
the current error ? the unit can serve for error
correction in subsequent steps
hi(x)fi(x,h1(x), ..., hi-1(x))
etc.
49
Contextual models

few, cascaded, separately optimized neurons
? efficient training
? excellent generalization
.. as shown e.g. for two spirals

50
Contextual models
For trees recursive processing of the structure
starting at the leaves towards the root
q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) gives
the context
q-1
acyclic!
q-1
h1(v) f1(l(v),h1(ch1(v)),...,h1(chk(v)))
q-1
q-1
... not possible since weights are frozen after
adding a neuron!
h2(v) f2(l(v),h2(ch1(v)),...,h2(chk(v)),
h1(v),h1(ch1(v)),...,h1(chk(v)))
etc. no problem!
51
Contextual models

Recursive cascade correlation
init
repeat
add hi
train fi(l(v), q-1(hi(v)),
h1(v), q-1(h1(v)),
...,hi-1(v), q-1(hi-1(v)))
on the correlation
train the output on the error

52
Contextual models
Restricted recurrence allows to look at
parents q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) q
1(hi(v))(hi(pa1(v)),...,hi(pak(v)))
q-1
q1
q1
q-1
... would yield cycles
q-1
possible due to restricted recurrence!
Contextual cascade correlation hi(v)fi(l(v),q-1(
hi(v)),h1(v),q-1(h1(v)),q1(h1(v))
...,hi-1(v),q-1(hi-1(v)),q1(hi-1(v)))
53
Contextual models

q1 extends the context of hi

i3
i1
i2
54
Contextual models

Experiment QSPR-problem Micheli,Sperduti,Sona
predict the boiling point of alkanes (in oC).
Alkanes CnH2n2, methan, ethan, propan, butan,
pentan,

hexan
2-methyl-pentan
the larger n and the more side strands, the
higher the boiling point ? excellent benchmark
55
Contextual models

Structure for alkanes

CH3(CH2(CH2(CH(CH2(CH3),CH(CH3,CH2(CH3))))))
56
Contextual models

Alkane 150 examples, bipolar encoding of
symbols,
10-fold cross-validation,
number of neurons 137 (CRecCC), 110/140 (RecCC),
compare to FNN with direct encoding for
restricted length Cherqaoui/Vilemin
codomain -164,174

57
Contextual models

Context PCA of hi(v)

58
Contextual models Approximation capability
59
Contextual models

Major problem Giles RCC cannot represent all
finite automata!

CRecCC
NN CC
RCC
RecCC
FSA
RNN
RecNN
60
Contextual models

? RCC is strictly less powerful than RNN due to
the restricted recurrence if considering
approximation for inputs of arbitrary size/length
? It is not clear, what we get for restricted
size/length resp. approximation in L1-norm
? The restricted recurrence enables us to
integrate the parents into the context, i.e. to
deal with a larger set of inputs (acyclic graphs
instead of trees)

61
Contextual models

Supersource transductions
IO-isomorphic tranductions

real number
62
Contextual models

... for L1-approximation, we get
Hammer/Micheli/Sperduti
? RCC is approximation complete for sequences and
supersource transduction (required squashing
function which is C1 and non vanishing at one
point)
? RecCC with multiplicative neurons is
approximation complete for tree structures and
supercource transduction (required squashing
function which is C2 and nonvanishing at one
point)
? Contextual Cascade Correlation with
multiplicative neurons is approximation complete
for acyclic graphs and IO-isomorphic transduction
(required graphs possess one supersource and a
mild structural condition, squashing function
which is C2 and nonvanishing at one point)