New developments for recurrent neural systems - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

New developments for recurrent neural systems

Description:

[Hawkins, Boden, The Applicability of Recurrent Neural Networks for Biological ... [Xu, Hu, Wunsch, Inference of genetic regulatory networks with recurrent neural ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 66
Provided by: sebastia3
Category:

less

Transcript and Presenter's Notes

Title: New developments for recurrent neural systems


1
New developments for recurrent neural systems
Barbara Hammer Institute of Informatics TU
Clausthal hammer_at_in.tu-clausthal.de
2
  • Recurrent neural networks
  • Definition
  • Architectural bias
  • Recurrent self-organizing maps
  • Definition
  • Capacity
  • Contextual models
  • Background
  • Approximation capability

3
Recurrent neural networks - Definition
4
Recurrent neural networks
  • Feedforward processing
  • Recurrent processing

Moo
Moo
Moo
Moo
5
Recurrent neural networks
  • Application areas
  • Hawkins, Boden, The Applicability of Recurrent
    Neural Networks for Biological Sequence Analysis,
    IEEE/ACM TCBB, 2005
  • Xu, Hu, Wunsch, Inference of genetic regulatory
    networks with recurrent neural network models,
    IEMBS 2004
  • Pollastri, Baldi, Prediction of contact maps by
    GIOHMMs and recurrent neural networks using
    lateral propagation from all four cardinal
    corners, Bioinformatics, 2002
  • Bonet et al., Predicting Human Immunodeficiency
    Virus (HIV) Drug Resistance using Recurrent
    Neural Networks. Proceedings of the 10th
    International Electronic Conference on Synthetic
    Organic Chemistry, 2006
  • Reczko et al., Finding signal peptides in human
    protein sequences using recurrent neural
    networks, WABI 2002
  • Chen, Chaudhari, Bidirectional segmented-memory
    recurrent neural network for protein secondary
    structure prediction, Soft Computing - A Fusion
    of Foundations, Methodologies and Applications,
    2006
  • Bates et al., Detection of seizure foci by
    recurrent neural networks, Engineering in
    Medicine and Biology Society, 2000. Proceedings
    of the 22nd Annual International Conference of
    the IEEE
  • Güler, Ãœbeyli, Güler, Recurrent neural networks
    employing Lyapunov exponents for EEG signals
    classification, Expert Systems with Applications,
    2005
  • Petrosian, Prokhorov, Schiffer, Early
    recognition of Alzheimer's disease in EEG using
    recurrent neural network and wavelet transform,
    Proc. SPIE, 2000

6
Recurrent neural networks
7
Recurrent neural networks
  • Feedforward neural network
  • neurons connected in an acyclic graph
  • every neuron computes x ? sgd(wtx-b)
  • network computes function on vector spaces
  • Recurrent neural network
  • feedforward network enriched with recurrent
    connections which set a temporal context
  • recurrent connections use the output of the
    previous time step
  • network computes function on time series

8
Recurrent neural networks
x(t)
o(t)
z(t) f(x(t),z(t-1)) o(t) g(z(t))
z(t)
z(t-1)
9
Recurrent neural networks
  • well established, training by minimizing the
    quadratic error ? backpropagation through time,
    real time recurrent learning, Kalman filtering,
  • long term dependencies cannot be captured due to
    vanishing gradients

derivative
The derivative vanishes if propagated through
several time steps!
10
Recurrent neural networks
  • Recent trend

fixed recurrent part based on universal pinciples
readout trained by means of simple gradient
mechansim
11
Recurrent neural networks
  • Fractal prediction machines

input alphabet T C G A context is two-dimensional
A C A
T
C
  • resulting points constitute a fractal
  • Markovian property emphasis on one part of the
    sequence

G
A
12
Recurrent neural networks
  • Fractal predistion machine demo
  • daily volatility change of the Dow Jones
    Industrial Average 2/1918-4/1997, predict the
    direction of volatility move for the next day

Tino,Dorffner, Predicting the future from
fractal representations of the past, Machine
Learning, 2001
13
Recurrent neural networks
  • Echo state networks

very high dimension, random connections
  • echo state property
  • in the limit, the context does not depend on the
    initialization
  • e.g. spectral radius smaller than one
  • activation initialized by long enough recurrence

14
Recurrent neural networks
  • Echo state networks demo

Mackey-Glass time series
Laser data
Lorentz attractor
Jaeger/Haas, Harnessing nonlinearity predicting
chaotic systems and saving energy in wireless
communication, Science, 2004
15
Recurrent neural networks Architectural bias
16
Recurrent neural networks
  • Approximation completeness RNNs can aproximate
    every recursive system with continuous transition
    and finite time horizon
  • connection to recursive symbolic computation
    mechanisms?

dynamics of a symbolic formalism?
17
Recurrent neural networks
  • Symbolic mechanisms
  • finite memory models look only at a finite time
    window
  • f(x1,x2,) f(x1,,xL) for fixed L
  • finite state automata computation based on a
    finite internal state
  • pushdown automata computation based on an
    interior stack
  • context sensitive language - computations in
    linear space
  • Turing machines
  • beyond (? computation with real numbers!)

18
Recurrent neural networks
RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with Gaussian noise finite memory models
Maass/Sontag
19
Recurrent neural networks
  • Motivation architectural bias

easy divide this form into two parts with the
same size and form
difficult divide this form into four parts with
the same size and form
extremely difficult divide this form into six
parts with the same size and form
20
Recurrent neural networks
  • RNNs are initialized with small weights what is
    the bias?
  • It holds Hammer/Tino
  • small weight RNNs ? FMMs For every RNN with
    small weights one can find a finite memory length
    L such that the RNN can be approximated by a FMM
    with memory length L.
  • FMMs ? small weight RNNs For every FMM, an RNN
    with randomly initialized small weights exists
    which approximates the FMM.
  • small weight RNNs have excellent generalization
    ability (distribution independent UCED property)
    for RNNs with small weights, the empirical error
    represents the real error independent of the
    underlying distribution

21
Recurrent neural networks
RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with Gaussian noise finite memory models
Maass/Sontag
22
Recurrent Self-organizing maps Definition
23
Recurrent self-organizing maps
  • Supervised learning
  • Unsupervised learning

24
Recurrent self-organizing maps
Self-organizing map (SOM) Kohonen popular
unsupervised self-organizing neural method for
data mining and visualization
network given by prototypes wj ? Rn in a lattice
j(j1,j2)
mapping Rn?x ? position j in the lattice for
which x-wj minimal
Hebbian learning based on examples xi and
neighborhood cooperation
x
j x - wj minimal
i.e. choose xi and adapt all wj wj wj
?nhd(j,j0)(xi-wj)
?
25
Recurrent self-organizing maps
  • Neural gas Martinetz no prior lattice,
    adaptation according to the rank
  • wj wj ? rk(wj,xi)(xi-wj)
  • HSOM Ritter hyperbolic lattice structure
  • wj wj ?nhdH(j,j0)(xi-wj)
  • but for real vectors of fixed size only!
  • Time series and recurrence?

26
Recurrent self-organizing maps
  • Temporal Kohonen map Chappell/Taylor
  • Recurrent SOM Varsta/Heikkonen

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
27
Recurrent self-organizing maps
  • TKM/RSOM compute a leaky average of time series
  • It is not clear how they can differentiate
    various contexts
  • no explicit context!

is the same as
28
Recurrent self-organizing maps
  • Merge SOM Hammer/Strickert, 2003 explicit
    notion of context

(wj,cj) in Rnxn
wj represents the current entry xt cj
represents the context the content of the
winner of the last step
d(xt,wj) axt-wj (1-a)Ct-cj where Ct
?wI(t-1) (1-?)cI(t-1), I(t-1) winner in step
t-1
merge
29
Recurrent self-organizing maps
  • Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
30
Recurrent self-organizing maps
  • Training
  • MSOM wj wj ?nhd(j,j0)(xt-wj)
  • cj cj
    ?nhd(j,j0)(Ct-cj)
  • MNG wj wj ?rk(wj,xt)(xt-wj)
  • cj wj
    ?rk(wj,xt)(Ct-cj)
  • MHSOM wj wj ?nhdH(j,j0)(xt-wj)
  • cj cj
    ?nhdH(j,j0)(Ct-cj)

31
Recurrent self-organizing maps
  • Experiment
  • speaker identification, Japanese vowel ae
  • 9 speakers, 30 articulations per speaker in
    training set
  • separate test set
  • http//kdd.ics.uci.edu/databases/JapaneseVowels/Ja
    paneseVowels.html

time
12-dim. cepstrum
32
Merge SOM
  • MNG with posterior labeling
  • ? 0.5, a 0.99?0.63, ? 0.3
  • 150 neurons
  • 0 training error
  • 2.7 test error
  • 1000 neurons
  • 0 training error
  • 1.6 test error
  • rule based 5.9, HMM 3.8 Kudo et al.

33
Merge SOM
  • Experiment
  • Reber grammar
  • 3106 input vectors for training
  • 106 vectors for testing
  • MNG, 617 neurons, ? 0.5, a 1?0.57
  • evaluation by the test data
  • attach the longest unique sequence to each winner
  • 428 distinct words
  • average length 8.902
  • reconstruction from the map
  • backtracking of the best matching predecessor
  • triplets only valid Reber words
  • unlimited average 13.78
  • TVPXTTVVEBTSXXTVPSEBPVPXTVVEBPVVEB

BTXXVPXVPXVPSE
BTXXVPXVPSE
(W,C)
34
Merge SOM
  • Experiment
  • classification of donor sites for C.elegans
  • 5 settings with 10000 training data, 10000 test
    data, 50 nucleotides TCGA embedded in 3 dim, 38
    donor Sonnenburg, Rätsch et al.
  • MNG with posterior labeling
  • 512 neurons, ?0.25, ?0.075, a 0.999 ?
    0.4,0.7
  • 14.060.66 training error, 14.260.39 test
    error
  • sparse representation 512 6 dim

35
Recurrent self-organizing maps Capacity
36
Recurrent self-organizing maps
  • Theorem context representation
  • Assume
  • a SOM with merge context is given (no
    neighborhood)
  • a sequence x0, x1, x2, x3, is given
  • enough neurons are available
  • Then
  • the optimum weight/context pair for xt is
  • w xt, c ?i0..t-1
    ?(1-?)t-i-1xi
  • Hebbian training converges to this setting as a
    stable fixed point
  • Compare to TKM
  • optimum weights are w ?i0..t (1-a)ixt-i /
    ?i0..t (1-a)i
  • but no fixed point for TKM

37
Recurrent self-organizing maps
  • Theorem - capacity
  • MSOM can simulate finite automata
  • TKM cannot
  • ? MSOM is strictly more powerful than TKM/RSOM!

state
input
state
d
state
input (1,0,0,0)
38
Recurrent self-organizing maps
General recursive maps
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
The methods differ in the choice of context!
Ct
Hebbian learning w ? xt c ? Ct
39
Recurrent self-organizing maps
xt,xt-1,xt-2,,x0
(w,c)
xt w2
Ct - c2
xt
MSOM Ct merged content of the winner in the
previous time step TKM/RSOM Ct activation of
the current neuron (implicit c)
Ct
xt-1,xt-2,,x0
40
Recurrent self-organizing maps
  • MSOM
  • Ct merged content of the winner in the
    previous time step
  • TKM/RSOM
  • Ct activation of the current neuron
    (implicit c)
  • Recursive SOM (RecSOM) Voegtlin
  • Ct exponential transformation of the
    activation of all neurons
  • (exp(-d(xt-1,w1)),,exp(-d(xt-1,wN)))
  • Feedback SOM (FSOM) Horio/Yamakawa
  • Ct leaky integrated activation of all
    neurons
  • (d(xt-1,w1),, d(xt-1,wN)) ?Ct-1
  • SOM for structured data (SOMSD)
    Hagenbuchner/Sperduti/Tsoi
  • Ct index of the winner in the previous
    step
  • Supervised recurrent networks
  • Ct sgd(activation), metric as dot product

41
Recurrent self-organizing maps
for normalized or WTA semilinear context
42
Recurrent self-organizing maps
  • Experiment
  • Mackey-Glass time series
  • 100 neurons
  • different lattices
  • different contexts
  • evaluation by the temporal quantization error

average(mean activity k steps into the past -
observed activity k steps into the past)2
43
Recurrent self-organizing maps
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
44
Contextual models - Background
45
Contextual models
46
Contextual models
  • time series ? sensor signals, spoken language,
  • sequences ? text, DNA,
  • tree structures ? terms, formulas, logic,
  • graph structures ? chemical molecules, graphic,
    networks,
  • neural networks for structures
  • kernel methods Haussler, Watkins et al.
  • recursive networks Küchler et al.

47
Contextual models
Recursive network
  • training given pattern (xi,f(xi))
  • selection of the architecture
  • optimization of the weights
  • evaluation of the test error

inp.
output
cont.
cont.
directed acyclic graphs over Rn with one
supersource and fan-out 2
where frec(Rn)2?Rc frec(?)
0 frec(a(l,r)) f(a,frec(l),frec(r))
g?frec(Rn)2?Ro
48
Contextual models
  • Cascade Correlation Fahlmann/Lebiere
  • given data (x,y) in RnxR, find f such that f(x)y

minimize the error on the given data
x
y
maximize the correlation of the units output and
the current error ? the unit can serve for error
correction in subsequent steps
hi(x)fi(x,h1(x), ..., hi-1(x))
etc.
49
Contextual models
  • few, cascaded, separately optimized neurons
  • ? efficient training
  • ? excellent generalization
  • .. as shown e.g. for two spirals

50
Contextual models
For trees recursive processing of the structure
starting at the leaves towards the root
q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) gives
the context
q-1
acyclic!
q-1
h1(v) f1(l(v),h1(ch1(v)),...,h1(chk(v)))
q-1
q-1
... not possible since weights are frozen after
adding a neuron!
h2(v) f2(l(v),h2(ch1(v)),...,h2(chk(v)),
h1(v),h1(ch1(v)),...,h1(chk(v)))
etc. no problem!
51
Contextual models
  • Recursive cascade correlation
  • init
  • repeat
  • add hi
  • train fi(l(v), q-1(hi(v)),
  • h1(v), q-1(h1(v)),
  • ...,hi-1(v), q-1(hi-1(v)))
  • on the correlation
  • train the output on the error

52
Contextual models
Restricted recurrence allows to look at
parents q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) q
1(hi(v))(hi(pa1(v)),...,hi(pak(v)))
q-1
q1
q1
q-1
... would yield cycles
q-1
possible due to restricted recurrence!
Contextual cascade correlation hi(v)fi(l(v),q-1(
hi(v)),h1(v),q-1(h1(v)),q1(h1(v))
...,hi-1(v),q-1(hi-1(v)),q1(hi-1(v)))
53
Contextual models
  • q1 extends the context of hi

i3
i1
i2
54
Contextual models
  • Experiment QSPR-problem Micheli,Sperduti,Sona
    predict the boiling point of alkanes (in oC).
    Alkanes CnH2n2, methan, ethan, propan, butan,
    pentan,

hexan
2-methyl-pentan
the larger n and the more side strands, the
higher the boiling point ? excellent benchmark
55
Contextual models
  • Structure for alkanes

CH3(CH2(CH2(CH(CH2(CH3),CH(CH3,CH2(CH3))))))
56
Contextual models
  • Alkane 150 examples, bipolar encoding of
    symbols,
  • 10-fold cross-validation,
  • number of neurons 137 (CRecCC), 110/140 (RecCC),
  • compare to FNN with direct encoding for
    restricted length Cherqaoui/Vilemin
  • codomain -164,174

57
Contextual models
  • Context PCA of hi(v)

58
Contextual models Approximation capability
59
Contextual models
  • Major problem Giles RCC cannot represent all
    finite automata!

CRecCC
NN CC
RCC
RecCC
FSA
RNN
RecNN
60
Contextual models
  • ? RCC is strictly less powerful than RNN due to
    the restricted recurrence if considering
    approximation for inputs of arbitrary size/length
  • ? It is not clear, what we get for restricted
    size/length resp. approximation in L1-norm
  • ? The restricted recurrence enables us to
    integrate the parents into the context, i.e. to
    deal with a larger set of inputs (acyclic graphs
    instead of trees)

61
Contextual models
  • Supersource transductions
  • IO-isomorphic tranductions

real number
62
Contextual models
  • ... for L1-approximation, we get
    Hammer/Micheli/Sperduti
  • ? RCC is approximation complete for sequences and
    supersource transduction (required squashing
    function which is C1 and non vanishing at one
    point)
  • ? RecCC with multiplicative neurons is
    approximation complete for tree structures and
    supercource transduction (required squashing
    function which is C2 and nonvanishing at one
    point)
  • ? Contextual Cascade Correlation with
    multiplicative neurons is approximation complete
    for acyclic graphs and IO-isomorphic transduction
    (required graphs possess one supersource and a
    mild structural condition, squashing function
    which is C2 and nonvanishing at one point)

63
Conclusions.
64
Conclusions
  • Recurrent networks
  • FMM and learning bias ? alternative training
    mechanisms
  • Recurrent self-organizing maps
  • context defines function and capacity
  • Contextual processing
  • general forms of recurrence open the way towards
    structures

65
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com