Title: Recursive SelfOrganizing Networks
1Recursive Self-Organizing Networks
- Barbara Hammer,
- AG LNM, Universität Osnabrück,
- and Alessio Micheli,
- Alessandro Sperduti, Marc Strickert
2History
Symbolic systems
if red then apple otherwise pear
if (on(apple1,pear) and free(apple2)) moveto
(apple1,apple2)
Neural networks
finite dimensional vector
3History
Recurrent networks for sequences
frec(x1,x2,x3,...) f(x1,frec(x2,x3,..))
...
Recursive networks for trees
frec(a(t1,t2)) f(a,frec(t1),frec(t2))
4History
- well established models
- Training gradient based learning, ...
- Theory representation/approximation capability,
learnability, complexity, ... - Applications for RNNs too many to be mentioned,
for RecNNs - term classification Goller, Küchler, 1996
- automated theorem proving Goller, 1997
- learning tree automata Küchler, 1998
- QSAR/QSPR problems Schmitt, Goller, 1998
Bianucci, Micheli, Sperduti, Starita, 2000
Vullo, Frasconi, 2003 - logo and image recognition Costa, Frasconi,
Soda, 1999 - natural language parsing Costa, Frasconi, Sturt,
Lombardo, Soda, 2000 - document classification Diligenti, Frasconi,
Gori, 2001 - fingerprint classification Yao, Marcialis, Roli,
Frasconi, Pontil, 2001 - prediction of contact maps Baldi, Frasconi,
Pollastri, Vullo, 2002
5History
- unsupervised methods
- Visualize (noisy) fruits
representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
6History
Unsupervised networks for data representation,
visualization, clustering, preprocessing, data
mining,
e.g. self organizing map (SOM) given a lattice
of neurons
i(i1,i2)
represent data via winner takes all
classification f Rn ? I, x ? i where x-wi2
minimal
Hebbian learning based on examples xi,
topological mapping because of neighborhood
cooperation wj wj ?exp(-j-jw/s2)
(xi-wj)
x
i x - wi2 minimal
?
?
7History
8History
representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
.. but what should we do with fruit salad?
9Recursive self-organizing networks
- Outline
- Various approaches
- A unifying framework
- More approaches
- Experiments
- Theory
- Conclusions
10Various approaches
11Various approaches
i(i1,i2)
x
i x - wi2 minimal
sequence or structure
flatten this one or use an alternative metric
12Various approaches
- Temporal Kohonen Map Chappell,Taylor
i(i1,i2)
standard Hebbian learning for wi
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 ad1(i) minimal
leaky integration also recurrent SOM
Varsta,Heikkonen,Milan
i3 d3(i) x3 - wi2 ad2(i) minimal
recurrence
13Various approaches
Hebbian learning for wi and ci
i(i1,i2)
(wi,ci) with ci in RN
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 a(d1(1),,d1(N)) -
ci2 minimal
i3 d3(i) x3 - wi2 a(d2(1),,d2(N)) -
ci2 minimal
...Voegtlin uses exp(-di(j))...
recurrence
14Various approaches
- SOMSD Hagenbuchner,Sperduti,Tsoi
Hebbian learning for wi, ci1, and ci2
i(i1,i2)
(wi,ci1,ci2) with ci1, ci2 in R2
x1
x2
x3
i2 x2 wi 2, i4 x4 wi 2, i5 x5
wi 2 minimal
i3 x3 wi 2 ai4 c i12 ai5 c
i22 minimal
x4
x5
i1 x1 wi 2 ai2 ci1 2 ai3 c
i22 minimal
recurrence
15Various approaches
(1,1)
(1,3)
(1,1)
(2,3)
(3,3)
(3,1)
16A unifying framework
17A unifying framework
a(t,t)
t
t
a
(w,r,r)
w a2
rep(t) - r2
rep(t) r2
rep(t)
rep(t)
18A unifying framework
- ingredients
- binary trees with labels in (W,dW)
- formal representation of trees (R,dR)
? context! - neurons n with labeling (L0(n),L1(n),L2(n)) in
WxRxR - a function rep RN ? R
- recursive distance of a(t1,t2) from n
- drec(a(t1,t2),n) dW(a,L0(n))
adR(R1,L1(n)) adR(R2,L2(n)) - where
- Ri r? if ti ? and
- Ri rep(drec(ti,n1),,dre
c(ti,nN)) otherwise - training
- L0(n) ?
nhd(n,nwinner) (a L0(n)) - L1(n) ?
nhd(n,nwinner) (R1 L1(n)) - L2(n) ?
nhd(n,nwinner) (R2 L2(n))
19A unifying framework
SOMSD
context index of the winner
(w,r,r) in WxRdxRd
w a2
rep(t) - r2
rep(t) r2
rep(t) index of the winner
rep(t) index of the winner
20A unifying framework
RecSOM
context whole maps activation
(w,r) in WxRN
w a2
rep(t) - r2
rep(t)(exp(-drec(t,n1)),,exp(-drec(t,nN)))
21A unifying framework
TKM
context neurons activation
(w,(0,,1,,0))
w a2
drec(t,ni) (0,,1,,0)t (drec(t,n1),,drec(t,nN)
)
repid (drec(t,n1),,drec(t,nN))
22A unifying framework
recursive NN
context activation of all neurons
(w,w1,w2)
w a
w2 rep(t)
w1 rep(t)
rep(t) sgd(activation)
rep(t) sgd(activation)
23More approaches
24More approaches
context model
winner content
structure
MSOM
TKM no explicit context, hence restricted
winner content (w,r) (w,r) ? dimensionality too
high merge (1-?)w ?r
RecSOM very high dimensional context
SOMSD compressed information
25More approaches
lattice
MNG
HSOMS
VQadapt only the winner
NGadapt every neuron according to its rank, i.e.
nhd(n,nw) h(rk(n,x)), no prior lattice!!!
HSOM hyperbolic lattice structure
26Experiments
27Experiments
Mackey-Glass time series
temporal quantization error
mean value for each t-j over all pattern
compare to the actual pattern
winner for t
28Experiments
.2 .15 .1 .05 0
SOMSD
0 5 10 15 20 25
30 Index of past inputs (index 0 present)
past
present
29Experiments
HSOMS
MNG
30Experiments
reconstruct 3-gram probabilities from the neurons
of a HSOMS by counting
31Experiments
SOMSD
HSOMS
E
B
P
S
S
P
V
V
X
X
E
B
T
T
32Theory
33Theory
- Cost function of training?
- (approximate) SOM, VQ, NG training is a
stochastic gradient descent on (some f) -
?if(xi-w12,...,xi-wN2) -
- recursive (approximate) SOM, VQ, NG is in general
no stochastic gradient descent, but it can be
interpreted as a truncated stochastic gradient
descent on -
?if(drec(ti,n1),..., drec(ti,nN)) - (the same f) if dW,dR is the squared
Euclidean distance
34Theory
(w,r,r)
w a2
R(t) - r2
a
R(t) r2
35Theory
- Representation of data?
- SOMSD, RecSOM given a finite set, given enough
neurons, every sequence/tree can be represented
by a neuron, - context location in the lattice
- TKM, MSOM
- trees cannot be represented (commutativity),
- the range of representation is restricted by the
weight space, - optimum codes for a sequence (a0,a1,a2,...) are
- TKM w ?tatat / ?tat
and - MSOM w a0, c ?t?t-1at / ?t?t-1
- context fractal encoding
- for MSOM this is a fixed point, uses additional
?
36Theory
induces d1(t,t) ? distance of their winner
indices d2(t,t) ? distance of their
representations
?
?
tripels (w,c1,c2)
trees
37Theory
- Explicit metric for SOMSD
- Assume granularity triples (w,c1,c2) form
e-cover of the space w.r.t. dR, dW, - topological matching for all neurons i1, i2 with
weights (w1,c11,c12), (w2,c21,c22) holds
dR(i1,i2) (adW(w1,w2)bdR(c11,c21)bdR(c12,c2
2)) lt e - then
- d1(t,t) Drec(t,t) (ee(ab)const)
(1-(2b)H1) / (1-2b), Hheight - where
- Drec(t,?) Drec(?,t) dR(winner(t),r?)
- Drec(x(t1,t2),x(t1,t2)) adW(x,x)
bDrec(t1,t1) bDrec(t2,t2)
Markovian
38Conclusions
39Conclusions
- recursive self-organizing models expand
supervised RecNNs, they differ with respect to
the choice of context ( activation for RecNNs) - context e.g.
- neuron, map activation, winner index, winner
content - complexity
- lattice model
- way of representation
- models make sense, possibly (locally) Markovian
- ... but many topics of ongoing research ...
40(No Transcript)
41(No Transcript)
42Various approaches
SOMSD (sequences)
standard SOM
adaptation
43Experiments
physiological data (heart rate, chest volume,
blood oxygen concentration, preprocessed), 617
neurons
44(No Transcript)
45Examples
0.4
-1
1
0.7
0.6
0.3
P (-1) 4 / 7
P(1) 3 / 7
generator for words with two discrete states
specialization unambiguous temporal context
46Experiments
100 most probable binary words (all nodes except
for root node)
0.6
MNG receptive fields for 100 neurons (bullets
indicate spec. neurons)
1
-1
47Experiments
reconstruction by counting for HSOMS
48Experiments
probabilistic 2-gram model, symbols a,b,c
subject to noise
SOMSD with 100 neurons probability extraction
counting symbols???
49Experiments
U-matrix on the weights U-value mean distance
from neighbors valleys symbols
50(No Transcript)
51Experiments
U-matrix, context U-matrix, and mean previous
contexts
52Speaker Identification for a Japanese Vowel
Three exemplary patterns of Æ articulations
from different speakers
53- More info on the data
- 9 different speakers.
- Training set 30 articulations from each
speaker. - Test set varying number of articulations.
- available from UCI Knowledge Discovery in
Database at http//kdd.ics.uci.edu/databas
es/JapaneseVowels/JapaneseVowels.html.
Aim ? Unsupervised speaker identification.
Class assignment to neurons a posteriori by
using the training set. Speaker for which a
neuron is most active is considered as target
class.
54Linear discriminant analysis (LDA) for MNG
24D -gt 2D projection of the 212D weight-context
tuples of MNG neurons
All 150 neurons are displayed with a posteriori
labels (colors).
- Neurons separate well, already in crude (!)
2D-projection. - Neurons specialize on speakers. ?
55Speaker Identification Results
Classification errors ? Training set 2.96
(all 150 neurons used) ? Test set
4.86 (1 idle neuron, never selected) Referenc
e error 5.9 (supervised, rule based, Kudo et
al. 1999)