Recursive SelfOrganizing Networks - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Recursive SelfOrganizing Networks

Description:

L0(n) = ? nhd(n,nwinner) (a L0(n)) L1(n) = ? nhd(n,nwinner) (R1 L1(n) ... Class assignment to neurons: a posteriori by using the training set. ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 56

Provided by: Barb371

Category:

more less

Transcript and Presenter's Notes

Title: Recursive SelfOrganizing Networks

1
Recursive Self-Organizing Networks

Barbara Hammer,
AG LNM, Universität Osnabrück,
and Alessio Micheli,
Alessandro Sperduti, Marc Strickert

2
History
Symbolic systems
if red then apple otherwise pear
if (on(apple1,pear) and free(apple2)) moveto
(apple1,apple2)
Neural networks
finite dimensional vector
3
History
Recurrent networks for sequences
frec(x1,x2,x3,...) f(x1,frec(x2,x3,..))
...
Recursive networks for trees
frec(a(t1,t2)) f(a,frec(t1),frec(t2))
4
History

well established models
Training gradient based learning, ...
Theory representation/approximation capability,
learnability, complexity, ...
Applications for RNNs too many to be mentioned,
for RecNNs
term classification Goller, Küchler, 1996
automated theorem proving Goller, 1997
learning tree automata Küchler, 1998
QSAR/QSPR problems Schmitt, Goller, 1998
Bianucci, Micheli, Sperduti, Starita, 2000
Vullo, Frasconi, 2003
logo and image recognition Costa, Frasconi,
Soda, 1999
natural language parsing Costa, Frasconi, Sturt,
Lombardo, Soda, 2000
document classification Diligenti, Frasconi,
Gori, 2001
fingerprint classification Yao, Marcialis, Roli,
Frasconi, Pontil, 2001
prediction of contact maps Baldi, Frasconi,
Pollastri, Vullo, 2002

5
History

unsupervised methods
Visualize (noisy) fruits

representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
6
History
Unsupervised networks for data representation,
visualization, clustering, preprocessing, data
mining,
e.g. self organizing map (SOM) given a lattice
of neurons
i(i1,i2)
represent data via winner takes all
classification f Rn ? I, x ? i where x-wi2
minimal
Hebbian learning based on examples xi,
topological mapping because of neighborhood
cooperation wj wj ?exp(-j-jw/s2)
(xi-wj)
x
i x - wi2 minimal
?
?
7
History
8
History

Visualize (noisy) fruits

representation (Øx,Øy,Øx/Øy,curvature,color,hard
ness,weight, ) ? Rn
.. but what should we do with fruit salad?
9
Recursive self-organizing networks

Outline
Various approaches
A unifying framework
More approaches
Experiments
Theory
Conclusions

10
Various approaches
11
Various approaches
i(i1,i2)

Standard SOM

x
i x - wi2 minimal
sequence or structure
flatten this one or use an alternative metric
12
Various approaches

Temporal Kohonen Map Chappell,Taylor

i(i1,i2)
standard Hebbian learning for wi
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 ad1(i) minimal
leaky integration also recurrent SOM
Varsta,Heikkonen,Milan
i3 d3(i) x3 - wi2 ad2(i) minimal
recurrence
13
Various approaches

Recursive SOM Voegtlin

Hebbian learning for wi and ci
i(i1,i2)
(wi,ci) with ci in RN
sequence
x1,x2,x3,x4,
i1 d1(i) x1 - wi2 minimal
i2 d2(i) x2 - wi2 a(d1(1),,d1(N)) -
ci2 minimal
i3 d3(i) x3 - wi2 a(d2(1),,d2(N)) -
ci2 minimal
...Voegtlin uses exp(-di(j))...
recurrence
14
Various approaches

SOMSD Hagenbuchner,Sperduti,Tsoi

Hebbian learning for wi, ci1, and ci2
i(i1,i2)
(wi,ci1,ci2) with ci1, ci2 in R2
x1
x2
x3
i2 x2 wi 2, i4 x4 wi 2, i5 x5
wi 2 minimal
i3 x3 wi 2 ai4 c i12 ai5 c
i22 minimal
x4
x5
i1 x1 wi 2 ai2 ci1 2 ai3 c
i22 minimal
recurrence
15
Various approaches

Example 42 ? 33? 33? ...

(1,1)
(1,3)
(1,1)
(2,3)
(3,3)
(3,1)
16
A unifying framework
17
A unifying framework
a(t,t)
t
t
a
(w,r,r)
w a2
rep(t) - r2
rep(t) r2
rep(t)
rep(t)
18
A unifying framework

ingredients
binary trees with labels in (W,dW)
formal representation of trees (R,dR)
? context!
neurons n with labeling (L0(n),L1(n),L2(n)) in
WxRxR
a function rep RN ? R
recursive distance of a(t1,t2) from n
drec(a(t1,t2),n) dW(a,L0(n))
adR(R1,L1(n)) adR(R2,L2(n))
where
Ri r? if ti ? and
Ri rep(drec(ti,n1),,dre
c(ti,nN)) otherwise
training
L0(n) ?
nhd(n,nwinner) (a L0(n))
L1(n) ?
nhd(n,nwinner) (R1 L1(n))
L2(n) ?
nhd(n,nwinner) (R2 L2(n))

19
A unifying framework
SOMSD
context index of the winner
(w,r,r) in WxRdxRd
w a2
rep(t) - r2
rep(t) r2
rep(t) index of the winner
rep(t) index of the winner
20
A unifying framework
RecSOM
context whole maps activation
(w,r) in WxRN
w a2
rep(t) - r2
rep(t)(exp(-drec(t,n1)),,exp(-drec(t,nN)))
21
A unifying framework
TKM
context neurons activation
(w,(0,,1,,0))
w a2
drec(t,ni) (0,,1,,0)t (drec(t,n1),,drec(t,nN)
)
repid (drec(t,n1),,drec(t,nN))
22
A unifying framework
recursive NN
context activation of all neurons
(w,w1,w2)
w a
w2 rep(t)
w1 rep(t)
rep(t) sgd(activation)
rep(t) sgd(activation)
23
More approaches
24
More approaches
context model
winner content
structure
MSOM
TKM no explicit context, hence restricted
winner content (w,r) (w,r) ? dimensionality too
high merge (1-?)w ?r
RecSOM very high dimensional context
SOMSD compressed information
25
More approaches
lattice
MNG
HSOMS
VQadapt only the winner
NGadapt every neuron according to its rank, i.e.
nhd(n,nw) h(rk(n,x)), no prior lattice!!!
HSOM hyperbolic lattice structure
26
Experiments
27
Experiments
Mackey-Glass time series
temporal quantization error
mean value for each t-j over all pattern
compare to the actual pattern
winner for t
28
Experiments
.2 .15 .1 .05 0
SOMSD
0 5 10 15 20 25
30 Index of past inputs (index 0 present)
past
present
29
Experiments
HSOMS
MNG
30
Experiments

Reber grammar

reconstruct 3-gram probabilities from the neurons
of a HSOMS by counting
31
Experiments
SOMSD
HSOMS
E
B
P
S
S
P
V
V
X
X
E
B
T
T
32
Theory
33
Theory

Cost function of training?
(approximate) SOM, VQ, NG training is a
stochastic gradient descent on (some f)
?if(xi-w12,...,xi-wN2)
recursive (approximate) SOM, VQ, NG is in general
no stochastic gradient descent, but it can be
interpreted as a truncated stochastic gradient
descent on
?if(drec(ti,n1),..., drec(ti,nN))
(the same f) if dW,dR is the squared
Euclidean distance

34
Theory
(w,r,r)
w a2
R(t) - r2
a
R(t) r2
35
Theory

Representation of data?
SOMSD, RecSOM given a finite set, given enough
neurons, every sequence/tree can be represented
by a neuron,
context location in the lattice
TKM, MSOM
trees cannot be represented (commutativity),
the range of representation is restricted by the
weight space,
optimum codes for a sequence (a0,a1,a2,...) are
TKM w ?tatat / ?tat
and
MSOM w a0, c ?t?t-1at / ?t?t-1
context fractal encoding
for MSOM this is a fixed point, uses additional
?

36
Theory

Topology?

induces d1(t,t) ? distance of their winner
indices d2(t,t) ? distance of their
representations
?
?
tripels (w,c1,c2)
trees
37
Theory

Explicit metric for SOMSD
Assume granularity triples (w,c1,c2) form
e-cover of the space w.r.t. dR, dW,
topological matching for all neurons i1, i2 with
weights (w1,c11,c12), (w2,c21,c22) holds
dR(i1,i2) (adW(w1,w2)bdR(c11,c21)bdR(c12,c2
2)) lt e
then
d1(t,t) Drec(t,t) (ee(ab)const)
(1-(2b)H1) / (1-2b), Hheight
where
Drec(t,?) Drec(?,t) dR(winner(t),r?)
Drec(x(t1,t2),x(t1,t2)) adW(x,x)
bDrec(t1,t1) bDrec(t2,t2)

Markovian
38
Conclusions
39
Conclusions

recursive self-organizing models expand
supervised RecNNs, they differ with respect to
the choice of context ( activation for RecNNs)
context e.g.
neuron, map activation, winner index, winner
content
complexity
lattice model
way of representation
models make sense, possibly (locally) Markovian
... but many topics of ongoing research ...

40
(No Transcript)
41
(No Transcript)
42
Various approaches

SOMSD

SOMSD (sequences)
standard SOM
adaptation
43
Experiments
physiological data (heart rate, chest volume,
blood oxygen concentration, preprocessed), 617
neurons
44
(No Transcript)
45
Examples
0.4
-1
1
0.7
0.6
0.3
P (-1) 4 / 7
P(1) 3 / 7
generator for words with two discrete states
specialization unambiguous temporal context
46
Experiments
100 most probable binary words (all nodes except
for root node)
0.6
MNG receptive fields for 100 neurons (bullets
indicate spec. neurons)
1
-1
47
Experiments
reconstruction by counting for HSOMS
48
Experiments
probabilistic 2-gram model, symbols a,b,c
subject to noise
SOMSD with 100 neurons probability extraction
counting symbols???
49
Experiments
U-matrix on the weights U-value mean distance
from neighbors valleys symbols
50
(No Transcript)
51
Experiments
U-matrix, context U-matrix, and mean previous
contexts
52
Speaker Identification for a Japanese Vowel
Three exemplary patterns of Æ articulations
from different speakers
53

More info on the data
9 different speakers.
Training set 30 articulations from each
speaker.
Test set varying number of articulations.
available from UCI Knowledge Discovery in
Database at http//kdd.ics.uci.edu/databas
es/JapaneseVowels/JapaneseVowels.html.

Aim ? Unsupervised speaker identification.
Class assignment to neurons a posteriori by
using the training set. Speaker for which a
neuron is most active is considered as target
class.
54
Linear discriminant analysis (LDA) for MNG
24D -gt 2D projection of the 212D weight-context
tuples of MNG neurons
All 150 neurons are displayed with a posteriori
labels (colors).

Neurons separate well, already in crude (!)
2D-projection.
Neurons specialize on speakers. ?

55
Speaker Identification Results
Classification errors ? Training set 2.96
(all 150 neurons used) ? Test set
4.86 (1 idle neuron, never selected) Referenc
e error 5.9 (supervised, rule based, Kudo et
al. 1999)

Write a Comment

User Comments (0)