Unsupervised recurrent networks

About This Presentation

Title:

Unsupervised recurrent networks

Description:

online: stochastic gradient descent. Barbara Hammer. Institut of Informatics. 10 ... the optimum weight/context pair for xt is. w = xt, c = i=0..t-1 ?(1-?)t-i-1 xi ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 31

Provided by: instit47

Category:

more less

Transcript and Presenter's Notes

Title: Unsupervised recurrent networks

1
Unsupervised recurrent networks

Barbara Hammer, Institute of Informatics,
Clausthal University of Technology

2
Brocken
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
Prototype-based clustering
7
Prototype based clustering

data contained in a real-vector space
prototypes characterized by locations in the data
space
clustering induced by the receptive fields based
on the euclidean metric

8
Vector quantization

init prototypes
repeat
present a data point
adapt the winner into the direction of the data
point

9
Cost function

minimizes the cost function
online stochastic gradient descent ?

10
Neighborhood cooperation
Self-Organizing Map regular lattice
Neural gas data optimum topology
j(j1,j2)
?
?
11
Clustering recurrent data
12
(No Transcript)
13
Old models
14
Old models

Temporal Kohonen Map

leaky integration
x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
15
(No Transcript)
16
Our model
17
Merge neural gas/SOM
explicit temporal context
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
merge-context content of the winner
Ct
training w ? xt c ? Ct
18
Merge neural gas/SOM
(wj,cj) in Rnxn

explicit context, global recurrence
wj represents entry xt
cj repesents the context which

equals the winner content of the last time step
distance d(xt,wj) axt-wj (1-a)Ct-cj
where Ct ?wI(t-1) (1-?)cI(t-1), I(t-1)
winner in step t-1 (merge)
training wj ? xt, cj ? Ct

19
Merge neural gas/SOM

Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
42 50 33 45 32 42
41 40 34 39 33 38
40 37 35 36 34 35
C3 (3338)/2 35.5
20
Merge neural gas/SOM

speaker identification, japanese vovel ae
UCI-KDD archive
9 speakers, 30 articulations each

time
12-dim. cepstrum
MNG, 150 neurons 2.7 test error MNG, 1000
neurons 1.6 test error rule based 5.9, HMM
3.8
21
Merge neural gas/SOM

Experiment
classification of donor sites for C.elegans
5 settings with 10000 training data, 10000 test
data, 50 nucleotides TCGA embedded in 3 dim, 38
donor Sonnenburg, Rätsch et al.
MNG with posterior labeling
512 neurons, ?0.25, ?0.075, a 0.999 ?
0.4,0.7
14.060.66 training error, 14.260.39 test
error
sparse representation 512 6 dim

22
Merge neural gas/SOM

Theorem context representation
Assume
a map with merge context is given (no
neighborhood)
a sequence x0, x1, x2, x3, is given
enough neurons are available
Then
the optimum weight/context pair for xt is
w xt, c ?i0..t-1
?(1-?)t-i-1xi
Hebbian training converges to this setting as a
stable fixed point
Compare to TKM
optimum weights are w ?i0..t (1-a)ixt-i /
?i0..t (1-a)i
but no fixed point for TKM
MSOM is the correct implementation of TKM

23
More models
24
More models
what is the correct temporal context ?
xt,xt-1,xt-2,,x0
(w,c)
xt w2
xt
Ct - c2
Context RSOM/TKM neuron itself MSOM winner
content SOMSD winner index RecSOM all
activations
Ct
training w ? xt c ? Ct
xt-1,xt-2,,x0
25
More models
TKM RSOM MSOM SOMSD RecSOM
context Neuron itself Neuron itself Winner content Winner index Activation of all neurons
encoding Input space Input space Input space Lattice space Activation space
memory nN nN 2nN (dn)N (Nn)N
lattice all all all regular / hyperbolic all
capacity ltFSA ltFSA FSA FSA PDA
for normalised WTA context
26
More models

Experiment
Mackey-Glass time series
100 neurons
different lattices
different contexts
evaluation by the temporal quantization error

average(mean activity k steps into the past -
observed activity k steps into the past)2
27
More models
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
28
So what?
29
So what?

inspection / clustering of high-dimensional
events within their temporal context could be
possible
strong regularization as for standard SOM / NG
possible training methods for reservoirs
some theory
some examples
no supervision
the representation of context is critical and
not clear at all
training is critical and not clear at all

30
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Unsupervised recurrent networks - PowerPoint PPT Presentation

Unsupervised recurrent networks

online: stochastic gradient descent. Barbara Hammer. Institut of Informatics. 10 ... the optimum weight/context pair for xt is. w = xt, c = i=0..t-1 ?(1-?)t-i-1 xi ... – PowerPoint PPT presentation