Neuroinformatics

About This Presentation

Title:

Neuroinformatics

Description:

meinauto(X) :- left(Y,X),holl ndisch(Y),rot(Y) ... Holographic reduced representation [Plate, 95] circular correlation/convolution ... – PowerPoint PPT presentation

Number of Views:474

Avg rating:3.0/5.0

Slides: 69

Provided by: institutfu5

Category:

more less

Transcript and Presenter's Notes

Title: Neuroinformatics

1
Neuroinformatics

Barbara Hammer, Institute of Informatics,
Clausthal University of Technology

2
problems in winter
3
hartelijk bedankt
meinauto(X) - left(Y,X),holländisch(Y),rot(Y). me
inauto(X) - kennzeichen(X,Y), enthält(Y,Z),enthäl
t(Y,14). enthält(X_,X). .
???
?? !!!
GS-Z-14
pattern recognition ? vectors speech recognition,
time series processing ? sequences symbolic,
general data ? tree structures, graphs
4
Part 5 Symbolic structures and neural networks

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recursive neural networks
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures
Unsupervised methods
Recursive SOM/NG

5
The good old days KBANN and co.
feedforward neural network

black box
distributed representation
connection to rules for symbolic I/O ?

y
x
neuron
fw Rn ? Ro
6
The good old days KBANN and co.

Knowledge Based Artificial Neural Networks
Towell/Shavlik, AI 94
start with a network which represents known rules
train using additional data
extract a set of symbolic rules after training

7
The good old days KBANN and co.
8
The good old days KBANN and co.
data
train
use some form of backpropagation, add a penalty
to the error e.g. for changing the weights

The initial network biases the training result,
but
There is no guarantee that the initial rules are
preserved
There is no guarantee that the hidden neurons
maintain their semantic

9
The good old days KBANN and co.
(complete) rules

There is no exact direct correspondence of a
neuron and a single rule/boolean variable
It is NP complete to find a minimum logical
description for a trained network Golea,
AISB'96
Therefore, a couple of different rule extraction
algorithms have been proposed, and this is still
a topic of ongoing research

10
The good old days KBANN and co.
(complete) rules
decompositional approach
pedagogical approach
11
The good old days KBANN and co.

Decompositional approaches
subset algorithm, MofN algorithm describe single
neurons by sets of active predecessors
Craven/Shavlik, 94
local activation functions (RBF like) allow an
approximate direct description of single neurons
Andrews/Geva, 96
MLP2LN biases the weights towards 0/-1/1 during
training and can then extract exact rules Duch
et al., 01
prototype based networks can be decomposed along
relevant input dimensions by decision tree nodes
Hammer et al., 02
Observation
usually some variation of if-then rules is
achieved
small rule sets are only achieved if further
constraints guarantee that single weights/neurons
have a meaning
tradeoff between accuracy and size of the
description

12
The good old days KBANN and co.

Pedagogical approaches
extraction of conjunctive rules by extensive
search Saito/Nakano 88
interval propagation Gallant 93, Thrun 95
extraction by minimum separation
Tickle/Diderich, 94
extraction of decision trees Craven/Shavlik, 94
evolutionary approaches Markovska, 05
Observation
usually some variation of if-then rules is
achieved
symbolic rule induction required with a little
(or a bit more) help of a neural network

13
The good old days KBANN and co.

Where is this good for?
Nobody (inside neuroinformatics) uses FNNs these
days ?
Insertion of prior knowledge might be valuable.
But efficient training algorithms allow to
substitute this by additional training data
(generated via rules) ?
Validation of the network output might be
valuable, but there exist alternative (good)
guarantees from statistical learning theory ?
If-then rules are not very interesting since
there exist good symbolic learners for learning
propositional rules for classification ?
Propositional rule insertion/extraction is often
an essential part of more complex rule
insertion/extraction mechanisms ?
one can extend it to automaton insertion/extractio
n for recurrent networks (basically, the
transition function of the automaton is
inserted/extracted this way) ?
People e.g. in the medical domain also want an
explanation for a classification ?
extension beyond propositional rules interesting
?
There is at least one application domain (inside
neuroinformatics) where if-then rules are very
interesting and not so easy to learn
fuzzy-control ?

14
Useful neurofuzzy systems
process
input
observation
control
Fuzzy control
if (observation ? FMI) then (control ? FMO)
15
Useful neurofuzzy systems
Fuzzy control
if (observation ? FMI) then (control ? FMO)
Neurofuzzy control
Benefit the form of the fuzzy rules (i.e. neural
architecture) and the shape of the fuzzy sets
(i.e. neural weights) can be learned from data!
16
Useful neurofuzzy systems

NEFCON implements Mamdani control
Nauck/Klawonn/Kruse, 94
ANFIS implements Takagi-Sugeno control Jang, 93
and many other
Learning
of rules evolutionary or clustering
of fuzzy set parameters backpropagation or some
form of Hebbian learning

17
State of the art structure kernels
kernel k(x,x)
data
just compute pairwise distances for this complex
data using structure information
sets, sequences, tree structures, graph
structures
18
State of the art structure kernels

Closure properties of kernels Haussler, Watkins
Principled problems for complex structures
computing informative graph kernels is at least
as hard as graph isomorphism Gärtner
Several promising proposals - taxonomy Gärtner

derived from local transformations
semantic
count common substructures
derived from a probabilistic model
syntax
19
State of the art structure kernels

Count common substructures

GA AG AT
Efficient computation dynamic programming suffix
trees
GAGAGA
3 2 0
3
GAT
1 0 1
locality improved kernel Sonnenburg et al., bag
of words Joachims string kernel Lodhi et al.,
spectrum kernel Leslie et al. word-sequence
kernel Cancedda et al.
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
graph kernels based on paths or subtrees Gärtner
et al.,Kashima et al. kernels for prolog trees
based on similar symbols Passerini/Frasconi/deRae
dt
20
State of the art structure kernels

Derived from a probabilistic model

describe by probabilistic model P(x)
compare characteristics of P(x)
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
21
State of the art structure kernels

Derived from local transformations

is similar to
expand to a global kernel
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
22
State of the art structure kernels

Intelligent preprocessing (kernel extraction)
allows an adequate integration of
semantic/syntactic structure information
This can be combined with state of the art neural
methods such as SVM
Very promising results for
Classification of documents, text Duffy, Leslie,
Lodhi,
Detecting remote homologies for genomic sequences
and further problems in genome analysis
Haussler, Sonnenburg, Vert,
Quantitative structure activity relationship in
chemistry Baldi et al.

23
Conclusions feedforward networks

propositional rule insertion and extraction (to
some extend) are possible ?
useful for neurofuzzy systems ?
structure-based kernel extraction followed by
learning with SVM yields state of the art results
?
but sequential instead of fully integrated
neuro-symbolic approach ?
FNNs itself are restricted to flat data which can
be processed in one shot. No recurrence ?

24
Recursive networks The general idea recursive
distributed representations

How to turn tree structures/acyclic graphs into a
connectionist representation?

25
The general idea recursive distributed
representations
recursion!
inp.
fRixRcxRc?Rc yields
output
cont.
fenc where fenc(?)0 fenc(a(l,r)) f(a,fenc(l),fen
c(r))
cont.
26
The general idea recursive distributed
representations
encoding fenc(Rn)2?Rc fenc(?)
0 fenc(a(l,r)) f(a,fenc(l),fenc(r))
right
decoding hdec Ro ?(Rn)2 hdec(0) ? hdec(x)
h0(x) (hdec(h1(x)), hdec(h2(x)))
fRn2c?Rc
gRc?Ro
hRo?Rn2o
27
The general idea recursive distributed
representations

recursive distributed description Hinton,90
general idea without concrete implementation ?
tensor construction Smolensky, 90
encoding/decoding given by (a,b,c) ? a?b?c
increasing dimensionality ?
Holographic reduced representation Plate, 95
circular correlation/convolution
fixed encoding/decoding with fixed dimensionality
(but potential loss of information) ?
necessity of chunking or clean-up for decoding ?
Binary spatter codes Kanerva, 96
binary operations, fixed dimensionality,
potential loss
necessity of chunking or clean-up for decoding ?
RAAM Pollack,90, LRAAM Sperduti, 94
trainable networks, trained for the identity,
fixed dimensionality
encoding optimized for the given training set ?

28
The general idea recursive distributed
representations

Nevertheless results not promising ?
Theorem Hammer
There exists a fixed size neural network which
can uniquely encode tree structures of arbitrary
depth with discrete labels ?
For every code, decoding of all trees up to
height T requires O(2T) neurons for sigmoidal
networks ?
? encoding seems possible, but no fixed size
architecture exists for decoding

29
One breakthrough recursive networks

Recursive networks Goller/Küchler, 96
do not use decoding
combine encoding and mapping
train this combination directly for the given
task with backpropagation through structure
? efficient data and problem adapted encoding is
learned

encoding
transformation
y
30
One breakthrough recursive networks

Applications
term classification Goller, Küchler, 1996
automated theorem proving Goller, 1997
learning tree automata Küchler, 1998
QSAR/QSPR problems Schmitt, Goller, 1998
Bianucci, Micheli, Sperduti, Starita, 2000
Vullo, Frasconi, 2003
logo recognition, image processing Costa,
Frasconi, Soda, 1999, Bianchini et al. 2005
natural language parsing Costa, Frasconi, Sturt,
Lombardo, Soda, 2000,2005
document classification Diligenti, Frasconi,
Gori, 2001
fingerprint classification Yao, Marcialis, Roli,
Frasconi, Pontil, 2001
prediction of contact maps Baldi, Frasconi,
Pollastri, Vullo, 2002
protein secondary structure prediction Frasconi
et al., 2005

31
One breakthrough recursive networks

application prognosis of contact maps of proteins

x1x2x3x4x5x6x7x8x9x10
x1x2x3x4x5x6x7x8x9x10
(x2,x3)
x1x2x3x4x5x6x7x8x9x10
x1x2x3x4x5x6x7x8x9x10
0
32
One breakthrough recursive networks
x1x2x3x4x5x6x7x8x9x10
PDB
PDBselect(Ct,nCt,dist.truePos) 6?
0.71,0.998,0.59 12? 0.43,0.987,0.55
Pollastri,Baldi,Vullo,Frasconi, NIPS2002

33
One breakthrough recursive networks
Theory approximation completeness - for every
(reasonable) function f and egt0 exists a RecNN
which approximates f up to e (with appropriate
distance measure) Hammer
34
Going on towards more complex structures

Planar graphs

Baldi,Frasconi,,2002
35
Going on towards more complex structures

Acyclic graphs

q1
Contextual cascade correlation Micheli,Sperduti,0
3
36
Contextual cascade correlation
Cascade Correlation 1990,Fahlmann/Lebiere given
data (x,y) in RnxR, find f such that f(x)y
minimize the error on the given data
x
y
maximize the correlation of the units output and
the current error ? the unit can serve for error
correction in subsequent steps
hi(x)fi(x,h1(x), ..., hi-1(x))
etc.
37
Contextual cascade correlation

few, cascaded, separately optimized neurons
? efficient training
? excellent generalization
.. as shown e.g. for two spirals

38
Contextual cascade correlation
for trees recursive processing of the structure
starting at the leaves towards the root
q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) gives
the context
q-1
acyclic!
q-1
h1(v) f1(l(v),h1(ch1(v)),...,h1(chk(v)))
q-1
q-1
... not possible since weights are frozen after
adding a neuron!
h2(v) f2(l(v),h2(ch1(v)),...,h2(chk(v)),
h1(v),h1(ch1(v)),...,h1(chk(v)))
etc. no problem!
39
Contextual cascade correlation

cascade correlation for trees
init
repeat
add hi
train fi(l(v), q-1(hi(v)),
h1(v), q-1(h1(v)),
...,hi-1(v), q-1(hi-1(v)))
on the correlation
train the output on the error

40
Contextual cascade correlation
restricted recurrence allows to look at
parents q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) q
1(hi(v))(hi(pa1(v)),...,hi(pak(v)))
q-1
q1
q-1
... would yield cycles
q1
possible due to restricted recurrence!
q-1
contextual cascade correlation hi(v)fi(l(v),q-1(
hi(v)),h1(v),q-1(h1(v)),q1(h1(v))
...,hi-1(v),q-1(hi-1(v)),q1(hi-1(v)))
41
Contextual cascade correlation
q1 extends the context of hi
i3
i1
i2
42
Contextual cascade correlation
QSPR-problem Micheli,Sperduti,Sona predict the
boiling point of alkanes (in oC). Alkanes
CnH2n2, methan, ethan, propan, butan, pentan,
hexan
2-methyl-pentan
the larger n and the more side strands, the
higher the boiling point ? excellent benchmark
43
Contextual cascade correlation
structure for alkanes
CH3(CH2(CH2(CH(CH2(CH3),CH(CH3,CH2(CH3))))))
44
Contextual cascade correlation
Alkane 150 examples, bipolar encoding of
symbols, 10-fold cross-validation, number of
neurons 137 (CRecCC), 110/140 (RecCC), compare to
FNN with direct encoding for restricted length
Cherqaoui/Vilemin codomain -164,174
45
Contextual cascade correlation
visualization of the contexts
46
Contextual cascade correlation
QSAR-problem Micheli,Sperduti,Sona predict the
activity (IC50) of benzodiazepines

tranquillizer/analgesic Valiquid, Rohypnol,
Noctamid, .. function potentiates the inhibiting
neurotransmitter GABA problem cross tolerance,
cross dependencies ? very important to take an
accurate dose
47
Contextual cascade correlation

structure for benzodiazepines

e.g. bdz(h,f,h,ph,h,h,f,h,h) bdz(h,h,h,ph,c3(h,h,
h),h,c3(h,h,h),h,h) bdz(c3(h,h,o1(c3(h,h,h))),h,h,
ph,h,h,n2(o,o),h,h)
48
Contextual cascade correlation

Benzodiazepine 79 examples, bipolar encoding,
5 test points taken from Hadjipavlou/Hansch and
additional folds,
max number of neurons 13-40

49
Contextual cascade correlation
50
Contextual cascade correlation

Supersource transductions
IO-isomorphic tranductions

real number
51
Contextual cascade correlation

Approximation completeness
recursive cascade correlation is approximation
complete for trees and supersource transductions
contextual recursive cascade correlation is
approximation complete for acyclic graph
structures (with a mild structural condition) and
IO-isomorphic transduction

52
Conclusions recursive networks

Very promising neural architectures for direct
processing of tree structures ?
Successful applications and mathematical
background ?
Connections to symbolic mechanisms (tree
automata) ?
Extensions to more complex structures (graphs)
are under development ?
A few approaches which achieve structured outputs
?

53
Unsupervised models
j(j1,j2)
self-organizing map network given by prototypes
wi ? Rn in a lattice
Hebbian learning based on examples xi and
neighborhood cooperation
x
j x - wj minimal
online learning iterate choose xi determine the
winner adapt all wj wj wj
?exp(-n(j,j0)/s2)(xi-wj)
?
54
Unsupervised models

Represent time series of drinks of the last night

55
Old recursive SOM models

Temporal Kohonen Map

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
56
Old recursive SOM models

TKM/RSOM compute a leaky average of time series,
i.e. they are robust w.r.t. outliers

57
Old recursive SOM models

TKM/RSOM compute a leaky average of time series
It is not clear how they can differentiate
various contexts
no explicit context!

is the same as
58
Merge neural gas/ Merge SOM
(wj,cj) in Rnxn

explicit context, global recurrence
wj represents entry xt
cj repesents the context which

equals the winner content of the last time step
distance d(xt,wj) axt-wj (1-a)Ct-cj
where Ct ?wI(t-1) (1-?)cI(t-1), I(t-1)
winner in step t-1 (merge)
training wj ? xt, cj ? Ct

59
Merge SOM

Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
60
Merge neural gas/SOM

speaker identification, japanese vovel ae
UCI-KDD archive
9 speakers, 30 articulations each

time
12-dim. cepstrum
MNG, 150 neurons 2.7 test error MNG, 1000
neurons 1.6 test error rule based 5.9, HMM
3.8
61
Merge neural gas/SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
merge-context content of the winner
Ct
training w ? xt c ? Ct
62
General recrusive SOM
xt,xt-1,xt-2,,x0
(w,c)
xt w2
xt
Ct - c2
Context RSOM/TKM neuron itself MSOM winner
content SOMSD winner index RecSOM all
activations
Ct
training w ? xt c ? Ct
xt-1,xt-2,,x0
63
General recursive SOM

Experiment
Mackey-Glass time series
100 neurons
different lattices
different contexts
evaluation by the temporal quantization error

average(mean activity k steps into the past -
observed activity k steps into the past)2
64
General recursive SOM
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
65
Merge SOM

Theory (capacity)
MSOM can simulate finite automata
TKM/RSOM cannot
? MSOM is strictly more powerful than TKM/RSOM!

state
input
state
d
state
input (1,0,0,0)
66
General recursive SOM
for normalised WTA context
67
Conclusions unsupervised networks

Recurrence for unsupervised networks
allows mining/visualization of temporal contexts
different choices yield different
capacity/efficiency
ongoing topic of research

68
Structures and neural networks

Feedforward
rule insertion/extraction to link symbolic
descriptions and neural networks
kernels for structures to process structured
outputs
Recursive networks for structure processing
standard recursive networks for dealing with
tree-structured inpputs
contextual recursive cascade correlation for
IO-isomorphic transductions on acyclic graphs
Unsupervised networks
variety of recurive SOM models with different
capacity
relevant choice context representation