Neuroinformatics - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Neuroinformatics

Description:

meinauto(X) :- left(Y,X),holl ndisch(Y),rot(Y) ... Holographic reduced representation [Plate, 95] circular correlation/convolution ... – PowerPoint PPT presentation

Number of Views:474
Avg rating:3.0/5.0
Slides: 69
Provided by: institutfu5
Category:

less

Transcript and Presenter's Notes

Title: Neuroinformatics


1
Neuroinformatics
  • Barbara Hammer, Institute of Informatics,
  • Clausthal University of Technology

2
problems in winter
3
hartelijk bedankt
meinauto(X) - left(Y,X),holländisch(Y),rot(Y). me
inauto(X) - kennzeichen(X,Y), enthält(Y,Z),enthäl
t(Y,14). enthält(X_,X). .
???
?? !!!
GS-Z-14
pattern recognition ? vectors speech recognition,
time series processing ? sequences symbolic,
general data ? tree structures, graphs
4
Part 5 Symbolic structures and neural networks
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recursive neural networks
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures
  • Unsupervised methods
  • Recursive SOM/NG

5
The good old days KBANN and co.
feedforward neural network
  • black box
  • distributed representation
  • connection to rules for symbolic I/O ?

y
x
neuron
fw Rn ? Ro
6
The good old days KBANN and co.
  • Knowledge Based Artificial Neural Networks
    Towell/Shavlik, AI 94
  • start with a network which represents known rules
  • train using additional data
  • extract a set of symbolic rules after training

7
The good old days KBANN and co.
8
The good old days KBANN and co.
data
train
use some form of backpropagation, add a penalty
to the error e.g. for changing the weights
  • The initial network biases the training result,
    but
  • There is no guarantee that the initial rules are
    preserved
  • There is no guarantee that the hidden neurons
    maintain their semantic

9
The good old days KBANN and co.
(complete) rules
  • There is no exact direct correspondence of a
    neuron and a single rule/boolean variable
  • It is NP complete to find a minimum logical
    description for a trained network Golea,
    AISB'96
  • Therefore, a couple of different rule extraction
    algorithms have been proposed, and this is still
    a topic of ongoing research

10
The good old days KBANN and co.
(complete) rules
decompositional approach
pedagogical approach
11
The good old days KBANN and co.
  • Decompositional approaches
  • subset algorithm, MofN algorithm describe single
    neurons by sets of active predecessors
    Craven/Shavlik, 94
  • local activation functions (RBF like) allow an
    approximate direct description of single neurons
    Andrews/Geva, 96
  • MLP2LN biases the weights towards 0/-1/1 during
    training and can then extract exact rules Duch
    et al., 01
  • prototype based networks can be decomposed along
    relevant input dimensions by decision tree nodes
    Hammer et al., 02
  • Observation
  • usually some variation of if-then rules is
    achieved
  • small rule sets are only achieved if further
    constraints guarantee that single weights/neurons
    have a meaning
  • tradeoff between accuracy and size of the
    description

12
The good old days KBANN and co.
  • Pedagogical approaches
  • extraction of conjunctive rules by extensive
    search Saito/Nakano 88
  • interval propagation Gallant 93, Thrun 95
  • extraction by minimum separation
    Tickle/Diderich, 94
  • extraction of decision trees Craven/Shavlik, 94
  • evolutionary approaches Markovska, 05
  • Observation
  • usually some variation of if-then rules is
    achieved
  • symbolic rule induction required with a little
    (or a bit more) help of a neural network

13
The good old days KBANN and co.
  • Where is this good for?
  • Nobody (inside neuroinformatics) uses FNNs these
    days ?
  • Insertion of prior knowledge might be valuable.
    But efficient training algorithms allow to
    substitute this by additional training data
    (generated via rules) ?
  • Validation of the network output might be
    valuable, but there exist alternative (good)
    guarantees from statistical learning theory ?
  • If-then rules are not very interesting since
    there exist good symbolic learners for learning
    propositional rules for classification ?
  • Propositional rule insertion/extraction is often
    an essential part of more complex rule
    insertion/extraction mechanisms ?
  • one can extend it to automaton insertion/extractio
    n for recurrent networks (basically, the
    transition function of the automaton is
    inserted/extracted this way) ?
  • People e.g. in the medical domain also want an
    explanation for a classification ?
  • extension beyond propositional rules interesting
    ?
  • There is at least one application domain (inside
    neuroinformatics) where if-then rules are very
    interesting and not so easy to learn
    fuzzy-control ?

14
Useful neurofuzzy systems
process
input
observation
control
Fuzzy control
if (observation ? FMI) then (control ? FMO)
15
Useful neurofuzzy systems
Fuzzy control
if (observation ? FMI) then (control ? FMO)
Neurofuzzy control
Benefit the form of the fuzzy rules (i.e. neural
architecture) and the shape of the fuzzy sets
(i.e. neural weights) can be learned from data!
16
Useful neurofuzzy systems
  • NEFCON implements Mamdani control
    Nauck/Klawonn/Kruse, 94
  • ANFIS implements Takagi-Sugeno control Jang, 93
  • and many other
  • Learning
  • of rules evolutionary or clustering
  • of fuzzy set parameters backpropagation or some
    form of Hebbian learning

17
State of the art structure kernels
kernel k(x,x)
data
just compute pairwise distances for this complex
data using structure information
sets, sequences, tree structures, graph
structures
18
State of the art structure kernels
  • Closure properties of kernels Haussler, Watkins
  • Principled problems for complex structures
    computing informative graph kernels is at least
    as hard as graph isomorphism Gärtner
  • Several promising proposals - taxonomy Gärtner

derived from local transformations
semantic
count common substructures
derived from a probabilistic model
syntax
19
State of the art structure kernels
  • Count common substructures

GA AG AT
Efficient computation dynamic programming suffix
trees
GAGAGA
3 2 0
3
GAT
1 0 1
locality improved kernel Sonnenburg et al., bag
of words Joachims string kernel Lodhi et al.,
spectrum kernel Leslie et al. word-sequence
kernel Cancedda et al.
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
graph kernels based on paths or subtrees Gärtner
et al.,Kashima et al. kernels for prolog trees
based on similar symbols Passerini/Frasconi/deRae
dt
20
State of the art structure kernels
  • Derived from a probabilistic model

describe by probabilistic model P(x)
compare characteristics of P(x)
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
21
State of the art structure kernels
  • Derived from local transformations

is similar to
expand to a global kernel
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
22
State of the art structure kernels
  • Intelligent preprocessing (kernel extraction)
    allows an adequate integration of
    semantic/syntactic structure information
  • This can be combined with state of the art neural
    methods such as SVM
  • Very promising results for
  • Classification of documents, text Duffy, Leslie,
    Lodhi,
  • Detecting remote homologies for genomic sequences
    and further problems in genome analysis
    Haussler, Sonnenburg, Vert,
  • Quantitative structure activity relationship in
    chemistry Baldi et al.

23
Conclusions feedforward networks
  • propositional rule insertion and extraction (to
    some extend) are possible ?
  • useful for neurofuzzy systems ?
  • structure-based kernel extraction followed by
    learning with SVM yields state of the art results
    ?
  • but sequential instead of fully integrated
    neuro-symbolic approach ?
  • FNNs itself are restricted to flat data which can
    be processed in one shot. No recurrence ?

24
Recursive networks The general idea recursive
distributed representations
  • How to turn tree structures/acyclic graphs into a
    connectionist representation?

25
The general idea recursive distributed
representations
recursion!
inp.
fRixRcxRc?Rc yields
output
cont.
fenc where fenc(?)0 fenc(a(l,r)) f(a,fenc(l),fen
c(r))
cont.
26
The general idea recursive distributed
representations
encoding fenc(Rn)2?Rc fenc(?)
0 fenc(a(l,r)) f(a,fenc(l),fenc(r))
right
decoding hdec Ro ?(Rn)2 hdec(0) ? hdec(x)
h0(x) (hdec(h1(x)), hdec(h2(x)))
fRn2c?Rc
gRc?Ro
hRo?Rn2o
27
The general idea recursive distributed
representations
  • recursive distributed description Hinton,90
  • general idea without concrete implementation ?
  • tensor construction Smolensky, 90
  • encoding/decoding given by (a,b,c) ? a?b?c
  • increasing dimensionality ?
  • Holographic reduced representation Plate, 95
  • circular correlation/convolution
  • fixed encoding/decoding with fixed dimensionality
    (but potential loss of information) ?
  • necessity of chunking or clean-up for decoding ?
  • Binary spatter codes Kanerva, 96
  • binary operations, fixed dimensionality,
    potential loss
  • necessity of chunking or clean-up for decoding ?
  • RAAM Pollack,90, LRAAM Sperduti, 94
  • trainable networks, trained for the identity,
    fixed dimensionality
  • encoding optimized for the given training set ?

28
The general idea recursive distributed
representations
  • Nevertheless results not promising ?
  • Theorem Hammer
  • There exists a fixed size neural network which
    can uniquely encode tree structures of arbitrary
    depth with discrete labels ?
  • For every code, decoding of all trees up to
    height T requires O(2T) neurons for sigmoidal
    networks ?
  • ? encoding seems possible, but no fixed size
    architecture exists for decoding

29
One breakthrough recursive networks
  • Recursive networks Goller/Küchler, 96
  • do not use decoding
  • combine encoding and mapping
  • train this combination directly for the given
    task with backpropagation through structure
  • ? efficient data and problem adapted encoding is
    learned

encoding
transformation
y
30
One breakthrough recursive networks
  • Applications
  • term classification Goller, Küchler, 1996
  • automated theorem proving Goller, 1997
  • learning tree automata Küchler, 1998
  • QSAR/QSPR problems Schmitt, Goller, 1998
    Bianucci, Micheli, Sperduti, Starita, 2000
    Vullo, Frasconi, 2003
  • logo recognition, image processing Costa,
    Frasconi, Soda, 1999, Bianchini et al. 2005
  • natural language parsing Costa, Frasconi, Sturt,
    Lombardo, Soda, 2000,2005
  • document classification Diligenti, Frasconi,
    Gori, 2001
  • fingerprint classification Yao, Marcialis, Roli,
    Frasconi, Pontil, 2001
  • prediction of contact maps Baldi, Frasconi,
    Pollastri, Vullo, 2002
  • protein secondary structure prediction Frasconi
    et al., 2005

31
One breakthrough recursive networks
  • application prognosis of contact maps of proteins

x1x2x3x4x5x6x7x8x9x10
x1x2x3x4x5x6x7x8x9x10
(x2,x3)
x1x2x3x4x5x6x7x8x9x10
x1x2x3x4x5x6x7x8x9x10
0
32
One breakthrough recursive networks
x1x2x3x4x5x6x7x8x9x10
PDB
PDBselect(Ct,nCt,dist.truePos) 6?
0.71,0.998,0.59 12? 0.43,0.987,0.55
Pollastri,Baldi,Vullo,Frasconi, NIPS2002

33
One breakthrough recursive networks
Theory approximation completeness - for every
(reasonable) function f and egt0 exists a RecNN
which approximates f up to e (with appropriate
distance measure) Hammer
34
Going on towards more complex structures
  • Planar graphs

Baldi,Frasconi,,2002
35
Going on towards more complex structures
  • Acyclic graphs

q1
Contextual cascade correlation Micheli,Sperduti,0
3
36
Contextual cascade correlation
Cascade Correlation 1990,Fahlmann/Lebiere given
data (x,y) in RnxR, find f such that f(x)y
minimize the error on the given data
x
y
maximize the correlation of the units output and
the current error ? the unit can serve for error
correction in subsequent steps
hi(x)fi(x,h1(x), ..., hi-1(x))
etc.
37
Contextual cascade correlation
  • few, cascaded, separately optimized neurons
  • ? efficient training
  • ? excellent generalization
  • .. as shown e.g. for two spirals

38
Contextual cascade correlation
for trees recursive processing of the structure
starting at the leaves towards the root
q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) gives
the context
q-1
acyclic!
q-1
h1(v) f1(l(v),h1(ch1(v)),...,h1(chk(v)))
q-1
q-1
... not possible since weights are frozen after
adding a neuron!
h2(v) f2(l(v),h2(ch1(v)),...,h2(chk(v)),
h1(v),h1(ch1(v)),...,h1(chk(v)))
etc. no problem!
39
Contextual cascade correlation
  • cascade correlation for trees
  • init
  • repeat
  • add hi
  • train fi(l(v), q-1(hi(v)),
  • h1(v), q-1(h1(v)),
  • ...,hi-1(v), q-1(hi-1(v)))
  • on the correlation
  • train the output on the error

40
Contextual cascade correlation
restricted recurrence allows to look at
parents q-1(hi(v))(hi(ch1(v)),...,hi(chk(v))) q
1(hi(v))(hi(pa1(v)),...,hi(pak(v)))
q-1
q1
q-1
... would yield cycles
q1
possible due to restricted recurrence!
q-1
contextual cascade correlation hi(v)fi(l(v),q-1(
hi(v)),h1(v),q-1(h1(v)),q1(h1(v))
...,hi-1(v),q-1(hi-1(v)),q1(hi-1(v)))
41
Contextual cascade correlation
q1 extends the context of hi
i3
i1
i2
42
Contextual cascade correlation
QSPR-problem Micheli,Sperduti,Sona predict the
boiling point of alkanes (in oC). Alkanes
CnH2n2, methan, ethan, propan, butan, pentan,
hexan
2-methyl-pentan
the larger n and the more side strands, the
higher the boiling point ? excellent benchmark
43
Contextual cascade correlation
structure for alkanes
CH3(CH2(CH2(CH(CH2(CH3),CH(CH3,CH2(CH3))))))
44
Contextual cascade correlation
Alkane 150 examples, bipolar encoding of
symbols, 10-fold cross-validation, number of
neurons 137 (CRecCC), 110/140 (RecCC), compare to
FNN with direct encoding for restricted length
Cherqaoui/Vilemin codomain -164,174
45
Contextual cascade correlation
visualization of the contexts
46
Contextual cascade correlation
QSAR-problem Micheli,Sperduti,Sona predict the
activity (IC50) of benzodiazepines

tranquillizer/analgesic Valiquid, Rohypnol,
Noctamid, .. function potentiates the inhibiting
neurotransmitter GABA problem cross tolerance,
cross dependencies ? very important to take an
accurate dose
47
Contextual cascade correlation
  • structure for benzodiazepines

e.g. bdz(h,f,h,ph,h,h,f,h,h) bdz(h,h,h,ph,c3(h,h,
h),h,c3(h,h,h),h,h) bdz(c3(h,h,o1(c3(h,h,h))),h,h,
ph,h,h,n2(o,o),h,h)
48
Contextual cascade correlation
  • Benzodiazepine 79 examples, bipolar encoding,
  • 5 test points taken from Hadjipavlou/Hansch and
    additional folds,
  • max number of neurons 13-40

49
Contextual cascade correlation
50
Contextual cascade correlation
  • Supersource transductions
  • IO-isomorphic tranductions

real number
51
Contextual cascade correlation
  • Approximation completeness
  • recursive cascade correlation is approximation
    complete for trees and supersource transductions
  • contextual recursive cascade correlation is
    approximation complete for acyclic graph
    structures (with a mild structural condition) and
    IO-isomorphic transduction

52
Conclusions recursive networks
  • Very promising neural architectures for direct
    processing of tree structures ?
  • Successful applications and mathematical
    background ?
  • Connections to symbolic mechanisms (tree
    automata) ?
  • Extensions to more complex structures (graphs)
    are under development ?
  • A few approaches which achieve structured outputs
    ?

53
Unsupervised models
j(j1,j2)
self-organizing map network given by prototypes
wi ? Rn in a lattice
Hebbian learning based on examples xi and
neighborhood cooperation
x
j x - wj minimal
online learning iterate choose xi determine the
winner adapt all wj wj wj
?exp(-n(j,j0)/s2)(xi-wj)
?
54
Unsupervised models
  • Represent time series of drinks of the last night

55
Old recursive SOM models
  • Temporal Kohonen Map

x1,x2,x3,x4,,xt,
d(xt,wi) xt-wi ad(xt-1,wi)
training wi ? xt
Recurrent SOM
d(xt,wi) yt where yt (xt-wi) ayt-1
training wi ? yt
56
Old recursive SOM models
  • TKM/RSOM compute a leaky average of time series,
    i.e. they are robust w.r.t. outliers

57
Old recursive SOM models
  • TKM/RSOM compute a leaky average of time series
  • It is not clear how they can differentiate
    various contexts
  • no explicit context!

is the same as
58
Merge neural gas/ Merge SOM
(wj,cj) in Rnxn
  • explicit context, global recurrence
  • wj represents entry xt
  • cj repesents the context which

    equals the winner content of the last time step
  • distance d(xt,wj) axt-wj (1-a)Ct-cj
  • where Ct ?wI(t-1) (1-?)cI(t-1), I(t-1)
    winner in step t-1 (merge)
  • training wj ? xt, cj ? Ct

59
Merge SOM
  • Example 42 ? 33? 33? 34

C1 (42 50)/2 46
C2 (3345)/2 39
C3 (3338)/2 35.5
60
Merge neural gas/SOM
  • speaker identification, japanese vovel ae
    UCI-KDD archive
  • 9 speakers, 30 articulations each

time
12-dim. cepstrum
MNG, 150 neurons 2.7 test error MNG, 1000
neurons 1.6 test error rule based 5.9, HMM
3.8
61
Merge neural gas/SOM
xt,xt-1,xt-2,,x0
xt-1,xt-2,,x0
xt
(w,c)
xt w2
Ct - c2
merge-context content of the winner
Ct
training w ? xt c ? Ct
62
General recrusive SOM
xt,xt-1,xt-2,,x0
(w,c)
xt w2
xt
Ct - c2
Context RSOM/TKM neuron itself MSOM winner
content SOMSD winner index RecSOM all
activations
Ct
training w ? xt c ? Ct
xt-1,xt-2,,x0
63
General recursive SOM
  • Experiment
  • Mackey-Glass time series
  • 100 neurons
  • different lattices
  • different contexts
  • evaluation by the temporal quantization error

average(mean activity k steps into the past -
observed activity k steps into the past)2
64
General recursive SOM
SOM
quantization error
RSOM
NG
RecSOM
SOMSD
HSOMSD
MNG
now
past
65
Merge SOM
  • Theory (capacity)
  • MSOM can simulate finite automata
  • TKM/RSOM cannot
  • ? MSOM is strictly more powerful than TKM/RSOM!

state
input
state
d
state
input (1,0,0,0)
66
General recursive SOM
for normalised WTA context
67
Conclusions unsupervised networks
  • Recurrence for unsupervised networks
  • allows mining/visualization of temporal contexts
  • different choices yield different
    capacity/efficiency
  • ongoing topic of research

68
Structures and neural networks
  • Feedforward
  • rule insertion/extraction to link symbolic
    descriptions and neural networks
  • kernels for structures to process structured
    outputs
  • Recursive networks for structure processing
  • standard recursive networks for dealing with
    tree-structured inpputs
  • contextual recursive cascade correlation for
    IO-isomorphic transductions on acyclic graphs
  • Unsupervised networks
  • variety of recurive SOM models with different
    capacity
  • relevant choice context representation
Write a Comment
User Comments (0)
About PowerShow.com