Title: Tutorial: Neural methods for nonstandard data
1TutorialNeural methods for non-standard data
- Barbara Hammer, University of Osnabrück,
- Brijnesh J.Jain, Technical University of Berlin
2Non-standard data
3What are standard data?
mushroom dataset
real vectors (x1,x2,,xn) with appropriate
scaling low dimensionality n
wine data
and many other
4Why are standard data not sufficient?
well, sometimes they are,
lets consider one real-life example
5ouch!!!
rrrrrringgggg!!!
6Standard data
labeling of the training set
1
training
0
feature encoding
prediction
shape
apple/pear
softness
In this case, were happy with standard data
7Why are standard data not sufficient?
but sometimes they are not,
lets consider another real-life example
8(No Transcript)
9Non-standard data
standard real vectors with high dimensionality,
missing entries, inappropriate scaling,
sets
Rossi, Conan-Guez, El Golli Clustering
functional data with the SOM algorithm
Delannay, Rossi, Conan-Guez Functional radial
basis function network
Rossi, Conan-Guez Functional preprocessing for
multilayer perceptrons
functions
sequences
Micheli, Portera, Sperduti A preliminary
experimental comparison of recursive neural
networks and a tree kernel method for QSAR/QSPR
regression tasks
Bianchini, Maggini, Sarti, Scarselli Recursive
networks for processing graphs with labelled edges
tree structures
Geibel, Jain, Wysotzki SVM learning with the SH
inner product
Jain, Wysotzki The maximum weighted clique
problem and Hopfield networks
graph structures
10Non-standard data
a priori unlimited number of basic constituents
often described by real vectors
relations which include important information
11Methods to deal with non-standard data
12Methods to deal with non-standard data
a priori unlimited number of basic constituents
relations
standard feature encoding
similarity based approaches
recursive processing
13Standard feature encoding
(x1,x2,,xn)
pros and cons
? ESANN dimensionality reduction, data
normalization, independent component
analysis, feature ranking,
- depends on the application area
14Similarity-based approaches
wx k(x,x) d(w,x)
The similarity measure (dot product, kernel,
distance measure, ) is expanded to non-standard
data.
define the similarity measure
define, how to represent/adapt w
wx k(x,x) d(w,x)
x
real number
think about an efficient compu- tation
- functional networks
- unsupervised models
- kernel methods and SVM
15Recursive processing
recurrence
(x1,x2,,xn)
Each constituent of the structure is processed
separately within the context set by the
relations.
define how context is represented
(x1,x2,,xn)
define the order of processing
- partially recurrent systems
- fully recurrent systems
training,
output
16Similarity-based approaches1. functional
networks2. unsupervised models3. kernel
methods and SVM
17Functional networks
(x1,f(x1)) (x2,f(x2)) (xt,f(xt))
wf ?w(x)f(x)dx
f
for functional data
possibly high dimensional, missing
values, different sampling
functional data analysis Ramsay/Silverman
linear models Hastie/Mallows,Marx/Eilers,Cardot/
Ferraty/Sarda, James/Hastie
non-parametric models Ferraty/Vieu
? embed into the vector space of square
integrable functions
18Functional networks
(x1,f(x1)) (x2,f(x2)) (xt,f(xt))
wf ?w(x)f(x)dx
f
approximate by a finite sum
for functional data
... this session
neural networks for functions
Rossi/Conan-Guez
application of functional MLPs
operator networks for time dependent
functions Chen/Chen,Back/Chen
Delannay/Rossi/ Conan-Guez/Verleysen
functional multilayer perceptron
Rossi/Conan-Guez
approximation completeness of MLPs for general
input spaces Stinchcombe
functional RBFs
Rossi/Conan-Guez/ ElGolli
functional SOMs
19Unsupervised models
x
d(w,x) d(x,x)
distance measure, or distance matrix
for general data with distance measure
methods based on d(x,x) only MDS, ISOMAP,
ISODATA,
auto-encoder cost function
SOM for proximity data Graepel/Obermeyer
batch-SOM and generalized mean
SOM for general distance metric Kohonen
SOM for graph structures via edit distance Bunke
et al.
adaptation in discrete steps
20Kernel methods
x
k(x,x)
kernel matrix
for general data with kernel matrix
k must be positive semi-definite
design/closure properties of kernels
Haussler,Watkins
taxonomy Gärtner
concrete kernels for bioinformatics, text
processing, i.e. sequences, trees, graphs,
count common substructures
derived from local transformations
derived from a probabilistic model
semantic
syntax
21Kernel methods
GA AG AT
efficiency DP suffix trees
GAGAGA
3 2 0
3
GAT
1 0 1
kernel methods - common substructures
which substructures, partial matches
locality improved kernel Sonnenburg et al., bow
Joachims string kernel Lodhi et al., spectrum
kernel Leslie et al. word-sequence kernel
Cancedda et al.
strings
trees
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
Micheli/Portera/Sperduti
tree kernels in chemistry
graphs
graph kernels based on paths Gärtner et
al.,Kashima et al.
Geibel/Jain/Wysotzki
Schur/Hadamard product
22Kernel methods
describe by probabilistic model P(x)
compare vectors from P(x)
kernel methods probabilistic models
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
vector derived from one model, e.g. gradient of
log-likelihood
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
kernel derived from a separate model for each
data point
23Kernel methods
is similar to
expand to a global kernel
kernel methods local transformations
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
expansion via matrix exponentiation
24Recursive models1. partial recurrence2.
full recurrence
25Partial recurrence
current input
sequence
context
for recursive structures
tree structure
recurrent networks for sequences ? ESANN02
ct f(xt,ct-1)
recursive networks for tree structures
ct f(xt,cleft,cright)
- principled dynamics RAAM, etc. Pollack, Plate,
Sperduti, - recursive networks including training
Goller/Küchler, Frasconi/Gori/Sperduti,
Sperduti/Starita - applications logic, pictures, documents,
chemistry, bioinformatics, fingerprints, parsing,
Baldi, Bianchini, Bianucci, Brunak, Costa,
Diligenti, Frasconi, Goller, Gori, Hagenbuchner,
Küchler, Maggini, Micheli, Pollastri, Scarselli,
Schmitt, Sperduti, Starita, Soda, Vullo, - theory Bianchini, Frasconi, Gori, Hammer,
Küchler, Micheli, Scarselli, Sperduti,
Micheli/Portera/Sperduti
tree kernels in chemistry compared to recursive
networks
26Partial recurrence
direct. acyclic graphs
spatial data
undirected graphs
GAGAGA
for almost recursive structures
unlimited fan-out, no positioning
bicausal networks Baldi,Brunak,Frasconi,Pollastri
,Soda
generalized recursive networks Pollastri,Baldi,Vu
llo,Frasconi
Bianchini/Maggini/ Sarti/Scarselli
contextual cascade correlation Micheli,Sperduti,S
ona
extensions to labelled directed acyclic graphs
with unlimited fan-out and no positioning of the
children
27Partial recurrence
current input
distance w-x ???
context
recursive structures - unsupervised
which context?
leaky integration TKM, RSOM, SARDNET,
Euliano/Principe,Farkas/Miikkulainen,James/Miikku
lainen, Kangas, Koskela/Varsta/Heikkonen/Kaski,
Varsta/Heikkone/Lampinen, Chappell/Taylor,
Wiemer, recursive SOM Voegtlin SOMSD
Hagenbuchner/Sperduti/Tsoi MSOM
Strickert/Hammer
no explicit context, sequences
net activation, seq.
winner index, trees
winner content, seq.
overview Barretto/Araujo/Kremer,
Hammer/Micheli/Sperduti/Strickert general
framework theory Hammer/Micheli/Sperduti/Strick
ert
28Full recurrence
synchronous/ asynchronous update until convergence
for graph structures
Hopfield networks optimize an energy function ?
solve difficult problems Hopfield/Tank
graph matching problem
find structure preserving permutation matrix
solve a maximum clique problem in an association
graph
complexity unknown
29Full recurrence
max clique
structure match permutation
association graph
for graph structures
direct formulations Li/Nasrabadi,Lin et al.
self-amplification, deterministic annealing,
softmax for penalty terms Gold/Rangarajan/Mjolnes
s, Suganthan et al.
Jain/Wysotzki
solution via classical (advanced)
Hopfield-network, various applications, noise
Jain/Wysotzki
maximum weighted clique
solution via replicator dynamics Pelillo et al.
Geibel/Jain/Wysotzki
compute Schur- Hadamard product
30Conclusions
31Neural methods for non-standard data
standard feature encoding
recursive processing
partial recurrence full recurrence
similarity based approaches
functional networks unsupervised models kernel
methods and SVM
combinations thereof such as Geibel/Jain/Wysotzki
32(No Transcript)
33End of slide show, click to exit..