Tutorial: Neural methods for nonstandard data - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Tutorial: Neural methods for nonstandard data

Description:

Neural methods for non-standard data. Barbara Hammer, University of ... diffusion kernel [Kondor/Lafferty, Lafferty/Lebanon, Vert/Kanehisa] expansion via matrix ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 34
Provided by: Barb371
Category:

less

Transcript and Presenter's Notes

Title: Tutorial: Neural methods for nonstandard data


1
TutorialNeural methods for non-standard data
  • Barbara Hammer, University of Osnabrück,
  • Brijnesh J.Jain, Technical University of Berlin

2
Non-standard data
3
What are standard data?
mushroom dataset
real vectors (x1,x2,,xn) with appropriate
scaling low dimensionality n
wine data
and many other
4
Why are standard data not sufficient?
well, sometimes they are,
lets consider one real-life example
5
ouch!!!
rrrrrringgggg!!!
6
Standard data
labeling of the training set
1
training
0
feature encoding
prediction
shape
apple/pear
softness
In this case, were happy with standard data
7
Why are standard data not sufficient?
but sometimes they are not,
lets consider another real-life example
8
(No Transcript)
9
Non-standard data
standard real vectors with high dimensionality,
missing entries, inappropriate scaling,
sets
Rossi, Conan-Guez, El Golli Clustering
functional data with the SOM algorithm
Delannay, Rossi, Conan-Guez Functional radial
basis function network
Rossi, Conan-Guez Functional preprocessing for
multilayer perceptrons
functions
sequences
Micheli, Portera, Sperduti A preliminary
experimental comparison of recursive neural
networks and a tree kernel method for QSAR/QSPR
regression tasks
Bianchini, Maggini, Sarti, Scarselli Recursive
networks for processing graphs with labelled edges
tree structures
Geibel, Jain, Wysotzki SVM learning with the SH
inner product
Jain, Wysotzki The maximum weighted clique
problem and Hopfield networks
graph structures
10
Non-standard data
a priori unlimited number of basic constituents
often described by real vectors
relations which include important information
11
Methods to deal with non-standard data
12
Methods to deal with non-standard data
a priori unlimited number of basic constituents
relations
standard feature encoding
similarity based approaches
recursive processing
13
Standard feature encoding
(x1,x2,,xn)
pros and cons
  • fast

? ESANN dimensionality reduction, data
normalization, independent component
analysis, feature ranking,
  • standard neural methods
  • depends on the application area
  • information loss
  • high dimensional data

14
Similarity-based approaches
wx k(x,x) d(w,x)
The similarity measure (dot product, kernel,
distance measure, ) is expanded to non-standard
data.
define the similarity measure
define, how to represent/adapt w
wx k(x,x) d(w,x)
x
real number
think about an efficient compu- tation
  • functional networks
  • unsupervised models
  • kernel methods and SVM

15
Recursive processing
recurrence
(x1,x2,,xn)
Each constituent of the structure is processed
separately within the context set by the
relations.
define how context is represented
(x1,x2,,xn)
define the order of processing
  • partially recurrent systems
  • fully recurrent systems

training,
output
16
Similarity-based approaches1. functional
networks2. unsupervised models3. kernel
methods and SVM
17
Functional networks
(x1,f(x1)) (x2,f(x2)) (xt,f(xt))
wf ?w(x)f(x)dx
f
for functional data
possibly high dimensional, missing
values, different sampling
functional data analysis Ramsay/Silverman
linear models Hastie/Mallows,Marx/Eilers,Cardot/
Ferraty/Sarda, James/Hastie
non-parametric models Ferraty/Vieu
? embed into the vector space of square
integrable functions
18
Functional networks
(x1,f(x1)) (x2,f(x2)) (xt,f(xt))
wf ?w(x)f(x)dx
f
approximate by a finite sum
for functional data
... this session
neural networks for functions
Rossi/Conan-Guez
application of functional MLPs
operator networks for time dependent
functions Chen/Chen,Back/Chen
Delannay/Rossi/ Conan-Guez/Verleysen
functional multilayer perceptron
Rossi/Conan-Guez
approximation completeness of MLPs for general
input spaces Stinchcombe
functional RBFs
Rossi/Conan-Guez/ ElGolli
functional SOMs
19
Unsupervised models
x
d(w,x) d(x,x)
distance measure, or distance matrix
for general data with distance measure
methods based on d(x,x) only MDS, ISOMAP,
ISODATA,
auto-encoder cost function
SOM for proximity data Graepel/Obermeyer
batch-SOM and generalized mean
SOM for general distance metric Kohonen
SOM for graph structures via edit distance Bunke
et al.
adaptation in discrete steps
20
Kernel methods
x
k(x,x)
kernel matrix
for general data with kernel matrix
k must be positive semi-definite
design/closure properties of kernels
Haussler,Watkins
taxonomy Gärtner
concrete kernels for bioinformatics, text
processing, i.e. sequences, trees, graphs,
count common substructures
derived from local transformations
derived from a probabilistic model
semantic
syntax
21
Kernel methods
GA AG AT
efficiency DP suffix trees
GAGAGA
3 2 0
3
GAT
1 0 1
kernel methods - common substructures
which substructures, partial matches
locality improved kernel Sonnenburg et al., bow
Joachims string kernel Lodhi et al., spectrum
kernel Leslie et al. word-sequence kernel
Cancedda et al.
strings
trees
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
Micheli/Portera/Sperduti
tree kernels in chemistry
graphs
graph kernels based on paths Gärtner et
al.,Kashima et al.
Geibel/Jain/Wysotzki
Schur/Hadamard product
22
Kernel methods
describe by probabilistic model P(x)
compare vectors from P(x)
kernel methods probabilistic models
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
vector derived from one model, e.g. gradient of
log-likelihood
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
kernel derived from a separate model for each
data point
23
Kernel methods
is similar to
expand to a global kernel
kernel methods local transformations
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
expansion via matrix exponentiation
24
Recursive models1. partial recurrence2.
full recurrence
25
Partial recurrence

current input
sequence

context
for recursive structures
tree structure
recurrent networks for sequences ? ESANN02
ct f(xt,ct-1)
recursive networks for tree structures
ct f(xt,cleft,cright)
  • principled dynamics RAAM, etc. Pollack, Plate,
    Sperduti,
  • recursive networks including training
    Goller/Küchler, Frasconi/Gori/Sperduti,
    Sperduti/Starita
  • applications logic, pictures, documents,
    chemistry, bioinformatics, fingerprints, parsing,
    Baldi, Bianchini, Bianucci, Brunak, Costa,
    Diligenti, Frasconi, Goller, Gori, Hagenbuchner,
    Küchler, Maggini, Micheli, Pollastri, Scarselli,
    Schmitt, Sperduti, Starita, Soda, Vullo,
  • theory Bianchini, Frasconi, Gori, Hammer,
    Küchler, Micheli, Scarselli, Sperduti,

Micheli/Portera/Sperduti
tree kernels in chemistry compared to recursive
networks
26
Partial recurrence
direct. acyclic graphs
spatial data
undirected graphs
GAGAGA
for almost recursive structures
unlimited fan-out, no positioning
bicausal networks Baldi,Brunak,Frasconi,Pollastri
,Soda
generalized recursive networks Pollastri,Baldi,Vu
llo,Frasconi
Bianchini/Maggini/ Sarti/Scarselli
contextual cascade correlation Micheli,Sperduti,S
ona
extensions to labelled directed acyclic graphs
with unlimited fan-out and no positioning of the
children
27
Partial recurrence

current input
distance w-x ???

context
recursive structures - unsupervised
which context?
leaky integration TKM, RSOM, SARDNET,
Euliano/Principe,Farkas/Miikkulainen,James/Miikku
lainen, Kangas, Koskela/Varsta/Heikkonen/Kaski,
Varsta/Heikkone/Lampinen, Chappell/Taylor,
Wiemer, recursive SOM Voegtlin SOMSD
Hagenbuchner/Sperduti/Tsoi MSOM
Strickert/Hammer
no explicit context, sequences
net activation, seq.
winner index, trees
winner content, seq.
overview Barretto/Araujo/Kremer,
Hammer/Micheli/Sperduti/Strickert general
framework theory Hammer/Micheli/Sperduti/Strick
ert
28
Full recurrence
synchronous/ asynchronous update until convergence
for graph structures
Hopfield networks optimize an energy function ?
solve difficult problems Hopfield/Tank
graph matching problem
find structure preserving permutation matrix
solve a maximum clique problem in an association
graph
complexity unknown
29
Full recurrence
max clique
structure match permutation
association graph
for graph structures
direct formulations Li/Nasrabadi,Lin et al.
self-amplification, deterministic annealing,
softmax for penalty terms Gold/Rangarajan/Mjolnes
s, Suganthan et al.
Jain/Wysotzki
solution via classical (advanced)
Hopfield-network, various applications, noise
Jain/Wysotzki
maximum weighted clique
solution via replicator dynamics Pelillo et al.
Geibel/Jain/Wysotzki
compute Schur- Hadamard product
30
Conclusions
31
Neural methods for non-standard data
standard feature encoding
recursive processing
partial recurrence full recurrence
similarity based approaches
functional networks unsupervised models kernel
methods and SVM
combinations thereof such as Geibel/Jain/Wysotzki
32
(No Transcript)
33
End of slide show, click to exit..
Write a Comment
User Comments (0)
About PowerShow.com