Title: Institute for Advanced Studies in Basic Sciences
1Institute for Advanced Studies in Basic Sciences
Zanjan
Kohonen Artificial Neural Networks in Analytical
Chemistry
Mahdi Vasighi
2Contents
- Introduction to Artificial Neural Network (ANN)
- Self Organizing Map ANN
- Kohonen ANN
- Applications
3Introduction
An artificial neural network (ANN), is a
mathematical model based on biological neural
networks.
In more practical terms neural networks are
non-linear statistical data modeling tools. They
can be used to model complex relationships
between inputs and outputs or to find patterns in
data.
4The basic types of goals or problems in
analytical chemistry for solution of which the
ANNs can be used are the following
- Election of samples from a large quantity of the
existing ones for further handling. - Classification of an unknown sample into a class
out of several pre-defined (known in advance)
number of existing classes. - Clustering of objects, i.e., finding the inner
structure of the measurement space to which the
samples belong. - Making models for predicting behaviors or
effects of unknown samples in a quantitative
manner.
5The first thing to be aware of in our
consideration of employing the ANNs is the nature
of the problem we are trying to solve
Supervised or Unsupervised
6Supervised Learning
The supervised problem means that the chemist has
already a set of experiments with known outcomes
for specific inputs at hand.
In this networks, structure consists of an
interconnected group of artificial neurons and
processes information using a connectionist
approach to computation.
7Unsupervised Learning
The unsupervised problem means that one deals
with a set of experimental data which have no
specific associated answers (or supplemental
information) attached.
In unsupervised problems (like clustering) it is
not necessary to know in advance to which cluster
or group the training objects Xs belongs. The
Network automatically adapts itself in such a way
that the similar input objects are associated
with the topological close neurons in the ANN.
8Kohonen Artificial Neural Networks
The Kohonen ANN offers considerably different
approach to ANNs. The main reason is that the
Kohonen ANN is a self-organizing system which
is capable to solve the unsupervised rather than
the supervised problems.
The Kohonen network is probably the closest of
all artificial neural networks architectures and
learning schemes to the biological neuron network
9As a rule, the Kohonen type of net is based on a
single layer of neurons arranged in a
two-dimensional plane having a well defined
topology
A defined topology means that each neuron has a
defiend number of neurons as nearest neighbors,
second-nearest neighbor, etc.
10The neighborhood of a neuron is usually arranged
either in squares or in hexagon.
In the Kohonen conception of neural networks, the
signal similarity is related to the spatial
(topological) relation among neurons in the
network.
11Competitive Learning
The Kohonen learning concept tries to map the
input so that similar signals excite neurons that
are very close together.
121st step an m-dimensional object Xs enters the
network and only one neuron from those in the
output layer is selected after input occurs, the
network selects the winner c (central)
according to some criteria. c is the neuron
having either the largest output in the entire
network
132nd step After finding the neuron c, its weight
vector are corrected to make its response closer
to input.
3rd step The weight of neighboring neurons must
be corrected as well. These corrections are
usually scaled down, depending on the distance
from c. Beside decreasing with increasing the
distance from c, it decreases with each iteration
step. (learning rate)
14XS
i
Input
(1i )
55
154th step After the correction have been made
the weights should be normalized to a constant
value, usually 1.
5th step The next object Xs is input and the
process repeated. After all objects are input
once, one epoch is completed.
160.2 0.4 0.1 0.4 0.5 0.5 0.1 0.3 0.6 0.6 0.8 0.0
0.7 0.2 0.9 0.2 0.4 0.3 0.3 0.1 0.8 0.9 0.2 0.4
0.5 0.1 0.5 0.0 0.6 0.3 0.7 0.0 0.1 0.2 0.9 0.1
1.0 0.0 0.1 0.1 0.2 0.3 0.8 0.7 0.4 0.7 0.2 0.7
1.0 0.2 0.6
0.34 0.80 0.52 0.76
1.28 0.46 0.80 1.18
0.82 0.30 0.76 0.44
1.06 0.32 1.18 1.16
Input vector
Winner
442
output
17Input vector
1.0 0.2 0.6
0.2 0.4 0.1 0.4 0.5 0.5 0.1 0.3 0.6 0.6 0.8 0.0
0.7 0.2 0.9 0.2 0.4 0.3 0.3 0.1 0.8 0.9 0.2 0.4
0.5 0.1 0.5 0.0 0.6 0.3 0.7 0.0 0.1 0.2 0.9 0.1
1.0 0.0 0.1 0.1 0.2 0.3 0.8 0.7 0.4 0.7 0.2 0.7
0.8 -0.2 0.5 0.6 -0.3 0.1 0.9 -0.1 0.0 0.4 -0.6 0.6
0.3 0.0 -0.3 0.8 -0.2 0.3 0.7 0.1 -0.2 0.1 0.0 0.2
0.5 0.1 0.1 1.0 -0.4 0.3 0.3 0.2 0.5 0.8 -0.7 0.5
0.0 0.2 0.5 0.9 0.0 0.3 0.2 -0.5 0.2 0.3 0.0 -0.1
0.40.9
1 0.9
0.80.9
0.60.9
18Top Map
After the training process accomplished, the
complete set of the training vectors is once more
run through the KANN. In this last run the
labeling of the neurons excited by the input
vector is made into the table called top map.
19Weight Map
The number of weights in each neuron is equal to
the dimension m of the input vector. Hence, in
each level of weight only data of one specific
variable are handled.
Trained KANN
20Toroidal Topology
W
3rd layer of neighbor neurons
21Analytical Applications
- Classification and Reaction monitoring
- Classification of photochemical and metabolic
reactions by Kohonen self-organizing maps is
demonstrated
- Changes in the 1H NMR spectrum of a mixture and
their interpretation in terms of chemical
reactions taking place.
- Difference between the 1H NMR spectra of the
products and the reactants as a descriptor of
reaction was introduced as input vector to
Kohonen self organizing map.
22Dataset Photochemical cycloadditions. This was
partitioned into a training set of 147 reactions
and a test set of 42 reactions, all manually
classified into seven classes. The 1H NMR spectra
were simulated from the molecular structures by
SPINUS.
- The input variables Reaction descriptors
derived from 1H NMR spectra. - Topology toroidal 1313 and 1515 for
photochemical reactions and 2929 for metabolic
reactions. - Neighbor Scaling function Linear decreasing
triangular with learning rate of 0.1 to 0 with
50-100 epoch - Winning neuron selection criteria Euclidean
distance.
23After the predictive models for the
classification of chemical reactions were
established on the basis of simulated NMR data,
their applicability to reaction data from mixed
sources (experimental and simulated) was
evaluated.
24A second dataset 911 metabolic reactions
catalyzed by transferases classified into eight
subclasses according to the Enzyme Commission
(E.C.) system.
resulting surface for such a SOM, each neuron
colored according to the Enzyme Commission
subclass of the reactions activating it, that
is, the second digit of the EC number.
25For photochemical reactions, The percentage of
correct classifications obtained for the training
and test sets by SOMs. Correct predictions could
be achieved for 94-99 of the training set and
for 81-88 of the test set.
For metabolic reactions, 94-96 of correct
predictions for SOMs. The test set was predicted
with 66-67 of accuracy by individual SOMs.
26Analytical Applications
A general problem in QSAR modeling is the
selection of most relevant descriptors.
27- Calibration and test set Selection
28References
- Chem.Int.Lab.sys. 38 (1997) 1-23
- Neural Networks For Chemists, An Introduction.
(Weinheim/VCH Publishers ) - Anal. Chem. 2007, 79, 854-862
- Current Computer-Aided Drug Design, 2005, 1,
73-78 - Acta Chimica Slovenica 1994, pp. 327-352
29Thanks