Title: DATA-MINING
1DATA-MINING
- Artificial Neural Networks
Alexey Minin, Jass 2006
2Teaching without the tutor introduction
ANN forms its output itself, according to the
information, presented for input. We have to
minimize some functional.After we have found this
functional we have to minimize it. It is the main
task, and according to this functional the input
vector will be changed.
In practice, adaptive networks code input
information in the most compact way, of course
according to some predefined requirements.
3Teaching without the tutor redundancy of data
- The length of data description
Dimension of data number of components of input
vector
Capacity of data number of bits, defining the
possible variety of all values
Two ways of coding (reducing) the information
Reducing the dimension of data with min loss
Reducing the variety of data by detecting the
prototypes
finding of independent features
Clustering and quantifying
4Two ways to reduce the data
Reducing the dimension allows us to describe the
data with less components
Clustering allows us to reduce the variety of
data,reducing the number of bits,we need to
describe the data.
NB
We can unite both types of algorithms. We can use
Kohonen maps, when prototypes regulate in the
space of low dimension. For example, input data
can be reflected on to 2-dimensional grid of
prototypes the way, you can visualize the data
you have.
5Main idea neuron - indicator
Neuron has one output and its teaching upon a
d-dimension data
Lets say that the activation function is linear.
The output therefore is the linear combination
of its outputs
The amplitude after the training is finished can
be the indicator for the data. Showing rather the
data corresponds for training patterns or not.
6Hebb training algorithm
According to Hebb
If we will reformulate the task as the
optimization task we will get the property of
such neuron and rule how to define functional we
have to min
NB! If we wont to have minimum of the E than we
will have an output amplitude equals to infinity
7Oja training rule
The member interfering was added to stop
unlimited growth of weights
Rule Oja maximizes sensitivity of an output
neuron at the limited amplitude of weights. It is
easy to be convinced of it, having equated
average change of weights to zero. Having
increased then the right part of equality on w.
We are convinced, that in balance
Thus, weights of trained neuron are located on
hyper sphere
At training on Oja, a vector of weights settles
down on hyper sphere, In a direction maximizing
Projection of input vectors.
8Oja training rule
SUMMARY Neuron is trying to reproduce the value
of its input for known output. It means that
its trying to maximize the sensitivity of its
output neurons-indicators for many dimensional
input information, doing compression this way.
NB! The output of the Oja output layer is the
linear combination of main components. If you
want to receive main components you should change
sum of all outputs
9The analysis of main components
Lets say that we have d-dimensional data
we are training m linear neurons
.
THE TASK IS
We want an amplitude to be independent indicators
of all output neurons, fully reflecting
information about many-dimensional data we have.
10The requirement
- Neurons must interact somehow (if we will train
them independently we will receive the same
result for all of them)
In simple case
Lets take perceptron with linear neuron for
hidden layer, in which the number of inputs and
outputs equals, and the weights with the same
indexes in both layers are the same. Lets try to
teach ANN to reproduce the input on the output.
Training rule therefore
Looks like Oya training rule!
11Self training layer
In our formulation the training of separate
neuron, is trying to reproduce the inputs
according to its outputs. Generalizing this note,
it is logical to suggest a rule,according to
which the value of outputs restoring according
to whole output information. Doing this way we
can get Oja training rule for one layer network
The hidden layer of such ANN, the same as Oya
layer,makes optimal coding of input data, and
contains maximum variety of data according to
existing restrictions.
12Example
Lets change activation function on the sigmoid in
the training rule
Brings new property (Oja, et al, 1991). Such
algorithm, in particular, was used for the
decomposition of mixed signals with an unknown
way (i.e. blind signal separation). For
example this task we have when we want to
separate human voice and noise.
13Competition of neurons the winner gets all
Basis algorithm The training of competition layer
remains constant
of neuron winner
The winner
The winner will be the neuron, which has the
maximum response
Training of winner
14The winner takes away not all
One of variants of updating of a base rule of
training of a competitive layer Consists in
training not only the neuron-winner, but also its
"neighbors", though and with In the smaller
speed. Such approach - "pulling up" of the
nearest to the winner neuron- It is applied in
topographical Kohonen cards
Function of the neighborhood is equal to unit for
the neuron- -winner with an index And
gradually falls down at removal from the
neuron-winner
Training on Kohonen reminds stretching an elastic
grid of prototypes on Data file from training
sample
15Methodology of self-organizing cards
Schematic representation of self-organizing
network
Training on Kohonen reminds stretching an elastic
grid of prototypes on Data file from training
sample
Neurons in the target layer are ordered and
correspond to cells of a bi-dimensional card
which can be painted by a principle of affinity
of attributes
16Visualization a topographical card, Induced by
i-th component of entrance data
The convenient tool of visualization Data is
coloring topographical Cards, it is similar to
how it do on Usual geographical cards.
All attribute of data generates the coloring
Cells of a card - on size of average value This
attribute at the data who have got in given Cell.
Having collected together cards of all
interesting Us of attributes, we shall receive
topographical The atlas, giving integrated
representation About structure of multivariate
data.
17Methodology of self-organizing cards
Classified SOM for NASDAQ100 index for the
period from 10-Nov-1997 till 27-Aug-2001
18Complexity of the algorithm
When its better to use reducing of dimension,
and when quantifying of the input information?
Number of training patterns
number of syn weights of 1 layer ANN with d
inputs m output neurons
quantifying
Reducing the dim
of operations
of operations
Compression coef (b capacity data)
Compression coef
Complexity
Complexity
With the same compression coef
19JPEG example
Image is divided on to 8x8 pixels, which should
be input vectors, we want to reduce. In our case
gradation of the gray accuracy of the
represented data
Lets propose that image contains
But if d64x64 than Kgt103
20Any questions?