Title: Unsupervised Competitive Learning
1Unsupervised Competitive Learning
- WTA, SOM
- Hertz Ch. 9
- Transparencies based on Jianfengs course in
Sussex and others
2Unsupervised learning simple competitive
learning Biological background Neurons are
wired topographically, nearby neurons connect to
nearby neurons. In visual cortex, neurons are
organized in functional columns. Ocular
dominance columns one region responds to one
eye input Orientation columns one region
responds to one direction
3Self-organization as a principle of neural
development
4Competitive learning --finite resources
outputs compete to see which will win via
inhibitory connections between them --aim is
to automatically discover statistically salient
features of pattern vectors in training data
set feature detectors --can find clusters in
training data pattern space which can be used
to classify new patterns --basic structure
5 input layer fully connected to output layer
input to output layer connection feedforward
output layer compares activation's of units
following presentation of pattern vector x via
(sometimes virtual) inhibitory lateral
connections winner selected based on largest
activation winner-
takes-all (WTA) linear or binary activation
functions of output units. Very different
from previous (supervised) learning where we pay
our attention to input-output relationship, here
we will look at the pattern of connections
(weights)
6(No Transcript)
7Simple competitive learning algorithm Initialise
all weights to random values and normalise (so
that w 1)
loop until stopping criteria satisfied . choose
pattern vector x from training set Compute
distance between pattern and weight vectors
Xi - W find output unit with
largest activation ie winner i with
the property that Xi - W
lt X i - W update the weight vector of
winning unit only with W(t1) W(t)h (t) (Xi -
W (t)) end loop
8- NB choosing the largest output is the same as
choosing the vector w that is nearest to x since - w.x wTx w x cos(angle between x and
w) - w x if the angle is 0
- b) w - x2 (w1-x1) 2 (w2-x2) 2
- w1 2 w2 2 x1 2 x2 2 2(x1w1
x2w2) - w 2 x 2 - 2 wTx
- Since w 1 and x is fixed, minimising
w - x2 is equivalent to maximising wTx - Therefore, if we only really want the angle, we
can limit all inputs to x 1 (e.g. points on
a sphere)
9How does competitive learning work? can view the
points of all input vectors as in contact with
surface of hypersphere (in 2D a
circle) distance between points on surface of
hypersphere degree of similarity between
patterns
Outputs
10with respect to the incoming signal, the weight
of the yellow line is updated so that the vector
is rotated towards the incoming signal
W(t1) W(t)h (t) (Xi - W (t))
Thus the weight vector becomes more and more
similar to the input i.e. it is a feature
detector for that input
11When there are more input patterns, can see each
weight vector migrates from initial position
(determined randomly) to centre of gravity of a
cluster of the input vectors. Thus Discover
Clusters
Outputs
inputs
Eg on-line k-means. Nearest centre updated by
centrei(n1) centrei h (t) (Xi - centrei(t))
12(No Transcript)
13to compare withK-means algorithm
- (from Wikipedia)
- The K-means algorithm is an algorithm to cluster
objects based on attributes into k partitions.
The objective it tries to achieve is to minimize
total intra-cluster variance, or, the function - sum_i sum_j in Si (x_j µi)2
- where there are k clusters Si, i 1,2,...,k and
µi is the centroid or mean point of all the
points in Si. - The algorithm starts by partitioning the input
points into k initial sets, either at random or
using some heuristic data. It then calculates the
mean point, or centroid, of each set. It
constructs a new partition by associating each
point with the closest centroid. Then the
centroids are recalculated for the new clusters,
and algorithm repeated by alternate application
of these two steps until convergence, which is
obtained when the points no longer switch
clusters (or alternatively centroids are no
longer changed).
14Error minimization viewpoint Consider error
minimization across all patterns N in training
set. Aim is to decrease error E E
S Xi - W (t) 2 For winning unit k when
pattern is Xi the direction the weights need to
change in a direction (so as to perform gradient
descent) determined by (from previous lectures)
W(t1) W(t)h (t) (Xi - W (t)) Which is
the update rule for supervised learning
(remembering that in supervised learning, W is O,
the output of the neurons) ie replace W by O
and we recover the adaline/simple gradient
descent learning rule
15Leaky learning modify weights of both winning
and losing units but at different learning rates
where h w (t) gtgt h L (t) has the effect of
slowly moving losing units towards denser regions
pattern space. Many other ways as we will
discuss later on
16Vector Quantization Application of competitive
learning Idea Categorize a given set of input
vectors into M classes using competitive learning
algorithms, and then represent any vector just by
the class into which it falls Important use of
competitive learning (esp. in data
compressing) divides entire pattern space into a
number of separate subspaces set of M units
represent set of prototype vectors. new pattern
x is assigned to a class based on its closeness
to a prototype vector using Euclidean distances
17(No Transcript)
18Topographic maps Extend the ideas of competitive
learning to incorporate the neighborhood around
inputs and neurons We want a nonlinear
transformation of input pattern space onto output
feature space which preserves neighbourhood
relationship between the inputs -- a feature map
where nearby neurons respond to similar
inputs e.g. Place cells, orientation columns,
somatosensory cells etc Idea is that neurons
selectively tune to particular input patterns in
such a way that the neurons become ordered with
respect to each other so that a meaningful
coordinate system for different input features is
created
19Known as a Topographic map spatial locations
are indicative of the intrinsic statistical
features of the input patterns ie close in the
input gt close in the output
When the yellow input is active, the yellow
neuron is the winner. When the orange input is
active, we want the orange neuron to be the winner
20Eg Activity-based self-organization (von der
Malsburg, 1973)
incorporation of competitive and cooperative
mechanisms to generate feature maps using
unsupervised learning networks Biologically
motivated how can activity-based learning using
highly interconnected circuits lead to orderly
mapping of visual stimulus space onto cortical
surface? (visual-tectum map)
cortical units
visual space
21Kohonens self-organizing map (SOM) algorithm It
can perform dimensionality reduction SOM can
be viewed as a vector quantization type algorithm
22Kohonens self-organizing map (SOM) Algorithm
set time, t0 initialise learning rate h (0)
initialise size of neighbourhood function
initialize all weights to small random values
23Loop until stopping criteria satisfied Choose
pattern vector x from training set Compute
distance between pattern and weight vectors for
each output unit
x - Wi(t) Find winning unit from
minimum distance i x -
Wi(t) min x - Wi(t) Update
weights of winning and neighbouring units using
neighbourhood functions wij(t1)wij(t) h
(t) h(i,i,t) xj- wij(t) note that
when h(i,i)1 if ii and 0 otherwise, we have
the simple competitive
learning
24 Decrease size of neighbourhood
when t is large we have h(i,i,t)1 if ii
and 0 otherwise Decrease learning rate h
(t) when t is large we have h (t) 0
Increment time tt1 end loop Generally,
need a LARGE number of iterations
25Neighbourhood function Relates degree of weight
update to distance from winning unit, i to other
units in lattice. Typically a Gaussian
function When ii , distance is zero so
h1 Note h decreases monotonically with distance
from winning unit and is symmetrical about
winning unit Important that the size of
neighbourhood (width) decreases over time to
stabilize mapping, otherwise would get noisy
version of competitive learning
26Biologically, development (self-organization) is
governed by the gradient of a chemical element,
which generally preserves the topological
relation. Kohonen (and others) has shown that
the neighborhood function could be implemented by
a diffusing neuromodulatory gas such as nitric
oxide (which has been shown to be necessary for
learning esp. spatial learning) Thus there is
some biological motivation for the SOM
27Examples of mapping linear Kohonen maps onto 1
and 2 dimensional structures
28Mapping a 2-d Kohonen map onto a 2-d square
29Another 2d map
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37- The elastic ring can also be derivable from a
cost-function
A gradient-descent algorithm leads to the
elastic-ring rules.