Title: Connectionist Networks
1Connectionist Networks
2Perceptron Learning
3Perceptron Learning
4Perceptron Learning
5Perceptron Learning
(x1,x2)
w0 w1x1 w2x2 0
d
6Perceptron Learning
7Perceptron Learning
8Perceptron Learning
9Perceptron Learning
10Perceptron Learning
Heaviside function
11Perceptron Learning
12Perceptron Learning
13Perceptron Learning
14Perceptron Learning
w0 -14.8, w1 2.7, w2 -2.9
(7,-5)
15Perceptron Learning
16Perceptron Learning
w0 -14.8, w1 2.7, w2 -2.9
(5,4)
17Perceptron Learning
18Perceptron Learning
w0 -14.8, w1 2.7, w2 -2.9
(5,4) -gt 0
(7,-5) -gt 1
19Perceptron Learning
- Assume we have a set of data inputs with binary
labels , and a neuron whose output
is bounded between 0 and 1. - Data points with a binary label of 1 can be
viewed as having a probability of occurrence of
and data points having a binary label of
0 can be viewed as having a probability of
occurrence of under our model
sigmoid distribution.
20Perceptron Learning
- Since our binary labels are all zeros
or ones a cleaver way to express the error in our
classifications is
21Perceptron Learning
22Perceptron Learning
23Perceptron Learning
24Perceptron Learning
25Perceptron Learning
26Perceptron Learning
- The Algorithm so far
- For each input/target pair (x(n),t(n)) (n 1
N), compute f(n) f(x(n) w), where - Define e(n) t(n) - f(n), and compute for each
weight wi - Then let
27Perceptron Learning
28Perceptron Learning
- The learning rate
- Recall that we previously said the weight
updates proceed using - The new weights then, are
- Sometimes, however, this update step can produce
weight changes that jump around too much and jump
right past the optimal decision boundary. To
avoid this problem we specify a learning rate
that makes each weight change a little smaller.
29Perceptron Learning
f(net)
30Perceptron Learning
bias
X1
X2
XD
31Backpropagation Learning
V1
V2
Vj
VJ
wij
HI
H1
H2
Hi
wki
X1
X2
XD
Xk
32Backpropagation Learning
V1
Output
1
1
-2
Hidden Layer
1
2
1
1
1
1
1
X1
X2
Input
33Backpropagation Learning
- Again assume we have a set of data inputs with
binary labels . - Denote the inputs X1, , Xk, XD, the hidden
units H1, , Hi , HI and the output units V1, ,
Vj, , VJ. - Denote the weight between input Xk and hidden
unit Hi by wki and the weight between hidden unit
Hi and visible output unit Vj by wij.
34Backpropagation Learning
V1
V2
Vj
VJ
wij
HI
H1
H2
Hi
wki
X1
X2
XD
Xk
35Backpropagation Learning
V1
V2
Vj
VJ
wij
HI
H1
H2
Hi
wki
X1
X2
XD
Xk
36Competitive Learning
37Competitive Learning
38Competitive Learning
39Competitive Learning
40Competitive Learning
41Competitive Learning
42Competitive Learning
43Competitive Learning
- The Algorithm
- Randomly pick K-means.
- Calculate the distance from each point to each
mean. The mean that is closest to a given point
wins that point. - Re-calculate each mean as the mean of the points
it has just won. - Iterate until the total distance moved by all of
the means is a very small number (like zero).
44Hebbian Coincidence Learning
visual
auditory
1
f
-1
0
45Hebbian Coincidence Learning
46Hebbian Coincidence Learning
1
w1 1 w2 1 w3 0
-1
x1 1 x2 1 x3 -1
2
0
1
wnew 1 1 0 0.2 ? (1) ? 1 1 -1 wnew 1.2
1.2 -0.2
47Hebbian Coincidence Learning
1
w1 1.2 w2 1.2 w3 -0.2
-1
x1 1 x2 1 x3 -1
2.6
0
1
wnew 1.2 1.2 -0.2 0.2 ? (1) ? 1 1
-1 wnew 1.4 1.4 -0.4
48Attractor Networks or Memories
49Attractor Networks or Memories
- For the weights, wij denotes the weight from
neuron j to neuron i. - A Hopfield network consists of I neurons. They
are fully connected through symmetric,
bidirectional connections with weights wij wji.
There are no self-connections, so wii 0 for
all i. Biases wi0 may be included as weights
coming from input x0 which is permanently set to
x0 1. - The output of neuron i is denoted by xi.
50Attractor Networks or Memories
- The activity at neuron i is the weighted sum of
the inputs from all the other neurons - The threshold function used is the hyperbolic
tangent function
51Attractor Networks or Memories
- The learning rule is intended to make a set of
desired memories x(n) be stable states of the
Hopfiled networks activity rule. Each memory is
a binary pattern, with xi ? -1, 1. - The weights are set using the Hebb rule
52Attractor Networks or Memories
53Attractor Networks or Memories
w1
w2
w3
w25
w1
0
w2
0
w3
0
0
w25
54Attractor Networks or Memories
w1
w2
w3
w25
w1
0
w2
0
w3
0
0
w25
55Attractor Networks or Memories
n 21
n 25
56Attractor Networks or Memories
w1
w2
w3
w25
w1
0
w2
0
w3
0
2
w25
0
2
57Attractor Networks or Memories
58Attractor Networks or Memories
1 1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 1 -1 -1 1 1 1 -1 1
-1 -1 -1 1
59Attractor Networks or Memories
60Attractor Networks or Memories
61Attractor Networks or Memories
25 x 1
25 x 25
25 x 25
25 x 25
62Attractor Networks or Memories
63Attractor Networks or Memories
- The Algorithm
- For a given set of n memories X(n) (say each
memory is 5 x 5, or has 25 units), compute
weights between all nodes of X using Hebbian
learning, setting all diagonal
weights to zero.
64Attractor Networks or Memories
- For a new presentation of a corrupted memory X,
initialize Xold to be a vector of ones (25 x 1
in this case). - Compute the activations using activation WX
- Compute the threshold outputs using Xnew
tanh(activation)
65Attractor Networks or Memories
- Compute the change in X from one iteration to the
next
change Xnew - Xold - Compute the gradient on the weights
- gw XnewchangeT
- gw gw gwT
- Update the weights
- Wnew Wold ??(gw - ?Wold)
- Stop of Xold Xnew otherwise set Xold Xnew
and iterate again, computing the new activations
with Xnew etc.