Title: Neural Networks: how they learn
1Neural Networks how they learn
- Course 11
- Alexandra Cristea
- USI
2Perceptron, discrete neuron
- You have seen how a neuron (and a NN) can
represent information - First, simple case
- no hidden layers
- Only one neuron
- Get rid of threshold (- t) becomes w0
3Threshold function f
(w0 - t -1)
?
f
?
4O A or B
5O A and B
6- What is learning for a computer?
7Learning weight computation
- W1(A1)W2(A1)gt(t1)
- W1(A0)W2(A1)lt(t1)
- W1(A1)W2(A0)lt(t1)
- W1(A0)W2(A0)lt(t1)
8w0 w1 w2
Linearly Separable Set
w0w1x1w2x2
x1
x2
9w0 w1 w2
Linearly Separable Set
w0w1x1w2x2
x1
x2
10w0 w1 w2
Linearly Separable Set
w0w1x1w2x2
x1
x2
11Non Linearly Separable Set
w0 w1 w2
w0w1x1w2x2
x1
x2
12Perceptron Learning Ruleincremental version
ROSENBLATT (1962)
- FOR i 0 TO n DO wirandom initial value
ENDFOR - REPEAT
- select a pair (x,t) in X
- ( each pair must have a positive probability
of being selected ) - IF wT x' gt 0 THEN y1 ELSE y0 ENDIF
- IF y ? t THEN
- FOR i 0 TO n DO wi wi ? (t-y) xi' ENDFOR
ENDIF - UNTIL X is correctly classified
13Idea Perceptron Learning Rule
wi wi ? (t-y) xi'
t1 y0 (wTx?0)
wneww ?x
t0 y1 (wTxgt0)
wneww - ?x
14Perceptron Convergence Theorem
- Let X be a finite, linearly separable training
set. Let the initial weight vector and the
learning parameter be chosen arbitrarily. - Then for each infinite sequence of training pairs
from X, the sequence of weight vectors obtained
by applying the perceptron learning rule
converges in a finite number of steps.
15How much can one neuron learn?
16O or(x1,,xn)
wi gt t e.g., wi i or wi 3 etc.
17O and(x1,,xn)
?wi gtt/n i1..n ?wi ?t/n ink..nj (subsets)
wi 1/n 1/n2
18O or(x1, and (x2,x3) )
w17 w20,8 w30,7
19O or(and(x1,xk),and (xk1,xn) )
Any problem?
w1wk1/k 1/k2 wk1wn 1/(n-k)1/(n-k)2
20Non Linearly Separable Set
w0w1x1w2x2
w0 w1 w2
x1
x2
21Non Linearly Separable Set
w0w1x1w2x2
w0 w1 w2
x1
x2
22Non Linearly Separable Set
w0w1x1w2x2
w0 w1 w2
x1
x2
23Linear Separable Set Definition
- Consider a finite set Xx(i),t(i)) x(i) in Rn,
t(i)in 0,1. - The set X is called linearly separable if there
exists a vector - w (w0,w1,...,wn) in Rn1 such that for each pair
(x,t) in X - if (t1) then w0 sum_j1n wj xj gt 0
- if (t0) then w0 sum_j1n wj xj lt 0.
back
24Intro BP
- Disadvantages of discrete MLP lack of simple
learning algorithm - Continuous MLP several
- Most of them variants on a basic learning
algorithm error back propagation
25Backpropagation
- Most famous learning algorithm
- Uses a rule similar to WidrowHoff
- (slightly more complicated)
26BKPError
y1?t1
Hidden layer error?
27Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value (v1,v2) Internal activation
28Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
29Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
30BKPError
O1
y1?t1
I1
O2
O2, I2
Hidden layer error?
31Backpropagation to hidden layer
O2, I2
32Algorithms and their relations
dw?(t-y)xi
Discrete neuron
Perceptron Learning
Gradient Descent
Continuous neuron
Continuous neurons
BP
Delta Rule
dw ?(t-y)fxi ?(t-y)y(1-y)xi
dr Fr (t-yr) ds-1 Fs-1WsTds Ws ? ds ys-1T
33More Demos
- http//wwwis.win.tue.nl/acristea/HTML/NN/tutorial
/