Title: Neural Networks and Backpropagation
1Neural Networks andBackpropagation
- Sebastian Thrun
- 15-781, Fall 2000
2Outline
- Perceptrons
- Learning
- Hidden Layer Representations
- Speeding Up Training
- Bias, Overfitting and Early Stopping
- (Example Face Recognition)
3ALVINN drives 70mph on highways
Dean Pomerleau CMU
4ALVINN drives 70mph on highways
5Human Brain
6Neurons
7Human Learning
- Number of neurons 1010
- Connections per neuron 104 to 105
- Neuron switching time 0.001 second
- Scene recognition time 0.1 second
- 100 inference steps doesnt seem much
8The Bible (1986)
9Perceptron
10Inverter
input x1 output
0 1
1 0
x1
11Boolean OR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 1
x2
x1
12Boolean AND
input x1 input x2 ouput
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
13Boolean XOR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
Eeek!
x2
x1
14Linear Separability
x2
x1
15Linear Separability
x2
-
AND
x1
-
-
16Linear Separability
x2
-
XOR
x1
-
17Boolean XOR
o
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
h1
h1
x1
x1
18Perceptron Training Rule
19Converges, if
- training data linearly separable
- step size h sufficiently small
- no hidden units
20How To Train Multi-Layer Perceptrons?
o
h1
h1
x1
x1
21Sigmoid Squashing Function
o u t p u t
w0
w2
wn
w1
x01
. . .
x2
xn
x1
i n p u t
22Sigmoid Squashing Function
s(x)
x
23Gradient Descent
- Learn wis that minimize squared error
24Gradient Descent
25Gradient Descent (single layer)
26Batch Learning
- Initialize each wi to small random value
- Repeat until termination
- Dwi 0
- For each training example d do
- od ? s(?i wi xi,d)
- Dwi ? Dwi h (td - od) od (1-od) xi,d
- wi ? wi Dwi
27Incremental (Online) Learning
- Initialize each wi to small random value
- Repeat until termination
- For each training example d do
- Dwi 0
- od ? ?i wi xi,d
- Dwi ? Dwi h (td - od) od (1-od) xi,d
- wi ? wi Dwi
28Backpropagation Algorithm
- Generalization to multiple layers and multiple
output units
29Backpropagation Algorithm
- Initialize all weights to small random numbers
- For each training example do
- For each hidden unit h
- For each output unit k
- For each output unit k
- For each hidden unit h
- Update each network weight wij
with
30Backpropagation Algorithm
31Can This Be Learned?
Input Output
10000000 ? 10000000
01000000 ? 01000000
00100000 ? 00100000
00010000 ? 00010000
00001000 ? 00001000
00000100 ? 00000100
00000010 ? 00000010
00000001 ? 00000001
32Learned Hidden Layer Representation
Input Output
10000000 ? .89 .04 .08 ? 10000000
01000000 ? .01 .11 .88 ? 01000000
00100000 ? .01 .97 .27 ? 00100000
00010000 ? .99 .97 .71 ? 00010000
00001000 ? .03 .05 .02 ? 00001000
00000100 ? .22 .99 .99 ? 00000100
00000010 ? .80 .01 .98 ? 00000010
00000001 ? .60 .94 .01 ? 00000001
33Training Internal Representation
34Training Error
35Training Weights
36ANNs in Speech Recognition
Haung/Lippman 1988
37Speeding It Up Momentum
error E
wij
weight wij
38Convergence
- May get stuck in local minima
- Weights may diverge
- but works well in practice
39Overfitting in ANNs
40Early Stopping (Important!!!)
- Stop training when error goes up on validation set
41Sigmoid Squashing Function
s(x)
x
42ANNs for Face Recognition
Head pose (1-of-4) 90 accuracy Face
recognition (1-of-20) 90 accuracy
43ANNs for Face Recognition
44Recurrent Networks