Neural Networks and Backpropagation

About This Presentation

Title:

Neural Networks and Backpropagation

Description:

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000 Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting ... – PowerPoint PPT presentation

Number of Views:335

Avg rating:3.0/5.0

Slides: 45

Provided by: SCS106

Learn more at: http://web.cecs.pdx.edu

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks and Backpropagation

1
Neural Networks andBackpropagation

Sebastian Thrun
15-781, Fall 2000

2
Outline

Perceptrons
Learning
Hidden Layer Representations
Speeding Up Training
Bias, Overfitting and Early Stopping
(Example Face Recognition)

3
ALVINN drives 70mph on highways
Dean Pomerleau CMU
4
ALVINN drives 70mph on highways
5
Human Brain
6
Neurons
7
Human Learning

Number of neurons 1010
Connections per neuron 104 to 105
Neuron switching time 0.001 second
Scene recognition time 0.1 second
100 inference steps doesnt seem much

8
The Bible (1986)
9
Perceptron
10
Inverter
input x1 output
0 1
1 0
x1
11
Boolean OR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 1
x2
x1
12
Boolean AND
input x1 input x2 ouput
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
13
Boolean XOR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
Eeek!
x2
x1
14
Linear Separability
x2
x1
15
Linear Separability
x2
-

AND
x1
-
-
16
Linear Separability
x2

-
XOR
x1

-
17
Boolean XOR
o
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
h1
h1
x1
x1
18
Perceptron Training Rule
19
Converges, if

training data linearly separable
step size h sufficiently small
no hidden units

20
How To Train Multi-Layer Perceptrons?
o

Gradient descent

h1
h1
x1
x1
21
Sigmoid Squashing Function
o u t p u t
w0
w2
wn
w1
x01
. . .
x2
xn
x1
i n p u t
22
Sigmoid Squashing Function
s(x)
x
23
Gradient Descent

Learn wis that minimize squared error

24
Gradient Descent
25
Gradient Descent (single layer)
26
Batch Learning

Initialize each wi to small random value
Repeat until termination
Dwi 0
For each training example d do
od ? s(?i wi xi,d)
Dwi ? Dwi h (td - od) od (1-od) xi,d
wi ? wi Dwi

27
Incremental (Online) Learning

Initialize each wi to small random value
Repeat until termination
For each training example d do
Dwi 0
od ? ?i wi xi,d
Dwi ? Dwi h (td - od) od (1-od) xi,d
wi ? wi Dwi

28
Backpropagation Algorithm

Generalization to multiple layers and multiple
output units

29
Backpropagation Algorithm

Initialize all weights to small random numbers
For each training example do
For each hidden unit h
For each output unit k
For each output unit k
For each hidden unit h
Update each network weight wij

with
30
Backpropagation Algorithm
31
Can This Be Learned?
Input Output
10000000 ? 10000000
01000000 ? 01000000
00100000 ? 00100000
00010000 ? 00010000
00001000 ? 00001000
00000100 ? 00000100
00000010 ? 00000010
00000001 ? 00000001
32
Learned Hidden Layer Representation
Input Output
10000000 ? .89 .04 .08 ? 10000000
01000000 ? .01 .11 .88 ? 01000000
00100000 ? .01 .97 .27 ? 00100000
00010000 ? .99 .97 .71 ? 00010000
00001000 ? .03 .05 .02 ? 00001000
00000100 ? .22 .99 .99 ? 00000100
00000010 ? .80 .01 .98 ? 00000010
00000001 ? .60 .94 .01 ? 00000001
33
Training Internal Representation
34
Training Error
35
Training Weights
36
ANNs in Speech Recognition
Haung/Lippman 1988
37
Speeding It Up Momentum
error E
wij
weight wij
38
Convergence

May get stuck in local minima
Weights may diverge
but works well in practice

39
Overfitting in ANNs
40
Early Stopping (Important!!!)

Stop training when error goes up on validation set

41
Sigmoid Squashing Function
s(x)
x
42
ANNs for Face Recognition
Head pose (1-of-4) 90 accuracy Face
recognition (1-of-20) 90 accuracy
43
ANNs for Face Recognition
44
Recurrent Networks

Write a Comment

User Comments (0)