Neural Networks and Backpropagation - PowerPoint PPT Presentation

About This Presentation
Title:

Neural Networks and Backpropagation

Description:

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000 Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting ... – PowerPoint PPT presentation

Number of Views:335
Avg rating:3.0/5.0
Slides: 45
Provided by: SCS106
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: Neural Networks and Backpropagation


1
Neural Networks andBackpropagation
  • Sebastian Thrun
  • 15-781, Fall 2000

2
Outline
  • Perceptrons
  • Learning
  • Hidden Layer Representations
  • Speeding Up Training
  • Bias, Overfitting and Early Stopping
  • (Example Face Recognition)

3
ALVINN drives 70mph on highways
Dean Pomerleau CMU
4
ALVINN drives 70mph on highways
5
Human Brain
6
Neurons
7
Human Learning
  • Number of neurons 1010
  • Connections per neuron 104 to 105
  • Neuron switching time 0.001 second
  • Scene recognition time 0.1 second
  • 100 inference steps doesnt seem much

8
The Bible (1986)
9
Perceptron
10
Inverter
input x1 output
0 1
1 0
x1
11
Boolean OR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 1
x2
x1
12
Boolean AND
input x1 input x2 ouput
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
13
Boolean XOR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
Eeek!
x2
x1
14
Linear Separability
x2
x1
15
Linear Separability
x2
-

AND
x1
-
-
16
Linear Separability
x2

-
XOR
x1

-
17
Boolean XOR
o
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 0
h1
h1
x1
x1
18
Perceptron Training Rule
19
Converges, if
  • training data linearly separable
  • step size h sufficiently small
  • no hidden units

20
How To Train Multi-Layer Perceptrons?
o
  • Gradient descent

h1
h1
x1
x1
21
Sigmoid Squashing Function
o u t p u t
w0
w2
wn
w1
x01
. . .
x2
xn
x1
i n p u t
22
Sigmoid Squashing Function
s(x)
x
23
Gradient Descent
  • Learn wis that minimize squared error

24
Gradient Descent
25
Gradient Descent (single layer)
26
Batch Learning
  • Initialize each wi to small random value
  • Repeat until termination
  • Dwi 0
  • For each training example d do
  • od ? s(?i wi xi,d)
  • Dwi ? Dwi h (td - od) od (1-od) xi,d
  • wi ? wi Dwi

27
Incremental (Online) Learning
  • Initialize each wi to small random value
  • Repeat until termination
  • For each training example d do
  • Dwi 0
  • od ? ?i wi xi,d
  • Dwi ? Dwi h (td - od) od (1-od) xi,d
  • wi ? wi Dwi

28
Backpropagation Algorithm
  • Generalization to multiple layers and multiple
    output units

29
Backpropagation Algorithm
  • Initialize all weights to small random numbers
  • For each training example do
  • For each hidden unit h
  • For each output unit k
  • For each output unit k
  • For each hidden unit h
  • Update each network weight wij

with
30
Backpropagation Algorithm
31
Can This Be Learned?
Input Output
10000000 ? 10000000
01000000 ? 01000000
00100000 ? 00100000
00010000 ? 00010000
00001000 ? 00001000
00000100 ? 00000100
00000010 ? 00000010
00000001 ? 00000001
32
Learned Hidden Layer Representation
Input Output
10000000 ? .89 .04 .08 ? 10000000
01000000 ? .01 .11 .88 ? 01000000
00100000 ? .01 .97 .27 ? 00100000
00010000 ? .99 .97 .71 ? 00010000
00001000 ? .03 .05 .02 ? 00001000
00000100 ? .22 .99 .99 ? 00000100
00000010 ? .80 .01 .98 ? 00000010
00000001 ? .60 .94 .01 ? 00000001
33
Training Internal Representation
34
Training Error
35
Training Weights
36
ANNs in Speech Recognition
Haung/Lippman 1988
37
Speeding It Up Momentum
error E
wij
weight wij
38
Convergence
  • May get stuck in local minima
  • Weights may diverge
  • but works well in practice

39
Overfitting in ANNs
40
Early Stopping (Important!!!)
  • Stop training when error goes up on validation set

41
Sigmoid Squashing Function
s(x)
x
42
ANNs for Face Recognition
Head pose (1-of-4) 90 accuracy Face
recognition (1-of-20) 90 accuracy
43
ANNs for Face Recognition
44
Recurrent Networks
Write a Comment
User Comments (0)
About PowerShow.com