Example of Backpropagation - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Example of Backpropagation

Description:

... can be applied to any acyclic directed graph of sigmoid units. Standard structure using two layers of sigmoid units (one hidden layer and one output layer) ... – PowerPoint PPT presentation

Number of Views:320

Avg rating:3.0/5.0

Slides: 20

Provided by: Sel80

Category:

more less

Transcript and Presenter's Notes

Title: Example of Backpropagation

1
Example of Backpropagation
2
ANN Illustrative Example Face Recognition
3
ANN Illustrative Example Face Recognition

Many target functions can be learned from the
image data
Identity of person
Direction which the person is facing left,
right, straight ahead, upward
Gender of the person
Whether or not wearing sunglasses
Specific task considered learning the direction
in which the person is facing (to their left,
right, straight ahead, or upward)

4
ANN Illustrative Example Face Recognition

Practical design choices in applying
Backpropagation
The Learning Task classifying camera images of
faces of various people in various poses
Image Database
624 grayscale images 20 different people approx
32 images per person
Various expressions (happy, sad, angry, neutral)
Different directions (left, right, straight
ahead, up)
Resolution of 120 ?128

5
ANN Illustrative Example Face Recognition

Specific task considered learning the direction
in which the person is facing (to their left,
right, straight ahead, or upward)
Without optimizing design choices, design
described here learns target function quite well
After training on a set of 260 images,
classification accuracy over a separate test set
is 90
Contrast the default accuracy by randomly
guessing one of the four face directions is 25

6
Design Choices

1. Input Encoding
How to encode the image image vs features
2. Output Encoding
No of output units, target values for output
units
3. Network Graph Structure
No of units and network and interconnection
4. Other Learning Algorithm Parameter
Learning rate eta
Momentum alpha

7
1. Input Encoding

Design choices
Preprocess image to extract edges, regions of
uniform intensity, or other local image features
difficulty is variable no of edges, whereas ANN
has fixed no of input units
encode image as a fixed set of 30 x 32 pixel
intensity values (coarse resolution summary of
the original) ranging from 0 to 255

8
2. Output Encoding

Design choices
1 of n output encoding Four values indicating
direction in which person is looking (left,
right, up, straight)
Single unit Classification using a single ouput
unit assigning 0.2, 0.4, 0.6 and 0.8 to four
values
Choice of 1 of n output encoding
provides more degres of freedom for representing
target function (n times as many weights
available in output layer)
Differene between highest and second highest can
be used as a measure of confidence

9
Network graph structure
30 ? 32 inputs
960 x 3 x 4 network
10
2. Output Encoding (2)

Target values for output units
obvious choice (1,0,0,0) to encode facing
looking to left
(0,1,0,0) to encode face looking straight, etc
Instead of using 0 and 1 use values 0.1 and 0.9
since sigmoid units cannot produce 0 and 1 given
finite weights
gradient descent will force weights to grow
without bound
0.1 and 0.9 are achievable using sigmoid units
with finite weights

11
Input-to-Hidden Network Weights
left
strt
rght
up
...
...
Weights from image pixels into each hidden unit,
--each weight plotted in the position of
corresponding pixel --weights are sensitive to
pixels in which face and body appear
12
Hidden-to-Output Network Weights
16 weights corresponding to hidden to output
connections with w0 being leftmost in each
rectangle (white is high)
left
strt
rght
up
...
...
left
strt
rght
up
30 ? 32 inputs
13
3. Network Graph Structure

Backpropagation can be applied to any acyclic
directed graph of sigmoid units
Standard structure using two layers of sigmoid
units (one hidden layer and one output layer)
Since training times become larger with more
layers
Only 3 hidden units were used yielding 90
accuracy
With 30 hidden units test set accuracy increased
only 1 to 2 percent
Training time on Sparc5 was 1 hr for 30 hidden
units and only 5 minutes for 3 hidden units

14
4. Other Learning Algorithm Parameters

Learning rate eta was set to 0.3
Momentum alpha was set to 0.3
Lower values yielded equivalent generalization
accuracy but longer training times
With higher values training fails to converge
with acceptable error over training set
Full gradient descent was used instead of
stochastic approximation

15
4. Other Learning Algorithm Parameters (2)

Input unit weights initialized to zero (because
of more intelligible visualizations of the
learned weights)
After every 50 gradient steps the performance was
evaluated over the validation set
Final selected network was one with highest
accuracy over validation set
Final reported accuracy was over third set of
test examples

16
Learned Hidden Representations

Useful to examine learned weight values for 2889
weights in network

17
Network Weights after 100 iterations
16 weights corresponding to hidden to output
connections with w0 being leftmost in each
rectangle (white is high)
left
strt
rght
up
...
...
left
strt
rght
up
30 ? 32 inputs
Weights from image pixels into each hidden unit,
with each weight plotted in the position of
corresponding pixel weights are sensitive to
features in which face and body appear
18
Network Behavior for right input
left
strt
rght
up
...
...
left
strt
right
up
30 ? 32 inputs
Input-Hidden Weights match for middle hidden
unit Also w2 has a high weight for middle hidden
unit Therefore right will fire
19
Character Recognition