Title: Example of Backpropagation
1Example of Backpropagation
2ANN Illustrative Example Face Recognition
3ANN Illustrative Example Face Recognition
- Many target functions can be learned from the
image data - Identity of person
- Direction which the person is facing left,
right, straight ahead, upward - Gender of the person
- Whether or not wearing sunglasses
- Specific task considered learning the direction
in which the person is facing (to their left,
right, straight ahead, or upward)
4ANN Illustrative Example Face Recognition
- Practical design choices in applying
Backpropagation - The Learning Task classifying camera images of
faces of various people in various poses - Image Database
- 624 grayscale images 20 different people approx
32 images per person - Various expressions (happy, sad, angry, neutral)
- Different directions (left, right, straight
ahead, up) - Resolution of 120 ?128
5ANN Illustrative Example Face Recognition
- Specific task considered learning the direction
in which the person is facing (to their left,
right, straight ahead, or upward) - Without optimizing design choices, design
described here learns target function quite well - After training on a set of 260 images,
classification accuracy over a separate test set
is 90 - Contrast the default accuracy by randomly
guessing one of the four face directions is 25
6Design Choices
- 1. Input Encoding
- How to encode the image image vs features
- 2. Output Encoding
- No of output units, target values for output
units - 3. Network Graph Structure
- No of units and network and interconnection
- 4. Other Learning Algorithm Parameter
- Learning rate eta
- Momentum alpha
71. Input Encoding
- Design choices
- Preprocess image to extract edges, regions of
uniform intensity, or other local image features - difficulty is variable no of edges, whereas ANN
has fixed no of input units - encode image as a fixed set of 30 x 32 pixel
intensity values (coarse resolution summary of
the original) ranging from 0 to 255
82. Output Encoding
- Design choices
- 1 of n output encoding Four values indicating
direction in which person is looking (left,
right, up, straight) - Single unit Classification using a single ouput
unit assigning 0.2, 0.4, 0.6 and 0.8 to four
values - Choice of 1 of n output encoding
- provides more degres of freedom for representing
target function (n times as many weights
available in output layer) - Differene between highest and second highest can
be used as a measure of confidence
9Network graph structure
30 ? 32 inputs
960 x 3 x 4 network
102. Output Encoding (2)
- Target values for output units
- obvious choice (1,0,0,0) to encode facing
looking to left - (0,1,0,0) to encode face looking straight, etc
- Instead of using 0 and 1 use values 0.1 and 0.9
since sigmoid units cannot produce 0 and 1 given
finite weights - gradient descent will force weights to grow
without bound - 0.1 and 0.9 are achievable using sigmoid units
with finite weights
11Input-to-Hidden Network Weights
left
strt
rght
up
...
...
Weights from image pixels into each hidden unit,
--each weight plotted in the position of
corresponding pixel --weights are sensitive to
pixels in which face and body appear
12Hidden-to-Output Network Weights
16 weights corresponding to hidden to output
connections with w0 being leftmost in each
rectangle (white is high)
left
strt
rght
up
...
...
left
strt
rght
up
30 ? 32 inputs
133. Network Graph Structure
- Backpropagation can be applied to any acyclic
directed graph of sigmoid units - Standard structure using two layers of sigmoid
units (one hidden layer and one output layer) - Since training times become larger with more
layers - Only 3 hidden units were used yielding 90
accuracy - With 30 hidden units test set accuracy increased
only 1 to 2 percent - Training time on Sparc5 was 1 hr for 30 hidden
units and only 5 minutes for 3 hidden units
144. Other Learning Algorithm Parameters
- Learning rate eta was set to 0.3
- Momentum alpha was set to 0.3
- Lower values yielded equivalent generalization
accuracy but longer training times - With higher values training fails to converge
with acceptable error over training set - Full gradient descent was used instead of
stochastic approximation
154. Other Learning Algorithm Parameters (2)
- Input unit weights initialized to zero (because
of more intelligible visualizations of the
learned weights) - After every 50 gradient steps the performance was
evaluated over the validation set - Final selected network was one with highest
accuracy over validation set - Final reported accuracy was over third set of
test examples
16Learned Hidden Representations
- Useful to examine learned weight values for 2889
weights in network
17Network Weights after 100 iterations
16 weights corresponding to hidden to output
connections with w0 being leftmost in each
rectangle (white is high)
left
strt
rght
up
...
...
left
strt
rght
up
30 ? 32 inputs
Weights from image pixels into each hidden unit,
with each weight plotted in the position of
corresponding pixel weights are sensitive to
features in which face and body appear
18Network Behavior for right input
left
strt
rght
up
...
...
left
strt
right
up
30 ? 32 inputs
Input-Hidden Weights match for middle hidden
unit Also w2 has a high weight for middle hidden
unit Therefore right will fire
19Character Recognition
- http//yann.lecun.com/exdb/lenet/index.html