Learning to perceive how handwritten digits were drawn - PowerPoint PPT Presentation

About This Presentation

Title:

Learning to perceive how handwritten digits were drawn

Description:

Two different ways to use backpropagation for handwritten digit recognition ... Use bilinear interpolation to distribute the ink at each point to the 4 closest pixels. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 33

Provided by: hin9

Learn more at: http://www.cs.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning to perceive how handwritten digits were drawn

1
Learning to perceive how hand-written digits
were drawn

Geoffrey Hinton
Canadian Institute for Advanced Research
and
University of Toronto

2
Good old-fashioned neural networks
Compare outputs with correct answer to get error
signal
Back-propagate error signal to get
derivatives for learning
outputs
bias
hidden layers
input vector
3
Two different ways to use backpropagation for
handwritten digit recognition

Standard method Train a neural network with 10
output units to map from pixel intensities to
class labels.
This works well, but it does not make use of a
lot of prior knowledge that we have about how the
images were generated.
The generative approach First write a graphics
program that converts a motor program (a sequence
of muscle commands) into an image.
Then learn to map pixel intensities to motor
programs.
Digits classes are far more natural in the space
of motor programs

These twos have very similar motor programs
4
A variation

It is very difficult to train a single net to
recognize very different motor programs. So train
a separate network for each digit class.
When given a test image, get each of the 10
networks to extract a motor program.
Then see which of the 10 motor programs is best
at reconstructing the test image.
Also consider how similar that motor program is
to the other motor programs for that class.

5
A simple generative model

We can generate digit images by simulating the
physics of drawing.
A pen is controlled by four springs.
The trajectory of the pen is determined by the 4
spring stiffnesses at 17 time steps .
The ink is produced from 60 evenly spaced points
along the trajectory
Use bilinear interpolation to distribute the ink
at each point to the 4 closest pixels.
Use time-invariant parameters for the ink
intensity and the width of a convolution kernel.
Then clip intensities above 1.

We can also learn to set the mass, viscosity, and
positions of the four endpoints for each image.
6
Some ways to invert a generator

Look inside the generator to see how it works and
try to invert each step of the generative
process.
Its hard to invert processes that lose
information
The third dimension
The correspondence between model-parts and
image-parts.
Define a prior distribution over codes and
generate lots of (code, image) pairs. Then train
a recognition neural network that does image ?
code.
But where do we get the prior over codes?
The distribution of codes is exactly what we want
to learn from the data!
Is there any way to do without a the prior over
codes?

7
A way to train a class-specific model from a
single prototype

Start with a single prototype code
Learn to invert the generator in the vicinity of
the prototype by adding noise to the code and
generating (code, image) pairs for training a
neural net.
Then use the learned model to get codes for real
images that are similar to the prototype. Add
noise to these codes and generate more training
data.
This extends the region that can be inverted
along the data manifold (with genetic jumps).

prototype

nearby datapoint
manifold of digit class in code space
8
An example during the later stages of training
About 2 ms
9
How training examples are created
used for training
code error
clean code
add noise
noisy code
predictedcode
biases prototype
hidden units
hidden units
generate
generate
recon-structed image
recon. error
training image
image
used for testing
10
How the perceived trajectory changes at the early
stages of learning
11
Typical fits to the training data at the end of
training
12
Typical fits to the training data at the end of
training
13
Typical fits to the training data at the end of
training
14
Typical fits to the training data at the end of
training
15
performance on test data
blue 2 red 3
16
The five errors
17
Performance on test data if we do not use an
extra trick
18
On test data, the model often gets the
registration slightly wrong
The neural net has solved the difficult global
search problem, but it has got the fine details
wrong. So we need to perform a local search to
fix up the details
19
Local search

Use the trajectory produced by the neural network
as an initial guess.
Then use the difference between the data and the
reconstruction to adjust the guess.
This means we need to convert residual errors in
pixel space into gradients in trajectory space.
But how to we backpropagate the pixel residuals
through the generative graphics model?
Make a neural network version of the generative
model.

20
How the generative neural net is trained
We use the same training pairs as for the
recognition net
clean code
add noise
noisy code
generative hidden units
recognition hidden units
generate
recon-structed image
recon. error
training image
image
used for training
21
An example during the later stages of training
the generative neural network
22
How the generative neural net is used
initialize
update code
clean code
code gradients
current code
generative hidden units
recognition hidden units
generate
recon error
current image
real image
pixel error gradients
23
(No Transcript)
24
The 2 model gets better at fitting 2s
But it also gets better a fitting 3s
25
(No Transcript)
26
Improvement from local search

Local search reduces the squared pixel error of
the correct model by 20-50
It depends on how well the networks are trained
The squared pixel error of the wrong model is
reduced by a much smaller fraction
It often increases because the pixel residuals
are converted to motor program adjustments using
a generative net that has not been trained in
that part of the space.
The classification error improves by 25-40

27
Why spring stiffnesses are a good language

If the mass is not where we thought it was during
planning, the force that is generated is the
planned force plus a feedback term that pulls the
mass back towards where it should have been

The four spring stiffnesses define a quadratic
energy function. This function generates an
acceleration. The acceleration depends on where
the mass is.

feedback term
planned position
iso-energy contour
planned force
28
A more ambitious application

Suppose we have a way to generate realistic
images of faces from underlying variables that
describe emotional expression, lighting, pose,
and individual 3-D face shape.
Given a big database of unlabeled faces, we
should be able to recover the state of the
generator.

29
Conclusions so far

Computers are becoming fast enough to allow
adaptive networks to be used in all sorts of
novel ways.
The most principled way to do perception is by
inverting a generative model.

30
THE END
31
An even more ambitious application

Suppose we have a model of a person in which the
muscles are springs.
We are also given a prototype motor program that
makes the model walk for a few steps.
Can we train a neural network to convert the
current dynamic state plus a partial description
of a desired walk into a motor program that
produces the right behaviour for the next 100
mille-seconds?

32
Some other generative models

We could use a B-spline with 8 control points to
generate the curve (Williams et. al). This does
not learn to fit the images nearly as well.
If we use 16 control points it ties itself in
knots. Momentum is useful.

Write a Comment

User Comments (0)