Title: Introduction to Neural Networks' Backpropagation algorithm'
1Lecture 4bCOMP4044 Data Mining and Machine
LearningCOMP5318 Knowledge Discovery and Data
Mining
- Introduction to Neural Networks. Backpropagation
algorithm. - Reference Dunham 61-66, 103-114
2Outline
- Introduction to Neural Networks
- What is an artificial neural network?
- Human nervous system
- Taxonomy of neural networks
- Backpropagation algorithm
- Example
- Error space
- Universality of backpropagation
- Generalization and overfitting
- Heuristic modifications of backpropagation
- Convergence example
- Momentum
- Learning rate
- Limitations and capabilities
- Interesting applications
3What is an Artificial Neural Network (NN)?
0.7
- A network of many simple units (neurons, nodes)
0.3
0.2
- The units are connected by connections
- Each connection has a numeric weight associated
with it - Units receive inputs (from the environment or
other units) via the connections. They produce
output using their weights and the inputs (i.e.
they operate locally) - A NN can be represented as a directed graph
- NNs learn from examples and exhibit some
capability for generalization beyond the training
data - knowledge is acquired by the network from its
environment via learning and is stored in the
weights of the connections - the training (learning) rule a procedure for
modifying the weights of connections in order to
perform a certain task
4Neuron Model
- Each connection from unit i to j has a numeric
weigh wij associated with it, which determines
the strength and the sign of the connection - Each neuron first computes the weighed sum of its
inputs wp, and then applies an activation
function f to derive the output (activation) a - A neuron may have a special weight called bias
weight b . It is connected to a fixed input of 1. - NNs represent a function of their weights
(parameters). By adjusting the weights, we change
this function. This is done by using a learning
rule.
if there are 2 inputs p12 and p23, and if w11
3, w121, b -1.5, then a f(2331 -1.5)
f(7.5)
5Artificial NNs vs. Biological NNs?
- Artificial neurons are
- (extremely) simple abstractions of biological
neurons - realized as a computer program or specialized
hardware - Networks of artificial neurons
- do not have a fraction of the power of the human
brain but can be trained to perform useful
functions
- Some of the artificial NNs are models of
biological NNs, some are not - Computational Neuroscience deals with creating
realistic models of biological neurons and brain - The inspiration for the field of NNs came from
the desire to produce artificial systems capable
of sophisticated, perhaps "intelligent",
computations similar to those that the human
brain routinely performs, and thereby also to
enhance our understanding of the brain
6Human Nervous System
- We have only just began to understand how our
neural system operates - A huge number of neurons and interconnections
between them - 100 billion (i.e. 1010 ) neurons in the brain
- a full Olympic-sized swimming pool contains 1010
raindrops the number of stars in the Milky Way
is of the same magnitude - 104 connections per neuron
- Biological neurons are slower than computers
- Neurons operate in 10-3 seconds , computers in
10-9 seconds - The brain makes up for the slow rate of operation
by the large number of neurons and connections
7Efficiency of Biological Neural Systems
For interested students, not examinable
- The brain performs tasks like pattern
recognition, perception, motor control many times
faster than the fastest digital computers
- efficiency of the sonar system of a bat
- sonar is an active echo-location system
- a bat sonar provides information about the
distance from a target, its relative velocity,
size, azimuth, elevation the size of various
features of the target - the complex neural computations needed to extract
all this information from the target echo occurs
within a brain which has the size of a plum! - the precision and success rate of the target
location achieved by the echo-locating bat is
rather impossible to match by radar or sonar
engineers - How does a human brain or the brain of a bat do
it?
8Biological Neurons
- Purpose of neurons transmit information in the
form of electrical signals - it accepts many inputs, which are all added up in
some way - if enough active inputs are received at once, the
neuron will be activated and fire if not, it
will remain in its inactive state - Structure of neuron
- body (soma) contains nucleus containing the
chromosomes - dendrites
- axon
- synapse - a narrow gap
- couples the axon with the dendrite of another
cell - no direct linkage across the junction, it is a
chemical one - information is passed from one neuron to another
through synapses
9Different types of biological neurons
10Operation of biological neurons
- Signals are transmitted between neurons by
electrical pulses (action potentials, AP)
traveling along the axon - When the potential at the synapse is raised
sufficiently by the AP, it releases chemicals
called neurotransmitters - it may take the arrival of more than one AP
before the synapse is triggered
- The neurotransmitters diffuse across the gap and
chemically activate gates on the dendrites, that
allows charged ions to flow
- The flow of ions alters the potential of the
dendrite and provides a voltage pulse on the
dendrite (post-synaptic-potential, PSP) - some synapses excite the dendrite they affect,
while others inhibit it - the synapses also determine the strength of the
new input signal - each PSP travels along its dendrite and spreads
over the soma - the soma sums the effects of thousands PSPs if
the resulting potential exceeds a threshold, the
neuron fires and generates an AP
11Learning in Biological NNs
- We were born with some of our neural structures
others have been established by experience - At the early stage of the human brain development
(first 2 years) about 1 million synapses are
formed per second - Synapses are then modified through the learning
process - Learning is achieved by
- creation of new synaptic connections between
neurons - modification of existing synapses
- The synapses are thought to be mainly responsible
for learning - 1949, Hebb proposed his famous learning rule
- The strength of a synapse between 2 neurons is
increased by the repeated activation of one
neuron by the other across the synapse
12Correspondence Between Artificial and Biological
Neurons
- How this artificial neuron relates to the
biological one? - input p (or input vector p) input signal (or
signals) at the dendrite - weight w (or weight vector w) - strength of the
synapse (or synapses) - summer transfer function - cell body
- neuron output a - signal at the axon
13Taxonomy of NNs
- Active phase feedforward (acyclic) and
recurrent (cyclic, feedback) - Learning phase - supervised and unsupervised
- Feedforward supervised networks
- typically used for classification and function
approximation - perceptrons, ADALINEs, backpropagation networks,
RBF, Learning Vector Quantization (LVQ) networks - Feedforward unsupervised networks
- Hebbian networks used for associative learning
- Competitive networks performing clustering and
visualization, e.g. Self-Organizing Kohonen
Feature Maps (SOM) - Recurrent networks temporal data processing
- recurrent backpropagation, associative memories,
adaptive resonance networks
14Backpropagation algorithm
15Neural Network (NN) Model
- Computational model consisting of 3 parts
- 1) Architecture neurons and connections
- input, hidden, output neurons
- fully or partially connected
- neuron model computation performed by each
neuron type of transfer function - initialization of the weights
- 2) Learning algorithm
- how are the weights of the connections changed in
order to facilitate learning - Goal for classification tasks mapping between
the input examples and the classes - 3) Recall technique how is the information
obtained from the NN - for classification tasks how is the class of a
new example determined
16Backpropagation Network - Architecture
- 1) A network with 1 or more hidden layers
- e.g. a NN for the iris data
e.g. 1 output neuron for each class
output neurons
hidden neurons (1 hidden layer)
e.g. 1 input neuron for each attribute
inputs
- 2) Feedforward network - each neuron receives
input only from the neurons in the previous layer
- 3) Typically fully connected - all neurons in a
layer are connected with all neurons in the next
layer - 4) Weights initialization small random values,
e.g. -1,1
17Backpropagation Network Architecture 2
- 5) Neuron model - weighed sum of input signals
differentiable transfer function
a f(wpb)
- any differentiable transfer function f can be
used most frequently the sigmoid and tan-sigmoid
(hyperbolic tangent sigmoid) functions are used
18Architecture Number of Input Units
- Numerical data - typically 1 input unit for each
attribute - Categorical data 1 input unit for each
attribute value) - How many input units for the weather data?
sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy
- Encoding of the input examples typically binary
depending on the value of the attribute (on and
off) - e.g. ex 1 100 100 10 01
19Number of Output Units
- Typically 1 neuron for each class
target class ex1 1 0
ex.1 1 0 0 1 0 0 1
0 0 1
- Encoding of the targets (classes) typically
binary - e.g. class1 (no) 1 0, class2 (yes) 0 1
20Number of Hidden Layers and Units in Them
- An art! Typically - by trial and error
- The task constrains the number of inputs and
output units but not the number of hidden layers
and neurons in them - Too many hidden layers and units (i.e. too many
weights) overfitting - Too few underfitting, i.e. the NN is not able
to learn the input-output mapping - A heuristic to start with 1 hidden layer with n
hidden neurons, n(inputsoutput_neurons)/2
target class ex1 1 0
sunny overcast rainy hot mild cool high
normal false true
outlook temperature
humidity windy
ex.1 1 0 0 1 0 0 1
0 0 1
21Learning in Backpropagation NNs
- Supervised learning the training data
- consists of labeled examples (p,d), i.e. the
desired output d for them is given (p - input
vector d- desired output) - can be viewed as a teacher during the training
process - error - difference between the desired d and
actual a network output - Idea of backpropagation learning
- For each training example p
- Propagate p through the network and calculate the
output a . Compare the desired d with the actual
output a and calculate the error - Update weights of the network to reduce the
error - Until error over all examples lt threshold
- Why backpropagation? Adjusts the weights
backwards (from the output to the input units) by
propagating the weight change
22Backpropagation Learning - 2
- Sum of Squared Errors (E) is a classical measure
of error - E for a single training example over all output
neurons - di - desired, ai - actual network output for
output neuron i - Thus, backpropagation learning can be viewed as
an optimization search in the weight space - Goal state the set of weights for which the
performance index (error) is minimum - Search method hill climbing
23Error Landscape in Weight Space
- E is a function of the weights
- Several local minima and one global minimum
E
E as a function of w1 and w2
w1
w2
- How to minimize the error? Take steps downhill
- Not guaranteed to find the global minimum except
in the (glorious) situation where there is only
one global minimum - How to get to the bottom as fast as possible?
(i.e. we need to know what direction to move that
will make the largest reduction in error)
24Steepest Gradient Descent
- The direction of the steepest descent is called
gradient and can be computed ( dE/dw ) - A function decreases most rapidly when the
direction of movement is in the direction of the
negative of the gradient - Hence, we want to adjust the weights so that the
change moves the system down the error surface in
the direction of the locally steepest descent,
given by the negative of the gradient - - learning rate, defines the step
typically in the range (0,1)
25Backpropagation Algorithm - Idea
- The backpropagation algorithm adjust weights by
working backward from the output layer to the
input layer - Calculate the error and propagate this error from
layer to layer
- 2 approaches
- Incremental the weights are adjusted after each
training example is applied - Called also an approximate steepest descent
- Preferred as it requires less space
- Batch weights are adjusted once after all
training examples are applied and a total error
was calculated
- Solid lines - forward propagation of signals
- Dashed lines backward propagation of error
26For interested students, not examinable
Backpropagation - Derivation
- a neural network with one hidden layer
indexes - i over output neurons, j over hidden, k over
inputs - E (over all neurons, for the current input
vector p, i.e. incremental mode)
di target output of neuron i for p oi actual
output of neuron i for p
- Express E in terms of weights and input signals
- 1. Input for the hidden neuron j for p
2. Activation of neuron j as function of its
input
27For interested students, not examinable
Backpropagation Derivation - 2
3. Input for the output neuron i
4. Output for the output neuron i
5. Substituting 4 into E
28For interested students, not examinable
Backpropagation Derivation - 3
6. Steepest gradient descent adjust the weights
proportionally to the negative of the error
gradient For a weight wji to an output neuron
lt- chain rule
For a weight wkj to a hidden neuron
29Backpropagation Rule - Summary
( i is over the nodes in the layer above q)
30Derivative of Sigmoid Activation Function
- From the formulas for gt we must be able
to calculate the derivatives for f. For a
sigmoid transfer function
- Thus, backpropagation errors for a network with
sigmoid transfer function
- q is an output neuron
-
- q is a hidden neuron
31Backpropagation Algorithm - Summary
- 1. Determine the architecture of the network
- how many input and output neurons what output
encoding - hidden neurons and layers
- 2. Initialize all weights (biases incl.) to
small random values, typically ??-1,1 - 3. Repeat until termination criterion satisfied
- (forward pass) Present a training example and
propagate it through the network to calculate the
actual output - (backward pass) Compute the error (the
values for the output neurons). - Starting with output layer, repeat for
each layer in the network - - propagate the
values back to the previous layer - - update the
weights between the two layers - The stopping criteria is checked at the end of
each epoch - The error (mean absolute or mean square) is
below a threshold - All training examples are propagated and the
total error is calculated - The threshold is determined heuristically e.g.
0.3 - Maximum number of epochs is reached
- Early stopping using a validation set
- It typically takes hundreds or thousands of
epochs for an NN to converge - Try Matlabs demo nnd11bc!
- epoch - 1 pass through the training set
32How to Determine if an Example is Correctly
Classified?
- Accuracy may be used to evaluate performance
once training has finished or as a stopping
criteria checked at the end of each epoch - Binary encoding
- apply each example and get the resulting output
activations of the output neurons the example
will belong to the class corresponding to the
output neuron with highest activation. - Example 3 classes the outputs for ex.X are
0.3, 0.7, 0.5 gt ex. X belongs to class 2 - i.e. each output value is regarded as the
probability of the example to belong to the class
corresponding to this output
33Backpropagation - Example
- 2 classes, 2-d input data
- training set
- ex.1 0.6 0.1 class 1 (banana)
- ex.2 0.2 0.3 class 2 (orange)
-
- Network architecture
- How many inputs?
- How many hidden neurons?
- Heuristic
- n(inputsoutput_neurons)/2
- How many output neurons?
- What encoding of the outputs?
- 10 for class 1, 01 for class 0
- Initial weights and learning rate
- Lets ?0.1 and the weights are set as in the
picture
34Backpropagation Example (cont. 1)
- 1. Forward pass for ex. 1 - calculate the
outputs o6 and o7 - o10.6, o20.1, target output 1 0, i.e.
class 1 - Activations of the hidden units
- net3 o1 w13 o2w23b30.60.10.1(-0.2)0.10.
14 - o31/(1e-net3) 0.53
- net4 o1 w14 o2w24b40.600.10.20.20.22
- o41/(1e-net4) 0.55
- net5 o1 w15 o2w25b50.60.30.1(-0.4)0.50.
64 - o51/(1e-net5) 0.65
- Activations of the output units
- net6 o3 w36 o4w46 o5w56 b60.53(-0.4)0.55
0.10.650.6-0.10.13 - o61/(1e-net6) 0.53
- net7 o3 w37 o4w47 o5w57 b70.530.20.55(-
0.1)0.65(-0.2)0.60.52 - o71/(1e-net7) 0.63
35Backpropagation Example (cont. 2)
- 2. Backward pass for ex. 1
- Calculate the output errors ??6 and ??7 (note
that d61, d70 for class 1) - ??6 (d6-o6) o6 (1-o6)(1-0.53)0.53(1-0.53)
0.12 - ??7 (d7-o7) o7 (1-o7)(0-0.63)0.63(1-0.63)
-0.15 - Calculate the new weights between the hidden and
output units (?0.1) - ??w36 ? ?6 o3 0.10.120.530.006
- ?w36new w36old ??w36 -0.40.006-0.394
- ??w37 ? ?7 o3 0.1-0.150.53-0.008
- ?w37new w37old ??w37 0.2-0.008-0.19
- Similarly for ?w46new , w47new , w56new and
w57new - For the biases b6 and b7 (remember biases are
weights with input 1) - ??b6 ? ?6 1 0.10.120.012
- b6new b6old ??b6 -0.10.012-0.012
- Similarly for b7
36Backpropagation Example (cont. 3)
- Calculate the errors of the hidden units ?3, ?4
and ??5 - ??3 o3 (1-o3) (w36 ??6 w37 ??7 )
- 0.53(1-0.53)(-0.40.120.2(-0.15))-0.019
- Similarly for ?4 and ??5
- Calculate the new weights between the input and
hidden units (?0.1) - ??w13 ? ?3 o1 0.1(-0.019)0.6-0.0011
- ?w13new w13old ??w13 0.1-0.00110.0989
- Similarly for ?w23new , w14new , w24new , w15new
and w25new b3, b4 and b6 - 3. Repeat the same procedure for the other
training examples - Forward pass for ex. 2backward pass for ex.2
- Forward pass for ex. 3backward pass for ex. 3
-
- Note its better to apply input examples in
random order
37Backpropagation Example (cont. 4)
- 4. At the end of the epoch check if the
stopping criteria is satisfied - if yes stop training
- if not, continue training
- epoch
- go to step 1
38Steepest Gradient Descent
- Does the gradient descent guarantee that after
each adjustment the error will be reduced? No! - Not optimal - is guaranteed to find a minimum but
it might be a local minimum! - a local minimum may be a good enough solution
Backpropagations error space many local and 1
global minimum
39Universality of Backpropagation
- Boolean functions
- Every boolean function of the inputs can be
represented by network with a single hidden layer - Continuous functions - universal approximation
theorems - Any continuous function can be approximated with
arbitrary small error by a network with one
hidden layer (Cybenko 1989, Hornik et al. 1989) - Any function (inc. discontinuous) can be
approximated to arbitrary small error by a
network with two hidden layers (Cybenco 1988) - These are existence theorems they say the
solution exist but dont say how to choose the
number of hidden neurons! - For a given network it is hard to say exactly
which functions can be represented and which ones
not
40Overfitting
- Occurs when
- Training examples are noisy
- Number of the free (trainable) parameters is
bigger than the number of training examples - The network has been trained too long
- Preventing overtraining
- Use network that is just large enough to provide
an adequate fit - OckhamRazor dont use a bigger net when a
smaller one will work - The network should not have more free parameters
than there are training examples! - However, it is difficult to know beforehand how
large a network should be for a specific
application!
41Preventing Overtraining - Validation Set Approach
- Also called an early stopping method
- Available data is divided into 3 subsets
- Training set
- Used for computing the gradient and updating the
weights - Validation set
- The error on the validation set is monitored
during the training - This error will normally decrease during the
initial phase of training (as does the training
error) - However, when the network begins to overfit the
data, the error on the validation set will
typically begin to rise - Training is stopped when the error on the
validation set increases for a pres-specified
number of iterations and the weights and biases
at the minimum of the validation set error are
returned - Testing set
- Not used during training but to compare different
algorithms once training has completed
42Error Surface and Convergence - Example
- Path b gets trapped in a local minimum
- What can be done? Try different initializations
- Path a converges to the optimum solution but is
very slow - What can we do?
Try nnd12sd1!
43Speeding up the Convergence
- Solution 1 Increase the learning rate
- Faster on the flat part but unstable when falling
into the steep valley that contains the minimum
point overshooting the minimum - Try nnd12sd2!
- Solution 2 Smooth out the trajectory by
averaging the weight update, e.g. make current
update dependent on the previous - The use of momentum might smooth out the
oscillations and produce a stable trajectory
44Backpropagation with Momentum - Example
- Example the same learning rate and initial
position
- Smooth and faster convergence
- Stable algorithm
- By the use of momentum we can use a larger
learning rate while maintaining the stability of
the algorithm
squared error
- Typical momentum values used in practice 0.6-0.9
45More on the Learning Rate
- Constant throughout training (standard steepest
descent) - The performance is very sensitive to the proper
setting of the learning rate - Too small slow convergence
- Too big oscillation, overshooting of the
minimum - It is not possible to determine the optimum
learning rate before training as it changes
during training and depends on the error surface - Variable learning rate
- goal keep the learning rate as large as possible
while keeping learning stable - Several algorithms have been proposed
46Limitations and Capabilities
- Multilayer perceptons (MLPs) trained with
backpropagation can perform function
approximation and pattern classification - Theoretically they can
- Perform any linear and non-linear computation
- Can approximate any reasonable function arbitrary
well - gt are able to overcome the limitations of
earlier NNs (perceptrons and ADALINEs) - In practice
- May not always find a solution can be trapped
in a local minimum - Performance is sensitive to the starting
conditions (weights initialization) - Sensitive to the number of hidden layers and
neurons - Too few neurons underfitting, unable to learn
what you want it to learn - Too many overfitting, learns slowly
- gt the architecture of a MLP network is not
completely constrained by the problem to be
solved as the number of hidden layers and
neurons are left to the designer
47Limitations and Capabilities cont.
- Sensitive to the value of the learning rate
- Too small slow learning
- Too big instability or poor performance
- The proper choices depends on the nature of
examples - Trial and error
- Refer to the choices that have worked well in
similar problems - gt successful application of NNs requires time
and experience
- Backpropagation - summary
- uses steepest descent algorithm for minimizing
the mean square error - Gradient descent (GD)
- Standard GD is slow as it requires small learning
rate for stable learning - GD with momentum is faster as it allows higher
learning rate while maintaining stability - There are several variations of the
backpropagation algorithm
48Some Interesting NN Applications
- A few examples of the many significant
applications of NNs - You can use them for the paper presentation in
w12 and 13! - Network design was the result of several months
trial and error experimentation - Moral NNs are widely applicable but they cannot
magically solve problems wrong choices lead to
poor performance - NNs are the second best way of doing just about
anything John Denker - NN provide passable performance on many tasks
that would be difficult to solve explicitly with
other techniques
49For interested students only, not examinable
NETtalk
- Sejnowski and Rosenberg 87
- Pronunciation of written English
- Fascinating problem in linguistics
- Task with high commercial profit
- How?
- Mapping the text stream to phonemes
- Passing the phonemes to speech generator
- Task for the NN learning to map the text to
phonemes - Good task for a NN as most of the rules are
approximately correct - E.g. cat k, century s
50For interested students only, not examinable
NETtalk -Architecture
- 203 input neurons 7 (sliding window the
character to be pronounced and the 3 characters
before and after it) x 29 possible characters (26
letters blank, period, other punctuation) - 80 hidden
- 26 output corresponding to the phonemes
51For interested students only, not examinable
NETtalk - Performance
- Training set
- 1024-words hand transcribed into phonemes
- Accuracy on training set 90 after 50 epochs
- Why not 100?
- A few dozen hours of training time a few months
of experimentation with different architectures - Testing
- Accuracy 78
- Importance
- A good showpiece for the philosophy of NNs
- The network appears to mimic the speech patterns
of young children incorrect bubble at first (as
the weights are random), then gradually improving
to become understandable
52For interested students only, not examinable
Handwritten Character Recognition
- Le Cun et al. 89
- Read zip code on hand-addressed envelopes
- Task for the NN
- A preprocessor is used to recognize the segments
in the individual digits - Based on the segments, the network has to
identify the digits - Network architecture
- 256 input neurons 16x16 array of pixels
- 3 hidden layers 768, 192, 30 neurons
respectively - 10 output neurons digits 0-9
- Not fully connected network
- If it was a fully connected network 200 000
connections (impossible to train) instead only
9760 connections - Units in the hidden layer act as feature
detectors e.g. each unit in the 1st hidden
layer is connected with 25 input neurons
(5x5pixel region)
53For interested students only, not examinable
Handwritten Character Recognition cont.
- Training 7300 examples
- Testing 2000 examples
- Accuracy 99
- Hardware implementation (in VLSI)
- enables letters to be sorted at high speed
- zip codes
- One of the largest applications of NNs
54For interested students only, not examinable
Driving Motor Vehicles
- Pomerleau et al., 1993
- ALVIN (Autonomous Land Vehicle In a Neural
Network) - Learns to drive a van along a single lane on a
highway
- Once trained on a particular road, ALVIN can
drive at speed gt 40 miles per hour - Chevy van and US Army HMMWV personnel carrier
- computer-controlled steering, acceleration and
braking - sensors color stereo video camera, radar,
positioning system, scanning laser finders
55For interested students only, not examinable
ALVINN - Architecture
- Fully connected backpropagation NN with 1 hidden
layer - 960 input neurons the signal from the camera is
preprocessed to yield 30x32 image intensity grid - 5 hidden neurons
- 32 output neurons corresponding to directions
- If the output node with the highest activation is
- The left most , than ALVINN turns sharply left
- The right most, than ALVINN turns sharply right
- A node between them, than ALVINN directs the van
in a proportionally intermediate direction - Smoothing the direction it is calculated as
average suggested not only by the output node
with highest activation but also by the nodes
immediate neighbours - Training examples (image-direction pairs)
- Recording such pairs when human drives the
vehicle - After collecting 5 mins such data and 10 mins
training, ALVINN can drive on its own
56For interested students only, not examinable
ALVINN - Training
- Training examples (image-direction pairs)
- Recording such pairs when human drives the
vehicle - After collecting 5 min such data and 10 min
training, ALVINN can drive on its own - Potential problem as the human is too good and
(typically) does not stray from the lane, there
are no training examples that show how to recover
when you are misaligned with the road - Solution ALVINN corrects this by creating
synthetic training examples it rotates each
video image to create additional views of what
the road would look like if the van were a little
off course to the left or right
57For interested students only, not examinable
ALVINN - Results
- Impressive results
- ALVINN has driven at speeds up to 70 miles per
hour for up to 90 miles on public highways near
Pittsburgh - Also at normal speeds on single lane dirt roads,
paved bike paths, and two lane suburban streets - Limitations
- Unable to drive on a road type for which it
hasnt been trained - Not very robust to changes in lighting conditions
and presence of other vehicles - Comparison with traditional vision algorithms
- Use image processing to analyse the scene and
find the road and then follow it - Most of them achieve 3-4 miles per hour
58For interested students only, not examinable
ALVINN - Discussion
- Why is ALVINN so successful?
- Fast computation - once trained, the NN is able
to compute a new steering direction 10 times a
second gt the computed direction can be off by
10 from the ideal as long as the system is able
to make a correction in a few tenths of a second - Learning from examples is very appropriate
- No good theory of driving but it is easy to
collect examples . Motivated the use of learning
algorithm (but not necessary NNs) - Driving is continuous, noisy domain, in which
almost all features contribute some information
gt NNs are better choice than some other learning
algorithms (e.g. DTs)