Title: Announcements
1Announcements
- Homework 5 due today, October 30
- Book Review due today, October 30
- Lab 3 due Thursday, November 1
- Homework 6 due Tuesday, November 6
- Current Event
- Kay - today
- Chelsea - Thursday, November 1
2Neural Networks
3Artificial Neural Networks
- Artificial neural networks (ANNs) provide a
practical method for learning - real-valued functions
- discrete-valued functions
- vector-valued functions
- Robust to errors in training data
- Successfully applied to such problems as
- interpreting visual scenes
- speech recognition
- learning robot control strategies
4Biological Neurons
- The human brain is made up of billions of simple
processing units neurons.
- Inputs are received on dendrites, and if the
input levels are over a threshold, the neuron
fires, passing a signal through the axon to the
synapse which then connects to another neuron.
5Neural Network Representation
- ALVINN uses a learned ANN to steer an autonomous
vehicle driving at normal speeds on public
highways - Input to network 30x32 grid of pixel intensities
obtained from a forward-pointed camera mounted on
the vehicle - Output direction in which the vehicle is steered
- Trained to mimic observed steering commands of a
human driving the vehicle for approximately 5
minutes
6ALVINN
7Appropriate problems
- ANN learning well-suit to problems which the
training data corresponds to noisy, complex data
(inputs from cameras or microphones) - Can also be used for problems with symbolic
representations - Most appropriate for problems where
- Instances have many attribute-value pairs
- Target function output may be discrete-valued,
real-valued, or a vector of several real- or
discrete-valued attributes - Training examples may contain errors
- Long training times are acceptable
- Fast evaluation of the learned target function
may be required - The ability for humans to understand the learned
target function is not important
8Artificial Neurons (1)
- Artificial neurons are based on biological
neurons. - Each neuron in the network receives one or more
inputs. - An activation function is applied to the inputs,
which determines the output of the neuron the
activation level.
- The charts on the right show three typical
activation functions.
9Artificial Neurons (2)
- A typical activation function works as follows
- Each node i has a weight, wi associated with it.
The input to node i is xi. - t is the threshold.
- So if the weighted sum of the inputs to the
neuron is above the threshold, then the neuron
fires.
10Perceptrons
- A perceptron is a single neuron that classifies a
set of inputs into one of two categories (usually
1 or -1). - If the inputs are in the form of a grid, a
perceptron can be used to recognize visual images
of shapes. - The perceptron usually uses a step function,
which returns 1 if the weighted sum of inputs
exceeds a threshold, and 0 otherwise.
11Training Perceptrons
- Learning involves choosing values for the weights
- The perceptron is trained as follows
- First, inputs are given random weights (usually
between 0.5 and 0.5). - An item of training data is presented. If the
perceptron mis-classifies it, the weights are
modified according to the following - where t is the target output for the training
example, o is the output generated by the
preceptron and a is the learning rate, between 0
and 1 (usually small such as 0.1) - Cycle through training examples until
successfully classify all examples - Each cycle known as an epoch
12Bias of Perceptrons
- Perceptrons can only classify linearly separable
functions. - The first of the following graphs shows a
linearly separable function (OR). - The second is not linearly separable
(Exclusive-OR).
13Convergence
- Perceptron training rule only converges when
training examples are linearly separable and a
has a small learning constant - Another approach uses the delta rule and gradient
descent - Same basic rule for finding update value
- Changes
- Do not incorporate the threshold in the output
value (unthresholded perceptron) - Wait to update weight until cycle is complete
- Converges asymptotically toward the minimum error
hypothesis, possibly requiring unbounded time,
but converges regardless of whether the training
data are linearly separable
14Multilayer Neural Networks
- Multilayer neural networks can classify a range
of functions, including non linearly separable
ones.
- Each input layer neuron connects to all neurons
in the hidden layer. - The neurons in the hidden layer connect to all
neurons in the output layer.
15Speech Recognition ANN
16Sigmoid Unit
- ?(x) is the sigmoid function
- Nice property differentiable
- Derive gradient descent rules to train
- One sigmoid unit - node
- Multilayer networks of sigmoid units
17Backpropagation
- Multilayer neural networks learn in the same way
as perceptrons. - However, there are many more weights, and it is
important to assign credit (or blame) correctly
when changing weights. - E sums the errors over all of the network output
units
18Backpropagation Algorithm
- Create a feed-forward network with nin inputs,
nhidden hidden units, and nout output units. - Initialize all network weights to small random
numbers - Until termination condition is met, Do
- For each ltx,tgt in training examples, Do
- Propagate the input forward through the network
- Input the instance x to the network and compute
the output ou of every unit u in the network - Propagate the errors backward through the
network - For each network output unit k, calculate its
error term dk - For each hidden unit h, calculate its error term
dh - Update each network weight wji
- where
19Example Learning AND
a
b
c
Initial Weights w_da .2 w_db .1 w_dc
-.1 w_d0 .1 w_ea -.5 w_eb .3 w_ec
-.2 w_e0 0 w_fd .4 w_fe -.2 w_f0 -.1
d
e
f
Training Data AND(1,0,1) 0 AND(1,1,1)
1 Alpha 0.1
20Hidden Layer representation
Target Function
Can this be learned?
21Yes
22Plots of Squared Error
23Hidden Unit
(.15 .99 .99)
24Evolving weights
25Momentum
- One of many variations
- Modify the update rule by making the weight
update on the nth iteration depend partially on
the update that occurred in the (n-1)th iteration - Minimizes error over training examples
- Speeds up training since it can take 1000s of
iterations
26When to stop training
- Continue until error falls below some predefined
threshold - Bad choice because Backpropagation is susceptible
to overfitting - Won't be able to generalize as well over unseen
data
27Cross Validation
- Common approach to avoid overfitting
- Reserve part of the training data for testing
- m examples are partitioned into k disjoint
subsets - Run the procedure k times
- Each time a different one of these subsets is
used as validation - Determine the number of iterations that yield the
best performance - Mean of the number of iterations is used to train
all n examples
28Neural Nets for Face Recognition
29Hidden Unit Weights
left
straight
right
up
30Error gradient for the sigmoid function
31Error gradient for the sigmoid function