Learning: Neural Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Learning: Neural Networks

Description:

Need to determine how to CHANGE weight based on contribution to performance. Need to determine how MUCH change to make per iteration. Rate parameter r' ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 27
Provided by: classesCs
Category:

less

Transcript and Presenter's Notes

Title: Learning: Neural Networks


1
Learning Neural Networks
  • Artificial Intelligence
  • CMSC 25000
  • February 5, 2004

2
Roadmap
  • Neural Networks
  • Motivation Overcoming perceptron limitations
  • Motivation ALVINN
  • Heuristic Training
  • Backpropagation Gradient descent
  • Avoiding overfitting
  • Avoiding local minima
  • Conclusion Teaching a Net to talk

3
Perceptron Summary
  • Motivated by neuron activation
  • Simple training procedure
  • Guaranteed to converge
  • IF linearly separable

4
Neural Nets
  • Multi-layer perceptrons
  • Inputs real-valued
  • Intermediate hidden nodes
  • Output(s) one (or more) discrete-valued

X1
Y1 Y2
X2
X3
X4
Inputs
Hidden
Hidden
Outputs
5
Neural Nets
  • Pro More general than perceptrons
  • Not restricted to linear discriminants
  • Multiple outputs one classification each
  • Con No simple, guaranteed training procedure
  • Use greedy, hill-climbing procedure to train
  • Gradient descent, Backpropagation

6
Solving the XOR Problem
o1
w11
Network Topology 2 hidden nodes 1 output
w13
x1
w01
w21
y
-1
w23
w12
w03
w22
x2
-1
w02
o2
Desired behavior x1 x2 o1 o2 y 0 0 0
0 0 1 0 0 1 1 0 1 0 1
1 1 1 1 1 0
-1
Weights w11 w121 w21w22 1 w013/2 w021/2
w031/2 w13-1 w231
7
Neural Net Applications
  • Speech recognition
  • Handwriting recognition
  • NETtalk Letter-to-sound rules
  • ALVINN Autonomous driving

8
ALVINN
  • Driving as a neural network
  • Inputs
  • Image pixel intensities
  • I.e. lane lines
  • 5 Hidden nodes
  • Outputs
  • Steering actions
  • E.g. turn left/right how far
  • Training
  • Observe human behavior sample images, steering

9
Backpropagation
  • Greedy, Hill-climbing procedure
  • Weights are parameters to change
  • Original hill-climb changes one parameter/step
  • Slow
  • If smooth function, change all parameters/step
  • Gradient descent
  • Backpropagation Computes current output, works
    backward to correct error

10
Producing a Smooth Function
  • Key problem
  • Pure step threshold is discontinuous
  • Not differentiable
  • Solution
  • Sigmoid (squashed s function) Logistic fn

11
Neural Net Training
  • Goal
  • Determine how to change weights to get correct
    output
  • Large change in weight to produce large reduction
    in error
  • Approach
  • Compute actual output o
  • Compare to desired output d
  • Determine effect of each weight w on error d-o
  • Adjust weights

12
Neural Net Example
xi ith sample input vector w weight vector
yi desired output for ith sample
-
Sum of squares error over training samples
From 6.034 notes lozano-perez
Full expression of output in terms of input and
weights
13
Gradient Descent
  • Error Sum of squares error of inputs with
    current weights
  • Compute rate of change of error wrt each weight
  • Which weights have greatest effect on error?
  • Effectively, partial derivatives of error wrt
    weights
  • In turn, depend on other weights gt chain rule

14
Gradient Descent
dG dw
  • E G(w)
  • Error as function of weights
  • Find rate of change of error
  • Follow steepest rate of change
  • Change weights s.t. error is minimized

E
G(w)
w0w1
w
Local minima
15
Gradient of Error
-
Note Derivative of sigmoid ds(z1)
s(z1)(1-s(z1)) dz1
From 6.034 notes lozano-perez
16
From Effect to Update
  • Gradient computation
  • How each weight contributes to performance
  • To train
  • Need to determine how to CHANGE weight based on
    contribution to performance
  • Need to determine how MUCH change to make per
    iteration
  • Rate parameter r
  • Large enough to learn quickly
  • Small enough reach but not overshoot target values

17
Backpropagation Procedure
i
j
k
  • Pick rate parameter r
  • Until performance is good enough,
  • Do forward computation to calculate output
  • Compute Beta in output node with
  • Compute Beta in all other nodes with
  • Compute change for all weights with

18
Backprop Example
Forward prop Compute zi and yi given xk, wl
19
Backpropagation Observations
  • Procedure is (relatively) efficient
  • All computations are local
  • Use inputs and outputs of current node
  • What is good enough?
  • Rarely reach target (0 or 1) outputs
  • Typically, train until within 0.1 of target

20
Neural Net Summary
  • Training
  • Backpropagation procedure
  • Gradient descent strategy (usual problems)
  • Prediction
  • Compute outputs based on input vector weights
  • Pros Very general, Fast prediction
  • Cons Training can be VERY slow (1000s of
    epochs), Overfitting

21
Training Strategies
  • Online training
  • Update weights after each sample
  • Offline (batch training)
  • Compute error over all samples
  • Then update weights
  • Online training noisy
  • Sensitive to individual instances
  • However, may escape local minima

22
Training Strategy
  • To avoid overfitting
  • Split data into training, validation, test
  • Also, avoid excess weights (less than samples)
  • Initialize with small random weights
  • Small changes have noticeable effect
  • Use offline training
  • Until validation set minimum
  • Evaluate on test set
  • No more weight changes

23
Classification
  • Neural networks best for classification task
  • Single output -gt Binary classifier
  • Multiple outputs -gt Multiway classification
  • Applied successfully to learning pronunciation
  • Sigmoid pushes to binary classification
  • Not good for regression

24
Neural Net Example
  • NETtalk Letter-to-sound by net
  • Inputs
  • Need context to pronounce
  • 7-letter window predict sound of middle letter
  • 29 possible characters alphabetspace,.
  • 729203 inputs
  • 80 Hidden nodes
  • Output Generate 60 phones
  • Nodes map to 26 units 21 articulatory, 5
    stress/sil
  • Vector quantization of acoustic space

25
Neural Net Example NETtalk
  • Learning to talk
  • 5 iterations/1024 training words bound/stress
  • 10 iterations intelligible
  • 400 new test words 80 correct
  • Not as good as DecTalk, but automatic

26
Neural Net Conclusions
  • Simulation based on neurons in brain
  • Perceptrons (single neuron)
  • Guaranteed to find linear discriminant
  • IF one exists -gt problem XOR
  • Neural nets (Multi-layer perceptrons)
  • Very general
  • Backpropagation training procedure
  • Gradient descent - local min, overfitting issues
Write a Comment
User Comments (0)
About PowerShow.com