Neural Network Architecture and Learning

About This Presentation

Title:

Neural Network Architecture and Learning

Description:

And use the results to compute gradients for intermediate weights. 3. Summary : ... Solution: Initialize weights to small nonzero values (on linear part of function) ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 17

Provided by: ccGa

Category:

more less

Transcript and Presenter's Notes

Title: Neural Network Architecture and Learning

1
Neural Network Architecture and Learning

Guest Lecture by
Some slides by Jim Rehg

2
Recursive Error Propagation
Now we can compute the errors further from the
output recursively. And use the results to
compute gradients for intermediate weights.
3
Summary Calculating the Gradient Using Backprop
Do forward pass with current parameters to obtain
Compute errors for the output nodes
Recursion for backpropagating the error from
output to input
Weight gradient given by (backwards) error and
(forwards) node prediction
Reduce Error through gradient-based methods
4
Properties of Neural Networks

Fixed number of basis functions adapt to the data
Universal function approximator
Wide range of architectural possibilities
Trivially easy to handle very large datasets
(out-of-memory training)
Patterns are presented to network sequentially
and weights updated
Backprop is efficient O(w)
Many ways to make it faster
Hessian update, conjugate gradient, fastprop, etc.

5
Adaptive Bases
6
Construction of Input-Output Mapping
7
Neural Net as Universal Function Approximator
Fig. 5.3 from Bishop, Inputs are all
one-dimensional in these examples. Neural nets
are powerful. Training data, Learned function
8
Modular Training Via Jacobian
Fig. 5.8 from Bishop

Given pre-trained model
How to update weight w in blue module
efficiently?
Green module has no effect
Red module participates in learning via its
Jacobian

9
Challenges in Neural Net Training

Objective function is nonlinear, nonconvex
Local minima are a significant problem
How to control capacity?

10
Capacity Control

Capacity of network is roughly the number of
hidden units
Many schemes for determining the number of hidden
units
Standard approach to capacity control is
regularization via early stopping

11
Early Stopping for Regularization
Fig. 5.13 from Bishop
12
Numerical Optimization

Training is local, gradient-based method
Various techniques for avoiding local minima
Momentum, stochastic gradient, etc.
Initialization procedure must be well-designed
Suppose weights are chosen to saturate function
outputs?
Suppose weights are initialized to zero?
Solution Initialize weights to small nonzero
values (on linear part of function)

13
Invariance