Title: RPROP Resilient Propagation
1RPROPResilient Propagation
- Students Novica Zarvic
- Roxana Grigoras
- Term Winter semester 2003/2004
- Lecture Machine Learning / Neural Networks
- Course Information Engineering
- Date 2004-01-06
2Content
- Part I
- General remarks
- Foundations (MLP, Supervised Learning,
Backpropagation and its problems) - Part II
- Description of the RProp Algorithm
- Example cases
- Part III
- Visualization with SNNS
- Discussion
-02-
3General remarks
I.
- Basis for this talk
- Rprop Description and Implementation Details
- (Technical report by Martin Riedmiller, January
1994) -
- URL http//lrb.cs.uni-dortmund.de/riedmill/publi
cations/rprop.details.ps.Z
-03-
4MLPMulti-Layer Perceptron
I.
Output layer
Input layer
Hidden layer(s)
Topology of a typical feed-forward network with
two hidden layers. The external input is
presented to the input layer, propagated forward
through the hidden layers and yields an output
activation vector in the output layer.
-04-
5Supervised Learning
I.
- Objective
- Tune the weights in the network such that the
network performs a desired mapping of input to
output activations.
-05-
6Principleof supervised learning (like BP or one
of its derivatives)
I.
- Presentation of the input pattern through
activation of the input units. The pattern set
consists of input activation vector xp and a
target vector tp. - Feedforward computation to get the resulting
output vector sp. - Compare sp with tp. Distance between the vectors
is measured by the function E ½ ?p ?n
tp sp 2 . (n number of units in output
layer, p
a pattern pair of the pattern set P) - Backpropagation of the errors from the output to
the input changes the weights of the connections.
This minimizes the error vector. - Changing the weights of all neurons with the
previous calculated values. -
-06-
7Problems of Backpropagation
I.
- ? No information about the complete error
function. It is difficult to choose a good
learning rate. - a. Local Minima of E
- b. Plateaus
- c. Oscillation
- d. Leaving good Minima
- ? It uses only weight-specific information
(partial derivative) to adapt weight-specific
parameters.
-07-
8RPROPResilient Propagation
II.
- What is the traditional Backpropagation algorithm
doing? - ? It modifies the weights of the partial
derivatives. (?E/ ?wij) - ? Problem The size of this differential does
not really represent the size of the necessary
modification of the weight changes. - ? Solution RProp does not count on the value of
the partial derivative. It considers only the
sign of the derivative to indicate the direction
of the weight update.
-08-
9RPROP-Description-
II.
- Effective learning scheme
- It performs a direct adaption of weight step
based on local gradient information - Basic principle of RProp is to eliminate the
harmful influence of the size of the partial
derivative on the weight step - It considers only the sign of the derivative to
indicate the direction of the weight update.
-09-
10RPROPResilient Propagation
II.
-10-
11RPROPWhat is ?ij ?
II.
- ?ij is an update value.
- The size of the weight change is exclusively
determined by this weight-specific update
value. - ?ij evolves during the learning process based on
its local sight on the errorfunction E, according
to the following learning-rule
-11-
12RPROP
II.
- The weight update ?wij follows a simple rule
- If the derivative is positive (increasing
error), the weight is decreased by its update
value. - If the derivative is negative, the update value
is added.
-12-
13RPROPOne exception (Take bad steps back!)
II.
- If the partial derivative changes sign, i.e. the
previous step was too large and the minimum was
missed, the previous weight update is reverted.
-13-
14RPROP-The pseudo code-
II.
-14-
15RPROP-Settings-
II.
- Increasing and decreasing factors
- ?- 0.5 (decrease factor)
- ? 1.2 (increase factor)
- Limits
- ?max 50.0 (upper limit)
- ?min 1e-6 (upper limit)
- Initial value
- ?o 0.1 (default setting)
-15-
16RPROPBackprop vs. RProp
III.
-16-
17RPROP-Discussion-
III.
- Compared to all other algorithms, only the sign
of the differential is used to perform learning
and adaptation. - The size of the derivative decreases
exponentially with the distance between the
weight and the output-layer. - Using RProp the size of the weight-step is
dependent only on the sequence of signs ?
learning is spread equally all over the entire
network.
-17-
18RPROP-Further material-
III.
- Advanced Supervised Learning in Multi-layer
Perceptrons From Backpropagation to Adaptive
Learning Algorithms (Martin Riedmiller) - A Direct Adaptive Method for Faster
Backpropagation Learning The RPROP Algorithm
(Martin Riedmiller) - Rprop Description and Implementation Details
(Martin Riedmiller)
-18-
19RPROPResilient Propagation
III.
- Thank you for listening!
- ?
-19-