The delta rule

About This Presentation

Title:

The delta rule

Description:

assumption: examples are drawn from a probability distribution. conditions for generalization ... Smooth activation functions are important for generalizing the ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 20

Provided by: sebasti67

Category:

more less

Transcript and Presenter's Notes

Title: The delta rule

1
The delta rule
2
Learn from your mistakes
3
If it aint broke, dont fix it.
4
Outline

Supervised learning problem
Delta rule
Delta rule as gradient descent
Hebb rule

5
Supervised learning

Given examples

Find perceptron such that

6
Example handwritten digits

Find a perceptron that detects twos.

7
Delta rule

Learning from mistakes.
delta difference between desired and actual
output.
Also called perceptron learning rule

8
Two types of mistakes

False positive
Make w less like x.
False negative
Make w more like x.
The update is always proportional to x.

9
Objective function

Gradient update
Stochastic gradient descent on
E0 means no mistakes.

10
Perceptron convergence theorem

Cycle through a set of examples.
Suppose a solution with zero error exists.
The perceptron learning rule finds a solution in
finite time.

11
If examples are nonseparable

The delta rule does not converge.
Objective function is not equal to the number of
mistakes.
No reason to believe that the delta rule
minimizes the number of mistakes.

12
Memorization generalization

Prescription minimize error on the training set
of examples
What is the error on a test set of examples?
Vapnik-Chervonenkis theory
assumption examples are drawn from a probability
distribution
conditions for generalization

13
contrast with Hebb rule

Assume that the teacher can drive the perceptron
to produce the desired output.
What are the objective functions?

14
Is the delta rule biological?

Actual output anti-Hebbian
Desired output Hebbian
Contrastive

15
Objective function

Hebb rule
distance from inputs
Delta rule
error in reproducing the output

16
Supervised vs. unsupervised

Classification vs. generation
I shall not today attempt further to define the
kinds of material pornography but I know it
when I see it.
Justice Potter Stewart

17
Smooth activation function

same except for slope of f
update is small when the argument of f has large
magnitude.

18
Objective function

Gradient update
Stochastic gradient descent on
E0 means zero error.

19
Smooth activation functions are important for
generalizing the delta rule to multilayer
perceptrons.

Write a Comment

User Comments (0)

About PowerShow.com

The delta rule - PowerPoint PPT Presentation

The delta rule

assumption: examples are drawn from a probability distribution. conditions for generalization ... Smooth activation functions are important for generalizing the ... – PowerPoint PPT presentation