Title: FeedForward Neural Networks
1Feed-Forward Neural Networks
2Content
- Introduction
- Single-Layer Perceptron Networks
- Learning Rules for Single-Layer Perceptron
Networks - Perceptron Learning Rule
- Adline Leaning Rule
3Feed-Forward Neural Networks
4Historical Background
- 1943 McCulloch and Pitts proposed the first
computational models of neuron. - 1949 Hebb proposed the first learning rule.
- 1958 Rosenblatts work in perceptrons.
- 1969 Minsky and Paperts exposed limitation of
the theory. - 1970s Decade of dormancy for neural networks.
- 1980-90s Neural network return (self-organization,
back-propagation algorithms, etc)
5Nervous Systems
- Human brain contains 1011 neurons.
- Each neuron is connected 104 others.
- Some scientists compared the brain with a
complex, nonlinear, parallel computer. - The largest modern neural networks achieve the
complexity comparable to a nervous system of a
fly.
6Neurons
- The main purpose of neurons is to receive,
analyze and transmit further the information in a
form of signals (electric pulses). - When a neuron sends the information we say that a
neuron fires.
7Neurons
Acting through specialized projections known as
dendrites and axons, neurons carry information
throughout the neural network.
This animation demonstrates the firing of a
synapse between the pre-synaptic terminal of one
neuron to the soma (cell body) of another neuron.
8A Model of Artificial Neuron
9A Model of Artificial Neuron
10Feed-Forward Neural Networks
- Graph representation
- nodes neurons
- arrows signal flow directions
- A neural network that does not contain cycles
(feedback loops) is called a feedforward network
(or perceptron).
11Layered Structure
Hidden Layer(s)
12Knowledge and Memory
- The output behavior of a network is determined by
the weights. - Weights ? the memory of an NN.
- Knowledge ? distributed across the network.
- Large number of nodes
- increases the storage capacity
- ensures that the knowledge is robust
- fault tolerance.
- Store new information by changing weights.
13Pattern Classification
output pattern y
- Function x ? y
- The NNs output is used to distinguish between
and recognize different input patterns. - Different output patterns correspond to
particular classes of input patterns. - Networks with hidden layers can be used for
solving more complex problems then just a linear
pattern classification.
input pattern x
14Training
Training Set
. . .
. . .
Goal
. . .
. . .
15Generalization
- By properly training a neural network may produce
reasonable answers for input patterns not seen
during training (generalization). - Generalization is particularly useful for the
analysis of a noisy data (e.g. timeseries).
16Generalization
- By properly training a neural network may produce
reasonable answers for input patterns not seen
during training (generalization). - Generalization is particularly useful for the
analysis of a noisy data (e.g. timeseries).
17Applications
- Pattern classification
- Object recognition
- Function approximation
- Data compression
- Time series analysis and forecast
- . . .
18Feed-Forward Neural Networks
- Single-Layer Perceptron Networks
19The Single-Layered Perceptron
20The Single-Layered Perceptron
21Training a Single-Layered Perceptron
Training Set
Goal
22Learning Rules
- Linear Threshold Units (LTUs) Perceptron
Learning Rule - Linearly Graded Units (LGUs) Widrow-Hoff
learning Rule
Training Set
Goal
23Feed-Forward Neural Networks
- Learning Rules for
- Single-Layered Perceptron Networks
- Perceptron Learning Rule
- Adline Leaning Rule
24Perceptron
Linear Threshold Unit
sgn
25Perceptron
Goal
Linear Threshold Unit
sgn
26Example
Goal
Class 1
g(x) ?2x1 2x220
Class 2
27Augmented input vector
Goal
Class 1 (1)
Class 2 (?1)
28Augmented input vector
Goal
29Augmented input vector
Goal
A plane passes through the origin in the
augmented input space.
30Linearly Separable vs. Linearly Non-Separable
AND
OR
XOR
Linearly Separable
Linearly Separable
Linearly Non-Separable
31Goal
- Given training sets T1?C1 and T2 ? C2 with
elements in form of x(x1, x2 , ... , xm-1 , xm)
T , where x1, x2 , ... , xm-1 ?R and xm ?1. - Assume T1 and T2 are linearly separable.
- Find w(w1, w2 , ... , wm) T such that
32Goal
wTx 0 is a hyperplain passes through the origin
of augmented input space.
- Given training sets T1?C1 and T2 ? C2 with
elements in form of x(x1, x2 , ... , xm-1 , xm)
T , where x1, x2 , ... , xm-1 ?R and xm ?1. - Assume T1 and T2 are linearly separable.
- Find w(w1, w2 , ... , wm) T such that
33Observation
Which ws correctly classify x?
What trick can be used?
34Observation
Is this w ok?
w1x1 w2x2 0
35Observation
w1x1 w2x2 0
Is this w ok?
36Observation
w1x1 w2x2 0
Is this w ok?
How to adjust w?
?w ?
37Observation
Is this w ok?
How to adjust w?
?w ??x
reasonable?
gt0
lt0
38Observation
Is this w ok?
reasonable?
How to adjust w?
?w ?x
gt0
lt0
39Observation
Is this w ok?
?
?w ?
?x
??x
or
40Perceptron Learning Rule
Upon misclassification on
Define error
41Perceptron Learning Rule
Define error
42Perceptron Learning Rule
43Summary ? Perceptron Learning Rule
Based on the general weight learning rule.
correct
incorrect
44Summary ? Perceptron Learning Rule
Converge?
45Perceptron Convergence Theorem
- Exercise Reference some papers or textbooks to
prove the theorem.
If the given training set is linearly separable,
the learning process will converge in a finite
number of steps.
46The Learning Scenario
Linearly Separable.
47The Learning Scenario
48The Learning Scenario
49The Learning Scenario
50The Learning Scenario
51The Learning Scenario
w4 w3
w3
52The Learning Scenario
w
53The Learning Scenario
The demonstration is in augmented space.
w
Conceptually, in augmented space, we adjust the
weight vector to fit the data.
54Weight Space
A weight in the shaded area will give correct
classification for the positive example.
w
55Weight Space
A weight in the shaded area will give correct
classification for the positive example.
?w ?x
w
56Weight Space
A weight not in the shaded area will give correct
classification for the negative example.
w
57Weight Space
A weight not in the shaded area will give correct
classification for the negative example.
w
?w ??x
58The Learning Scenario in Weight Space
59The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w1
60The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w1
w1
w0
61The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w2
w1
w1
w0
62The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w2
w3
w1
w1
w0
63The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w1
w1
w0
64The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w5
w1
w1
w0
65The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w4
w2
w3
w5
w1
w6
w1
w0
66The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w7
w4
w2
w3
w5
w1
w6
w1
w0
67The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
68The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
69The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w10
w2
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
70The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w9
w10
w2
w11
w8
w7
w4
w2
w3
w5
w1
w6
w1
w0
71The Learning Scenario in Weight Space
To correctly classify the training set, the
weight must move into the shaded area.
w2
w11
w1
w0
Conceptually, in weight space, we move the weight
into the feasible region.
72Feed-Forward Neural Networks
- Learning Rules for
- Single-Layered Perceptron Networks
- Perceptron Learning Rule
- Adline Leaning Rule
73Adline (Adaptive Linear Element)
Widrow 1962
74Adline (Adaptive Linear Element)
In what condition, the goal is reachable?
Goal
Widrow 1962
75LMS (Least Mean Square)
Minimize the cost function (error function)
76Gradient Decent Algorithm
Our goal is to go downhill.
Contour Map
?w
(w1, w2)
77Gradient Decent Algorithm
Our goal is to go downhill.
How to find the steepest decent direction?
Contour Map
?w
(w1, w2)
78Gradient Operator
Let f(w) f (w1, w2,, wm) be a function over Rm.
Define
79Gradient Operator
df positive
df zero
df negative
Go uphill
Plain
Go downhill
80The Steepest Decent Direction
To minimize f , we choose ?w ?? ? f
df positive
df zero
df negative
Go uphill
Plain
Go downhill
81LMS (Least Mean Square)
Minimize the cost function (error function)
82Adline Learning Rule
Minimize the cost function (error function)
83Learning Modes
- Batch Learning Mode
- Incremental Learning Mode
84Comparisons
Habbian Assumption
Gradient Decent
Fundamental
Converge Asymptotically
Convergence
In finite steps
Linearly Separable
Linear Independence
Constraint