Title: CS451CS551EE565 ARTIFICIAL INTELLIGENCE
1CS451/CS551/EE565ARTIFICIAL INTELLIGENCE
- Neural Networks
- 12-06-2006
- Prof. Janice T. Searleman
- jets_at_clarkson.edu, jetsza
2Outline
- Neural Nets
- Reading Assignment AIMA
- Chapter 20, section 20.5, Neural Networks
- Final Exam Mon, 12/11/06, 800 am, SC342
- HW7 posted due Wed. 12/06/06
3Connectionist models
- Key intuition Much of intelligence is in the
connections between the 10 billion neurons in the
human brain. - Neuron switching time is roughly 0.001 second
scene recognition time is about 0.1 second. This
suggests that the brain is massively parallel
because 100 computational steps are simply not
sufficient to accomplish scene recognition. - Development Formation of basic connection
topology - Learning Fine-tuning of topology Major
synaptic-efficiency changes. -
- The matrix IS the intelligence!
4Artificial Neural Networks (ANN)
- Distributed representational and computational
mechanism based (very roughly) on
neurophysiology. - A collection of simple interconnected processors
(neurons) that can learn complex behaviors
solve difficult problems. - Wide range of applications
- Supervised Learning
- Function Learning (Correct mapping from inputs to
outputs) - Time-Series Analysis, Forecasting, Controller
Design - Concept Learning
- Standard Machine Learning Classification tasks
Features gt Class - Unsupervised Learning
- Pattern Recognition (Associative Memory models)
- Words, Sounds, Faces, etc.
- Data Clustering
- Unsupervised Concept Learning
5NeuroComputing
- Nodes fire when sum (weighted inputs) gt
threshold. - Other varieties common unthresholded linear,
sigmoidal, etc. - Connection topologies vary widely across
applications - Weights vary in magnitude sign (stimulate or
inhibit) - Learning Finding proper topology weights
- Search process in the space of possible
topologies weights - Most ANN applications assume a fixed topology.
- The matrix IS the learning machine!
6Properties of connectionist models
- Many neuron-like threshold switching units.
- Many weighted interconnections between units.
- Highly parallel, distributed computation.
- Weights are tuned automatically.
- Especially useful for learning complex functions
with continuous-valued outputs and large numbers
of noisy inputs, which is the type that
logic-based techniques have difficulty with. - Fault-tolerant.
- Degrades gracefully.
7Neural Networks
node unit
ai g(ini)
A NODE
Wj,i
aj
input function
activation function
output
g
ini
output links
input links
ai
8Simple Computing Elements
- Each unit (node) receives signals from its input
links and computes a new activation level that it
sends along all output links. - Computation is split into two steps
- ini Wj,i aj , the linear step, and then
- ai g(ini), the nonlinear step.
j
9Possibilities for g
Step function
Sign function
Sigmoid (logistic) function
sign(x) 1, if x gt 0 -1, if x
lt 0
step(x) 1, if x gt threshold 0,
if x lt threshold (in picture above, threshold 0)
sigmoid(x) 1/(1e-x)
Adding an extra input with activation a0 - 1
and weight W0,j t is equivalent to having a
threshold at t. This way we can always assume a
0 threshold.
10Real vs artificial neurons
11Similarities with neurons
12Differences
13Neural Nets A Brief History
- McCulloch and Pitts 1943 Showed how neural-like
networks could compute - Rosenblatt 1950s Perceptrons
- Minsky Papert 1969 Perceptron Deficiencies
- Hopfield 1982 Hopfield Nets
- Hinton Sejnowski 1986 Boltzmann Machines
- Rumelhart et. al 1986 Multilayer nets with
- Backpropagation
14Universal computing elements
- In 1943, McCullough and Pitts showed that a
synchronous assembly of such neurons is a
universal computing machine. That is, any Boolean
function can be implemented with threshold (step
function) units.
15Implementing AND
-1
x1
1
W1.5
o(x1,x2)
x2
1
16Implementing OR
-1
x1
1
W0.5
o(x1,x2)
x2
1
o(x1,x2) 1 if 0.5 x1 x2 gt 0
0 otherwise
17Implementing NOT
-1
W-0.5
-1
o(x1,x2)
x1
18Implementing more complex Boolean functions
-1
0.5
1
x1
x1 or x2
1
-1
x2
1.5
1
1
x3
(x1 or x2) and x3
19Types of Neural Networks
- Feedforward Links are unidirectional, and there
are no cycles, i.e., the network is a directed
acyclic graph (DAG). Units are arranged in
layers, and each unit is linked only to units in
the next layer. There is no internal state other
than the weights. - Recurrent Links can form arbitrary topologies,
which can implement memory. Behavior can become
unstable, oscillatory, or chaotic.
20Feedforward Neural Net
21A recurrent network topology
- Hopfield net every unit i is connected to every
other unit j by weight Wij
Weights are assumed to be symmetric Wij Wji
Useful for associative memory after training on
a set of examples, a new stimulus will cause the
network to settle into an activation pattern
corresponding to the example in the training set
that most closely resembles the new stimulus.
22Hopfield Net
23- Perceptrons
- Hopfield Nets
- Multilayer Feedforward Nets
24Perceptrons
- Perceptrons are single-layer feedforward networks
- Each output unit is independent of the others
- Can assume a single output unit
- Activation of the output unit is calculated by
- O Step0( Wj xj )
- where xj is the activation of input unit j, and
we assume an additional weight and input to
represent the threshold
25Perceptron
26Multiple Perceptrons
27How can perceptrons be designed?
- The Perceptron Learning Theorem (Rosenblatt,
1960) Given enough training examples, there is
an algorithm that will learn any linearly
separable function. - Learning algorithm
- If the perceptron fires when it should not, make
each weight wi smaller by an amount proportional
to xi - If it fails to fire when it should, make each wi
proportionally larger
28The perceptron learning algorithm
- Inputs training set (x1,x2,,xn,t)
- Method
- Randomly initialize weights w(i), -0.5ltilt0.5
- Repeat for several epochs until convergence
- for each example
- Calculate network output o.
- Adjust weights
29Expressive limits of perceptrons
- Can the XOR function be represented by a
perceptron - (a network without a hidden layer)?
There is no assignment of values to w0,w1 and w2
that satisfies above inequalities. XOR cannot be
represented!
30So what can be represented using perceptrons?
and
or
Representation theorem 1 layer feedforward
networks can only represent linearly separable
functions. That is, the decision surface
separating positive from negative examples has to
be a plane.
31Why does the method work?
- The perceptron learning rule performs gradient
descent in weight space. - Error surface The surface that describes the
error on each example as a function of all the
weights in the network. A set of weights defines
a point on this surface. - We look at the partial derivative of the surface
with respect to each weight (i.e., the gradient
-- how much the error would change if we made a
small change in each weight). Then the weights
are being altered in an amount proportional to
the slope in each direction (corresponding to a
weight). Thus the network as a whole is moving
in the direction of steepest descent on the error
surface. - The error surface in weight space has a single
global minimum and no local minima. Gradient
descent is guaranteed to find the global minimum,
provided the learning rate is not so big that
that you overshoot it.
32- Perceptrons
- Hopfield Nets
- Multilayer Feedforward Nets
33Hopfield Nets
- John Hopfield, 1982
- distributed representation
- memory is stored as a pattern of activation
- different memories are different patterns on the
SAME PEs - distributed, asynchronous control
- each processor makes decisions based on local
situation - content-addressable memory
- a number of patterns can be stored in a net
- to retrieve a pattern, specify some (or all) of
it it will find the closest match - fault tolerance
- the network works even if a few PEs misbehave or
fail (graceful degradation) - also handles novel inputs well (robust)
34Distributed Information Storage Processing
- Information is stored in the weights with
- Concepts/Patterns spread over many weights, and
nodes. - Individual weights can hold info for many
different concepts
35Parallel Relaxation
- choose an arbitrary unit. if any neighbors are
active, compute the sum - if the sum is positive, then activate the unit
else, deactivate it - continue until a stable state is achieved (all
units have been considered no more units can
change) - Hopfield showed that given any set of weights and
any initial state, the parallel relaxation
algorithm would eventually settle into a stable
state
36Example Hopfield Net
Note that this is a stable state
37Test Input
What steady state does this converge to?
38Another Test Input
What steady state does this converge to?
39Four Stable States
40- Perceptrons
- Hopfield Nets
- Multilayer Feedforward Nets
41Multilayer Feedforward Net
42Multi-layer networks
- Multi-layer feedforward networks are trainable by
backpropagation provided the activation function
g is a differentiable function. - Threshold units dont qualify, but the logistic
function does.
43Sigmoid units