CS451CS551EE565 ARTIFICIAL INTELLIGENCE - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

CS451CS551EE565 ARTIFICIAL INTELLIGENCE

Description:

Words, Sounds, Faces, etc. Data Clustering. Unsupervised Concept Learning ... Properties of connectionist models. Many neuron-like threshold switching units. ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 44

Provided by: janicets

Category:

more less

Transcript and Presenter's Notes

Title: CS451CS551EE565 ARTIFICIAL INTELLIGENCE

1
CS451/CS551/EE565ARTIFICIAL INTELLIGENCE

Neural Networks
12-06-2006
Prof. Janice T. Searleman
jets_at_clarkson.edu, jetsza

2
Outline

Neural Nets
Reading Assignment AIMA
Chapter 20, section 20.5, Neural Networks
Final Exam Mon, 12/11/06, 800 am, SC342
HW7 posted due Wed. 12/06/06

3
Connectionist models

Key intuition Much of intelligence is in the
connections between the 10 billion neurons in the
human brain.
Neuron switching time is roughly 0.001 second
scene recognition time is about 0.1 second. This
suggests that the brain is massively parallel
because 100 computational steps are simply not
sufficient to accomplish scene recognition.
Development Formation of basic connection
topology
Learning Fine-tuning of topology Major
synaptic-efficiency changes.
The matrix IS the intelligence!

4
Artificial Neural Networks (ANN)

Distributed representational and computational
mechanism based (very roughly) on
neurophysiology.
A collection of simple interconnected processors
(neurons) that can learn complex behaviors
solve difficult problems.
Wide range of applications
Supervised Learning
Function Learning (Correct mapping from inputs to
outputs)
Time-Series Analysis, Forecasting, Controller
Design
Concept Learning
Standard Machine Learning Classification tasks
Features gt Class
Unsupervised Learning
Pattern Recognition (Associative Memory models)
Words, Sounds, Faces, etc.
Data Clustering
Unsupervised Concept Learning

5
NeuroComputing

Nodes fire when sum (weighted inputs) gt
threshold.
Other varieties common unthresholded linear,
sigmoidal, etc.
Connection topologies vary widely across
applications
Weights vary in magnitude sign (stimulate or
inhibit)
Learning Finding proper topology weights
Search process in the space of possible
topologies weights
Most ANN applications assume a fixed topology.
The matrix IS the learning machine!

6
Properties of connectionist models

Many neuron-like threshold switching units.
Many weighted interconnections between units.
Highly parallel, distributed computation.
Weights are tuned automatically.
Especially useful for learning complex functions
with continuous-valued outputs and large numbers
of noisy inputs, which is the type that
logic-based techniques have difficulty with.
Fault-tolerant.
Degrades gracefully.

7
Neural Networks
node unit
ai g(ini)
A NODE
Wj,i
aj
input function
activation function
output
g
ini
output links
input links
ai
8
Simple Computing Elements

Each unit (node) receives signals from its input
links and computes a new activation level that it
sends along all output links.
Computation is split into two steps
ini Wj,i aj , the linear step, and then
ai g(ini), the nonlinear step.

j
9
Possibilities for g
Step function
Sign function
Sigmoid (logistic) function
sign(x) 1, if x gt 0 -1, if x
lt 0
step(x) 1, if x gt threshold 0,
if x lt threshold (in picture above, threshold 0)
sigmoid(x) 1/(1e-x)
Adding an extra input with activation a0 - 1
and weight W0,j t is equivalent to having a
threshold at t. This way we can always assume a
0 threshold.
10
Real vs artificial neurons
11
Similarities with neurons
12
Differences
13
Neural Nets A Brief History

McCulloch and Pitts 1943 Showed how neural-like
networks could compute
Rosenblatt 1950s Perceptrons
Minsky Papert 1969 Perceptron Deficiencies
Hopfield 1982 Hopfield Nets
Hinton Sejnowski 1986 Boltzmann Machines
Rumelhart et. al 1986 Multilayer nets with
Backpropagation

14
Universal computing elements

In 1943, McCullough and Pitts showed that a
synchronous assembly of such neurons is a
universal computing machine. That is, any Boolean
function can be implemented with threshold (step
function) units.

15
Implementing AND
-1
x1
1
W1.5
o(x1,x2)
x2
1
16
Implementing OR
-1
x1
1
W0.5
o(x1,x2)
x2
1
o(x1,x2) 1 if 0.5 x1 x2 gt 0
0 otherwise
17
Implementing NOT
-1
W-0.5
-1
o(x1,x2)
x1
18
Implementing more complex Boolean functions
-1
0.5
1
x1
x1 or x2
1
-1
x2
1.5
1
1
x3
(x1 or x2) and x3
19
Types of Neural Networks

Feedforward Links are unidirectional, and there
are no cycles, i.e., the network is a directed
acyclic graph (DAG). Units are arranged in
layers, and each unit is linked only to units in
the next layer. There is no internal state other
than the weights.
Recurrent Links can form arbitrary topologies,
which can implement memory. Behavior can become
unstable, oscillatory, or chaotic.

20
Feedforward Neural Net
21
A recurrent network topology

Hopfield net every unit i is connected to every
other unit j by weight Wij

Weights are assumed to be symmetric Wij Wji
Useful for associative memory after training on
a set of examples, a new stimulus will cause the
network to settle into an activation pattern
corresponding to the example in the training set
that most closely resembles the new stimulus.
22
Hopfield Net
23

Perceptrons
Hopfield Nets
Multilayer Feedforward Nets

24
Perceptrons

Perceptrons are single-layer feedforward networks
Each output unit is independent of the others
Can assume a single output unit
Activation of the output unit is calculated by
O Step0( Wj xj )
where xj is the activation of input unit j, and
we assume an additional weight and input to
represent the threshold

25
Perceptron
26
Multiple Perceptrons
27
How can perceptrons be designed?

The Perceptron Learning Theorem (Rosenblatt,
1960) Given enough training examples, there is
an algorithm that will learn any linearly
separable function.
Learning algorithm
If the perceptron fires when it should not, make
each weight wi smaller by an amount proportional
to xi
If it fails to fire when it should, make each wi
proportionally larger

28
The perceptron learning algorithm

Inputs training set (x1,x2,,xn,t)
Method
Randomly initialize weights w(i), -0.5ltilt0.5
Repeat for several epochs until convergence
for each example
Calculate network output o.
Adjust weights

29
Expressive limits of perceptrons

Can the XOR function be represented by a
perceptron
(a network without a hidden layer)?

There is no assignment of values to w0,w1 and w2
that satisfies above inequalities. XOR cannot be
represented!
30
So what can be represented using perceptrons?
and
or
Representation theorem 1 layer feedforward
networks can only represent linearly separable
functions. That is, the decision surface
separating positive from negative examples has to
be a plane.
31
Why does the method work?

The perceptron learning rule performs gradient
descent in weight space.
Error surface The surface that describes the
error on each example as a function of all the
weights in the network. A set of weights defines
a point on this surface.
We look at the partial derivative of the surface
with respect to each weight (i.e., the gradient
-- how much the error would change if we made a
small change in each weight). Then the weights
are being altered in an amount proportional to
the slope in each direction (corresponding to a
weight). Thus the network as a whole is moving
in the direction of steepest descent on the error
surface.
The error surface in weight space has a single
global minimum and no local minima. Gradient
descent is guaranteed to find the global minimum,
provided the learning rate is not so big that
that you overshoot it.

Perceptrons
Hopfield Nets
Multilayer Feedforward Nets

33
Hopfield Nets

John Hopfield, 1982
distributed representation
memory is stored as a pattern of activation
different memories are different patterns on the
SAME PEs
distributed, asynchronous control
each processor makes decisions based on local
situation
content-addressable memory
a number of patterns can be stored in a net
to retrieve a pattern, specify some (or all) of
it it will find the closest match
fault tolerance
the network works even if a few PEs misbehave or
fail (graceful degradation)
also handles novel inputs well (robust)

34
Distributed Information Storage Processing

Information is stored in the weights with
Concepts/Patterns spread over many weights, and
nodes.
Individual weights can hold info for many
different concepts

35
Parallel Relaxation

choose an arbitrary unit. if any neighbors are
active, compute the sum
if the sum is positive, then activate the unit
else, deactivate it
continue until a stable state is achieved (all
units have been considered no more units can
change)
Hopfield showed that given any set of weights and
any initial state, the parallel relaxation
algorithm would eventually settle into a stable
state

36
Example Hopfield Net
Note that this is a stable state
37
Test Input
What steady state does this converge to?
38
Another Test Input
What steady state does this converge to?
39
Four Stable States
40

Perceptrons
Hopfield Nets
Multilayer Feedforward Nets

41
Multilayer Feedforward Net
42
Multi-layer networks

Multi-layer feedforward networks are trainable by
backpropagation provided the activation function
g is a differentiable function.
Threshold units dont qualify, but the logistic
function does.

43
Sigmoid units