Title: CITS7212 Computational Intelligence
1CITS7212 Computational Intelligence
2Neural NetworksNature-Inspired
Inputs
Outputs
Neural Network
Outputs
Inputs
3Neural NetworksNature-Inspired
- Inspired by the brain
- Simple animal brains still capable of functions
that impossible for computers - Computers strong at
- Performing complex math
- Maintaining data
- Traditional computers struggle to recognize and
generalize patterns of the past for future
actions - Neural networks offer a different way to analyze
and recognize patterns within data
4Neural NetworksA Brief History
- 1940s
- McCulloch Pitts wrote paper on neurons and
modeled simple neural network - Reinforced in The Organisation of Behaviour
(Debb, 1949) - 1950s
- Took backseat as traditional computing took main
stage - 1959
- Multi ADAptive LINear Elements (MADALINE)
- Widrow Hoff of Standford
- Removed echoes from telephone lines
- First neural network used commercially
- Still in use
- Results
- Too much hype
- Unfulfilled promises
- Fear of Thinking machines
- Halted funding till 1981
5Neural NetworksA Brief History
- 1982
- John Hopfield presented paper to Academy of
Sciences - Neural networks not simply used to model brains
- Create useful devices
- Used charisma and mathematical analysis to
champion the technology - Japan announced a 5th Generation effort into
neural networks - Led the US to fear falling behind
- Funding began to flow again
- Post 1985
- Annual meetings hosted by American Institute of
Physics - Neural Networks for Computing
- IEEE first International Conference on Neural
Networks in 1987 drew 1,800 people - Discussions ongoing everywhere
6Neural NetworksThe Brain
- The Brain
- Interconnected network of neurons that collect,
process and disseminate electrical signals via
synapses. - Neurons
- Synapses
- Neuron
- Cells in the brain that performs aggregation and
dissemination of electrical signals - Interconnected in vast networks via synapses to
provide computational power and intelligence
7Neural NetworksRepresentation of the Brain
- Brain
- Interconnected network of neurons that collect,
process and disseminate electrical signals via
synapses - Neurons
- Synapses
- Neural Network
- Interconnected network of units (or nodes) that
collect, process and disseminate values via links
- Nodes
- Links
8Neural NetworksUnits (or Nodes)
- Represents a neuron in the brain
- Are the building blocks of neural networks
- Collects values via input links
- Determines output values through activation
function - Disseminates values via output links
9UnitsInput Function
- Each input link has two values
- aj the value received as input from of node j
- Wj,i a numeric weight associated with the link
connecting node j and node i, - determines strength and sign of connection
- Bias link
- a0 is reserved for a fixed input of -1
- Given a biased weight W0,i
- Determines threshold needed for a positive
response - Node i computes the weighted sum of its inputs
(ini) - ini ? Wj,iaj
10UnitsActivation function
- Activation function (g) is applied to the input
function (ini) to produce output (ai) of node i - ai g(ini )
- g( ? Wj,iaj )
n
j 0
11UnitsActivation functions
- A variety of activation functions
- Needs to meet two (2) desiderata
- Unit to be active (near 1) when given right
inputsUnit to be inactive (near 0) when given
wrong inputs - Activation needs to be nonlinear
- Prevents a neural network collapsing into a
simple linear function - Two commonly used functions
- Threshold function
- Sigmoid function
12Activation functionsThreshold function
- Function
- g(ini) 1 , if ini gt 0
- 0 , if ini 0
- Useful for classifying inputs into two groups.
- Used to build networks that function as
- Feature identifiers
- Boolean input computers
13Activation functionsSigmoid function
- Function
- g(ini)
- Also known as logistic function
- Main advantage is that it has a nice derivative
- g(ini) g(ini )(1 - g(ini))
- Helpful for the weight-learning algorithm to be
seen later.
1 (1 e-ini)
14UnitsOutput
- Output of node i will be ai
- Output (ai) direct result of activation function
(gi) on the input function (ini) - ai g(ini )
- ai g( ? Wj,iaj )
-
n
j 0
15Neural NetworksNetwork Structures
- Units (nodes) are the building blocks of neural
networks. - Power of neural networks comes from their
structure - How units are linked together
- Two main types
- Feed-forward networks (acyclic)
- Recurrent (cyclic)
16Feed-forward Networks
- Units (nodes) usually arranged in layers
- Each unit receives input only from units in the
preceding layer - Represent a function of its current inputs
- No internal state other than the weights on links
- Two types of feed-forward networks
- Single layer
- Multilayer
17Single-layer Feed-forward Networks
- Also called a perceptron network
- All inputs connected directly to outputs
- Input units typically disseminate 1 (on) or 0
(off) - Output units use the threshold activation
function - threshold(ini) 1 , if ini gt 0
- 0 , if ini 0
- Perceptron networks represent some Boolean
functions - e.g. Majority function with 7 input units (Fires
if more than half of n inputs are 1) - Bias (W0,i a0) -1 x n/2 -3.5
- Therefore to reach threshold, 4 or more units
must have aj 1
18Single-layer Feed-forward NetworksRepresentation
- Cannot represent all Boolean functions
- Threshold perceptron only returns 1 if its
weighted sum of inputs gt 0 - ? Wj,xj gt 0
- W . x 0 defines a hyperplane in the input space
- One side of it returns 0, other side returns 1
- Threshold perceptron can only solve functions
that are linearly separable
n
j0
19Neural Network Learning Approaches
- Hebbs Rule
- Strengthen weights between highly active neurons
- Hopfield Law
- Extension of Hebbs rule that increments or
decrements by a learning rate - Kohonens Learning Law
- Units compete for opportunity to learn, winner
can inhibit competitors or excite its neighbors. - The Delta Rule
- Adjust weights to minimise difference between
expected output and actual output for the
training set - Gradient Descent Rule
- Extension of Delta rule that also implements a
learning rate
20Single-layer Feed-forward NetworksLearning
- Two approaches to training
- Unsupervised
- Supervised
- Aim to minimise a measure of error on training
set, becomes an optimization search in weight
space - Measure of error is the sum of squared errors (E)
- Err y hw(x)
- where,
- x is the input
- hw (x) is the output of the perceptron on the
example, - y is the true ouput
- Weights for the network are adjusted using an
update algorithm - Wj ? Wj a x Err x g(in) x xj
- Wj ? Wj a x Err x xj
- where,
- a is the learning rate
// sigmoid // threshold
21Single-layer Feed-forward NetworksLearning
- Training examples are run through the network one
at a time (cycle or epoch) - For each epoch, weights are adjusted to reduce
error - If Err y - hw(x) is positive then the network
output is too small - Weights of positive inputs increased
- Weights of negative inputs decreased
- This continues until a stopping criteria has been
reached - Weight changes have become very small
- Run out of time
- Pseudocode
// The opposite happens when Err is negative
22Single-layer Feed-forward NetworksPerformance
- Better solving linearly separable functions cf.
decision-tree learning - Struggles to solve restaurant example which is
not linearly separable - Best plane through the data correctly classifies
only 65
23Multilayer Feed-forward Networks
- Most applications require at least three layers
- Input layer (e.g. read from files or electronic
sensors) - Hidden layer(s)
- Output layer (e.g. sends to another process or
device) - Enlarges space of hypotheses network can
represents - Large hidden layer represent any continuous
function of the inputs - TWO hidden layers can represent discontinuous
functions - Problem of choosing correct number of hidden
units in advance still a difficult task
24Multilayer Feed-forward Networks Learning
- Learning is similar to that in single-layer FFN
- Differences
- Output vector hw(x) rather than single value
- Example has output vector y
- Major difference with single-layer
- Calculation of the error at the hidden layers
- There is no training data to guide the values in
hidden layers - Solution
- back-propagate error from output layer to hidden
layers
25Multilayer Feed-forward Networks Back-propagation
- Extension of perceptron learning algorithm
- Wj ? Wj a x Err x g(in) x xj
- To simplify algorithm define
- ?i Err x g(in)
- New representation
- Wj ? Wj a x xj x ?i
- Hidden node j is responsible for fraction of the
error ?i in each of the output nodes to which it
connects - ?i values are divided among connections based on
link weighting (Wj,i) between hidden node and
output node
26Multilayer Feed-forward Networks Back-propagation
- These values are propagated back to provide the
?j values for hidden layer - ?j g(inj) ? Wj,i?i
- Weight update rule for input units to hidden
layers - Wk,j ? Wk,j a x ak x ?j
- Follow same process backwards for networks with
more layers
27Multilayer Feed-forward Networks Back-propagation
28Multilayer Feed-forward Networks Back-propagation
- Summary
- Compute the ? values for the output units, using
observed error - Starting with output layer, repeat the following
for each layer in the network until the earliest
layer is reached - Propagate the ? values back to previous layer
- Update the weights between the two layers
29Multilayer Feed-forward Networks Performance
- AIMS
- aim for convergence to something close to the
global optimum in weight space. - network that gives highest prediction accuracy on
validation sets - From restaurant example
- (a) Training curve converges to perfect fit for
training data - (b) Network learns well
- Not as fast as decision-tree learning for
obvious reasons - Much improved over single-layer
- Can handle complexity well but requires correct
network structures - Number of hidden layers and hidden units
30Neural NetworksConstruction guidelines
- No quantifiable approach to the layout of a
network for a particular application - Three rules (guidelines) followed by designers
- More complexity in relationship between the
inputs and outputs should lead to an increase in
the number of units in hidden layer(s). - If the process being modeled has multiple stages,
may require multiple hidden layers. If not,
multiple layers will enable memorization
(undesireable). - Amount of training data sets upper bound on
number of units in hidden layer. - (Number of input-output pairs in training set)
- (Number of input and output units in network)
- Scaling factor usually between five (5) and ten
(10). Larger values for noisy data. - Tradeoff exists - the more units in the hidden
layer(s), memorisation of training data more
likely to occur but also allows for more
complexity in relationship between inputs and
outputs.
Scaling factor
31Recurrent Networks
- Recurrent neural networks are an extension to
feed-forward networks - Feed-forward networks operate on an input space
- Recurrent networks operate on an input space AND
an internal state space - State space is a trace of what already has
already been processed by the network - Allows for a more dynamic system as its response
depends on initial state which may depend on
previous inputs (state space) - Allows for short-term memory which brings new
possibilities - Allows functionality more resembling a brain.
- Learning can be very slow - back propagation
through time (BPTT)
32Recurrent Networks
- Simple recurrent network (Elman network)
- Context units in input layer
- Connections from hidden to input layer with fixed
weight of 1 - Fully recurrent network
- All units are connected to all other units
Simple recurrent network
33Other Types of Neural Networks
- Kohonen self-organizing network
- Recurrent Networks
- Hopfield Network
- Echo state network
- Stochastic neural networks
- Modular neural networks
- Committee of machines (CoM)
- Cascading neural networks
34Neuroevolution
- Use of evolutionary algorithms for training the
network. - Two methods available
- Updates the weights on connections within network
topology - Updates the weights AND topology of the network
itself - Add a link - complexification
- Remove a link - simplification
- Software packages available to allow development
- NeuroEvolution of Augmenting Topologies (NEAT)
35Neural NetworksLimitations
- Not the solution for all computing problems
- For a neural network to be developed, a number of
requirements need to be met - Needs a data set that can characterize the
problem - A large set of data for training and testing the
network - An implementer who understands the problem and
can decide on the activation functions and
learning methods to be used - Adequate hardware to support the high demands for
processing power - Development of neural networks can be very
difficult - Neural architects emerging
- Art to the development process
36Neural NetworksLimitations
- Neural networks will continue to make mistakes
- Hard to ensure its the optimal network
- Not used for applications that require error free
results - Neural networks frequent applications where
humans also struggle to be right all the time - Used where the accuracy (being below 100) is a
better result than the alternative system - Some examples
- Pick stocks
- Approve or deny loans
37Neural NetworksApplication Areas
- Sensor processing
- System identification and control
- Vehicle control
- Process control
- Game-playing
- Backgammon
- Chess
- Racing
- Pattern recognition
- Radar systems
- Face identification
- Object recognition
38Neural NetworksApplication Areas
- Sequence recognition
- Gesture
- Speech
- Handwritten text
- Medical diagnosis
- Financial
- Automated trading systems
- Data mining
39Neural NetworksCommercial Packages
- All commercial packages claim to be
- Easy to use
- Powerful
- Customizable
- Packages available
- NeuroSolutions - http//www.neurosolutions.com/pro
ducts/ns/ - Peltarion Synapse - http//www.peltarion.com/
40Neural NetworksThe Future
- Hybrid systems
- Greater integration of fuzzy logic into neural
networks - Hardware specialized for neural networks
- Greater speed
- Neural networks use many more neurons
- Need more advanced high-performing hardware
- Allows greater functionality and performance
- New applications will emerge
- Current technologies will be improved
- Greater sophistication and accuracy with better
training methods and network architectures
41Neural NetworksThe Future
- Neural Networks might, in the future, allow
- Robots that can see, feel, and predict the world
around them - Wide-spread use of self-driving cars
- Composition of music
- Conversion of handwritten documents to word
processing documents - Discovery of trends in the human genome to aid in
the understanding of the data compiled by the
Human Genome Project - Self-diagnosis of medical problems
- and much more!
42Neural NetworksThe Future
43Neural NetworksSources
- Diagrams and pseudocode taken from
- S. Russell and P. Norvig, Section 20.5,
Artificial Intelligence A Modern Approach,
Prentice Hall, 2002 - Example perceptron algorithm
- http//lcn.epfl.ch/tutorial/english/perceptron/htm
l/index.html
44Neural NetworksThe End