Title: ICT619%20Intelligent%20Systems%20Topic%204:%20Artificial%20Neural%20Networks
1ICT619 Intelligent SystemsTopic 4 Artificial
Neural Networks
2Artificial Neural Networks
- PART A
- Introduction
- An overview of the biological neuron
- The synthetic neuron
- Structure and operation of an ANN
- Problem solving by an ANN
- Learning in ANNs
- ANN models
- Applications
- PART B
- Developing neural network applications
- Design of the network
- Training issues
- A comparison of ANN and ES
- Hybrid ANN systems
- Case Studies
3Introduction
- Artificial Neural Networks (ANN)
- Also known as
- Neural networks
- Neural computing (or neuro-computing) systems
- Connectionist models
- ANNs simulate the biological brain for problem
solving - This represents a totally different approach to
machine intelligence from the symbolic logic
approach - The biological brain is a massively parallel
system of interconnected processing elements - ANNs simulate a similar network of simple
processing elements at a greatly reduced scale
4Introduction
- ANNs adapt themselves using data to learn problem
solutions - ANNs can be particularly effective for problems
that are hard to solve using conventional
computing methods - First developed in the 1950s, slumped in 70s
- Great upsurge in interest in the mid 1980s
- Both ANNs and expert systems are non-algorithmic
tools for problem solving - ES rely on the solution being expressed as a set
of heuristics by an expert - ANNs learn solely from data.
5(No Transcript)
6An overview of the biological neuron
- Estimated 1000 billion neurons in the human
brain, with each connected to up to 10,000 others - Electrical impulses produced by a neuron travel
along the axon - The axon connects to dendrites through synaptic
junctions
7An overview of the biological neuron
8An overview of the biological neuron
- A neuron collects the excitation of its inputs
and "fires" (produces a burst of activity) when
the sum of its inputs exceeds a certain threshold - The strengths of a neurons inputs are modified
(enhanced or inhibited) by the synaptic junctions - Learning in our brains occurs through a
continuous process of new interconnections
forming between neurons, and adjustments at the
synaptic junctions
9The synthetic neuron
- A simple model of the biological neuron, first
proposed in 1943 by McCulloch and Pitts consists
of a summing function with an internal threshold,
and "weighted" inputs as shown below.
10The synthetic neuron (contd)
- For a neuron receiving n inputs, each input xi (
i ranging from 1 to n) is weighted by multiplying
it with a weight wi - The sum of the products wixi gives the net
activation value of the neuron - The activation value is subjected to a transfer
function to produce the neurons output - The weight value of the connection carrying
signals from a neuron i to a neuron j is termed
wij..
11Transfer functions
- These compute the output of a node from its net
activation. Among the popular transfer functions
are - Step function
- Signum (or sign) function
- Sigmoid function
- Hyperbolic tangent function
- In the step function, the neuron produces an
output only when its net activation reaches a
minimum value known as the threshold - For a binary neuron i, whose output is a 0 or 1
value, the step function can be summarised as
12Transfer functions (contd)
- The sign function returns a value between -1 and
1. To avoid confusion with 'sine' it is often
called signum.
outputi
1
0
activationi
-1
13Transfer functions (contd)
- The sigmoid
- The sigmoid transfer function produces a
continuous value in the range 0 to 1 - The parameter gain affects the slope of the
function around zero
14Transfer functions (contd)
- The hyperbolic tangent
- A variant of the sigmoid transfer function
- Has a shape similar to the sigmoid (like an S),
with the difference being that the value of
outputi ranges between 1 and 1.
15Structure and operation of an ANN
- The building block of an ANN is the artificial
neuron. It is characterised by - weighted inputs
- summing and transfer function
- The most common architecture of an ANN consists
of two or more layers of artificial neurons or
nodes, with each node in a layer connected to
every node in the following layer - Signals usually flow from the input layer, which
is directly subjected to an input pattern, across
one or more hidden layers towards the output
layer.
16Structure and operation of an ANN
- The most popular ANN architecture, known as the
multilayer perceptron (shown in diagram above),
follows this model. - In some models of the ANN, such as the
self-organising map (SOM) or Kohonen net, nodes
in the same layer may have interconnections among
them - In recurrent networks, connections can even go
backwards to nodes closer to input
17Problem solving by an ANN
- The inputs of an ANN are data values grouped
together to form a pattern - Each data value (component of the pattern vector)
is applied to one neuron in the input layer - The output value(s) of node(s) in the output
layer represent some function of the input
pattern
18Problem solving by an ANN (contd)
- In the example above, the ANN maps the input
pattern to either one of two classes -
- The ANN produces the output for an accurate
prediction, only if the functional relationships
between the relevant variables, namely the
components of the input pattern, and the
corresponding output, have been learned by the
ANN -
- Any three-layer ANN can (at least in theory)
represent the functional relationship between an
input pattern and its class -
- It may be difficult in practice for the ANN to
learn a given relationship
19Learning in ANN
- Common human learning behaviour repeatedly going
through same material, making mistakes and
learning until able to carry out a given task
successfully -
- Learning by most ANNs is modelled after this type
of human learning - Learned knowledge to solve a given problem is
stored in the interconnection weights of an ANN - The process by which an ANN arrives at the right
values of these weights is known as learning or
training -
20Learning in ANN (contd)
- Learning in ANNs takes place through an iterative
training process during which node
interconnection weight values are adjusted - Initial weights, usually small random values, are
assigned to the interconnections between the ANN
nodes. - Like knowledge acquisition in ES, learning in
ANNs can be the most time consuming phase in its
development
21Learning in ANNs (contd)
- ANN learning (or training) can be supervised or
unsupervised - In supervised training,
- data sets consisting of pairs, each one an input
patterns and its expected correct output value,
are used - The weight adjustments during each iteration aim
to reduce the error (difference between the
ANNs actual output and the expected correct
output) -
- Eg, a node producing a small negative output when
it is expected to produce a large positive one,
has its positive weight values increased and the
negative weight values decreased
22Learning in ANNs
- In supervised training,
- Pairs of sample input value and corresponding
output value are used to train the net repeatedly
until the output becomes satisfactorily accurate -
- In unsupervised training,
- there is no known expected output used for
guiding the weight adjustments - The function to be optimised can be any function
of the inputs and outputs, usually set by the
application - the net adapts itself to align its weight values
with training patterns - This results in groups of nodes responding
strongly to specific groups of similar inputs
patterns
23The two states of an ANN
- A neural network can be in one of two states
training mode or operation mode - Most ANNs learn off-line and do not change their
weights once training is finished and they are in
operation -
- In an ANN capable of on-line learning, training
and operation continue together - ANN training can be time consuming, but once
trained, the resulting network can be made to run
very efficiently providing fast responses
24ANN models
- ANNs are supposed to model the structure and
operation of the biological brain - But there are different types of neural networks
depending on the architecture, learning strategy
and operation -
- Three of the most well known models are
- The multilayer perceptron
- The Kohonen network (the Self-Organising Map)
- The Hopfield net
- The Multilayer Perceptron (MLP) is the most
popular ANN architecture
25The Multilayer Perceptron
- Nodes are arranged into an input layer, an output
layer and one or more hidden layers - Also known as the backpropagation network
because of the use of error values from the
output layer in the layers before it to calculate
weight adjustments during training. - Another name for the MLP is the feedforward
network.
26MLP learning algorithm
- The learning rule for the multilayer perceptron
is known as "the generalised delta rule" or the
"backpropagation rule" - The generalised delta rule repeatedly calculates
an error value for each input, which is a
function of the squared difference between the
expected correct output and the actual output - The calculated error is backpropagated from one
layer to the previous one, and is used to adjust
the weights between connecting layers -
27 MLP learning algorithm (contd)
- New weight Old weight change calculated from
square of errorError difference between
desired output and actual output - Training stops when error becomes acceptable, or
after a predetermined number of iterations - After training, the modified interconnection
weights form a sort of internal representation
that enables the ANN to generate desired outputs
when given the training inputs or even new
inputs that are similar to training inputs - This generalisation is a very important property
28The error landscape in a multilayer perceptron
- For a given pattern p, the error Ep can be
plotted against the weights to give the so called
error surface - The error surface is a landscape of hills and
valleys, with points of minimum error
corresponding to wells and maximum error found on
peaks. - The generalised delta rule aims to minimise Ep by
adjusting weights so that they correspond to
points of lowest error - It follows the method of gradient descent where
the changes are made in the steepest downward
direction - All possible solutions are depressions in the
error surface, known as basins of attraction
29The error landscape in a multilayer perceptron
Ep
j
i
30Learning difficulties in multilayer perceptrons -
local minima
- The MLP may fail to settle into the global
minimum of the error surface and instead find
itself in one of the local minima - This is due to the gradient descent strategy
followed - A number of alternative approaches can be taken
to reduce this possibility - Lowering the gain term progressively
- Used to influence rate at which weight changes
are made during training - Value by default is 1, but it may be gradually
reduced to reduce the rate of change as training
progresses
31Learning difficulties in multilayer
perceptrons(contd)
- Addition of more nodes for better representation
of patterns - Too few nodes (and consequently not enough
weights) can cause failure of the ANN to learn a
pattern - Introduction of a momentum term
- Determines effect of past weight changes on
current direction of movement in weight space - Momentum term is also a small numerical value in
the range 0 -1 - Addition of random noise to perturb the ANN out
of local minima - Usually done by adding small random values to
weights. - Takes the net to a different point in the error
space hopefully out of a local minimum
32The Kohonen network (the self-organising map)
- Biological systems display both supervised and
unsupervised learning behaviour - A neural network with unsupervised learning
capability is said to be self-organising - During training, the Kohonen net changes its
weights to learn appropriate associations,
without any right answers being provided
33The Kohonen network (contd)
- The Kohonen net consists of an input layer, that
distributes the inputs to every node in a second
layer, known as the competitive layer. - The competitive (output) layer is usually
organised into some 2-D or 3-D surface (feature
map)
34Operation of the Kohonen Net
- Each neuron in the competitive layer is connected
to other neurons in its neighbourhood - Neurons in the competitive layer have excitatory
(positively weighted) connections to immediate
neighbours and inhibitory (negatively weighted)
connections to more distant neurons. - As an input pattern is presented, some of the
neurons in the competitive layer are sufficiently
activated to produce outputs, which are fed to
other neurons in their neighbourhoods - The node with the set of input weights closest to
the input pattern component values produces the
largest output. This node is termed the best
matching (or winning) node
35Operation of the Kohonen Net(contd)
- During training, input weights of the best
matching node and its neighbours are adjusted to
make them resemble the input pattern even more
closely - At the completion of training, the best matching
node ends up with its input weight values aligned
with the input pattern and produces the strongest
output whenever that particular pattern is
presented - The nodes in the winning node's neighbourhood
also have their weights modified to settle down
to an average representation of that pattern
class - As a result, the net is able to represent
clusters of similar input patterns - a feature
found useful for data mining applications, for
example.
36The Hopfield Model
- The Hopfield net is the most widely known of all
the autoassociative - pattern completing - ANNs -
- In autoassociation, a noisy or partially
incomplete input pattern causes the network to
stabilise to a state corresponding to the
original pattern - It is also useful for optimisation tasks.
- The Hopfield net is a recurrent ANN in which the
output produced by each neuron is fed back as
input to all other neurons - Neurons computer a weighted sum with a step
transfer function.
37The Hopfield Model (contd)
- The Hopfield net has no iterative learning
algorithm as such. Patterns (or facts) are simply
stored by adjusting the weights to lower a term
called network energy - During operation, an input pattern is applied to
all neurons simultaneously and the network is
left to stabilise - Outputs from the neurons in the stable state form
the output of the network. - When presented with an input pattern, the net
outputs a stored pattern nearest to the presented
pattern.
38When ANNs should be applied
- Difficulties with some real-life problems
- Solutions are difficult, if not impossible, to
define algorithmically due mainly to the
unstructured nature - Too many variables and/or the interactions of
relevant variables not understood well - Input data may be partially corrupt or missing,
making it difficult for a logical sequence of
solution steps to function effectively
39When ANNs should be applied (contd)
- The typical ANN attempts to arrive at an answer
by learning to identify the right answer through
an iterative process of self-adaptation or
training - If there are many factors, with complex
interactions among them, the usual "linear"
statistical techniques may be inappropriate - If sufficient data is available, an ANN can find
the relevant functional relationship by means of
an adaptive learning procedure from the data
40Current applications of ANNs
- ANNs are good at recognition and classification
tasks - Due to their ability to recognise complex
patterns, ANNs have been widely applied in
character, handwritten text and signature
recognition, as well as more complex images such
as faces - They have also been used successfully for speech
recognition and synthesis - ANNs are being used in an increasing number of
applications where high-speed computation of
functions is important, eg, in industrial robotics
41Current applications of ANNs(contd)
- One of the more successful applications of ANNs
has been as a decision support tool in the area
of finance and banking - Some examples of commercial applications of ANN
are - Financial market analysis for investment decision
making - Sales support - targeting customers for
telemarketing - Bankruptcy prediction
- Intelligent flexible manufacturing systems
- Stock market prediction
- Resource allocation scheduling and management
of personnel and equipment
42ANN applications - broad categories
- According to a survey (Quaddus Khan, 2002)
covering the period 1988 up to mid 1998, the main
business application areas of ANNs are - Production (36)
- Information systems (20)
- Finance (18)
- Marketing distribution (14.5)
- Accounting/Auditing (5)
- Others (6.5)
43ANN applications - broad categories (contd)
- The levelling off of publications on ANN
applications may be attributed to the ANN moving
from the research to the commercial application
domain - The emergence of other intelligent system tools
may be another factor
44Some advantages of ANNs
- Able to take incomplete or corrupt data and
provide approximate results. - Good at generalisation, that is recognising
patterns similar to those learned during
training - Inherent parallelism makes them fault-tolerant
loss of a few interconnections or nodes leaves
the system relatively unaffected - Parallelism also makes ANNs fast and efficient
for handling large amounts of data.
45ANN State-of-the-art overview
- Currently neural network systems are available as
- Software simulation on conventional computers -
prevalent - Special purpose hardware that models the
parallelism of neurons. - ANN-based systems not likely to replace
conventional computing systems, but they are an
established alternative to the symbolic logic
approach to information processing - A new computing paradigm in the form of hybrid
intelligent systems has emerged - often involving
ANNs with other intelligent system tools
46REFERENCES
- AI Expert (special issue on ANN), June 1990.
- BYTE (special issue on ANN), Aug. 1989.
- Caudill,M., "The View from Now", AI Expert, June
1992, pp.27-31. - Dhar, V., Stein, R., Seven Methods for
Transforming Corporate Data into Business
Intelligence., Prentice Hall 1997 - Kirrmann,H., "Neural Computing The new gold rush
in informatics", IEEE Micro June 1989 pp. 7-9 - Lippman, R.P., "An Introduction to Computing with
Neural Nets", IEEE ASSP Magazine, April 1987
pp.4-21. - Lisboa, P., (Ed.) Neural Networks Current
Applications, Chapman Hall, 1992. - Negnevitsky, M. Artificial Intelligence A Guide
to Intelligent Systems, Addison-Wesley 2005.
47REFERENCES (contd)
- Quaddus, M. A., and Khan, M. S., "Evolution of
Artificial Neural Networks in Business
Applications An Empirical Investigation Using a
Growth Model", International Journal of
Management and Decision Making, Vol.3, No.1,
March 2002, pp.19-34.(see also ANN application
publications end note library files, ICT619 ftp
site) - Wasserman, P.D., Neural Computing, Theory and
Practice, Van Nostrand Reinhold, New York 1989 - Wong, B.K., Bodnovich, T.A., Selvi, Yakup,
"Neural Networks applications in business A
Review and Analysis of the literature (1988-95)",
Decision Support Systems, 19, 1997, pp. 301-320.
- Zahedi, F., Intelligent Systems for Business,
Wadsworth Publishing, Belmont, California, 1993. - http//www.doc.ic.ac.uk/nd/surprise_96/journal/vo
l4/cs11/report.html