Title: Introduction to Artificial Neural Networks
1Introduction toArtificial Neural Networks
- Dinesh P Mital
- UMDNJ-SHRP
2What is a Neural Network? An Artificial Neural
Network (ANN) is an information processing
paradigm that is inspired by the way biological
nervous systems, such as the brain, processes the
information. The key element of this paradigm
is the novel structure of the information
processing system. It is composed of a large
number of highly interconnected processing
elements (neurons) working to solve specific
problems.
3Why use neural networks? Neural networks, have
remarkable ability to derive meaning from
complicated or imprecise data These networks
can be used to extract patterns and detect trends
that are too complex to be noticed by either
humans or other computer techniques. Adaptive
learning An ability to learn how to do tasks
based on the data given for training or initial
experience. Self-Organization An ANN can
create its own organization during learning time.
Real Time Operation. Ideal for Imprecise and
Noisy Data
4Von Neumann Computer Versus Biological Neural
System Von Neumann Computer Biological Neural
System Processor Complex Simple High
speed Low speed One or a few A large
number Memory Separate from processor Integrate
d into Non-content addressable Content
addressable Computing Centralized Decentralized
Sequential Parallel Stored
program Self learning Reliability Very
vulnerable Robust Expertise Numerical and
symbolic Perceptual problems Operative
environment Well defined and constrained Poorly
defined and unconstrained
5Comparison between Neural Networks, Expert
Systems Conventional Programming
6Introduction
- 1943- The emergence of Connectionist AI
- Artificial Neural Network
- A Massive Interconnection of Parallel Distributed
Computing Elements - Inspired by the Human Brain
- Massive Parallelism
- Simple Computing Elements
- Distribution of Knowledge
7Biological Neuron
axon
8- Artificial Neural Networks
- An ANN is a model that emulate the biological
neural network. - The biological neurons receive inputs through
dendrites and pass signal to other neurons
through axon. - Nucleus is the processing element in the neuron.
- ANN is composed of artificial neurons these are
the processing elements (PE). Each of the neuron
receive input(s), processes inputs and delivers a
single output. - Synapse in the neuron decides to amplify or
attenuate the signal - A single output signal from axon can go to
multiple dendrites.
9- The basic element of a neural network is the
perceptron. - First proposed by Frank Rosenblatt in 1958 at
Cornell University, the perceptron has 5 basic
elements - an n-vector input, weights, summing function,
threshold device, and an output. - Outputs are in the form of -1 and/or 1. The
threshold has a setting which governs the output
based on the summation of input vectors. If the
summation falls below the threshold setting, a -1
is the output. If the summation exceeds the
threshold setting, 1 is the output.
10(No Transcript)
11Perceptron
b
bias
12Mathematical Model
13Perceptron
output
inputs
14- Artificial Neural Networks
- Inputs to the perceptron can be raw data or
outputs from other processing elements. - Outputs of the perceptron can be the final
product or input to another neuron. - The Network
- An ANN is composed of as a collection of neurons
that are grouped in layers- minimum of two layers
( input and output layers). Other layers are
known as hidden layers. - A typical structure is shown on the next page.
- The processing of information is massively
parallel-as it is in our brain.
15(No Transcript)
16Multilayered Perceptron Model
Sigmoid function
17- Processing of Information
- Inputs
- Each input corresponds to a single attribute of
the problem. - For example for the diagnosis of a disease, each
symptom, could represent an input to one node. - Input could be image ( pattern) of skin texture,
if we are looking for the diagnosis of normal or
cancerous cells. - Outputs
- The outputs of the network represent the
solution to a problem. - For diagnosis of a disease, the answer could be
yes or no. - Weights
- A key element of ANN is weight.
- Weight expresses relative strength of the
entering signal from various connections that
transfers data from input point to the output
point.
18W13
x1W13
19- Processing of Information
- Summation Function
- Finds the weighted average of all input elements
entering the PE - If there are several output neurons. The output
at jth neuron is
Wij is weight from ith input node to the jth
output node.
20- Processing of Information
- Transfer Function
- The summation function computes the internal
cumulative signal value. - There is also an activation level of the neuron.
Based on the cumulative value of signal received
, the neuron may or may not produce an output. - The relationship between the activation level and
the output of the neuron may be linear or
non-linear. - The selection of the specific activation
function determines the networks operation. - One of the popular function is sigmoid function
where YT is the transformed value of Y.
21- Processing of Information
- Transfer Function
- The purpose of transfer function is to modify the
output level to a reasonable value (between 0 -
1). This transformation is done before the output
reaches the next level. - Example
- x1 3 w1 0.2
- x2 1 w2 0.4 PE Y 1.2
- x3 2 w3 0.1 YT f(Y)
- You can use simple threshold value.
X1
w1
YT
Y
?
X2
w2
X3
w3
22Neurons Transfer Functions
1. Pure Linear Transfer Function 2. Hard Limit
Transfer Function 3. Log Sigmoid Transfer
Function
1
-1
23A Multi-layered Network Function
24- Processing of Information
- Learning
- An ANN learns from its experience. The usual
process of learning involves three tasks - Compute output(s).
- Compare outputs with desired patterns and
feed-back the error. - Adjust the weights and repeat the process.
- The learning process starts by setting the
weights by some rules ( or randomly). The
difference between the actual output (y) and the
desired output(z) is called error (delta). - The objective is to minimize delta (error)to
zero. The reduction in error is done by changing
the weights.
25- The formal definition of learning in the context
of the network model can be given as the process
of updating network connection weights so that
the network can perform a specific task
efficiently. - The network learns ( or modifies) the connection
weights from the available training patterns (
data available). - The performance of the network improves over time
by iteratively updating weights in the network. - ANNs ability to automatically learn from examples
(data) makes them attractive and exciting (
instead of following a set of rules specified by
human experts) - This is one of the major advantage of the neural
networks over the traditional expert systems.
26- The key to the adaptive learning is to change
weights in right directions, so as to reduce the
error. - There are various algorithms for adjusting
weights - A few will be introduced later. - A Procedure for developing NN based applications
will be - 1. Collect Data.
- 2. Separate the data into Training and Test Sets.
- 3. Define ( select) a Network Structure.
- 4. Select a Learning Algorithm.
- 5. Transform Data to Network Inputs ( training
data). - 6. Start Training and Revise Weights until the
Error Criterion is Satisfied. - 7. Stop and Test the results with Test data.
- 8. Implementation Use Network for Testing New
Cases.
27- General Considerations for Network Design
- Which input attributes will be used to build and
study the network ?- use data points with min
correlation (independent). - Which network architecture will be suitable for
the study? - How many hidden layers should the network
contain? - How many nodes should there be in each hidden
layer? - What conditions will terminate the network
training?
28- Strengths of the network
- Neural networks are very suitable for noisy or
partial data sets. Transfer functions, such as
sigmoid functions normally smoothen the
variations - ANNs can process and predict numeric as well as
categorical outcome. - ANNs can be used for applications that requires a
time element to be included in the data set. - Neural networks have performed well in certain
domains where rules are not defined and there is
no structure. - The network can be trained for supervised and
unsupervised clustering
29- Weaknesses
- The biggest weakness is that they lack the
criterion for the decision ( reason?). This is
important at times. - The learning algorithms are not guaranteed to
converge to an optimal solution. However, you can
manipulate with various learning parameters. - Neural networks can be easily over-trained
(memorize) to a point of working well with
training data but perform poorly on test data.
You have to monitor this problem carefully.
30Forward and Backward Propagation
31- Developing NN Models
- One of the important step is the selection of
network structureWe will discuss the detailed
structures at a later stage - Associative Memory Systems
- It refers to ability to recall complete
situations from partial information. Such systems
correlate input data with information stored in
memory, - Information can be recalled even from incomplete
or noisy inputs. - Associative memory systems can detect
similarities between new inputs stored input
patterns. Use distance criterion. - Hidden Layer Systems
- Complex practical applications require one or
more hidden layers between inputs and outputs and
and a corresponding large number of weights.
32- Hidden Layer Systems ( contd)
- Using more that three layers is rare.
- Amount of computations involved is enormous.
- Double Layered Networks
- This structure does not require knowledge of
precise number of classes in the training data (
unsupervised). This is normally used in cases
where the output is not given. Only input data
are available. - Instead, it uses feed-forward and feed-backward
approach to adjust parameters/ weights as data
are analyzed to establish an arbitrary (
required ) number of categories that represent
the data presented to the system.
33- Back propagation Network
- It is the most widely used architecture. It is
very popular technique that is relatively easy to
implement. It requires large amount of training
data for conditioning the network before using it
for predicting the outcome. - A back-propagation network includes at-least one
hidden layer. - The approach is considered as feed-forward/ back
propagation approach. - Limitations
- NNs do not do well at tasks that are not driven
well by people. - They lack the explaining facility.
- Training time can be excessive .
34Back Propagation Algorithm
- The most popular successful method.
- Steps to be followed for the training
- Select the next training pair from the training
set( input vector and the output). - Present the input vector to the network.
- Network calculate the output of the network.
- Network calculates the error between the network
output and the desired output. - Network back propagates the error
- Adjust the weights of the network in a way that
minimizes the error. - Repeat the above steps for each vector in the
training set until the error is acceptable, for
each training data set..
35Network Training
- Supervised Learning
- Network is presented with the input and the
desired output. - Uses a set of inputs for which the desired
outputs results / classes are known.The
difference between the desired and actual output
is used to calculate adjustment to weights of the
NN structure - Unsupervised Learning
- Network is not shown the desired output.
- Concept is similar to clustering
- It tries to create classification in the outcome.
36- Unsupervised Learning
- Only input stimuli (parameters) are presented to
the network. The network is self organizing, that
is, it organizes itself internally, so that each
hidden processing elements and weights responds
appropriately to a different set of input
stimuli. - No knowledge is supplied about the classification
of outputs. However, the number of categories
into which the network classifies the inputs can
be controlled by varying certain parameters in
the model. In any case, human expert must
examine the final classifications to assign a
meaning usefulness of results. - Reinforcement Learning
- In between Supervised Unsupervised learning.
- Network gets a feedback from the environment.
37Learning ( Training) Algorithms The training
process requires a set of properly selected data
in the form of network inputs and target outputs.
During training, the weights and biases are
iteratively adjusted to minimize the network
performance function ( error). The default
performance function is mean square error. Input
data should be independent. Back- Propagation
learning algorithm There are many variation. The
commonly used one is gradient descent
algorithm x k1 xk - ?k gk Where xk is a
vector of current weights and biases and gk is
current gradient and ?k is the chosen learning
rate.
38Back Propagation Learning Algorithm
- It is the most commonly used generalization of
the delta rule. This procedure involves two
phases - Forward phase when the input is presented, it
propagates forward through the network to compute
output values for each processing element. For
each PE all the current outputs are compared with
the desired outputs and the error is computed. - Backward phase The calculated error in now fed
backward and weights are adjusted. - After completing both the phases, a new input is
presented for the further training. - This technique is slow and can cause instability
and has tendency to stuck in a local minima, but
it is still very popular.
39Gradient Descent Algorithm The idea is to
calculate an error each time the network is
presented with a training vector (given that we
have supervised learning where there is a target
vector) and to perform a gradient descent on the
error - considered as function of the weights.
There will be a gradient or slope for each
weight. Thus, we find the weights which give the
minimal error. Typically the error criterion is
defined by the square difference between the
pattern output and the target output( least
squared error). The total error E, is then just
the sum of the pattern error square.
40Error function (LMS)
Target output
Network output
41(No Transcript)
42 This method of weight adjustment is also known
as steepest gradient descent technique or Widrow
and Hoff rule and is most common type. This is
also known as Delta rule.
43Network Learning Rules Hebbian Rule The first
and the best known learning rule was introduced
by Donald Hebb. This basic rule is If a neuron
receives an input from another neuron, and if
both are highly active (mathematically have the
same sign), the weight between the neurons should
be strengthened.
where xi(t) and yj(t) are the outputs at nodes i
and j. wij are the weights between the nodes i
and j
44Hopfield Rule This law is similar to Hebbs
Rule with the exception that it specifies the
magnitude of the strengthening or weakening. It
states, "if the desired output and the input are
both active or both inactive, increment the
connection weight by the learning rate, otherwise
decrement the weight by the learning
rate. (Most learning functions have some
provision for a learning rate, or a learning
constant. Usually this term is positive and
between zero and one.)
45Error Correction Rule In the supervised learning,
the network is given a desired output(d) for each
input pattern. During the training period, the
actual output (y) may not be equal to the desired
output. The basic principle of the Error
Correction learning rule is to use the error
signal (d-y) to modify the connection weights to
gradually reduce the error.
46 The Delta Rule The Delta Rule is a further
variation of Hebbs Rule, and it is one of the
most commonly used. This rule is based on the
idea of continuously modifying the strengths of
the input connections to reduce the difference
(the delta) between the desired output value and
the actual output of a neuron. This rule changes
the connection weights in the way that minimizes
the mean squared error of the network. The error
is back propagated into previous layers one
layer at a time.
47Application of Delta Rule
Target output
Network output
48The process of back-propagating the network
errors continues until the first layer is
reached. The network of this type are called Feed
forward, Back-propagation and derives its name
from this method of computing the error term.
This rule is also referred to as the
Windrow-Hoff Learning Rule and the Least Mean
Square Learning Rule. Kohonens Learning
Law This procedure, developed by Teuvo Kohonen,
was inspired by learning in biological systems.
In this procedure, the neurons compete for the
opportunity to learn, or to update their weights.
49The processing neuron with the largest output is
declared the winner and has the capability of
inhibiting its competitors as well as exciting
its neighbors. Only the winner is permitted
output, and only the winner plus its neighbors
are allowed to update their connection weights.
The Kohonen rule does not require desired
output. Therefore it is implemented in the
unsupervised methods of learning. The size of
the neighborhood may vary during the training
period. It narrows as the training proceeds.
50- Other Numerical algorithms used are
- Conjugate gradient Algorithm
- Resilient Back propagation
- Fletcher-Reeves algorithm
- Polak-Ribiere algorithm
- Powel-Beale algorithm
- Quasi-Newton Algorithms
- Levenberg-Marquardt
- These algorithms have been designed to optimize
the network performance. However, the training
time depend upon the chosen algorithm
51Network Structures
- Feed Forward Networks
- Feed Forward Back Propagation Networks
- Recurrent ( or feedback) Networks
- Self Organizing Networks
- Adaptive Resonance Learning Network Model
- Hopfield Network
52Network Archtectures Feed Forward Networks
There are no feedback paths and you can not
modify outputs based on the errors between the
actual output and the desired outputs. Feed
Forward Back Propagation Networks These types of
networks are most common and enough time has
already been devoted on this network architecture.
53- In most common family of feed forward networks,
called, multi-layered perceptrons, neurons are
organized into layers that have unidirectional
connections between them. - Feed forward networks are memory-less- in the
sense that their output is independent of the
previous network state. - Recurrent or feedback networks, on the other hand
are dynamic - In dynamic systems, when a new input pattern is
presented, the neuron outputs are computed.
Because of the feedback paths, the inputs to each
neurons are modified, which leads the network to
enter a new state. - Different network architectures require
appropriate learning algorithms.
54- Recurrent ( or feedback) Networks
- In these networks there are feedback loops
present. These networks can learn from their
mistakes and are of highly adaptive in nature.
These kind of networks train slowly and work well
with noisy inputs.
Recurrent Network
55Recurrent Network
56Self Organizing Networks These class of networks
learn without being given the correct output (
classification) for the input pattern. It models
the neuro-biological systems fairly closely. This
auto-associative network is a single layer,
recurrent and highly connected. All the weights
must be initialized and both the weights and
inputs must be normalized or adjusted. Processing
elements compete for the privilege of learning.
It uses winner takes all, learning rule. The
node with the highest response and its neighbors
are allowed to adjust their weights. As the
training proceeds, the neighborhood becomes more
focused. The network trains very fast and works
well in the real-time mode. The network learns
continuously and adapts to changes very fast.
57outputs
inputs
Self Organizing Network
58- Adaptive Resonance Theory - Learning Network
Model - Stability and plasticity is an important issue in
competitive learning. How do we learn new things
( plasticity) and yet retain stability to ensure
that the existing knowledge is not erased ( or
corrupted). - ART models are developed for this purpose. The
network has sufficient supply of output units,
but they are not used until deemed necessary. - A unit is said to be committed, if it is being
used. - An input vector and the stored pattern are said
to resonate when they are sufficiently closed
similar. - When the input vector is not sufficiently close
to any of the existing prototype, a new category
is created.
59- The similarity is pre defined and an uncommitted
unit is assigned to the new category.
outputs
inputs
ART Network
60- Hopfield Network
- Hopfield networks use energy function as a tool
for designing recurrent network. - It is a single layered network. The property of
this network is is that it produces a content-
addressable memory, which correctly yields the
output from any sub-part (data) of sufficient
size. However, the storage capacity of patterns
is limited to about 15 of the number of nodes.
This is an important limitation. - This architecture is used for identification from
partially visible and/or noisy information-
military targets identification/ robotic control
systems. - The output of each PE is normally binary. The
final output can be binary or continuous type.
61outputs
inputs
Hopfield Network
62Pitfalls
- ANN are not a panacea.
- Long training period
- May take several hours or even days.
- Local Minima
- Network may get trapped, resulting in an inferior
solution. - Complex Input Modeling
- May not be suitable for all problems.
63Advantages of ANNs
- Learn over a period of time - from its mistakes.
It is learning while making decisions - Ability to generalize patterns.
- Abstraction of essential features.
- Ability to produce correct output even in the
presence of noise or imprecise data.
64- Applications
- Basically, most applications of neural networks
fall into the following categories - Prediction
- Uses input values to predict some output. e.g.
pick the best stocks in the market, predict
weather, identify people with cancer risk. - Classification
- Use input values to determine the
classification. e.g. is the input the letter A,
is the blob of the video data a plane and what
kind of plane is it. - Data association
- Like the classification, but it also recognizes
data that contains errors. e.g. not only identify
the characters that
65- were scanned but identify when the scanner is
not working properly. - Data Conceptualization
- Analyze inputs so that grouping relationships
can be inferred. e.g. extract from a database the
names of those most likely to be a particular
product from the exiting products. - Data Filtering
- Smoothen an input signal. e.g. take the noise
out of a telephone- voice signal.
66- Other Possible Applications
- Signal processing suppress line noise, with
adaptive echo canceling. - Robotics - navigation, vision recognition
- Pattern recognition, i.e. recognizing handwritten
characters, e.g. the current version of Apple's
Newton uses a neural net - Medical diagnosis, ADEs, selection of drugs,
diagnosis of diseases from given symptoms,etc. - Speech production reading text aloud (NETtalk)
- Speech recognition.
67- Vision face recognition , edge detection, visual
search engines - Business,e.g., rules for mortgage decisions are
extracted from past decisions made by experienced
evaluators, resulting in a network that has a
high level of agreement with human experts. - Financial Applications time series analysis,
stock market prediction. - Data Compression speech signal, image, e.g.
faces - Game Playing backgammon, chess, go, ...