Introduction to Artificial Neural Networks

About This Presentation

Title:

Introduction to Artificial Neural Networks

Description:

It is composed of a large number of highly interconnected processing elements ... Synapse in the neuron decides to amplify or attenuate the signal ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 68

Provided by: dinesh6

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Artificial Neural Networks

1
Introduction toArtificial Neural Networks

Dinesh P Mital
UMDNJ-SHRP

2
What is a Neural Network? An Artificial Neural
Network (ANN) is an information processing
paradigm that is inspired by the way biological
nervous systems, such as the brain, processes the
information. The key element of this paradigm
is the novel structure of the information
processing system. It is composed of a large
number of highly interconnected processing
elements (neurons) working to solve specific
problems.
3
Why use neural networks? Neural networks, have
remarkable ability to derive meaning from
complicated or imprecise data These networks
can be used to extract patterns and detect trends
that are too complex to be noticed by either
humans or other computer techniques. Adaptive
learning An ability to learn how to do tasks
based on the data given for training or initial
experience. Self-Organization An ANN can
create its own organization during learning time.
Real Time Operation. Ideal for Imprecise and
Noisy Data
4
Von Neumann Computer Versus Biological Neural
System Von Neumann Computer Biological Neural
System Processor Complex Simple High
speed Low speed One or a few A large
number Memory Separate from processor Integrate
d into Non-content addressable Content
addressable Computing Centralized Decentralized
Sequential Parallel Stored
program Self learning Reliability Very
vulnerable Robust Expertise Numerical and
symbolic Perceptual problems Operative
environment Well defined and constrained Poorly
defined and unconstrained
5
Comparison between Neural Networks, Expert
Systems Conventional Programming
6
Introduction

1943- The emergence of Connectionist AI
Artificial Neural Network
A Massive Interconnection of Parallel Distributed
Computing Elements
Inspired by the Human Brain
Massive Parallelism
Simple Computing Elements
Distribution of Knowledge

7
Biological Neuron
axon
8

Artificial Neural Networks
An ANN is a model that emulate the biological
neural network.
The biological neurons receive inputs through
dendrites and pass signal to other neurons
through axon.
Nucleus is the processing element in the neuron.
ANN is composed of artificial neurons these are
the processing elements (PE). Each of the neuron
receive input(s), processes inputs and delivers a
single output.
Synapse in the neuron decides to amplify or
attenuate the signal
A single output signal from axon can go to
multiple dendrites.

The basic element of a neural network is the
perceptron.
First proposed by Frank Rosenblatt in 1958 at
Cornell University, the perceptron has 5 basic
elements
an n-vector input, weights, summing function,
threshold device, and an output.
Outputs are in the form of -1 and/or 1. The
threshold has a setting which governs the output
based on the summation of input vectors. If the
summation falls below the threshold setting, a -1
is the output. If the summation exceeds the
threshold setting, 1 is the output.

10
(No Transcript)
11
Perceptron
b
bias
12
Mathematical Model
13
Perceptron
output
inputs
14

Artificial Neural Networks
Inputs to the perceptron can be raw data or
outputs from other processing elements.
Outputs of the perceptron can be the final
product or input to another neuron.
The Network
An ANN is composed of as a collection of neurons
that are grouped in layers- minimum of two layers
( input and output layers). Other layers are
known as hidden layers.
A typical structure is shown on the next page.
The processing of information is massively
parallel-as it is in our brain.

15
(No Transcript)
16
Multilayered Perceptron Model
Sigmoid function
17

Processing of Information
Inputs
Each input corresponds to a single attribute of
the problem.
For example for the diagnosis of a disease, each
symptom, could represent an input to one node.
Input could be image ( pattern) of skin texture,
if we are looking for the diagnosis of normal or
cancerous cells.
Outputs
The outputs of the network represent the
solution to a problem.
For diagnosis of a disease, the answer could be
yes or no.
Weights
A key element of ANN is weight.
Weight expresses relative strength of the
entering signal from various connections that
transfers data from input point to the output
point.

18
W13
x1W13
19

Processing of Information
Summation Function
Finds the weighted average of all input elements
entering the PE
If there are several output neurons. The output
at jth neuron is

Wij is weight from ith input node to the jth
output node.
20

Processing of Information
Transfer Function
The summation function computes the internal
cumulative signal value.
There is also an activation level of the neuron.
Based on the cumulative value of signal received
, the neuron may or may not produce an output.
The relationship between the activation level and
the output of the neuron may be linear or
non-linear.
The selection of the specific activation
function determines the networks operation.
One of the popular function is sigmoid function
where YT is the transformed value of Y.

Processing of Information
Transfer Function
The purpose of transfer function is to modify the
output level to a reasonable value (between 0 -
1). This transformation is done before the output
reaches the next level.
Example
x1 3 w1 0.2
x2 1 w2 0.4 PE Y 1.2
x3 2 w3 0.1 YT f(Y)
You can use simple threshold value.

X1
w1

YT
Y
?
X2
w2
X3
w3
22
Neurons Transfer Functions
1. Pure Linear Transfer Function 2. Hard Limit
Transfer Function 3. Log Sigmoid Transfer
Function
1
-1
23
A Multi-layered Network Function
24

Processing of Information
Learning
An ANN learns from its experience. The usual
process of learning involves three tasks
Compute output(s).
Compare outputs with desired patterns and
feed-back the error.
Adjust the weights and repeat the process.
The learning process starts by setting the
weights by some rules ( or randomly). The
difference between the actual output (y) and the
desired output(z) is called error (delta).
The objective is to minimize delta (error)to
zero. The reduction in error is done by changing
the weights.

The formal definition of learning in the context
of the network model can be given as the process
of updating network connection weights so that
the network can perform a specific task
efficiently.
The network learns ( or modifies) the connection
weights from the available training patterns (
data available).
The performance of the network improves over time
by iteratively updating weights in the network.
ANNs ability to automatically learn from examples
(data) makes them attractive and exciting (
instead of following a set of rules specified by
human experts)
This is one of the major advantage of the neural
networks over the traditional expert systems.

The key to the adaptive learning is to change
weights in right directions, so as to reduce the
error.
There are various algorithms for adjusting
weights - A few will be introduced later.
A Procedure for developing NN based applications
will be
1. Collect Data.
2. Separate the data into Training and Test Sets.
3. Define ( select) a Network Structure.
4. Select a Learning Algorithm.
5. Transform Data to Network Inputs ( training
data).
6. Start Training and Revise Weights until the
Error Criterion is Satisfied.
7. Stop and Test the results with Test data.
8. Implementation Use Network for Testing New
Cases.

General Considerations for Network Design
Which input attributes will be used to build and
study the network ?- use data points with min
correlation (independent).
Which network architecture will be suitable for
the study?
How many hidden layers should the network
contain?
How many nodes should there be in each hidden
layer?
What conditions will terminate the network
training?

Strengths of the network
Neural networks are very suitable for noisy or
partial data sets. Transfer functions, such as
sigmoid functions normally smoothen the
variations
ANNs can process and predict numeric as well as
categorical outcome.
ANNs can be used for applications that requires a
time element to be included in the data set.
Neural networks have performed well in certain
domains where rules are not defined and there is
no structure.
The network can be trained for supervised and
unsupervised clustering

Weaknesses
The biggest weakness is that they lack the
criterion for the decision ( reason?). This is
important at times.
The learning algorithms are not guaranteed to
converge to an optimal solution. However, you can
manipulate with various learning parameters.
Neural networks can be easily over-trained
(memorize) to a point of working well with
training data but perform poorly on test data.
You have to monitor this problem carefully.

30
Forward and Backward Propagation
31

Developing NN Models
One of the important step is the selection of
network structureWe will discuss the detailed
structures at a later stage
Associative Memory Systems
It refers to ability to recall complete
situations from partial information. Such systems
correlate input data with information stored in
memory,
Information can be recalled even from incomplete
or noisy inputs.
Associative memory systems can detect
similarities between new inputs stored input
patterns. Use distance criterion.
Hidden Layer Systems
Complex practical applications require one or
more hidden layers between inputs and outputs and
and a corresponding large number of weights.

Hidden Layer Systems ( contd)
Using more that three layers is rare.
Amount of computations involved is enormous.
Double Layered Networks
This structure does not require knowledge of
precise number of classes in the training data (
unsupervised). This is normally used in cases
where the output is not given. Only input data
are available.
Instead, it uses feed-forward and feed-backward
approach to adjust parameters/ weights as data
are analyzed to establish an arbitrary (
required ) number of categories that represent
the data presented to the system.

Back propagation Network
It is the most widely used architecture. It is
very popular technique that is relatively easy to
implement. It requires large amount of training
data for conditioning the network before using it
for predicting the outcome.
A back-propagation network includes at-least one
hidden layer.
The approach is considered as feed-forward/ back
propagation approach.
Limitations
NNs do not do well at tasks that are not driven
well by people.
They lack the explaining facility.
Training time can be excessive .

34
Back Propagation Algorithm

The most popular successful method.
Steps to be followed for the training
Select the next training pair from the training
set( input vector and the output).
Present the input vector to the network.
Network calculate the output of the network.
Network calculates the error between the network
output and the desired output.
Network back propagates the error
Adjust the weights of the network in a way that
minimizes the error.
Repeat the above steps for each vector in the
training set until the error is acceptable, for
each training data set..

35
Network Training

Supervised Learning
Network is presented with the input and the
desired output.
Uses a set of inputs for which the desired
outputs results / classes are known.The
difference between the desired and actual output
is used to calculate adjustment to weights of the
NN structure
Unsupervised Learning
Network is not shown the desired output.
Concept is similar to clustering
It tries to create classification in the outcome.

Unsupervised Learning
Only input stimuli (parameters) are presented to
the network. The network is self organizing, that
is, it organizes itself internally, so that each
hidden processing elements and weights responds
appropriately to a different set of input
stimuli.
No knowledge is supplied about the classification
of outputs. However, the number of categories
into which the network classifies the inputs can
be controlled by varying certain parameters in
the model. In any case, human expert must
examine the final classifications to assign a
meaning usefulness of results.
Reinforcement Learning
In between Supervised Unsupervised learning.
Network gets a feedback from the environment.

37
Learning ( Training) Algorithms The training
process requires a set of properly selected data
in the form of network inputs and target outputs.
During training, the weights and biases are
iteratively adjusted to minimize the network
performance function ( error). The default
performance function is mean square error. Input
data should be independent. Back- Propagation
learning algorithm There are many variation. The
commonly used one is gradient descent
algorithm x k1 xk - ?k gk Where xk is a
vector of current weights and biases and gk is
current gradient and ?k is the chosen learning
rate.
38
Back Propagation Learning Algorithm

It is the most commonly used generalization of
the delta rule. This procedure involves two
phases
Forward phase when the input is presented, it
propagates forward through the network to compute
output values for each processing element. For
each PE all the current outputs are compared with
the desired outputs and the error is computed.
Backward phase The calculated error in now fed
backward and weights are adjusted.
After completing both the phases, a new input is
presented for the further training.
This technique is slow and can cause instability
and has tendency to stuck in a local minima, but
it is still very popular.

39
Gradient Descent Algorithm The idea is to
calculate an error each time the network is
presented with a training vector (given that we
have supervised learning where there is a target
vector) and to perform a gradient descent on the
error - considered as function of the weights.
There will be a gradient or slope for each
weight. Thus, we find the weights which give the
minimal error. Typically the error criterion is
defined by the square difference between the
pattern output and the target output( least
squared error). The total error E, is then just
the sum of the pattern error square.
40
Error function (LMS)
Target output
Network output
41
(No Transcript)
42
This method of weight adjustment is also known
as steepest gradient descent technique or Widrow
and Hoff rule and is most common type. This is
also known as Delta rule.
43
Network Learning Rules Hebbian Rule The first
and the best known learning rule was introduced
by Donald Hebb. This basic rule is If a neuron
receives an input from another neuron, and if
both are highly active (mathematically have the
same sign), the weight between the neurons should
be strengthened.
where xi(t) and yj(t) are the outputs at nodes i
and j. wij are the weights between the nodes i
and j
44
Hopfield Rule This law is similar to Hebbs
Rule with the exception that it specifies the
magnitude of the strengthening or weakening. It
states, "if the desired output and the input are
both active or both inactive, increment the
connection weight by the learning rate, otherwise
decrement the weight by the learning
rate. (Most learning functions have some
provision for a learning rate, or a learning
constant. Usually this term is positive and
between zero and one.)
45
Error Correction Rule In the supervised learning,
the network is given a desired output(d) for each
input pattern. During the training period, the
actual output (y) may not be equal to the desired
output. The basic principle of the Error
Correction learning rule is to use the error
signal (d-y) to modify the connection weights to
gradually reduce the error.
46
The Delta Rule The Delta Rule is a further
variation of Hebbs Rule, and it is one of the
most commonly used. This rule is based on the
idea of continuously modifying the strengths of
the input connections to reduce the difference
(the delta) between the desired output value and
the actual output of a neuron. This rule changes
the connection weights in the way that minimizes
the mean squared error of the network. The error
is back propagated into previous layers one
layer at a time.
47
Application of Delta Rule
Target output
Network output
48
The process of back-propagating the network
errors continues until the first layer is
reached. The network of this type are called Feed
forward, Back-propagation and derives its name
from this method of computing the error term.
This rule is also referred to as the
Windrow-Hoff Learning Rule and the Least Mean
Square Learning Rule. Kohonens Learning
Law This procedure, developed by Teuvo Kohonen,
was inspired by learning in biological systems.
In this procedure, the neurons compete for the
opportunity to learn, or to update their weights.
49
The processing neuron with the largest output is
declared the winner and has the capability of
inhibiting its competitors as well as exciting
its neighbors. Only the winner is permitted
output, and only the winner plus its neighbors
are allowed to update their connection weights.
The Kohonen rule does not require desired
output. Therefore it is implemented in the
unsupervised methods of learning. The size of
the neighborhood may vary during the training
period. It narrows as the training proceeds.
50

Other Numerical algorithms used are
Conjugate gradient Algorithm
Resilient Back propagation
Fletcher-Reeves algorithm
Polak-Ribiere algorithm
Powel-Beale algorithm
Quasi-Newton Algorithms
Levenberg-Marquardt
These algorithms have been designed to optimize
the network performance. However, the training
time depend upon the chosen algorithm

51
Network Structures

Feed Forward Networks
Feed Forward Back Propagation Networks
Recurrent ( or feedback) Networks
Self Organizing Networks
Adaptive Resonance Learning Network Model
Hopfield Network

52
Network Archtectures Feed Forward Networks
There are no feedback paths and you can not
modify outputs based on the errors between the
actual output and the desired outputs. Feed
Forward Back Propagation Networks These types of
networks are most common and enough time has
already been devoted on this network architecture.
53

In most common family of feed forward networks,
called, multi-layered perceptrons, neurons are
organized into layers that have unidirectional
connections between them.
Feed forward networks are memory-less- in the
sense that their output is independent of the
previous network state.
Recurrent or feedback networks, on the other hand
are dynamic
In dynamic systems, when a new input pattern is
presented, the neuron outputs are computed.
Because of the feedback paths, the inputs to each
neurons are modified, which leads the network to
enter a new state.
Different network architectures require
appropriate learning algorithms.

Recurrent ( or feedback) Networks
In these networks there are feedback loops
present. These networks can learn from their
mistakes and are of highly adaptive in nature.
These kind of networks train slowly and work well
with noisy inputs.

Recurrent Network
55
Recurrent Network
56
Self Organizing Networks These class of networks
learn without being given the correct output (
classification) for the input pattern. It models
the neuro-biological systems fairly closely. This
auto-associative network is a single layer,
recurrent and highly connected. All the weights
must be initialized and both the weights and
inputs must be normalized or adjusted. Processing
elements compete for the privilege of learning.
It uses winner takes all, learning rule. The
node with the highest response and its neighbors
are allowed to adjust their weights. As the
training proceeds, the neighborhood becomes more
focused. The network trains very fast and works
well in the real-time mode. The network learns
continuously and adapts to changes very fast.
57
outputs
inputs
Self Organizing Network
58

Adaptive Resonance Theory - Learning Network
Model
Stability and plasticity is an important issue in
competitive learning. How do we learn new things
( plasticity) and yet retain stability to ensure
that the existing knowledge is not erased ( or
corrupted).
ART models are developed for this purpose. The
network has sufficient supply of output units,
but they are not used until deemed necessary.
A unit is said to be committed, if it is being
used.
An input vector and the stored pattern are said
to resonate when they are sufficiently closed
similar.
When the input vector is not sufficiently close
to any of the existing prototype, a new category
is created.

The similarity is pre defined and an uncommitted
unit is assigned to the new category.

outputs
inputs
ART Network
60

Hopfield Network
Hopfield networks use energy function as a tool
for designing recurrent network.
It is a single layered network. The property of
this network is is that it produces a content-
addressable memory, which correctly yields the
output from any sub-part (data) of sufficient
size. However, the storage capacity of patterns
is limited to about 15 of the number of nodes.
This is an important limitation.
This architecture is used for identification from
partially visible and/or noisy information-
military targets identification/ robotic control
systems.
The output of each PE is normally binary. The
final output can be binary or continuous type.

61
outputs
inputs
Hopfield Network
62
Pitfalls

ANN are not a panacea.
Long training period
May take several hours or even days.
Local Minima
Network may get trapped, resulting in an inferior
solution.
Complex Input Modeling
May not be suitable for all problems.

63
Advantages of ANNs

Learn over a period of time - from its mistakes.
It is learning while making decisions
Ability to generalize patterns.
Abstraction of essential features.
Ability to produce correct output even in the
presence of noise or imprecise data.

Applications
Basically, most applications of neural networks
fall into the following categories
Prediction
Uses input values to predict some output. e.g.
pick the best stocks in the market, predict
weather, identify people with cancer risk.
Classification
Use input values to determine the
classification. e.g. is the input the letter A,
is the blob of the video data a plane and what
kind of plane is it.
Data association
Like the classification, but it also recognizes
data that contains errors. e.g. not only identify
the characters that

were scanned but identify when the scanner is
not working properly.
Data Conceptualization
Analyze inputs so that grouping relationships
can be inferred. e.g. extract from a database the
names of those most likely to be a particular
product from the exiting products.
Data Filtering
Smoothen an input signal. e.g. take the noise
out of a telephone- voice signal.

Other Possible Applications
Signal processing suppress line noise, with
adaptive echo canceling.
Robotics - navigation, vision recognition
Pattern recognition, i.e. recognizing handwritten
characters, e.g. the current version of Apple's
Newton uses a neural net
Medical diagnosis, ADEs, selection of drugs,
diagnosis of diseases from given symptoms,etc.
Speech production reading text aloud (NETtalk)
Speech recognition.

Vision face recognition , edge detection, visual
search engines
Business,e.g., rules for mortgage decisions are
extracted from past decisions made by experienced
evaluators, resulting in a network that has a
high level of agreement with human experts.
Financial Applications time series analysis,
stock market prediction.
Data Compression speech signal, image, e.g.
faces
Game Playing backgammon, chess, go, ...