Introduction to Artificial Neural Networks - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Introduction to Artificial Neural Networks

Description:

The rule goes as follows: ... It uses 'winner takes all', learning rule. ... As the training proceeds, the neighborhood becomes more focused. ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 76
Provided by: dinesh6
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Artificial Neural Networks


1
Introduction toArtificial Neural Networks
Neural networks do not perform miracles. But if
used sensibly they can produce some amazing
results.
  • Dinesh P Mital
  • UMDNJ-SHRP

2
What is a Neural Network? An Artificial Neural
Network (ANN) is an information processing
paradigm that is inspired by the way biological
nervous systems, such as the brain, processes the
information. The key element of this paradigm
is the novel structure of the information
processing system. It is composed of a large
number of highly interconnected processing
elements (neurons) working to solve specific
problems.
3
Why use neural networks? Neural networks, have
remarkable ability to derive meaning from
complicated or imprecise data These networks
can be used to extract patterns and detect trends
that are too complex to be noticed by either
humans or other computer techniques. Adaptive
learning An ability to learn how to do tasks
based on the data given for training or initial
experience. Self-Organization An ANN can
create its own organization during learning time.
Real Time Operation. Ideal for Imprecise and
Noisy Data Fault Tolerance via Redundant
Information Coding
4
Von Neumann Computer Versus Biological Neural
System Von Neumann Computer Biological Neural
System Processor Complex Simple High
speed Low speed One or a few A large
number Memory Separate from processor Integrate
d into Non-content addressable Content
addressable Computing Centralized Decentralized
Sequential Parallel Stored
program Self learning Reliability Very
vulnerable Robust Expertise Numerical and
symbolic Perceptual problems Operative
environment Well defined and constrained Poorly
defined and unconstrained
5
Comparison between Neural Networks, Expert
Systems Conventional Programming
6
Introduction
  • 1943- The emergence of Connectionist AI
  • Artificial Neural Network
  • A Massive Interconnection of Parallel Distributed
    Computing Elements
  • Inspired by the Human Brain
  • Massive Parallelism
  • Simple Computing Elements
  • Distribution of Knowledge

7
Biological Neuron
axon
8
  • Artificial Neural Networks
  • An ANN is a model that emulate the biological
    neural network.
  • The biological neurons receive inputs through
    dendrites and pass signal to other neurons
    through axon.
  • Nucleus is the processing element in the neuron.
  • ANN is composed of artificial neurons these are
    the processing elements (PE). Each of the neuron
    receive input(s), processes inputs and delivers a
    single output.
  • Synapse in the neuron decides to amplify or
    attenuate the signal
  • A single output signal from axon can go to
    multiple dendrites.

9
  • The basic element of a neural network is the
    perceptron.
  • First proposed by Frank Rosenblatt in 1958 at
    Cornell University, the perceptron has 5 basic
    elements
  • an n-vector input, weights, summing function,
    threshold device, and an output.
  • Outputs are in the form of -1 and/or 1. The
    threshold has a setting which governs the output
    based on the summation of input vectors. If the
    summation falls below the threshold setting, a -1
    is the output. If the summation exceeds the
    threshold setting, 1 is the output.

10
(No Transcript)
11
Perceptron
b
bias
12
Mathematical Model
13
Perceptron
output
inputs
14
  • Artificial Neural Networks
  • Inputs to the perceptron can be raw data or
    outputs from other processing elements.
  • Outputs of the perceptron can be the final
    product or input to another neuron.
  • The Network
  • An ANN is composed of as a collection of neurons
    that are grouped in layers- minimum of two layers
    ( input and output layers). Other layers are
    known as hidden layers.
  • A typical structure is shown on the next page.
  • The processing of information is massively
    parallel-as it is in our brain.

15
(No Transcript)
16
Multilayered Perceptron Model
Sigmoid function
17
  • Processing of Information
  • Inputs
  • Each input corresponds to a single attribute of
    the problem.
  • For example for the diagnosis of a disease, each
    symptom, could represent an input to one node.
  • Input could be image ( pattern) of skin texture,
    if we are looking for the diagnosis of normal or
    cancerous cells.
  • Outputs
  • The outputs of the network represent the
    solution to a problem.
  • For diagnosis of a disease, the answer could be
    yes or no.
  • Weights
  • A key element of ANN is weight.
  • Weight expresses relative strength of the
    entering signal from various connections that
    transfers data from input point to the output
    point.

18
W13
x1W13
19
  • Processing of Information
  • Summation Function
  • Finds the weighted average of all input elements
    entering the PE
  • If there are several output neurons. The output
    at jth neuron is

Wij is weight from ith input node to the jth
output node.
20
  • Processing of Information
  • Transfer Function
  • The summation function computes the internal
    cumulative signal value.
  • There is also an activation level of the neuron.
    Based on the cumulative value of signal received
    , the neuron may or may not produce an output.
  • The relationship between the activation level and
    the output of the neuron may be linear or
    non-linear.
  • The selection of the specific activation
    function determines the networks operation.
  • One of the popular function is sigmoid function
    where YT is the transformed value of Y.

21
  • Processing of Information
  • Transfer Function
  • The purpose of transfer function is to modify the
    output level to a reasonable value (between 0 -
    1). This transformation is done before the output
    reaches the next level.
  • Example
  • x1 3 w1 0.2
  • x2 1 w2 0.4 PE Y 1.2
  • x3 2 w3 0.1 YT f(Y)
  • You can use simple threshold value.

X1
w1

YT
Y
?
X2
w2
X3
w3
22
Neurons Transfer Functions
1. Pure Linear Transfer Function 2. Hard Limit
Transfer Function 3. Log Sigmoid Transfer
Function
1
-1
23
A Multi-layered Network Function
24
  • Processing of Information
  • Learning
  • An ANN learns from its experience. The usual
    process of learning involves three tasks
  • Compute output(s).
  • Compare outputs with desired patterns and
    feed-back the error.
  • Adjust the weights and repeat the process.
  • The learning process starts by setting the
    weights by some rules ( or randomly). The
    difference between the actual output (y) and the
    desired output(z) is called error (delta).
  • The objective is to minimize delta (error)to
    zero. The reduction in error is done by changing
    the weights.

25
  • The formal definition of learning in the context
    of the network model can be given as the process
    of updating network connection weights so that
    the network can perform a specific task
    efficiently.
  • The network learns ( or modifies) the connection
    weights from the available training patterns (
    data available).
  • The performance of the network improves over time
    by iteratively updating weights in the network.
  • ANNs ability to automatically learn from examples
    (data) makes them attractive and exciting (
    instead of following a set of rules specified by
    human experts)
  • This is one of the major advantage of the neural
    networks over the traditional expert systems.

26
  • The key to the adaptive learning is to change
    weights in right directions, so as to reduce the
    error.
  • There are various algorithms for adjusting
    weights - A few will be introduced later.
  • A Procedure for developing NN based applications
    will be
  • 1. Collect Data.
  • 2. Separate the data into Training and Test Sets.
  • 3. Define ( select) a Network Structure.
  • 4. Select a Learning Algorithm.
  • 5. Transform Data to Network Inputs ( training
    data).
  • 6. Start Training and Revise Weights until the
    Error Criterion is Satisfied.
  • 7. Stop and Test the results with Test data.
  • 8. Implementation Use Network for Testing New
    Cases.

27
  • General Considerations for Network Design
  • Which input attributes will be used to build and
    study the network ?- use data points with min
    correlation (independent).
  • Which network architecture will be suitable for
    the study?
  • How many hidden layers should the network
    contain?
  • How many nodes should there be in each hidden
    layer?
  • What conditions will terminate the network
    training?

28
Firing Rules
A firing rule determines how one calculates
whether a neuron should fire for any input
pattern. It relates to all the input patterns,
not only the ones on which the node was trained.
A simple firing rule can be implemented by using
Hamming distance technique. The rule goes as
follows Take a collection of training patterns
for a node, some of which cause it to fire (the
1-taught set of patterns) and others which
prevent it from doing so (the 0-taught set).
29
Then the patterns not in the collection cause the
node to fire if, on comparison , they have more
input elements in common with the 'nearest'
pattern in the 1-taught set than with the
'nearest' pattern in the 0-taught set. If there
is a tie, then the pattern remains in the
undefined state. For example, a 3-input neuron
is taught to output 1 when the input (X1,X2 and
X3) is 111 or 101 and to output 0 when the input
is 000 or 001. Then, before applying the firing
rule, the truth table is
30
X1 0 0 0 0 1 1 1 1 X2 0 0 1 1
0 0 1 1 X3 0 1 0 1 0 1 0 1
OUT 0 0 0 0/1 0/1 1 0/1 1 As an
example of the way the firing rule is applied,
Let us take the pattern 010. It differs from
000 in 1 element, from 001 in 2 elements,
from 101 in 3 elements and from 111 in 2
elements.
31
Therefore, the 'nearest' pattern is 000 which
belongs in the 0-taught set. Thus the firing rule
requires that the neuron should not fire when the
input is 001. On the other hand, 011 is
equally distant from two taught patterns that
have different outputs and thus the output stays
undefined (0/1).
32
  • Strengths of the network
  • Neural networks are very suitable for noisy or
    partial data sets. Transfer functions, such as
    sigmoid functions normally smoothen the
    variations
  • ANNs can process and predict numeric as well as
    categorical outcome.
  • ANNs can be used for applications that requires a
    time element to be included in the data set.
  • Neural networks have performed well in certain
    domains where rules are not defined and there is
    no structure.
  • The network can be trained for supervised and
    unsupervised clustering

33
  • Weaknesses
  • The biggest weakness is that they lack the
    criterion for the decision ( reason?). This is
    important at times.
  • The learning algorithms are not guaranteed to
    converge to an optimal solution. However, you can
    manipulate with various learning parameters.
  • Neural networks can be easily over-trained
    (memorize) to a point of working well with
    training data but perform poorly on test data.
    You have to monitor this problem carefully.

34
Forward and Backward Propagation
35
  • Developing NN Models
  • One of the important step is the selection of
    network structureWe will discuss the detailed
    structures at a later stage
  • Associative Memory Systems
  • It refers to ability to recall complete
    situations from partial information. Such systems
    correlate input data with information stored in
    memory,
  • Information can be recalled even from incomplete
    or noisy inputs.
  • Associative memory systems can detect
    similarities between new inputs stored input
    patterns. Use distance criterion.
  • Hidden Layer Systems
  • Complex practical applications require one or
    more hidden layers between inputs and outputs and
    and a corresponding large number of weights.

36
  • Hidden Layer Systems ( contd)
  • Using more that three layers is rare.
  • Amount of computations involved is enormous.
  • Double Layered Networks
  • This structure does not require knowledge of
    precise number of classes in the training data (
    unsupervised). This is normally used in cases
    where the output is not given. Only input data
    are available.
  • Instead, it uses feed-forward and feed-backward
    approach to adjust parameters/ weights as data
    are analyzed to establish an arbitrary (
    required ) number of categories that represent
    the data presented to the system.

37
  • Back propagation Network
  • It is the most widely used architecture. It is
    very popular technique that is relatively easy to
    implement. It requires large amount of training
    data for conditioning the network before using it
    for predicting the outcome.
  • A back-propagation network includes at-least one
    hidden layer.
  • The approach is considered as feed-forward/ back
    propagation approach.
  • Limitations
  • NNs do not do well at tasks that are not driven
    well by people.
  • They lack the explaining facility.
  • Training time can be excessive .

38
Back Propagation Algorithm
  • The most popular successful method.
  • Steps to be followed for the training
  • Select the next training pair from the training
    set( input vector and the output).
  • Present the input vector to the network.
  • Network calculate the output of the network.
  • Network calculates the error between the network
    output and the desired output.
  • Network back propagates the error
  • Adjust the weights of the network in a way that
    minimizes the error.
  • Repeat the above steps for each vector in the
    training set until the error is acceptable, for
    each training data set..

39
Network Training
  • Supervised Learning
  • Network is presented with the input and the
    desired output.
  • Uses a set of inputs for which the desired
    outputs results / classes are known.The
    difference between the desired and actual output
    is used to calculate adjustment to weights of the
    NN structure
  • Unsupervised Learning
  • Network is not shown the desired output.
  • Concept is similar to clustering
  • It tries to create classification in the outcome.

40
  • Unsupervised Learning
  • Only input stimuli (parameters) are presented to
    the network. The network is self organizing, that
    is, it organizes itself internally, so that each
    hidden processing elements and weights responds
    appropriately to a different set of input
    stimuli.
  • No knowledge is supplied about the classification
    of outputs. However, the number of categories
    into which the network classifies the inputs can
    be controlled by varying certain parameters in
    the model. In any case, human expert must
    examine the final classifications to assign a
    meaning usefulness of results.
  • Reinforcement Learning
  • In between Supervised Unsupervised learning.
  • Network gets a feedback from the environment.

41
Learning ( Training) Algorithms The training
process requires a set of properly selected data
in the form of network inputs and target outputs.
During training, the weights and biases are
iteratively adjusted to minimize the network
performance function ( error). The default
performance function is mean square error. Input
data should be independent. Back- Propagation
learning algorithm There are many variation. The
commonly used one is gradient descent
algorithm x k1 xk - ?k gk Where xk is a
vector of current weights and biases and gk is
current gradient and ?k is the chosen learning
rate.
42
Back Propagation Learning Algorithm
  • It is the most commonly used generalization of
    the delta rule. This procedure involves two
    phases
  • Forward phase when the input is presented, it
    propagates forward through the network to compute
    output values for each processing element. For
    each PE all the current outputs are compared with
    the desired outputs and the error is computed.
  • Backward phase The calculated error in now fed
    backward and weights are adjusted.
  • After completing both the phases, a new input is
    presented for the further training.
  • This technique is slow and can cause instability
    and has tendency to stuck in a local minima, but
    it is still very popular.

43
Gradient Descent Algorithm The idea is to
calculate an error each time the network is
presented with a training vector (given that we
have supervised learning where there is a target
vector) and to perform a gradient descent on the
error - considered as function of the weights.
There will be a gradient or slope for each
weight. Thus, we find the weights which give the
minimal error. Typically the error criterion is
defined by the square of the difference between
the pattern output and the target output( least
squared error). The total error E, is then just
the sum of the pattern error square.
44
Error function (LMS)
Target output
Network output
45
(No Transcript)
46
This method of weight adjustment is also known
as steepest gradient descent technique or Widrow
and Hoff rule and is most common type. This is
also known as Delta rule.
47
Network Learning Rules Hebbian Rule The first
and the best known learning rule was introduced
by Donald Hebb. This basic rule is If a neuron
receives an input from another neuron, and if
both are highly active (mathematically have the
same sign), the weight between the neurons should
be strengthened.
where xi(t) and yj(t) are the outputs at nodes i
and j. wij are the weights between the nodes i
and j
48
Hopfield Rule This law is similar to Hebbs
Rule with the exception that it specifies the
magnitude of the strengthening or weakening. It
states, "if the desired output and the input are
both active or both inactive, increment the
connection weight by the learning rate, otherwise
decrement the weight by the learning
rate. (Most learning functions have some
provision for a learning rate, or a learning
constant. Usually this term is positive and
between zero and one.)
49
Error Correction Rule In the supervised learning,
the network is given a desired output(d) for each
input pattern. During the training period, the
actual output (y) may not be equal to the desired
output. The basic principle of the Error
Correction learning rule is to use the error
signal (d-y) to modify the connection weights to
gradually reduce the error.
50
The Delta Rule The Delta Rule is a further
variation of Hebbs Rule, and it is one of the
most commonly used. This rule is based on the
idea of continuously modifying the strengths of
the input connections to reduce the difference
(the delta) between the desired output value and
the actual output of a neuron. This rule changes
the connection weights in the way that minimizes
the mean squared error of the network. The error
is back propagated into previous layers one
layer at a time.
51
Application of Delta Rule
Target output
Network output
52
The process of back-propagating the network
errors continues until the first layer is
reached. The network of this type are called Feed
forward, Back-propagation and derives its name
from this method of computing the error term.
This rule is also referred to as the
Windrow-Hoff Learning Rule and the Least Mean
Square Learning Rule. Kohonens Learning
Law This procedure, developed by Teuvo Kohonen,
was inspired by learning in biological systems.
In this procedure, the neurons compete for the
opportunity to learn, or to update their weights.
53
The processing neuron with the largest output is
declared the winner and has the capability of
inhibiting its competitors as well as exciting
its neighbors. Only the winner is permitted
output, and only the winner plus its neighbors
are allowed to update their connection weights.
The Kohonen rule does not require desired
output. Therefore it is implemented in the
unsupervised methods of learning. The size of
the neighborhood may vary during the training
period. It narrows as the training proceeds.
54
  • Other Numerical algorithms used are
  • Conjugate gradient Algorithm
  • Resilient Back propagation
  • Fletcher-Reeves algorithm
  • Polak-Ribiere algorithm
  • Powel-Beale algorithm
  • Quasi-Newton Algorithms
  • Levenberg-Marquardt
  • These algorithms have been designed to optimize
    the network performance. However, the training
    time depend upon the chosen algorithm

55
Network Structures
  • Feed Forward Networks
  • Feed Forward Back Propagation Networks
  • Recurrent ( or feedback) Networks
  • Self Organizing Networks
  • Adaptive Resonance Learning Network Model
  • Hopfield Network

56
Network Archtectures Feed Forward Networks
There are no feedback paths and you can not
modify outputs based on the errors between the
actual output and the desired outputs. Feed
Forward Back Propagation Networks These types of
networks are most common and enough time has
already been devoted on this network architecture.
57
  • In most common family of feed forward networks,
    called, multi-layered perceptrons, neurons are
    organized into layers that have unidirectional
    connections between them.
  • Feed forward networks are memory-less- in the
    sense that their output is independent of the
    previous network state.
  • Recurrent or feedback networks, on the other hand
    are dynamic
  • In dynamic systems, when a new input pattern is
    presented, the neuron outputs are computed.
    Because of the feedback paths, the inputs to each
    neurons are modified, which leads the network to
    enter a new state.
  • Different network architectures require
    appropriate learning algorithms.

58
  • Recurrent ( or feedback) Networks
  • In these networks there are feedback loops
    present. These networks can learn from their
    mistakes and are of highly adaptive in nature.
    These kind of networks train slowly and work well
    with noisy inputs.

Recurrent Network
59
Recurrent Network
60
Self Organizing Networks These class of networks
learn without being given the correct output (
classification) for the input pattern. It models
the neuro-biological systems fairly closely. This
auto-associative network is a single layer,
recurrent and highly connected. All the weights
must be initialized and both the weights and
inputs must be normalized or adjusted. Processing
elements compete for the privilege of learning.
It uses winner takes all, learning rule. The
node with the highest response and its neighbors
are allowed to adjust their weights. As the
training proceeds, the neighborhood becomes more
focused. The network trains very fast and works
well in the real-time mode. The network learns
continuously and adapts to changes very fast.
61
outputs
inputs
Self Organizing Network
62
  • Adaptive Resonance Theory - Learning Network
    Model
  • Stability and plasticity is an important issue in
    competitive learning. How do we learn new things
    ( plasticity) and yet retain stability to ensure
    that the existing knowledge is not erased ( or
    corrupted).
  • ART models are developed for this purpose. The
    network has sufficient supply of output units,
    but they are not used until deemed necessary.
  • A unit is said to be committed, if it is being
    used.
  • An input vector and the stored pattern are said
    to resonate when they are sufficiently closed
    similar.
  • When the input vector is not sufficiently close
    to any of the existing prototype, a new category
    is created.

63
  • The similarity is pre defined and an uncommitted
    unit is assigned to the new category.

outputs
inputs
ART Network
64
  • Hopfield Network
  • Hopfield networks use energy function as a tool
    for designing recurrent network.
  • It is a single layered network. The property of
    this network is is that it produces a content-
    addressable memory, which correctly yields the
    output from any sub-part (data) of sufficient
    size. However, the storage capacity of patterns
    is limited to about 15 of the number of nodes.
    This is an important limitation.
  • This architecture is used for identification from
    partially visible and/or noisy information-
    military targets identification/ robotic control
    systems.
  • The output of each PE is normally binary. The
    final output can be binary or continuous type.

65
outputs
inputs
Hopfield Network
66
Pitfalls
  • ANN are not a panacea.
  • Long training period
  • May take several hours or even days.
  • Local Minima
  • Network may get trapped, resulting in an inferior
    solution.
  • Complex Input Modeling
  • May not be suitable for all problems.

67
Advantages of ANNs
  • Learn over a period of time - from its mistakes.
    It is learning while making decisions
  • Ability to generalize patterns.
  • Abstraction of essential features.
  • Ability to produce correct output even in the
    presence of noise or imprecise data.

68
  • Applications
  • Basically, most applications of neural networks
    fall into the following categories
  • Prediction
  • Uses input values to predict some output. e.g.
    pick the best stocks in the market, predict
    weather, identify people with cancer risk.
  • Classification
  • Use input values to determine the
    classification. e.g. is the input the letter A,
    is the blob of the video data a plane and what
    kind of plane is it.
  • Data association
  • Like the classification, but it also recognizes
    data that contains errors. e.g. not only identify
    the characters that

69
  • were scanned but identify when the scanner is
    not working properly.
  • Data Conceptualization
  • Analyze inputs so that grouping relationships
    can be inferred. e.g. extract from a database the
    names of those most likely to be a particular
    product from the exiting products.
  • Data Filtering
  • Smoothen an input signal. e.g. take the noise
    out of a telephone- voice signal.

70
Neural networks in medicine Artificial Neural
Networks (ANN) are currently a 'hot' research
area in medicine and it is believed that they
will receive extensive application to biomedical
systems in the next few years. At the moment, the
research is mostly on modelling parts of the
human body and recognising diseases from various
scans (e.g. cardiograms, CAT scans, ultrasonic
scans, etc.). Neural networks are ideal in
recognising diseases using scans since there is
no need to provide a specific algorithm on how to
identify the disease.
71
Neural networks learn by example so the details
of how to recognise the disease are not needed.
What is needed is a set of examples that are
representative of all the variations of the
disease. The examples need to be selected very
carefully if the system is to perform reliably
and efficiently. Modelling and Diagnosing the
Cardiovascular System Neural Networks are used
experimentally to model the human cardiovascular
system. Diagnosis can be achieved by building a
model of the cardiovascular system of an
individual and comparing it with the real time
physiological measurements taken from the
patient.
72
A model of an individual's cardiovascular system
must mimic the relationship among physiological
variables (i.e., heart rate, systolic and
diastolic blood pressures, and breathing rate) at
different physical activity levels. If a model
is adapted to an individual, then it becomes a
model of the physical condition of that
individual. The simulator will have to be able to
adapt to the features of any individual without
the supervision of an expert. This calls for a
neural network.
73
Another reason that justifies the use of ANN
technology, is the ability of ANNs to provide
sensor fusion which is the combining of values
from several different sensors. Sensor fusion
enables the ANNs to learn complex relationships
among the individual sensor values, which would
otherwise be lost if the values were individually
analyzed. In medical modeling and diagnosis, this
implies that even though each sensor in a set may
be sensitive only to a specific physiological
variable, ANNs are capable of detecting complex
medical conditions by fusing the data from the
individual biomedical sensors.
74
  • Other Possible Applications
  • Signal processing suppress line noise, with
    adaptive echo canceling.
  • Robotics - navigation, vision recognition
  • Pattern recognition, i.e. recognizing handwritten
    characters, e.g. the current version of Apple's
    Newton uses a neural net
  • Medical diagnosis, ADEs, selection of drugs,
    diagnosis of diseases from given symptoms,etc.
  • Speech production reading text aloud (NETtalk)
  • Speech recognition.

75
  • Vision face recognition , edge detection, visual
    search engines
  • Business,e.g., rules for mortgage decisions are
    extracted from past decisions made by experienced
    evaluators, resulting in a network that has a
    high level of agreement with human experts.
  • Financial Applications time series analysis,
    stock market prediction.
  • Data Compression speech signal, image, e.g.
    faces
  • Game Playing backgammon, chess, go, ...
Write a Comment
User Comments (0)
About PowerShow.com