Title: CENG 569 Spring 2006 NEUROCOMPUTING
1CENG 569Spring 2006NEUROCOMPUTING
Erol SahinDept. of Computer EngineeringMiddle
East Technical UniversityInonu Bulvari, 06531,
Ankara, TURKEY
- Week 1
- Introduction to neural nets the beginnings and
the basics - Course objectives
2Whats our motivation?
- Science Model how biological neural systems,
like human brain, work? - How do we see?
- How is information stored in/retrieved from
memory? - How do you learn to not to touch fire?
- How do your eyes adapt to the amount of light in
the environment? - Related fields Neuroscience, Computational
Neuroscience, Psychology, Psychophysiology,
Cognitive Science, Medicine, Math, Physics.
3Whats our motivation?
- Engineering Design information processing
systems that are as good as the biological ones? - How can we design a vision system that can see?
- how can we store data such that it can be
retrieved fast? - How can we make a robot not to repeat an action
that burned it before? - How can we make an articial retina that
automatically adapts to the mount of light? - Related fields Computer Science, Statistics,
Electronics Engineering, Mechanical Engineering.
4The biological neuron - simplified
- The basic information processing element of
neural systems. The neuron - receives input signals generated by other neurons
through its dendrites, - integrates these signals in its body,
- then generates its own signal (a series of
electric pulses) that travel along the axon which
in turn makes contacts with dendrites of other
neurons. - The points of contact between neurons are called
synapses.
5The biological neuron - 1
- The pulses generated by the neuron travels along
the axon as an electrical wave. - Once these pulses reach the synapses at the end
of the axon open up chemical vesicles exciting
the other neuron.
6The biological neuron - 2
7What this course is about? - 1
- Introduce and review computational models that
are often classified as neural networks. - The first part of the course will cover
- Perceptron, Adaline
- Multi-layer perceptrons, and the back-propagation
learning algorithm - Hopfield model, Boltzmann machine
- Unsupervised learning models
- Kohonens self-organized maps
- Radial basis functions
- Adaptive Resonance Theory models
- Support vector machines
- Properties and application of these models will
be reviewed. - Several programming projects will be given.
8What this course is about? - 2
- In the second part of the course, we will cover
more biologically-plausible models of neural
cirtcuits - Hodgkin-Huxley model of the biological neuron
- Feed-forward shunting networks and their
properties - Recurrent shunting networks and their properties
- Classical and operant conditioning and their
neural models
9Todays topics
- Brief history
- McCulloch-Pitts neuron
- Perceptron
- Adaline
10Brief History
- Old Ages
- Association (William James 1890)
- McCulloch-Pitts Neuron (1943,1947)
- Perceptrons (Rosenblatt 1958,1962)
- Adaline/LMS (Widrow and Hoff 1960)
- Perceptrons book (Minsky and Papert 1969)
- Dark Ages
- Self-organization in visual cortex (von der
Malsburg 1973) - Backpropagation (Werbos, 1974)
- Foundations of Adaptive Resonance Theory
(Grossberg 1976) - Neural Theory of Association (Amari 1977)
11Contd
- Modern Ages
- Adaptive Resonance Theory (Grossberg 1980)
- Hopfield model (Hopfield 1982, 1984)
- Self-organizing maps (Kohonen 1982)
- Reinforcement learning (Sutton and Barto 1983)
- Simulated Annealing (Kirkpatrick et al. 1983)
- Boltzmann machines (Ackley, Hinton, Terrence
1985) - Backpropagation (Rumelhart, Hinton, Williams
1986) - ART-networks (Carpenter, Grossberg 1992)
- Support Vector Machines
12William James
- William James (1890) Association
- Mental facts cannot properly be studied apart
from the physical environment of which they take
cognizance - Principle of association When two brain
processes are active together or in immediate
succession, one of them, on reoccurring tends to
propagate its excitement into the other
13McCulloch-Pitts neuron model (1943)
- f() is called the activation function.
- ? is also called as the bias.
- x1, x2 xN are the inputs at time t-1.
- Inputs are binary.
- At each interval the neuron can fire at most
once. - Positive weights (wi) correspond to excitatory
synapses and negative weights correspond to
inhibitory synapses. - ? is the threshold for the neuron to fire.
- f() is a non-linear hard-limiting function in
the original McCulloch-Pitts model. - No learning mechanism!
- Later f() is replaced by other continuous
squashing functions.
14McCulloch-Pitts
- McCulloch and Pitts (1943) showed that networks
made from these neurons can implement logic
functions, such as AND, OR, XOR. Therefore these
networks are universal computation devices. - Homework
- Build AND, OR and INVERT gates using
McCulloch-Pitts neurons. - What aspects of the McCulloch-Pitts neuron are
different from the biological neuron?
15McCulloch-Pitts
- McCulloch-Pitts (1943) The first computational
neuron model. - Showed that networks made from these neurons can
implement logic functions, such as AND, OR, XOR.
Therefore these networks are universal
computation devices. - Assumptions made
- Neuron activation is binary.
- At least a certain number of excitatory inputs
are needed to excite the neuron. - Even a single inhibitory input can inhibit the
neuron. - No delay.
- Network structure is fixed.
- No adaptation!
16Hebbs Learning Law
- In 1949, Donald Hebb formulated William James
principle of association into a mathematical form.
- If the activation of the neurons, y1 and y2 , are
both on (1) then the weight between the two
neurons grow. (Off 0) - Else the weight between remains the same.
- However, when bipolar activation -1,1 scheme
is used, then the weights can also decrease when
the activation of two neurons does not match.
17Perceptron - history
- Proposed by Rosenblatt et al. (1958-1962).
- A large class of neural models that incorporate
learning. - Mark I perceptron is built with a retina of
20X20 receptors. - Learned to recognize letters.
- Created excitement, and hype. (ENIAC was built in
1945)
18Perceptron - structure
- A perceptron is a network of S, A and R units
with a variable interaction which depends on the
sequence of the past activity states of the
network (Rosenblatt 1962).
- S Sensory unit
- A Association unit
- V Variable interaction matrix
- R The learning unit, a.k.a. the perceptron
- Learning Setting the weights of V such that the
network correctly classify the input patterns.
19Perceptron clearer structure
Associative units
Retina
Response unit
Variable weights
Fixed weights
Step activation function
Slide adapted from Dr. Nigel Crook from Oxford
Brookes University
20Perceptron - neuron
- Perceptron's activation y depends on the (linear)
sum of inputs (including a bias) converging on
the neuron through weighted pathways.
- x1, x2,,xN can be continuous.
- It can learn!
21Perceptron - activation
X1
w1,1
w2,1
Y1
Y2
w1,2
w2,2
X2
Slide adapted from Dr. Nigel Crook from Oxford
Brookes University
22Classification problem
- Imagine that there are two groups patterns One
group is classified as A whereas the other
classified as B - From a given set of example patterns whose
categories are known apriori, how can one learn
to correctly classify the input patterns, both
those that were seen before and those that were
not seen before. - Not an easy problem!
23Input space representation
Slide adapted from Dr. Nigel Crook from Oxford
Brookes University
24Perceptron learning
- If the perceptron classified the input correct
then do nothing. - If not, the weights of the active units are
decremented. - Learning is guaranteed for all the problems that
the perceptron can classify!
- For each input pattern X, compute the (binary)
output y, and compare it against the desired
output d. - Activation
- If X.W ? gt 0
- then y1, else y0.
- Learning
- If y d, that is the output is correct
- then W(t1) W(t)
- If not, that is the output is incorrect,
- then W(t1) W(t) ? W(t) where ? w(t) ? X.
25Perceptron - intuition
- A perceptron defines a hyperplane in N-1 space a
line in 2-D (two inputs), a plane in 3-D (three
inputs),. - The perceptron is a linear classifier Its
output is 0 on one side of the plane, and 1 for
the other. - Given a linearly separable problem, the
perceptron learning rule guarantees convergence.
26Adaline - history
- An adaptive pattern classication machine (called
Adaline, for adaptive linear). . . - Proposed by Widrow and Hoff.
- During a training phase, crude geometric
patterns are fed to the machine by setting the
toggle switches in the 4X4 input switch array.
Setting another toggle switch (the reference
switch) tells the machine whether the desired
output for the particular input pattern is 1 or
-1 . The system learns something from each
pattern and accordingly experiences a design
change.. (Widrow and Hoff 1960)
27Adaline Widrow-Hoff Learning
- The learning idea is as follows
- Define an error function that measure the
performance of the performance in terms of the
weights, input, output and desired output. - Take the derivative of this function with respect
to the weights, and modify the weights
accordingly such that the error is decreased. - Also known as the Least Mean Square (LMS) error
algorithm, the Widrow-Hoff rule, the Delta rule.
28The ADALINE
- The Widrow-Hoff rule (also known as Delta Rule)
- Minimizes the error between desired output t and
the net input y_in - Minimizes the squared error for each pattern
- Example if s 1 and t 0.5, then the graph of
E against w1,1 would be - Gradient decent
wij(new) wij(old) ?(ti y_ini)xj
E (t y_in)2
E
w1,1
w1,1
0
0.5
0.1
0.25
0.9
1
Slide adapted from Dr. Nigel Crook from Oxford
Brookes University
29The ADALINE learning algorithm
Step 0 Initialize all weights and set learning
rate wij (small random values) ? 0.2 (for
example) Step 1 While stopping condition is
false Step 1.1 For each training pair
st Step 1.1.1 Set activations on input
units xj sj Step 1.1.2 Compute net input
to output units y_ini bi ? xjwij Step
1.1.3 Update bias and weights bi(new)
bi(old) ?(ti y_ini) wij(new) wij(old)
?(ti y_ini)xj
Slide adapted from Dr. Nigel Crook from Oxford
Brookes University
30Adaptive filter
- F1 registers the input pattern.
- Signals Si are modulated through weighted
connections. - F2 computes the pattern match between the input
and the weights. - ?i xi wij X . Wj X Wj cos(X, Wj)
31Adaptive filter elements
- The dot product computes the projection of one
vector on another. The term XWj denotes the
energy, whereas cos(X,Wj) denotes the pattern. - If the both vectors are normalized (X Wj
XWj 1), then X.Wj cos(X,Wj). This
indicates how well the weight vector of the
neuron matched with the input vector. - The neuron with the largest activity at F2 has
the weights that are most close to the input.
This property is inherent in computational models
of neurons, and can be considered a good model
for the biological neurons.
32Course Organization
33Teaching staff
- Asst.Prof. Erol Sahin (erol_at_ceng.metu.edu.tr)
- Location B-106, Tel 210 5539
- E-mail erol_at_ceng.metu.edu.tr
- Office hours By appointment.
- Lectures Monday 1340-1630 (BMB4)
34Books
- Introduction to the Theory of Neural Computation
by John Hertz, Anders Krogh, Richard G. Palmer,
Santa Fe Institute Studies in the Sciences of
Complexity - Neurocomputing Foundations of ResearchEdited by
J. A. Anderson, E.Rosenfeld, MIT Press,
Cambridge, 1988. - Self-organization and associative memory by T.
Kohonen, Springer-Verlag, 1988.Available at the
Reserve section of the library. - Parallel Distributed Processing I and II,by J.
McClelland and D. Rumelhart . MIT Press,
Cambridge, MA, 1986. - Pattern Recognition by Self-Organizing Neural
Networks, Edited by G.A. Carpenter, and S.
Grossberg, Cambridge, MA, MIT Press, 1994. - Principles of Neural Science, E.R, Kandel, J.H.
Schwartz, and T.M. Jessel, Appleton Lange, 1991.
- Also other complementary articles that will be
made available
35Workload and Grading
36Weekly reading assignments
- You will be given weekly readings to be read
before the next class. - One-two page summary of these readings will be
asked. Sometimes, you will be handed some
questions to answer, or problems to solve.
37Projects
- You will be asked to simulate four neural network
models and apply them to a given problem. - For each project, you will be asked to submit a
4-5 page report, at a conference-paper quality
level. - Your project report will be graded based on its
- Style,
- Writing
- Results and their analysis
- Discussion of results
38Presentation
- You will be asked to review one or more papers
and make a 15 minute presentation on it. - If you already have a topic you are interested,
you can also propose them in advance.
39Communication
- These slides will be available at the course
webpage http//kovan.ceng.metu.edu.tr/erol/Course
s/CENG569/. - Announcements about the course will be made on
the web site (The CENG569 newsgroup at
news//metu.ceng.course.569 can be used for other
discussions regarding the course) - If you have a specific question you can send an
e-mail to me. However make sure that the subject
line starts with CENG569 capital letters, and no
spaces to get faster reply.
40Policies
- Late assignments
- Reading assignments are due within 15 minutes of
each class. Late submissions are not accepted. - Project reports submitted within 1 week of its
due time will get 80, within 2 weeks 60 .
Reports will not be accepted afterwards. - Academic dishonesty
- All assignments submitted should be fully your
own. We have a zero tolerance policy on cheating
and plagiarism. Your work will be regularly
checked for such misconduct and the following
punishment policy will be applied
41Cheating
- What is cheating?
- Sharing code either by copying, retyping,
looking at, or supplying a copy of a file. - What is NOT cheating?
- Helping others use systems or tools.
- Helping others with high-level design issues.
- Helping others debug their code.
42Good Luck!
43Homework - 1
- In half a page, describe the information
processing aspects of the biological neuron. - Build AND, OR and INVERT gates using
McCulloch-Pitts neurons. - What aspects of the McCulloch-Pitts neuron are
different from the biological neuron?