Title: From Neurons to Neural Networks
1From Neurons to Neural Networks
- Jeff Knisley
- East Tennessee State University
- Mathematics of Molecular and Cellular Biology
Seminar - Institute for Mathematics and its Applications,
April 2, 2008
2Outline of the Talk
- Brief Description of the Neuron
- A Hot-Spot Dendritic Model
- Classical Hodgkin-Huxley (HH) Model
- A Recent Approach to HH Nonlinearity
- Artificial Neural Nets (ANNs)
- 1957 1969 Perceptron Models
- 1980s soon MLPs and Others
- 1990s Neuromimetic (Spiking) Neurons
3Components of a Neuron
4Pre-Synaptic to Post-Synaptic
If threshold exceeded, then neuron fires,
sending a signal along its axon.
5Signal Propagation along Axon
- Signal is electrical
- Membrane depolarization from resting -70 mV
- Myelin acts as an insulator
- Propagation is electro-chemical
- Sodium channels open at breaks in myelin
- Much higher external Sodium ion concentrations
- Potassium ions work against sodium
- Chloride, other influences also very important
- Rapid depolarization at these breaks
- Signal travels faster than if only electrical
6Signal Propagation along Axon
- - -
- - -
- - -
- - -
- - -
7Action Potentials
- Sodium ion channels open and close
- Which causes
- Potassium ion channels to open and close
8Action Potentials
- Model Spike
- Actual Spike Train
9Post-Synaptic may be SubThreshold
Signals Decay at Soma if below a Certain
threshold
10Derivation of the Model
- Some Assumptions
- Assume Neuron separates R3 into 3
regionsinterior (i), exterior (e), and boundary
membrane surface (m) - Assume El is electric field and Bl is magnetic
flux density, where l e, i - Maxwells Equations
- Assume magnetic induction is negligible
- Ee ?Ve and Ei ?Vi for potentials Vl , l
i,e
11Current Densities ji and je
- Let sl conductivity 2-tensor, l i, e
- Intracellular homogeneous small radius
- Extracellular Ion Populations!
- Ohms Law (local)
ji
L
0
?
12Assume Circular Cross-sections
- Let V Vi Ve Vrest be membrane potential
difference, and let Rm, Ri , C be the membrane
resistance, intracellular resistance, membrane
capacitance, respectively. Let Isyn be a catch
all for ion channel activity.
d
13Dimensionless Cables
Let and let and tm
RmC constant
Iion
Tapered Cylinders Z instead of X and a taper
constant K.
Iion
14Ralls Theorem for Untapered
daughters
- If at each branching the parent
- diameter and the daughter cylinder
- diameters satisfy
- then the dendritic tree can be reduced
- to a single equivalent cylinder.
parent
Equivalent Cylinder
15Dendritic Models
Soma
Tapered Equivalent Cylinder
Full Arbor Model
16Tapered Equivalent Cylinder
- Ralls theorem (modified for taper) allows us to
collapse to an equivalent cylinder - Assume hot spots at x0, x1, , xm
. . .
Soma
0 x0 x1 . . .
xm l
17Ion Channel Hot Spots
- (Poznanski) Ij due to ion channel(s) at the jth
hot spot - Greens function G(x, xj, t) is solution to hot
spot equation for Ij as a point source and others
0 - Plus boundary conditions and Initial conditions
- Green is solution to Equivalent Cylinder model
18Equivalent Cylinder Model (Iion 0)
Soma
19Properties
- Spectrum is solely non-negative eigenvalues
- Eigenvectors are orthogonal in Voltage Clamp
- Eigenvectors are not orthogonal in original
- Solutions are multi-exponential decays
- Linear Models useful for subthreshold activation
assuming nonlinearities (Iion) are not
arbitrarily close to soma (and no electric field
(ephaptic) effects)
20Somatic Voltage Recording
Saturate to Steady State
Experimental Artifact
Ionic Channel Effects
0
10ms
21Hodgkin-Huxley Ionic Currents
- 1963 Nobel Prize in Medicine
- Cable Equation plus Ionic Currents (Isyn)
- From Numerous Voltage Clamp Experiments with
squid giant axon (0.5-1.0 mm in diameter) - Produces Action Potentials
- Ionic Channels
- n potassium activation variable
- m sodium activation variable
- h sodium inactivation variable
22Hodgkin-Huxley Equations
where any V with subscript is constant, any g
with a bar is constant, and each of the as and
bs are of similar form
23HH combined with Hot Spots
- The solution to the equiv cylinder with hotspots
is - where Ij is the restriction of V to jth hot
spot. - At a hot-spot, V satisfies ODE of the form
- where m, n, and h are functions of V.
24Brief description of an Approach to HH ion
channel nonlinearities
- Goal Accessible Approximations that still
produce action potentials. - Can be addressed using Linear Embedding, which is
closely related to the method of Turning
Variables. - Maps an finite degree polynomially nonlinear
dynamical system into an infinite degree linear
system. - The result is an infinite dimensional linear
system which is as unmanageable as the original
nonlinear equation. - Non-normal operators with continua of eigenvalues
- Difficult to project back to nonlinear system
(convergence and stability are thorny) - But still the approach has some value (action
potentials).
25The Hot-Spot Model Qualitatively
Key Features Summation of Synaptic Inputs. If
V(0,t) is large, action potential travels down
axon.
26Artificial Neural Network (ANN)
- Made of artificial neurons, each of which
- Sums inputs xi from other neurons
- Compares sum to threshold
- Sends signal to other neurons if above threshold
- Synapses have weights
- Model relative ion collections
- Model efficacy (strength) of synapse
27Artificial Neuron
Nonlinear firing function
.
.
.
28First Generation 1957 - 1969
- Best Understood in terms of Classifiers
- Partition a data space into regions containing
data points of the same classification. - The regions are predictions of the classification
of new data points.
29Simple Perceptron Model
- Given 2 classes Reference and Sample
- Firing function (activation function) has only
two values, 0 or 1. - Learning is by incremental updating of weights
using a linear learning rule
w1
w2
wn
30Perceptron Limitations
- Cannot Do XOR (1969, Minsky and Papert)
- Data must be linearly separable
- 1970s ANNs Wilderness Experience only a
handful working and very un-neuron-like
31Support Vector Machine Perceptron on a Feature
Space
- Data is projected into a high-dimensional Feature
Space, separated with a hyperplane - Choice of Feature Space (kernel) is key.
- Predictions based on location of hyperplane
32Second Generation 1981 - Soon
- Big Ideas from other Fields
- J. J. Hopfield compares neural networks to Ising
Spin Glass models. Uses statistical Mechanics to
prove that ANNs minimize a total energy
functional. - Cognitive Psychology provides new insights into
how neural networks learn. - Big Ideas from Math
- Kolmogorovs Theorem
AND
33Firing Functions are Sigmoidal
343 Layer Neural Network
The output layer may consist of a single neuron
Output
Input
Hidden (is usually much larger)
35Multilayer Network
.
.
.
.
.
.
36Hilberts Thirteenth Problem
- Original Are there continuous functions of 3
variables that are not representable by a
superposition of composition of functions of 2
variables? - Modern Can a continuous function of n variables
on a bounded domain of n-space be written as sums
of compositions of functions of 1 variable?
37Kolmogorovs Theorem
- Modified Version Any continuous function f
- of n variables can be written
- where only h and ws depend on f
- (That is, the gs are fixed)
38Cybenko (1989)
- Let s be any continuous sigmoidal function,
- and let x (x1,,xn). If f is absolutely
integrable - over the n-dimensional unit cube, then for all
e0, - there exists a (possibly very large ) integer N
and - vectors w1,,wN such that
- where a1,,aN and q1,,qN are fixed parameters.
39Multilayer Network (MLPs)
.
.
.
.
.
.
40ANN as a Universal Classifier
- Designs a function f Data - Classes
- Example f ( Red ) 1, f ( Blue) 0
- Support of f defines the regions
- Data is used to train (i.e., design ) function f
supp(f)
41Example Predicting Trees that are or are not
RNA-like
RNA Like
NotRNA Like
- Construct Graphical Invariants
- Train ANN using known RNA-trees
- Predict the others
422nd Generation Phenomenal Success
- Data Mining of Micro-array data
- Stock and commodities trading ANNs are an
important part of computerized trading - Post office mail sorting
43The Mars Rovers
- ANN decides between rough and smooth
- rough and smooth are ambiguous
- Learningvia manyexamples
And a neural network can lose up to 10 of its
neurons without significant loss in performance!
44ANN Limitations
- Overfitting e.g, if Training Set is unbalanced
- Mislabeled data can lead to slow (or no)
convergence or incorrect results. - Hard Margins No fuzzing of the boundary
45Problems on the Horizon
- Limitations are becoming very limiting
- Trained networks often are poor learners (and
self-learners are hard to train) - In real neural networks, more neurons imply
better networks (not so in ANNs ). - Temporal data is problematic ANNs have no
concept or a poor concept of time - Hybridized ANNs becoming the rule
- SVMs probably the tool of choice at present
- SOFMs, Fuzzy ANNs, Connectionism
46Third Generation 1997 -
- Back to Bio Spiking Neural Networks (SNN)
- Asynchronous, action-potential driven ANNs have
been around for some time. - SNNs show promise but results beyond current
ANNs have been elusive - Simulating actual HH equations (neuromimetic) has
to date not been enough - Time is both a promise and a curse
- A Possible Approach Use current dendritic models
to modify existing ANNs.
47ANNs with Multiple Time Scales
- SNN that reduces to ANN preserves Kolmogorov
Thm - The solution to the equiv cylinder with hotspots
is - where Ij is the restriction of V to jth hot
spot. - Equivalent Artificial Neuron
48Incorporating MultiExponentials
- G (0,x,t) is often a multi-exponential decay.
- In terms of time constants tk
- wjk are synaptic weights
- tk from electrotonic and morphometric data
- Rate of taper, Length of dendrites
- Branching, capacitance, resistance
49Approximation and Simplification
- If xj(u) approx 1 or xj(u) approx 0, then
- A Special Case (k is a constant)
- t 0 yields the standard Neural Net Model
- Standard Neural Net as initial Steady State
- Modify with time-dependent transient
50Artificial Neuron
wij, pij synaptic weights
Nonlinear firing function
wi1, pi1
.
.
.
win, pin
51Steady State and Transient
- Sensitivity and Soft Margins
- t 0 is a perceptron with weights wij
- t 8 is a perceptron with weights wij pij
- For all t in (0, 8), a traditional ANN with
weights between wij and wij pij - Transient is a perturbation scheme
- Many predictions over time (soft margins)
- Algorithm
- Partition training set into subsets
- Train at t0 for initial subset
- Train at t 0 values for other subsets
52Training the Network
- Define an energy function
- p vectors are the information to be learned
- Neural networks minimize energy
- The information in the network is equivalent to
the minima of the total squared energy function
53Back Propagation
- Minimize Energy
- Choose wj and aj so that
- In practice, this is hard
- Back Propagation with cont. sigmoidal
- Feed Forward, Calculate E, modify weights
- Repeat until E is sufficiently close to 0
54Back Propagation with Transient
- Train Network Initially (choose wj and aj)
- Each synapse given a transient weight pij
- Algorithm Addressing Over-fitting/Sensitivity
- Weights must be given random initial values
- Weights pij also given random initial values
- Separate Training of wj and aj and pij
ameliorates over-fitting during the training
sequence
55Observations/Results
- Spiking does occur
- But only if network is properly initiated
- Spikes only resemble Action Potentials
- This is one approach to SNNs
- Not likely to be the final word
- Other real neuron features may be necessary
(e.g., tapering axons can limit frequency of
action potentials alsobranching! ) - This approach does show promise in handling
temporal information
56Any Questions?