Title: Neural Networks
1Neural Networks
2Learning Objectives
- Understand the principles of neural networks
- Understand the backpropagation algorithm
3Principles of Neural Networks
- A Neural Network (NN) is an Artificial Neural
Network (ANN) based on the analogy of the brain
as a network of neurons a neuron is a brain
cell capable of collecting electric signals,
processing them, and disseminating them.
- Synonyms connectionist networks, connectionism,
neural computation, parallel distributed
processing.
- Among the most efficient machine learning methods
to interpret complex real-world sensor data, for
example recognize hand-written characters
(LeCun), spoken words (Lang), or faces (Cottrell).
4Principles of Neural Networks
- Biological backgroundthe human brain contains a
network of 1011 interconnected neurons, with high
number of interconnections (104)
- Fastest neuron answer time is 10-3
secondsCapable of fast decisions (10-1 s to
recognize mother)Speed of answer can be
explained by parallel processing
- ANNs imitate imperfectly real neurons- many
characteristics of real neural networks are not /
cannot be reproduced in ANNs
5Principles of Neural Networks
wi,j
?
f
yi
yj
yi f(xi)
w0,j
inputfunction
activationfunction
output
y0 -1
- Mathematical model for a neuron
6Principles of Neural Networks
- Bias weightw0,i is the bias, or threshold of the
unit, and associated with an input activity of 1.
- Criteria of activation function
- Unit should be active (near 1) when the right
input arrives and inactive (near 0) when other
input arrive - Nonlinear function
- Threshold function
- Sigmoid function 1 / 1 e -x
7Principles of Neural Networks
Threshold function
Sigmoid function
8Principles of Neural Networks
- Types of neural networks
- Feed-forward networks (or acyclic)function of
the current input only - Recurrent networks (or recurrent)feeds outputs
back into its own inputs
- Several layers
- Input layer ? input units
- Layer of hidden units
- Output layer ? output unit
9Principles of Neural Networks
- x5 f( w3,5 x3 w4,5 x4 ) f( w3,5
(w1,3 x1 w2,3 x2 ) w4,5 (w1,4 x1
w2,4 x2 ) ) f( x1 , x2 )
10Principles of Neural Networks
w00.5
w01.5
w11
w11
w21
w21
OR
AND
- ANNs can represent boolean functions AND, OR,
NAND, NORany boolean function with two levels
deep network
11Principles of Neural Networks
- A perceptron is a single layer feed-forward
neural network - Each output unit is independent of the others
12Principles of Neural Networks
- A perceptron (single layer feed forward neural
network) can only represent functions that are
linearly separable
13Neural Networks Principles
- Learn by adjusting weights to reduce error on
training set - The squared error for an example with input x and
true output y is - Perform optimization search by gradient descent
- Simple weight update rule
14Principles of Neural Networks
- ANNs can be used for
- Classification
- Regression
- Machine learning terminology
- Data (d1, t1), , (dn, tn) (pairs
data/target) - Training set / validation set
- Supervised learning model is fitted to the pairs
data/target - Unsupervised learning the target is not known
- Classification ? supervised learning
- Regression ? unsupervised learning
15Universal Approximation Properties
- Neural networks can approximate any reasonable
real function to any degree of precision
(regression) with a 3-layer network - Any boolean function can be approximated by a
multi-layer feed forward network since they are
combinations of threshold gates - 3-layer network with x in the input, a hidden
layer of sigmoid units, and one layer of linear
(identity function) output units, hidden layer
being as large as needed.
16Universal Approximation Properties
- Hypothesis
- f is uniformly continuous on 0,1
- f can be approximated with a function g such
that - g(0) f(0)
- g(x) f(k/n) for x in ((k-1)/n, k/n, k1..n
- Network needs one input unit, one output unit
receiving a connection from each hidden unit, and
n1 hidden threshold
17Backpropagation Algorithm
- Layers are usually fully connected numbers of
hidden units typically chosen by hand
18Backpropagation Algorithm
- Expressiveness of multilayer perceptronsAll
continuous functions w/ 2 layers, all functions
w/ 3 layers
19Backpropagation Algorithm
- Output layer same as for single-layer
perceptron, - Hidden layer back-propagate the error from the
output layer - Update rule for weights in hidden layer
20Backpropagation Algorithm
- The squared error on a single example is defined
as - where the sum is over the nodes in the output
layer.
21Backpropagation Algorithm
22Backpropagation Algorithm
- At each epoch (one cycle through the examples),
sum gradient updates for all examples and apply
23Backpropagation Algorithm
24Backpropagation Algorithm
- Handwritten digit recognition3-nearest-neighbor
2.4 error400-300-10 unit MLP 1.6
errorLeNet 768-192-30-10 unit MLP 0.9
25Applications
- Data clustering
- Classification
- Gene Reduction
- Gene Regulatory Networks
26Clustering
- Tamayo et al. (1999) have used SOMs to cluster
gene expressions of yeast and humans.
- Data yeast (Sacharomyces cerevisae) cell cycle
data from Spellman et al. (1998)hematopoietic
differentiation data
- SOMs (Self Organizing Feature Maps, Kohonen) are
well suited to classify data into clusters in
complex multidimensional data
27Clustering
- Measurement of gene expression at 10 minutes
intervals throughout two cell cycles (160
minutes), gives 16 timesteps
- Data first filtered to find the genes showing
significant variation in expression over the time
series
- Gene expression levels normalized across
experiments to focus on shape of patterns, not
magnitude
28Clustering
- Self Organizing Map are at the basis of
GENECLUSTER, developed by the authors to cluster
and visualize gene expressions
- A 6 x 5 node SOM trained on 416 genes, on yeast
cell cycle gene expressions previously analyzed
by hand, to compare with the clusters found with
SOM.
- 30 clusters. 4 replicate the four cell cycles
stages. The clusters identified by SOM match very
well the clusters built by human experts.
Correspond to G1, S, G2, and M phases of the cell
cycle.
29Clustering
SOM-derived
Human-derived
30Classification
- Cai and Chou (1998) use ANNs to predict HIV
protease cleavage sites in proteins.
- Knowing the HIV protease cleavage sites in
proteins will be helpful for designing specific
and efficient HIV protease inhibitors.
- Subject of study HIV-1 protease.
- Training set 299 oligopeptides.Test set 63
oligopeptides. Result high rate of correct
prediction (58/63 92.06).
31Classification
32Classification
- HIV data 114 positive sequences, 248 negative
sequences, for a total of 362 sequences.300
cycles of ANNs.
- HCV data 168 positive sequences, 752 negative
sequences, for a total of 920 sequences.500
cycles of ANNs.
- 20 positive for testing.10 different training
and test sets created for HCV and HIV using a
roulette wheel random selection preserving the
20 criterion.Each training/test pair was run
three times with random initialization of network.
33Gene Expression Data (GED)
- GED measure the relative expression levels of
genes at a single timestep using cDNA or
Affymetrix chips
- When individuals are measured only once, a gene
classificatory network for the population can be
extracted (see myeloma data)
- When individuals are measured more than once
across time, a gene regulatory network needs to
be reverse engineered
34Gene Reduction
- Narayanan et al. (2004) use ANNs to analyze
myeloma gene expressions
- Goal by analyzing the genes involved temporally
in the development a disease, identify patterns
of genes to better characterize the disease, and
design efficient drugs.
- Design drugs to target specific genes at
important points in time.
35Gene Reduction
- Two major problems for current gene expression
analysis techniques.
- Dimensionality the sheer volume of data leads to
the need for fast analytical tools - Sparsity there are many more genes than samples
- G S CG (gene expression analysis) is
concerned with selecting a small subset of
relevant genes (the S problem) as well as
combining individual genes to identify important
causal and classificatory relationships (the C
problem).
36Gene Reduction
- Myeloma data 7129 gene expression values for 105
samples.ANN with one-layer, 7129 input nodes,
one output node (myeloma / normal), feed forward
backpropagation ANN.
- Until sum of squared errors (SSE) on output node
is less than 0.001 (3000 epochs, 8 minutes on
pentium laptop).
- Weight values between 0.08196 and
0.07343.Average 0.000746.1443 links had 0 on
their weights across all runs.
37Gene Reduction
- The top 220 genes were then selected.Process of
training the network was repeated again on this
subset.
- The relevant data was extracted from the full
dataset, with the class information of each
sample.Top 21 genes for myeloma were finally
extracted.
- Learnt interesting causal and classificatory
rules.
38Gene Reduction
- If U24685 (-1.84127) is absent then
myeloma.U24685 corresponds to anti-B sell
antoantibody IgM heavy chain variable V-D-J
region (VH4)classified correctly 63 of 75
myeloma cases, with no false positives.
- If L00022 (-1.79993) is absent then
myeloma.L00022 corresponds to Ig active heavy
chain epsilon-1classified correctly 68 of 75
myeloma cases, but also three normal cases.
39Gene Reduction
- If X57809 (1.58233) is present then myeloma.
X57809 corresponds to rearranged immunoglobulin
lambda light chainclassified correctly 51 of 75
myeloma cases, with no false positives.
- If M34516 is present then myeloma. M34516
corresponds to omega light chain protein 14.1 (Ig
lambda chain related)classified correctly 61 of
75 myeloma cases, but also two normal cases.
40Gene Regulatory Networks
- Gene network construction
- Requires temporal GED
- Develops relationships between gene expression
values across timesteps. - These relationships can then form a gene
regulatory network - This network describes the excitation and
inhibition which govern gene expression patterns
41Gene Regulatory Networks
42Gene Regulatory Networks
- Boolean network model
- Each gene receives one or several inputs from
other genes - Sigmoid function models the gene as a binary
element - Compute the output (time T1) from the inputs
(time T) according to boolean logics - Time is discretized
43Gene Regulatory Networks
- Boolean gene network example
- Input is at time T
- Output is at time T1
44Gene Regulatory Networks
- Process to construct Liang networks
- Train the ANN on pairs of gene expressions values
from the training set - Train the network between T pattern and T1
differences between expected and observed
patterns - Train on all pairs in the training set and
calculate percentage of correct values - Single layer networks reduce complexity and
improve transparency
45Gene Regulatory Networks
- All Boolean network time series terminate in
specific, repeating attractor patterns. - These can be visualized as basin of attraction
graphs. - All trajectories are strictly determined, and
many states converge on one attractor. - Stability of gene networks.