Title: More on NNs
1More on NNs
This is lecture 17 of Biologically Inspired
Computing about NN applications, and
overfitting
2NN Applications
- ANNs are a mature, tried and tested technology,
used for all sorts - of things.
- There are countless applications. Maybe
ill-applied in many cases, maybe not trained
ideally in many cases, and so on. However, the
extent to which there are successful applications
reveals - How useful it is to have some way of
predicting/classifying without needing to know
the rules underlying the task - I.e. how much of an advance this BIC technology
provides over classical methods. - How basically flexible and reliable and scalable
NNs are - Next few slides contain application examples from
just a single site about a single commercial NN
package.
3Stocks, Commodities and Futures Currency Price
Predictions James O'Sullivan Controls trading
of more than 10 different financial markets with
consistent profits. Corporate Bond Rating
George Pugh Predicts corporate bond ratings
with 100 accuracy for consulting and trading.
Standard and Poor's 500 Prediction LBS Capital
Management, Inc Predicts the SP 500 one day
ahead and one week ahead with better accuracy
than traditional methods. Forecasting Stock
Prices Walkrich Investments Neural Networks rate
underpriced stock beating the SP.
4Business, Management, and Finance Direct
Marketing Mail Prediction Microsoft Improves
response rates from 4.9 to 8.2. Credit Scoring
Herbert Jensen Predicts loan application
success with 75-80 accuracy. Identifing
Policemen with Potential for Misconduct The
Chicago Police Department predict misconduct
potential based on employee records. Jury
Summoning with Neural Networks The Montgomery
Court House in Norristown, PA saves 70 million
annually using The Intelligent Summoner from MEA.
Forecasting Highway Maintenance with Neural
Networks Professor Awad Hanna at the University
of Wisconsin in Madison has trained a neural
network to predict which type of concrete is
better than another for a particular highway
problem.
5Medical Applications Breast Cancer Cell Analysis
David Weinberg, MD Image analysis ignores
benign cells and classifies malignant cells.
Hospital Expenses Reduced Anderson Memorial
Hospital Improves the quality of care, reduces
death rate, and saved 500,000 in the first 15
months of use. Diagnosing Heart Attacks J.
Furlong, MD Recognizes Acute Myocardial
Infarction from enzyme data Emergency Room Lab
Test Ordering S. Berkov, MD Saves time and money
ordering tests using symptoms and demographics.
Classifying Patients for Psychiatric Care G.
Davis, MD Predicts Length of Stay for
Psychiatric Patients, saving money
6Sports Applications Thoroughbred Horse
Racing Don Emmons 22 races, 17 winning horses.
Thoroughbred Horse Racing Rich Janeva 39 of
winners picked at odds better than 4.5 to 1. Dog
Racing Derek Anderson 94 accuracy picking
first place.
7Science Solar Flare Prediction Dr. Henrik
Lundstet Predicts the next major solar flare
helps prevent problems for power plants.
Mosquito Identification Aubrey Moore 100
accuracy distinguishing between male and female,
two species. Spectroscopy StellarNet Inc
Analyze spectral data to classify materials.
Weather Forecasting Fort Worth National Weather
Service Predict rainfall to 85 accuracy. Air
Quality Testing Researchers at the Defense
Research Establishment Suffield, Chemical
Biological Defense Section, in Alberta, Canada
have trained a neural network to recognize,
classify and characterize aerosols of unknown
origin with a high degree of accuracy.
8Manufacturing Plastics Testing Monsanto
Predicts plastics quality, saving research time,
processing time, and manufacturing expense.
Computer Chip Manufacturing Quality Intel
Analyzes chip failures to help improve yields.
Nondestructive Concrete Testing Donald G. Pratt
Detects the presence and position of flaws in
reinforced concrete. Beer Testing
Anheuser-Busch Identifies the organic content
of competitors' beer vapors with 96 accuracy.
Steam Quality Testing AECL Research in
Manitoba, Canada has developed the INSIGHT steam
quality monitor, an instrument used to measure
steam quality and mass flowrate.
9Overfitting
Suppose we train an NN to tell the
difference between handwritten t and c, using
only these examples
ts
The ANN will learn easily. Either BP or some
other method will quickly find weights for the NN
which mean It gives 100 correct prediction on
these cases.
cs
10Overfitting
BUT this NN will probably generalise very
poorly. E.g. here is potential (very likely)
performance on certain unseen cases
It will probably predict that this is a c
Why?
It will probably predict that this is a t
11Avoiding Overfitting
It can be avoided by using as much
training data as possible, ensuring as much
diversity as possible in the data. This cuts
down on the potential existence of features that
might be discriminative in the training data, but
are otherwise spurious. It can be avoided by
jittering (adding noise). During training, every
time an input pattern is presented, it is
randomly perturbed. The idea of this is that
spurious features will be washed out by the
noise, but valid discriminatory features will
remain. The problem with this approach is how to
correctly choose the level of noise.
12Avoiding Overfitting II
A typical curve showing performance during
training.
But here is performance on unseen data, not in
the training set.
Training data
error
Time (BP training, or EA/PSO generations)
13Avoiding Overfitting III
3. Another approach is early stopping. During
training, keep track of the networks performance
on a separate validation set of data. At the
point where error continues to improve on the
training set, but starts to get worse on the
validation set, that is when training should be
stopped, since it is starting to overfit on the
training data. The problem here is that this
point is far from always clear cut.
14Some other important NN points
Input Layer
Output layer
Round nodes are proper nodes, which work out a
weighted sum of their inputs and send it
on. Square input nodes dont really count
they just distribute the inputs.
A NN like above, with just one layer of
processing nodes, is called a perceptron.
Perceptrons usually have many inputs and
one output, but can have more than one output.
They work out one (or more) weighted sums of
their inputs.
15Linear separability
To the left, 1s and 0s are shown these show the
XOR of the x and y co-ordinates. Can you draw a
straight line which has the 1s on one side of it
and the 0s on the other side?
1
0
Y1
0
1
Y0
X1
X0
It cant be done XOR is therefore not linearly
separable It turns out that perceptrons cannot
solve linearly inseparable classification
problems. However, have just two layers of
processing nodes, and all classification problems
can be solved. Standard ANNs usually have 3
layers (input, hidden, output), and are sometimes
called Multilayer Perceptrons
16Perceptron can only draw one (hyper)line
1
0
Y1
0
1
Y0
X1
X0
17Multilayer perceptron can only draw many
(hyper)lines
1
0
Y1
0
1
Y0
X1
X0
but so can a perceptron. The difference is
that the extra layer can make decisions based
on what side of each line the data are on.
18Talk to me
about how you would use an EA to evolve a neural
network for a pattern recognition
task. Encoding? Operators? Fitness ?
19Next time
Associative Networks (Hopfield) Self-Organising
Maps