Title: Business Intelligence Technologies
1Business Intelligence Technologies Data Mining
- Lecture 6 Neural Networks
2Agenda
- Artificial Neural Networks (ANN)
- Case Discussion
- Model Evaluation
- Software Demo
- Exercise
3The Metaphor
Input Attribute 1
Sum of weighted input values
Transfer function
Input Attribute m
4What Neural Nets Do
- Neural Nets learn complex functions Yf(X) from
data. - The format of the function is not know.
5Components of Neural Nets
- Neural Nets are composed of
- Nodes, and
- Arcs
- Each arc specifies a weight.
- Each node (other than the input nodes) contains a
Transfer Function which converts its inputs to
outputs. The input to a node is the weighted sum
of the inputs from its arcs.
6Inside a Node
- Each node contains a transfer function which
converts the sum of the weighted inputs to an
output
7A Simple NN
Here is a simple Neural network with no hidden
layers, and with a threshold-based transfer
function
8Structure of Neural Nets
- Nodes are arranged into one input layer, zero or
more hidden layers, and one output layer. - In the input layer, there is one input node for
each attribute - The number of hidden nodes and layers is
configurable - In the output layer, there is one output node for
each category (class) being predicted, where the
output at each node is typically the probability
of the item being in that class. For 2-class
problems, one output node is sufficient. Neural
nets may also predict continuous numeric outputs
in which case there is only one output node.
Input Layer
Hidden Layer
Output Layer
9Examples Different Structures
10Feed-Forward Neural Net
- Feed the example into the net the value for each
attribute of the example goes into the relevant
input node of the net - Multiply the value flowing through each arc (x)
by the weight (w) on each arc. - Sum the weighted values flowing into each node
- Obtain the output (y) from each node by applying
the transfer function (f) to the weighted sum of
inputs to the node - Continue to feed the output through the net.
- This process is known as feed-forward.
11Neural Network Training
- Training is the process of setting the best
weights on the arcs connecting all the nodes in
the network - The goal is to use the training set to calculate
weights where the output of the network is as
close to the desired output as possible for as
many of the examples in the training set as
possible - Back propagation has been used since the 1980s to
adjust the weights (There are also other
methods) - Calculates the error by taking the difference
between the calculated result and the actual
result - The error is fed back through the network and the
weights are adjusted to minimize the error
12How Nets Learn Its Weights
- Training the nets begins with assigning each arc
a small random positive weight. - Feed training examples into the net, one by one.
(Each training example is marked with the actual
output.) - Compute the output for each example by feeding
each attribute value into the relevant node,
multiplying by the appropriate arc weights,
summing weighted inputs, and applying transfer
functions to obtain outputs at each level. - Compare the final output of the Nets to the
actual output for each example. - Adjust weights by small fraction so as to
minimize the error. Error may be the simple
squared difference between actual and calculated
output, or some other more complex error
function. - Continue feeding examples into net and adjusting
weights (usually feeding the training set
multiple times). See later for how you decide
when to stop.
13Neural Nets Advantages
- Can model highly non-linear and complex spaces
accurately - Handles noisy data well.
- Trained network works just like a math function,
computing outputs is quick and the neural net can
therefore be easily embedded into any decision
analysis tool. - Incremental can simply add new examples to
continue learning on new data.
14Problems with Nets
- Over-training ? overfitting. Training should be
stopped when the accuracy on the test set starts
decreasing markedly. - Comprehensibility / Transparency while you can
read mathematical functions from the neural net,
comprehensible rules cannot be directly read of a
neural net. It is difficult to verify the
plausibility of the model produced by the neural
net as predictions have low explainability. - Input values need to be numeric (b/c need to
mullied by a weight) - Convergence to a solution is not guaranteed and
training time can be high. Typically a net will
stop training after a set number of iterations
over the training set, after a set time, when
weights start to converge to fixed values, or
when the error rate on test data starts
increasing.
15Overtraining
- If you let a neural net run for too long, it can
memorize the training data, meaning it doesnt
generalize well to new data. - The usual resolution to this is to continually
test the performance of the net against hold-out
data (test data) while it is being trained.
Training is stopped when the accuracy on the test
set starts decreasing markedly.
16Example Applications
- Major advantage
- Learn complex functions
- Hedge fund pricing models
- (Because of NNs ability to fit
- complex functions, it suits
- very well for financial
- applications)
- ALVINN learnt to keep a car on the road by
watching people drive - Speech and face recognition
17An example Application for Targeting Decisions
- Problem
- Explaining customers purchase decisions by
means of explanatory variables, or predictors
- Process
- Learning - Calibrate a NN model (configuration,
weights) using a training sample drawn from a
previous campaign - Scoring - Applying the resulting network on a new
set of observations, and calculate a NN score
for each customer - Decision - Typically, the larger the NN score,
the better the customer. Thus, one can sort
out the customers in descending order of their
scores, and apply a cutoff point to separate out
targets from non-targets
18Then the NN would look like
19Case Discussion
- Neural Fair Value
- What are the inputs and outputs of the neural
network? - How is the neural network trained? How is the
trained network used for prediction? - Describe the entire process of the Neural Fair
Value model. - Why does neural network work for stock selection?
Will decision tree, KNN, or traditional
regression work? - Can the model be improved by manipulating the
time periods used for training? - SOM
20Agenda
- Artificial Neural Networks (ANN)
- Case Discussion
- Model Evaluation
- Software Demo
- Exercise
21Actual Vs. Predicted Output
Inputs Inputs Inputs Inputs Output Models Prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v
22Is measuring accuracy on training data a good
performance indicator?
- Using the same set of examples for training as
well as for evaluation results in an
overoptimistic evaluation of model performance. - Need to test performance on data not seen by
the modeling algorithm. i.e., data that was no
used for model building
23Data Partition
- Randomly partition data into training and test
set - Training set data used to train/build the
model. - Estimate parameters (e.g., for a linear
regression), build decision tree, build
artificial network, etc. - Test set a set of examples not used for model
induction. The models performance is evaluated
on unseen data. Also referred to as out-of-sample
data. - Generalization Error Models error on the test
data.
Set of training examples
Set of test examples
24Training, Validation Test Sets
- When training multiple model types out of which
one model is selected to be used for prediction
Training Set (Build models)
Test Set Evaluate chosen models error
Validation Set (Compare models performances)
25Models Performance EvaluationClassification
Models
- Classification models predict what class an
instance (example) belongs to. - E.g., good vs. bad credit risk (Credit), Response
vs. no response to a direct marketing campaign,
etc. - Evaluation measure 1 Classification Accuracy
Rate - Proportion of accurate classifications of
examples in test set. E.g., the model predicts
the correct class for 70 of test examples. -
26Classification Accuracy Rate
Classification Accuracy Rate S/N Proportion
examples accurately classified by the model S
number of examples accurately classified by
model N Total number of examples
Inputs Inputs Inputs Inputs Output Models prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v
27A Question
- Assume a model accurately classifies 90 of
instances in the test set - Is it a good model?
28Consider the following
- Response rate for a mailing campaign is 1
- We build a classification model to predict
whether or not a customer would respond. - The model classification accuracy rate is 99
- How good is our model?
99 do not respond
1 respond
29 Classification Accuracy Rate
- After examining the examples the model
misclassified - The model always predicts that a customer would
not respond (always recommends not to mail) - The model misclassifies all respondents
- Conclusion Need to examine the type of errors
made by the model. Not just the proportion of
error.
30Confusion Matrix for Classification Accuracy Rate
Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix
Predicted Class Predicted Class Predicted Class
Actual Class A B C
A 57 2 0
B 4 61 6
C 6 2 40
Error Report Error Report Error Report Error Report
Class Cases Errors Error
A 59 2 3.39
B 71 10 14.08
C 48 8 16.67
Overall 178 20 11.24
31Evaluating Numerical Prediction
- Assume we build a model to estimate the amount
spent on next catalog offer.
32 Evaluating Numerical Prediction
- Mean-Squared Error (MSE)
- a1,a2,,an - actual amounts spent
- p1,p2,,pn - predicted amounts
- Errori(pi - ai)
- Mean Square Error(p1 a1)2(p2-a2)2 (pn-
an)2/n - MSE(83-80)2(131.3-140)2(178-175)2(166-168)2
(117-120)2(198-189)231.86 - Root Mean Squared error
33 Evaluating Numerical Prediction
- Mean Absolute Error
- Does not assign higher weights to large errors.
All sizes of error are weighted equally -
- MAE
- MAE(83-80131.3-140178-175166-168
117-120198-189)/6(38.93239)/64.8
34Exercise
- Compare multiple classification models (tree,
KNN, ANN) - SAS HMEQ data set
- WEKA bank data set