Business Intelligence Technologies - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Business Intelligence Technologies

Description:

Business Intelligence Technologies Data Mining Lecture 6 Neural Networks * * * * * * Exercise Compare multiple classification models (tree, KNN, ANN) SAS: HMEQ ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 35

Provided by: GraduateS62

Category:

more less

Transcript and Presenter's Notes

Title: Business Intelligence Technologies

1
Business Intelligence Technologies Data Mining

Lecture 6 Neural Networks

2
Agenda

Artificial Neural Networks (ANN)
Case Discussion
Model Evaluation
Software Demo
Exercise

3
The Metaphor
Input Attribute 1
Sum of weighted input values
Transfer function
Input Attribute m
4
What Neural Nets Do

Neural Nets learn complex functions Yf(X) from
data.
The format of the function is not know.

5
Components of Neural Nets

Neural Nets are composed of
Nodes, and
Arcs
Each arc specifies a weight.
Each node (other than the input nodes) contains a
Transfer Function which converts its inputs to
outputs. The input to a node is the weighted sum
of the inputs from its arcs.

6
Inside a Node

Each node contains a transfer function which
converts the sum of the weighted inputs to an
output

7
A Simple NN
Here is a simple Neural network with no hidden
layers, and with a threshold-based transfer
function
8
Structure of Neural Nets

Nodes are arranged into one input layer, zero or
more hidden layers, and one output layer.
In the input layer, there is one input node for
each attribute
The number of hidden nodes and layers is
configurable
In the output layer, there is one output node for
each category (class) being predicted, where the
output at each node is typically the probability
of the item being in that class. For 2-class
problems, one output node is sufficient. Neural
nets may also predict continuous numeric outputs
in which case there is only one output node.

Input Layer
Hidden Layer
Output Layer
9
Examples Different Structures
10
Feed-Forward Neural Net

Feed the example into the net the value for each
attribute of the example goes into the relevant
input node of the net
Multiply the value flowing through each arc (x)
by the weight (w) on each arc.
Sum the weighted values flowing into each node
Obtain the output (y) from each node by applying
the transfer function (f) to the weighted sum of
inputs to the node
Continue to feed the output through the net.
This process is known as feed-forward.

11
Neural Network Training

Training is the process of setting the best
weights on the arcs connecting all the nodes in
the network
The goal is to use the training set to calculate
weights where the output of the network is as
close to the desired output as possible for as
many of the examples in the training set as
possible
Back propagation has been used since the 1980s to
adjust the weights (There are also other
methods)
Calculates the error by taking the difference
between the calculated result and the actual
result
The error is fed back through the network and the
weights are adjusted to minimize the error

12
How Nets Learn Its Weights

Training the nets begins with assigning each arc
a small random positive weight.
Feed training examples into the net, one by one.
(Each training example is marked with the actual
output.)
Compute the output for each example by feeding
each attribute value into the relevant node,
multiplying by the appropriate arc weights,
summing weighted inputs, and applying transfer
functions to obtain outputs at each level.
Compare the final output of the Nets to the
actual output for each example.
Adjust weights by small fraction so as to
minimize the error. Error may be the simple
squared difference between actual and calculated
output, or some other more complex error
function.
Continue feeding examples into net and adjusting
weights (usually feeding the training set
multiple times). See later for how you decide
when to stop.

13
Neural Nets Advantages

Can model highly non-linear and complex spaces
accurately
Handles noisy data well.
Trained network works just like a math function,
computing outputs is quick and the neural net can
therefore be easily embedded into any decision
analysis tool.
Incremental can simply add new examples to
continue learning on new data.

14
Problems with Nets

Over-training ? overfitting. Training should be
stopped when the accuracy on the test set starts
decreasing markedly.
Comprehensibility / Transparency while you can
read mathematical functions from the neural net,
comprehensible rules cannot be directly read of a
neural net. It is difficult to verify the
plausibility of the model produced by the neural
net as predictions have low explainability.
Input values need to be numeric (b/c need to
mullied by a weight)
Convergence to a solution is not guaranteed and
training time can be high. Typically a net will
stop training after a set number of iterations
over the training set, after a set time, when
weights start to converge to fixed values, or
when the error rate on test data starts
increasing.

15
Overtraining

If you let a neural net run for too long, it can
memorize the training data, meaning it doesnt
generalize well to new data.
The usual resolution to this is to continually
test the performance of the net against hold-out
data (test data) while it is being trained.
Training is stopped when the accuracy on the test
set starts decreasing markedly.

16
Example Applications

Major advantage
Learn complex functions
Hedge fund pricing models
(Because of NNs ability to fit
complex functions, it suits
very well for financial
applications)
ALVINN learnt to keep a car on the road by
watching people drive
Speech and face recognition

17
An example Application for Targeting Decisions

Problem
Explaining customers purchase decisions by
means of explanatory variables, or predictors

Process
Learning - Calibrate a NN model (configuration,
weights) using a training sample drawn from a
previous campaign
Scoring - Applying the resulting network on a new
set of observations, and calculate a NN score
for each customer
Decision - Typically, the larger the NN score,
the better the customer. Thus, one can sort
out the customers in descending order of their
scores, and apply a cutoff point to separate out
targets from non-targets

18
Then the NN would look like
19
Case Discussion

Neural Fair Value
What are the inputs and outputs of the neural
network?
How is the neural network trained? How is the
trained network used for prediction?
Describe the entire process of the Neural Fair
Value model.
Why does neural network work for stock selection?
Will decision tree, KNN, or traditional
regression work?
Can the model be improved by manipulating the
time periods used for training?
SOM

20
Agenda

Artificial Neural Networks (ANN)
Case Discussion
Model Evaluation
Software Demo
Exercise

21
Actual Vs. Predicted Output
Inputs Inputs Inputs Inputs Output Models Prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v

22
Is measuring accuracy on training data a good
performance indicator?

Using the same set of examples for training as
well as for evaluation results in an
overoptimistic evaluation of model performance.
Need to test performance on data not seen by
the modeling algorithm. i.e., data that was no
used for model building

23
Data Partition

Randomly partition data into training and test
set
Training set data used to train/build the
model.
Estimate parameters (e.g., for a linear
regression), build decision tree, build
artificial network, etc.
Test set a set of examples not used for model
induction. The models performance is evaluated
on unseen data. Also referred to as out-of-sample
data.
Generalization Error Models error on the test
data.

Set of training examples
Set of test examples
24
Training, Validation Test Sets

When training multiple model types out of which
one model is selected to be used for prediction

Training Set (Build models)
Test Set Evaluate chosen models error
Validation Set (Compare models performances)
25
Models Performance EvaluationClassification
Models

Classification models predict what class an
instance (example) belongs to.
E.g., good vs. bad credit risk (Credit), Response
vs. no response to a direct marketing campaign,
etc.
Evaluation measure 1 Classification Accuracy
Rate
Proportion of accurate classifications of
examples in test set. E.g., the model predicts
the correct class for 70 of test examples.

26
Classification Accuracy Rate
Classification Accuracy Rate S/N Proportion
examples accurately classified by the model S
number of examples accurately classified by
model N Total number of examples
Inputs Inputs Inputs Inputs Output Models prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v

27
A Question

Assume a model accurately classifies 90 of
instances in the test set
Is it a good model?

28
Consider the following

Response rate for a mailing campaign is 1
We build a classification model to predict
whether or not a customer would respond.
The model classification accuracy rate is 99
How good is our model?

99 do not respond
1 respond
29
Classification Accuracy Rate

After examining the examples the model
misclassified
The model always predicts that a customer would
not respond (always recommends not to mail)
The model misclassifies all respondents
Conclusion Need to examine the type of errors
made by the model. Not just the proportion of
error.

30
Confusion Matrix for Classification Accuracy Rate
Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix
Predicted Class Predicted Class Predicted Class
Actual Class A B C
A 57 2 0
B 4 61 6
C 6 2 40

Error Report Error Report Error Report Error Report
Class Cases Errors Error
A 59 2 3.39
B 71 10 14.08
C 48 8 16.67
Overall 178 20 11.24
31
Evaluating Numerical Prediction

Assume we build a model to estimate the amount
spent on next catalog offer.

32
Evaluating Numerical Prediction

Mean-Squared Error (MSE)
a1,a2,,an - actual amounts spent
p1,p2,,pn - predicted amounts
Errori(pi - ai)
Mean Square Error(p1 a1)2(p2-a2)2 (pn-
an)2/n
MSE(83-80)2(131.3-140)2(178-175)2(166-168)2
(117-120)2(198-189)231.86
Root Mean Squared error

33
Evaluating Numerical Prediction

Mean Absolute Error
Does not assign higher weights to large errors.
All sizes of error are weighted equally
MAE
MAE(83-80131.3-140178-175166-168
117-120198-189)/6(38.93239)/64.8

34
Exercise