Business Intelligence Technologies - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Business Intelligence Technologies

Description:

Business Intelligence Technologies Data Mining Lecture 6 Neural Networks * * * * * * Exercise Compare multiple classification models (tree, KNN, ANN) SAS: HMEQ ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 35
Provided by: GraduateS62
Category:

less

Transcript and Presenter's Notes

Title: Business Intelligence Technologies


1
Business Intelligence Technologies Data Mining
  • Lecture 6 Neural Networks

2
Agenda
  • Artificial Neural Networks (ANN)
  • Case Discussion
  • Model Evaluation
  • Software Demo
  • Exercise

3
The Metaphor
Input Attribute 1
Sum of weighted input values
Transfer function
Input Attribute m
4
What Neural Nets Do
  • Neural Nets learn complex functions Yf(X) from
    data.
  • The format of the function is not know.

5
Components of Neural Nets
  • Neural Nets are composed of
  • Nodes, and
  • Arcs
  • Each arc specifies a weight.
  • Each node (other than the input nodes) contains a
    Transfer Function which converts its inputs to
    outputs. The input to a node is the weighted sum
    of the inputs from its arcs.

6
Inside a Node
  • Each node contains a transfer function which
    converts the sum of the weighted inputs to an
    output

7
A Simple NN
Here is a simple Neural network with no hidden
layers, and with a threshold-based transfer
function
8
Structure of Neural Nets
  • Nodes are arranged into one input layer, zero or
    more hidden layers, and one output layer.
  • In the input layer, there is one input node for
    each attribute
  • The number of hidden nodes and layers is
    configurable
  • In the output layer, there is one output node for
    each category (class) being predicted, where the
    output at each node is typically the probability
    of the item being in that class. For 2-class
    problems, one output node is sufficient. Neural
    nets may also predict continuous numeric outputs
    in which case there is only one output node.

Input Layer
Hidden Layer
Output Layer
9
Examples Different Structures
10
Feed-Forward Neural Net
  • Feed the example into the net the value for each
    attribute of the example goes into the relevant
    input node of the net
  • Multiply the value flowing through each arc (x)
    by the weight (w) on each arc.
  • Sum the weighted values flowing into each node
  • Obtain the output (y) from each node by applying
    the transfer function (f) to the weighted sum of
    inputs to the node
  • Continue to feed the output through the net.
  • This process is known as feed-forward.

11
Neural Network Training
  • Training is the process of setting the best
    weights on the arcs connecting all the nodes in
    the network
  • The goal is to use the training set to calculate
    weights where the output of the network is as
    close to the desired output as possible for as
    many of the examples in the training set as
    possible
  • Back propagation has been used since the 1980s to
    adjust the weights (There are also other
    methods)
  • Calculates the error by taking the difference
    between the calculated result and the actual
    result
  • The error is fed back through the network and the
    weights are adjusted to minimize the error

12
How Nets Learn Its Weights
  • Training the nets begins with assigning each arc
    a small random positive weight.
  • Feed training examples into the net, one by one.
    (Each training example is marked with the actual
    output.)
  • Compute the output for each example by feeding
    each attribute value into the relevant node,
    multiplying by the appropriate arc weights,
    summing weighted inputs, and applying transfer
    functions to obtain outputs at each level.
  • Compare the final output of the Nets to the
    actual output for each example.
  • Adjust weights by small fraction so as to
    minimize the error. Error may be the simple
    squared difference between actual and calculated
    output, or some other more complex error
    function.
  • Continue feeding examples into net and adjusting
    weights (usually feeding the training set
    multiple times). See later for how you decide
    when to stop.

13
Neural Nets Advantages
  • Can model highly non-linear and complex spaces
    accurately
  • Handles noisy data well.
  • Trained network works just like a math function,
    computing outputs is quick and the neural net can
    therefore be easily embedded into any decision
    analysis tool.
  • Incremental can simply add new examples to
    continue learning on new data.

14
Problems with Nets
  • Over-training ? overfitting. Training should be
    stopped when the accuracy on the test set starts
    decreasing markedly.
  • Comprehensibility / Transparency while you can
    read mathematical functions from the neural net,
    comprehensible rules cannot be directly read of a
    neural net. It is difficult to verify the
    plausibility of the model produced by the neural
    net as predictions have low explainability.
  • Input values need to be numeric (b/c need to
    mullied by a weight)
  • Convergence to a solution is not guaranteed and
    training time can be high. Typically a net will
    stop training after a set number of iterations
    over the training set, after a set time, when
    weights start to converge to fixed values, or
    when the error rate on test data starts
    increasing.

15
Overtraining
  • If you let a neural net run for too long, it can
    memorize the training data, meaning it doesnt
    generalize well to new data.
  • The usual resolution to this is to continually
    test the performance of the net against hold-out
    data (test data) while it is being trained.
    Training is stopped when the accuracy on the test
    set starts decreasing markedly.

16
Example Applications
  • Major advantage
  • Learn complex functions
  • Hedge fund pricing models
  • (Because of NNs ability to fit
  • complex functions, it suits
  • very well for financial
  • applications)
  • ALVINN learnt to keep a car on the road by
    watching people drive
  • Speech and face recognition

17
An example Application for Targeting Decisions
  • Problem
  • Explaining customers purchase decisions by
    means of explanatory variables, or predictors
  • Process
  • Learning - Calibrate a NN model (configuration,
    weights) using a training sample drawn from a
    previous campaign
  • Scoring - Applying the resulting network on a new
    set of observations, and calculate a NN score
    for each customer
  • Decision - Typically, the larger the NN score,
    the better the customer. Thus, one can sort
    out the customers in descending order of their
    scores, and apply a cutoff point to separate out
    targets from non-targets

18
Then the NN would look like
19
Case Discussion
  • Neural Fair Value
  • What are the inputs and outputs of the neural
    network?
  • How is the neural network trained? How is the
    trained network used for prediction?
  • Describe the entire process of the Neural Fair
    Value model.
  • Why does neural network work for stock selection?
    Will decision tree, KNN, or traditional
    regression work?
  • Can the model be improved by manipulating the
    time periods used for training?
  • SOM

20
Agenda
  • Artificial Neural Networks (ANN)
  • Case Discussion
  • Model Evaluation
  • Software Demo
  • Exercise

21
Actual Vs. Predicted Output
Inputs Inputs Inputs Inputs Output Models Prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v

22
Is measuring accuracy on training data a good
performance indicator?
  • Using the same set of examples for training as
    well as for evaluation results in an
    overoptimistic evaluation of model performance.
  • Need to test performance on data not seen by
    the modeling algorithm. i.e., data that was no
    used for model building

23
Data Partition
  • Randomly partition data into training and test
    set
  • Training set data used to train/build the
    model.
  • Estimate parameters (e.g., for a linear
    regression), build decision tree, build
    artificial network, etc.
  • Test set a set of examples not used for model
    induction. The models performance is evaluated
    on unseen data. Also referred to as out-of-sample
    data.
  • Generalization Error Models error on the test
    data.

Set of training examples
Set of test examples
24
Training, Validation Test Sets
  • When training multiple model types out of which
    one model is selected to be used for prediction

Training Set (Build models)
Test Set Evaluate chosen models error
Validation Set (Compare models performances)
25
Models Performance EvaluationClassification
Models
  • Classification models predict what class an
    instance (example) belongs to.
  • E.g., good vs. bad credit risk (Credit), Response
    vs. no response to a direct marketing campaign,
    etc.
  • Evaluation measure 1 Classification Accuracy
    Rate
  • Proportion of accurate classifications of
    examples in test set. E.g., the model predicts
    the correct class for 70 of test examples.

26
Classification Accuracy Rate
Classification Accuracy Rate S/N Proportion
examples accurately classified by the model S
number of examples accurately classified by
model N Total number of examples
Inputs Inputs Inputs Inputs Output Models prediction Correct/ incorrect prediction
Single No of cards Age Incomegt50K Good/ Bad risk Good/ Bad risk
0 1 28 1 1 1 v
1 2 56 0 0 0 v
0 5 61 1 0 1 X
0 1 28 1 1 1 v

27
A Question
  • Assume a model accurately classifies 90 of
    instances in the test set
  • Is it a good model?

28
Consider the following
  • Response rate for a mailing campaign is 1
  • We build a classification model to predict
    whether or not a customer would respond.
  • The model classification accuracy rate is 99
  • How good is our model?

99 do not respond
1 respond
29
Classification Accuracy Rate
  • After examining the examples the model
    misclassified
  • The model always predicts that a customer would
    not respond (always recommends not to mail)
  • The model misclassifies all respondents
  • Conclusion Need to examine the type of errors
    made by the model. Not just the proportion of
    error.

30
Confusion Matrix for Classification Accuracy Rate
Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix
  Predicted Class Predicted Class Predicted Class
Actual Class A B C
A 57 2 0
B 4 61 6
C 6 2 40

Error Report Error Report Error Report Error Report
Class Cases Errors Error
A 59 2 3.39
B 71 10 14.08
C 48 8 16.67
Overall 178 20 11.24
31
Evaluating Numerical Prediction
  • Assume we build a model to estimate the amount
    spent on next catalog offer.

32
Evaluating Numerical Prediction
  • Mean-Squared Error (MSE)
  • a1,a2,,an - actual amounts spent
  • p1,p2,,pn - predicted amounts
  • Errori(pi - ai)
  • Mean Square Error(p1 a1)2(p2-a2)2 (pn-
    an)2/n
  • MSE(83-80)2(131.3-140)2(178-175)2(166-168)2
    (117-120)2(198-189)231.86
  • Root Mean Squared error

33
Evaluating Numerical Prediction
  • Mean Absolute Error
  • Does not assign higher weights to large errors.
    All sizes of error are weighted equally
  • MAE
  • MAE(83-80131.3-140178-175166-168
    117-120198-189)/6(38.93239)/64.8

34
Exercise
  • Compare multiple classification models (tree,
    KNN, ANN)
  • SAS HMEQ data set
  • WEKA bank data set
Write a Comment
User Comments (0)
About PowerShow.com