Analysis of Classification-based Error Functions - PowerPoint PPT Presentation

About This Presentation

Title:

Analysis of Classification-based Error Functions

Description:

Teaching artificial neural networks with an error function ... ann, balance, bcw, derm, ecoli, iono, iris, musk2, pima, sonar, wine ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 30

Provided by: nnml

Learn more at: https://axon.cs.byu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of Classification-based Error Functions

1
Analysis of Classification-based Error Functions

Mike Rimer
Dr. Tony Martinez
BYU Computer Science Dept.
18 March 2006

2
Overview

Machine learning
Teaching artificial neural networks with an error
function
Problems with conventional error functions
CB algorithms
Experimental results
Conclusion and future work

3
Machine Learning

Goal Automating learning of problem domains
Given a training sample from a problem domain,
induce a correct solution-hypothesis over the
entire problem population
The learning model is often used as a black box

4
Teaching ANNs with an Error Function

Used to train a multi-layer perceptron (MLP)
to guide the gradient descent learning procedure
to an optimal state
Conventional error metrics are sum-squared error
(SSE) and cross entropy (CE)
SSE suited to function approximation
CE aimed at classification problems
CB error functions Rimer Martinez 06 work
better for classification

5
SSE, CE

Attempts to approximate 0-1 targets in order to
represent making a decision

Pattern labeled as class 2
6
Issues with approximating hard targets

Requires weights to be large to achieve
optimality
Leads to premature weight saturation
Weight decay, etc., can improve the situation
Learns areas of the problem space unevenly and at
different times during training
Makes global learning problematic

7
Classification-basedError Functions

Designed to more closely match the goal of
learning a classification task (i.e. correct
classifications, not low error on 0-1 targets),
avoiding premature weight saturation and
discouraging overfit
CB1 Rimer Martinez 02, 06
CB2 Rimer Martinez 04
CB3 (submitted to ICML 06)

8
CB1

Only backpropagates error on misclassified
training patterns

9
CB2

Adds a confidence margin, µ, that is increased
globally as training progresses

10
CB3

Learns a confidence Ci for each training pattern
i as training progresses
Patterns often misclassified have low confidence
Patterns consistently classified correctly gain
confidence

11
Neural Network Training

Influenced by
Initial parameter (weight) settings
Pattern order presentation (stochastic training)
Learning rate
of hidden nodes
Goal of training
High generalization
Low bias and variance

12
Experiments

Empirical comparison of six error functions
SSE, CE, CE w/ WD, CB1-3
Used eleven benchmark problems from the UC Irvine
Machine Learning Repository
ann, balance, bcw, derm, ecoli, iono, iris,
musk2, pima, sonar, wine
Testing performed using stratified 10-fold
cross-validation
Model selection by hold-out set
Results were averaged over ten tests
LR 0.1, M 0.7

13
Classifier output difference (COD)

Evaluation of behavioral difference of two
hypotheses (e.g. classifiers)

T is the test set I is the identity or
characteristic function
14
Robustness to initial network weights

Averaged 30 random runs over all datasets

algorithm Test acc St Dev Epoch
CB3 93.468 4.7792 200.67
CB2 92.839 4.0800 366.69
CB1 92.828 5.3290 514.14
CE 92.789 5.3937 319.57
CE w/ WD 92.251 5.4735 197.24
SSE 91.951 5.6131 774.70
15
Robustness to initial network weights

Averaged over all tests

Algorithm Test error COD
CB3 0.0653 0.0221
CB2 0.0716 0.0274
CB1 0.0717 0.0244
CE 0.0721 0.0248
CE w/ WD 0.0774 0.0255
SSE 0.0804 0.0368
16
Robustness to pattern presentation order

Averaged 30 random runs over all datasets

algorithm Test acc St Dev Epoch
CB3 93.446 5.0409 200.46
CB2 92.641 5.4197 402.52
CB1 92.542 5.473 560.09
CE 92.290 5.6020 329.65
CE w/ WD 91.818 5.6278 221.21
SSE 91.817 5.6653 593.30
17
Robustness to pattern presentation order

Averaged over all tests

Algorithm Test error COD
CB3 0.0655 0.0259
CB2 0.0736 0.0302
CB1 0.0746 0.0282
CE 0.0771 0.0329
CE w/ WD 0.0818 0.0338
SSE 0.0818 0.0344
18
Robustness to learning rate

Average of varying the learning rate from 0.01
0.3

Algorithm Test acc St Dev Epoch
CB3 93.175 3.514 334.8
CB2 92.285 3.437 617.8
SSE 92.211 3.449 525.7
CB1 91.908 3.880 505.4
CE 91.629 3.813 466.2
CE w/ WD 91.330 3.845 234.6
19
Robustness to learning rate
20
Robustness to number of hidden nodes

Average of varying the number of nodes in the
hidden layer from 1 - 30

Algorithm Test acc St dev Epoch
CB3 93.026 3.397 303.9
CB1 92.291 3.610 381.0
CB2 92.136 3.410 609.4
SSE 92.066 3.402 623.1
CE 91.956 3.563 397.0
CE w/ WD 91.74 3.493 190.6
21
Robustness to number of hidden nodes
22
Conclusion