Efficient Neural Network Training Using Subsets of Very Large Datasets

1 / 56

About This Presentation

Title:

Efficient Neural Network Training Using Subsets of Very Large Datasets

Description:

Update weights after each training example. Momentum ... Lower learning rate of current weights. Add (1 or more) hidden units with standard learning rate ... –

Number of Views:25

Avg rating:3.0/5.0

Slides: 57

Provided by: scie262

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Neural Network Training Using Subsets of Very Large Datasets

1
Efficient Neural Network TrainingUsing Subsets
of Very Large Datasets

Srinivas Vadrevu
University of Minnesota Duluth

2
Overview

Very large amount of data
Knowledge Discovery in Databases (KDD)
Machine Learning (ML) algorithms
Neural Networks in KDD
Training with memory sized subsets

3
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

4
Classification Tasks An Example
Positive Examples
Negative Examples
Concept Determine whether
Positive/Negative Input Features Color,
Sides, Corner Output Feature
Positive/Negative Example Colorred,
Sides 8 Corner sharp
5
Learning

Supervised
Teacher labeled data
Un-supervised
No teacher labels
Neural Networks in classification
Supervised
Accurate
Efficient
Immune to noise

6
Knowledge Discovery in Databases (KDD)

Data, Data, Data!!!
Single pass learning
Read data into memory once
Disk references costly
In memory processing cheap
Our approach
Divide dataset into memory-sized subset
Train ANN successively with each subset

7
Basic Idea
.
.
.
8
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

9
A Typical Feed-forward Neural Network
Example
Corner sharp
Color red
Sides 8
Color blue
Sides 4
Corner round

Error
Activation
1
Bias Unit
Output positive/negative
weights
10
Learning in Neural Networks

Activation
Inputs set by input features
For other units determine net input
nk ( ? wj?k X aj)
Then calculate activation (e.g., sigmoid
function)
ak

?j ? LinkedTo(k)
11
Backpropagation

Learning in multi-layer feed-forward ANN
For each example
Propagate activation forward
Propagate error backward
Compute error for outputs
Backpropagate error to hiddens
Update weights with gradient descent

12
Other Ideas

Backprop requires many epochs to converge
Some ideas to overcome this
Stochastic learning
Update weights after each training example
Momentum
Add fraction of previous update to current update
Faster convergence

13
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

14
Several Methods to Speed up Learning in Neural
Networks

QuickProp (Fahlman, 1988)
RProp (Reidmiller and Braun, 1993)
Dynamic adaptation of ? and ?
(Salomon and van Hemmen, 1996)
Exploring error surface
(Schmidhuber, 1989)
Redefining error function
(Balakrishnan and Hanovar, 1992)

15
RProp

Resilient Propagation
Variant of backpropagation
Examines sign of partial derivative of error
Compute weight update
If sign changes, weight update change retained
Otherwise, update amount slightly increased
May converge quicker than backprop

16
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

17
Training with Subsets

Catlett (1991) trained with subsets of data
Classifiers from subsets generally inferior
Investigated other sampling methods (e.g.,
stratified)

18
Other Ideas in Training With Subsets

Breiman, 1999
Ensemble of classifiers trained on subsets of
data
Street and Kim, 2001
Similar to Breiman, decided whether to add
classifier to ensemble
Training on subsets of input features

19
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

20
Basic Idea
.
.
.
21
NN(Subset) Algorithm

P number of pages of memory available
G Data Pages / P ( Groups)
Initialize ANN
For each of G partitions
Randomly select (w/o replacement) P data pages
Train ANN for N epochs on current subset
Output resulting ANN

22
NNGrow(Subset) Algorithm

P number of pages of memory available
G Data Pages / P ( Groups)
Initialize ANN
For each of G partitions
Randomly select P data pages
Train ANN for N epochs on current subset
If not the last partition
Lower learning rate of current weights
Add (1 or more) hidden units with standard
learning rate
Output resulting ANN

23
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

24
Datasets
25
Error (letter-recognition)
26
Error (splice)
27
Discussion

Different mechanisms perform well on different
datasets
Decision tree learning generally effective
ANNs perform well, generally better with larger
number of hidden units
Naïve Bayes, K-Nearest Neighbor perform well on
some problems, and poorly on others

28
Convergence Results (letter)
29
Convergence Results (adult)
30
Discussion

ANNs often converge quickly (in less than 10
epochs)
Accuracy after a few epochs is comparable to
final accuracy
Q is it possible to adjust learning to always
achieve good accuracy quickly?

31
Varying of Hidden Units (letter)
32
Varying of Hidden Units (splice)
33
Discussion

Determining network topology difficult
For larger number of hidden units convergence may
be delayed
More hidden units generally produce lower error

34
Varying the Learning Rate (letter)
35
Varying the Learning Rate (splice)
36
Varying the Momentum (shuttle)
37
Varying the Momentum (splice)
38
Discussion

Choosing single learning rate impossible
For momentum close to one, learner often does not
learn
But higher momentum may produce lower error rates
Varying learning rate, momentum can produce
faster results
No single value of either is effective

39
NN(Subset) Results
40
NN(Subset) Results
41
NN(Subset) Vs NN(Baseline)
42
NN(Subset) Vs NN(Baseline)
43
NNGrow(Subset) Results
44
Subset Vs Baseline (msweb)
45
Subset Vs Baseline (letter)
46
Subset Vs Baseline (shuttle)
47
Subset Vs Baseline (splice)
48
Discussion

NN(Subset), NN(Baseline) results comparable
NNGrow(Subset) often produces lower error
Error of subset methods often comparable to ANN
on entire dataset

49
Summary Conclusions

ANNS converge quickly (often lt 10 epochs)
Difficult to reduce training time by altering the
network topology or learning parameters
NN(Subset), NNGrow(Subset) often produce results
comparable to baseline
NNGrow(Subset) often produces lower error than
NN(Subset)

50
Outline of Talk

Motivation
Background
Artificial Neural Networks (ANNs)
Speeding up ANN training
Training with Subsets
Our Idea Subset Training
Experimental Results
Other Related Work
Future Work
Conclusions

51
Breimans Method

Select subsets of data
Build new classifier on subset
Aggregate with previous classifiers
Compare error after adding classifier
Repeat as long as error decreases

52
Cascade Correlation (Fahlman Lebiere, 1990)

Start perceptron
Add hidden units until error plateaus
Freeze older weights in network
Network topology not predetermined

53
QuickProp

Try to adapt each weight in network to optimum
value
Model error surface as parabola using gradient in
current, previous steps and previous weight
change
New weight in current step is minimum point on
the parabola
Not guaranteed to converge

54
Future Work

Combination of other training methods and our
approach
Use data to determine when to stop learning
Use next subset as validation set
An approach similar to Breimans method
Use overlapping subsets
Use data to select learning parameters
Use subsets as validation sets and alter network
topology and parameters

55
Conclusions

ANNs often converge quickly
Difficult to reduce training time with topology
and learning parameters
Idea train large datasets by looking at
memory-sized subsets of data
Network can be built in one pass, making it
applicable to KDD

56
Acknowledgements

I am grateful to my advisor
Dr. Rich Maclin for providing me an opportunity
to work with him and his valuable guidance
I also thank Dr. Taek Kwon and
Dr. Tim Colburn for their co-operation

Write a Comment

User Comments (0)