Efficient Neural Network Training Using Subsets of Very Large Datasets

1 / 56
About This Presentation
Title:

Efficient Neural Network Training Using Subsets of Very Large Datasets

Description:

Update weights after each training example. Momentum ... Lower learning rate of current weights. Add (1 or more) hidden units with standard learning rate ... –

Number of Views:25
Avg rating:3.0/5.0
Slides: 57
Provided by: scie262
Category:

less

Transcript and Presenter's Notes

Title: Efficient Neural Network Training Using Subsets of Very Large Datasets


1
Efficient Neural Network TrainingUsing Subsets
of Very Large Datasets
  • Srinivas Vadrevu
  • University of Minnesota Duluth

2
Overview
  • Very large amount of data
  • Knowledge Discovery in Databases (KDD)
  • Machine Learning (ML) algorithms
  • Neural Networks in KDD
  • Training with memory sized subsets

3
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

4
Classification Tasks An Example
Positive Examples
Negative Examples
Concept Determine whether
Positive/Negative Input Features Color,
Sides, Corner Output Feature
Positive/Negative Example Colorred,
Sides 8 Corner sharp
5
Learning
  • Supervised
  • Teacher labeled data
  • Un-supervised
  • No teacher labels
  • Neural Networks in classification
  • Supervised
  • Accurate
  • Efficient
  • Immune to noise

6
Knowledge Discovery in Databases (KDD)
  • Data, Data, Data!!!
  • Single pass learning
  • Read data into memory once
  • Disk references costly
  • In memory processing cheap
  • Our approach
  • Divide dataset into memory-sized subset
  • Train ANN successively with each subset

7
Basic Idea
.
.
.
8
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

9
A Typical Feed-forward Neural Network
Example
Corner sharp
Color red
Sides 8
Color blue
Sides 4
Corner round



Error
Activation
1
Bias Unit
Output positive/negative
weights
10
Learning in Neural Networks
  • Activation
  • Inputs set by input features
  • For other units determine net input
  • nk ( ? wj?k X aj)
  • Then calculate activation (e.g., sigmoid
    function)
  • ak

?j ? LinkedTo(k)
11
Backpropagation
  • Learning in multi-layer feed-forward ANN
  • For each example
  • Propagate activation forward
  • Propagate error backward
  • Compute error for outputs
  • Backpropagate error to hiddens
  • Update weights with gradient descent

12
Other Ideas
  • Backprop requires many epochs to converge
  • Some ideas to overcome this
  • Stochastic learning
  • Update weights after each training example
  • Momentum
  • Add fraction of previous update to current update
  • Faster convergence

13
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

14
Several Methods to Speed up Learning in Neural
Networks
  • QuickProp (Fahlman, 1988)
  • RProp (Reidmiller and Braun, 1993)
  • Dynamic adaptation of ? and ?
  • (Salomon and van Hemmen, 1996)
  • Exploring error surface
  • (Schmidhuber, 1989)
  • Redefining error function
  • (Balakrishnan and Hanovar, 1992)

15
RProp
  • Resilient Propagation
  • Variant of backpropagation
  • Examines sign of partial derivative of error
  • Compute weight update
  • If sign changes, weight update change retained
  • Otherwise, update amount slightly increased
  • May converge quicker than backprop

16
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

17
Training with Subsets
  • Catlett (1991) trained with subsets of data
  • Classifiers from subsets generally inferior
  • Investigated other sampling methods (e.g.,
    stratified)

18
Other Ideas in Training With Subsets
  • Breiman, 1999
  • Ensemble of classifiers trained on subsets of
    data
  • Street and Kim, 2001
  • Similar to Breiman, decided whether to add
    classifier to ensemble
  • Training on subsets of input features

19
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

20
Basic Idea
.
.
.
21
NN(Subset) Algorithm
  • P number of pages of memory available
  • G Data Pages / P ( Groups)
  • Initialize ANN
  • For each of G partitions
  • Randomly select (w/o replacement) P data pages
  • Train ANN for N epochs on current subset
  • Output resulting ANN

22
NNGrow(Subset) Algorithm
  • P number of pages of memory available
  • G Data Pages / P ( Groups)
  • Initialize ANN
  • For each of G partitions
  • Randomly select P data pages
  • Train ANN for N epochs on current subset
  • If not the last partition
  • Lower learning rate of current weights
  • Add (1 or more) hidden units with standard
    learning rate
  • Output resulting ANN

23
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

24
Datasets
25
Error (letter-recognition)
26
Error (splice)
27
Discussion
  • Different mechanisms perform well on different
    datasets
  • Decision tree learning generally effective
  • ANNs perform well, generally better with larger
    number of hidden units
  • Naïve Bayes, K-Nearest Neighbor perform well on
    some problems, and poorly on others

28
Convergence Results (letter)
29
Convergence Results (adult)
30
Discussion
  • ANNs often converge quickly (in less than 10
    epochs)
  • Accuracy after a few epochs is comparable to
    final accuracy
  • Q is it possible to adjust learning to always
    achieve good accuracy quickly?

31
Varying of Hidden Units (letter)
32
Varying of Hidden Units (splice)
33
Discussion
  • Determining network topology difficult
  • For larger number of hidden units convergence may
    be delayed
  • More hidden units generally produce lower error

34
Varying the Learning Rate (letter)
35
Varying the Learning Rate (splice)
36
Varying the Momentum (shuttle)
37
Varying the Momentum (splice)
38
Discussion
  • Choosing single learning rate impossible
  • For momentum close to one, learner often does not
    learn
  • But higher momentum may produce lower error rates
  • Varying learning rate, momentum can produce
    faster results
  • No single value of either is effective

39
NN(Subset) Results
40
NN(Subset) Results
41
NN(Subset) Vs NN(Baseline)
42
NN(Subset) Vs NN(Baseline)
43
NNGrow(Subset) Results
44
Subset Vs Baseline (msweb)
45
Subset Vs Baseline (letter)
46
Subset Vs Baseline (shuttle)
47
Subset Vs Baseline (splice)
48
Discussion
  • NN(Subset), NN(Baseline) results comparable
  • NNGrow(Subset) often produces lower error
  • Error of subset methods often comparable to ANN
    on entire dataset

49
Summary Conclusions
  • ANNS converge quickly (often lt 10 epochs)
  • Difficult to reduce training time by altering the
    network topology or learning parameters
  • NN(Subset), NNGrow(Subset) often produce results
    comparable to baseline
  • NNGrow(Subset) often produces lower error than
    NN(Subset)

50
Outline of Talk
  • Motivation
  • Background
  • Artificial Neural Networks (ANNs)
  • Speeding up ANN training
  • Training with Subsets
  • Our Idea Subset Training
  • Experimental Results
  • Other Related Work
  • Future Work
  • Conclusions

51
Breimans Method
  • Select subsets of data
  • Build new classifier on subset
  • Aggregate with previous classifiers
  • Compare error after adding classifier
  • Repeat as long as error decreases

52
Cascade Correlation (Fahlman Lebiere, 1990)
  • Start perceptron
  • Add hidden units until error plateaus
  • Freeze older weights in network
  • Network topology not predetermined

53
QuickProp
  • Try to adapt each weight in network to optimum
    value
  • Model error surface as parabola using gradient in
    current, previous steps and previous weight
    change
  • New weight in current step is minimum point on
    the parabola
  • Not guaranteed to converge

54
Future Work
  • Combination of other training methods and our
    approach
  • Use data to determine when to stop learning
  • Use next subset as validation set
  • An approach similar to Breimans method
  • Use overlapping subsets
  • Use data to select learning parameters
  • Use subsets as validation sets and alter network
    topology and parameters

55
Conclusions
  • ANNs often converge quickly
  • Difficult to reduce training time with topology
    and learning parameters
  • Idea train large datasets by looking at
    memory-sized subsets of data
  • Network can be built in one pass, making it
    applicable to KDD

56
Acknowledgements
  • I am grateful to my advisor
  • Dr. Rich Maclin for providing me an opportunity
    to work with him and his valuable guidance
  • I also thank Dr. Taek Kwon and
  • Dr. Tim Colburn for their co-operation
Write a Comment
User Comments (0)
About PowerShow.com