Real World Machine Learning - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Real World Machine Learning

Description:

A software system which generates time series classifiers. ... Collected 3 million lightning events in the 22-MHz subband's lifetime. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 37
Provided by: damianr4
Category:
Tags: learning | machine | real | world

less

Transcript and Presenter's Notes

Title: Real World Machine Learning


1
Zeus Genetic Algorithms and Support Vector
Machines for Time Series Classification
  • Damian Eads1,2, Daniel Hill2, Sean Davis1,
  • Simon Perkins1, Junshui Ma1, Reid Porter1, and
    James Theiler1

Nonproliferation and Intl Security Division1 Los
Alamos National Laboratory MS D436 Los Alamos, NM
87545
Department of Computer Science2 Rochester
Institute of Technology 102 Lomb Memorial
Drive Rochester, NY 14623
2
What is Zeus?
  • A software system which generates time series
    classifiers.
  • Principal application for classifying lightning
    events.
  • Named after the supreme ruler of Mount Olympus.

3
FORTE Satellite
  • Equipped with a suite of optical and
    radio-frequency (RF) instruments.
  • Collected 3 million lightning events in the
    22-MHz subbands lifetime.

4
Purpose
  • Develop a more sophisticated weather monitoring
    system.
  • Improve out understanding of storm evolution.
  • Explore the concept of feature selection for
    Support Vector Machines.

5
Data Acquisition
Ground Station
Transmission
Triggering Event
Preprocessing
Zeus Classifier Generator
6
Preprocessing
  • Load Very High Frequency (VHF) Data
  • Derive Spectrogram via Fourier Transform
  • Produce Power Density Time Series

Example CG Event
7
Classes of Lightning
  • Cloud-to-Ground
  • Positive Initial Return Stroke (CG)
  • Negative Initial Return Stroke (IR)
  • Subsequent Negative Initial Return Stroke (SR)
  • Intra-Cloud
  • Impulsive Event (I)
  • Trans-ionospheric Pulse Pair (TIPP/I2)
  • Gradual Intra-Cloud Stroke (KM)
  • Off Record (O)

8
Examples of Power Densities
9
Zeus Software System
  • Implemented in C.
  • Uses the libsvm Support Vector Machine Library.
  • Runs on an Intel processor-based Linux
    Workstation.
  • Performance measurement code written in C,
    MATLAB, and bash.

10
Front/Back-end Architecture
Time Series
Zeus Classifier Generator
Genetic Algorithm
Feature Extraction
Front-end Stochastic Search
Classification
Support Vector Machine
Back-end Classification
Time Series Classifier
11
Classifier Architecture
Zeus Classifier Generator
Time Series
TIME SERIES CLASSIFIER
FEATUREEXTRACTOR
FEATURE SET
MODEL
Prediction Result
12
Support Vector Machine
  • Projects the n-dimensional feature space into a
    higher dimension.
  • Uses a non-linear mapping defined by a kernel
    function.
  • Maximizes the margin.
  • See Vladimir Vapniks The Nature of Statistical
    Learning Theory for more information.

13
Genetic Algorithm
  • A. Produce Initial Population
  • B. Evaluate Chromosomes
  • C. Perform Selection
  • D. Perform Sexual Recombination of parents to
    produce new population.
  • E. Based on a probability, perform mutation.
  • F. If stopping criteria is not met, go to step B.

14
Chromosome
  • Composed of primitive statistical, arithmetic,
    and signal processing operators.
  • Each gene (or algorithm) is represented as a
    tree, accepts both scalar and series input, and
    outputs scalar features.
  • The chromosome produces a feature vector set.

15
Chromosome Representation
Time Series
Genes
Chromosome
y
Feature Vector
x1
xi
xn
16
Example Chromosome
(define-feature-selector '((ratio-3 s1 '(
0)) (skew (gs (chunk s1 '(0.33 0.5))
'(15))) (int-t s1 '(0.73 0.98)) (sum s1) (kurt
s1) (kurt (drv (lcomb s1 (drv s1)))) (skew
s1) (max (drv (gs s1))) (/ (int-t s1) (sum (drv
s1))) (ratio-3 s1 '( 4))))
Interpretation of first two features The first
feature represents the ratio of the average power
of first 266 microseconds of the signal over the
last 266 microseconds. The second feature is the
skewness of the smoothed power density from 266
to 400 microseconds.
17
Primitive Operators
  • Minimum
  • Maximum
  • Ratio of Means
  • Add, Subtract, Multiply, Divide
  • Subseries
  • Subsampling
  • Derivative Approximation
  • Convolution Filtering
  • Mean
  • Standard Deviation
  • Variance
  • Skewness
  • Kurtosis
  • Integral
  • Sum
  • Linear Combination

18
Crossover Operators
  • Uniform
  • Single-point
  • GP (Genetic Programming) Crossover

19
Uniform Crossover
Procedure For each gene, randomly select a
parent. And place the corresponding gene into the
child.
Mother
Father
Child
20
Single-Point Crossover
Procedure Select a cut point. Place the mothers
genes in the child before the cut-point. Place
the fathers genes after the cut-point.
Mother
Cut Point
Father
Child
21
GP Crossover
  • For each gene, select a compatible branch from
    each parent, and swap them.



Mother Gene
Father Gene
Child Gene
22
Mutation
  • Algorithm Randomization completely randomize a
    specific gene.
  • Hoisting select an cut point and a grab point.
    Delete the node at the cut point and its
    decedents and insert the gene at the cut point.

Cut Point
Grab Point
23
Fitness Evaluation
  • In-sample Classification Rate Simply calculates
    the in-sample classification rate.
  • 10-Fold Cross Validation Score Provides an
    estimate of how well a chromosome will perform on
    unseen (out-of-sample) data.

24
Fitness In-sample Rate
3181 Features
10 Features
Training Set
Processed Set
Chromosome
A
B
Feature Extractor
Model
C
E
Result Set
F
Classifier
Train SVM
  • Steps
  • Run Feature Extractor
  • Produce Training Set
  • Train SVM
  • Produce Model
  • Run Classifier
  • Produce Result Set
  • Calculate Score

D
Score
25
Fitness N-Fold Cross Valid.
3181 Features
10 Features
Training Set
Processed Set
Chromosome
A
B
Feature Extractor
E
Model
Result Set
D
C
Testing Partitions
F
Classifier
Finished? No
  • Steps
  • Run Feature Extractor
  • Produce Training Set
  • Produce Testing Partitions
  • Train on Complement

Yes
E. Produce Model F. Predict Labels of Test Set G.
Score if finished, otherwise, goto step D.
Score
26
Performance Testing
  • Another layer of cross-validation is needed.
  • Equally sized testing partitions are created.
  • An entire Zeus run is performed on each testing
    partition complement.
  • After 10 runs are complete, a final 10 fold cross
    validation score is calculated.
  • If in-sample is the fitness criteria, 90 of the
    training data is used to train an SVM, otherwise
    81 of the training data is used.
  • Tested against a raw SVM without feature
    selection.

27
Results
  • 10 Features, In-sample for Fitness, 50
    Generations, Pop. size of 15

28
Results
  • 10 Features, 10-Fold CV for Fitness, 50
    Generations, Pop. size of 15

29
Results
  • 49 Features, In-sample for Fitness, 50
    Generations, Pop. size of 15

30
Results
  • 49 Features, 10-Fold CV for Fitness, 50
    Generations, Pop. size of 15

31
Results
  • 3181 Features, Raw SVM without Zeus

32
Result Summary
33
Conclusions
  • The 10-fold cross validation fitness function led
    to higher scores for the outer layer of
    validation.
  • The raw SVM outperformed Zeus by 3.95 however
    with significantly more features.
  • Fewer features can reduce the strain of satellite
    resources such as bandwidth and payload storage.
  • The raw SVMs parameters may have been
    over-optimized.

34
Future Work
  • Implement better primitive operators.
  • Use another stochastic search technique to select
    SVM parameters.
  • Facilitate control structures and function
    definitions.
  • Add more lightning data to database(currently 143
    samples).
  • Explore other data sets.

35
Acknowledgements
  • Special thanks is given to the FORTE Project
    Leader Abe Jacobson and the ISIS (Intelligent
    Searching of Images and Signals) Team for without
    their support this work would not be possible.
  • This work was supported by a funding from a
    Laboratory Directed Research and Development
    Directed Research (LDRD/DR) as well as by funding
    from various government agencies.

36
Questions
?
Write a Comment
User Comments (0)
About PowerShow.com