Title: Rensselaer Polytechnic Institute
1Distributed and Generic Maximum Likelihood
Evaluation
Berkeley 2008
Carlos Varela, Travis Desell, Boleslaw
Szymanski, Malik Magdon-Ismail Department of
Computer Science
Nathan Cole, Heidi Newberg Department of
Physics, Applied Physics and Astronomy
- Rensselaer Polytechnic Institute
- http//wcl.cs.rpi.edu/gmle
- http//milkyway.cs.rpi.edu
- March, 2008
2Overview
- Introduction
- Motivation
- Research Questions and Challenges
- Enabled Scientific Applications
- GMLE (Generic Maximum Likelihood Evaluator)?
- Approach and Goals
- Architecture
- Asynchronous Search Methods
- Performance Evaluation
- Test Environments
- Grid and BlueGene Performance
- Asynchronous Search Performance
- Conclusions Future Work
3Motivation
- From a theoretical point of view, the most
important general method of estimation known so
far is the method of maximum likelihood - H. A. Cramer, Mathematical Methods of
Statistics - Distribution is essential for scientific
computing - Scientific Models are becoming increasingly
complex - Rates of data acquisition are far exceeding
increases in computing power - No Free Lunch in Machine Learning
- No single parameter optimization method is the
best - Different
4Research Questions and Challenges
- Distributed Computing
- Enable the easy use of distributed environments
- Computational grids
- The Internet
- Supercomputers
- Maximize Scalability
- Learning methods
- Scientific Model Evaluation
- Optimize Performance
- Reduce communication times
- Load balance distribute computations
- Handle Distributed Failures
- Machine Learning
- Examine the scalability of different search
methods - How many model evaluations can be done
concurrently? - Examine which search methods work best on what
computing environments - How can searches be modified for better use on
large-scale computing environments - Can the search be done asynchronously?
- Generic use of search methods by different
scientific applications
5Applications Astronomy
What is the structure and origin of the Milky Way
galaxy?
- All the stars in the sky are being measured by
the SLOAN digital sky survey (figure at right
shows current progress). Already over 10
terabytes of data has been collected. - Other galaxies are easy to examine because we can
look at them, however being inside the Milky Way
makes determining its structure and how it
formed difficult. - Evaluating a single model of the Milky Way with a
single set of parameters can take over a year on
a typical high-end computer. - Models determine where different star streams are
in the Milky Way, which helps us understand
better its structure and how it was formed.
6Applications Particle Physics
How can theory predicted but not yet observed
particles be found?
- How are missing baryons found?
- A scientific model with 10 to 100 fit parameters
is used to calculate the occurrence of missing
baryons based on observed data. - Current data sets involve 105 events.
- Future data sets will involve 107 events with
the next generation of particle detectors. - Calculating a single set of fit parameters on a
single data set takes months to a year on a
single high-end computer. - Finding missing baryons will help verify current
models in quantum theory.
7- GMLE
-
- A Distributed and Generic
- Maximum Likelihood Evaluator
8Approach Goals
- Separation of Concerns
- Scientific models, distributed evaluation
frameworks and search methods must be able to be
developed independently - Simple interfaces required for interaction
between these components - Goals
- Plug-and-play scientific models, search methods
and distributed execution environments - Determine which applications and search methods
work best on which execution environments - Develop new search methods which take advantage
of large-scale computing environments - Enable more effective and efficient research into
difficult scientific problems and more complex
models
9GMLE Architecture (Synchronous)?
Scientific Models
Search Routines
Data Initialization Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Gradient Descent Genetic Search Simplex
Initial Parameters
Optimized Parameters
Evaluation Request
Results
Distribute Parameters
Combine Results
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator Creation
BOINC (Internet)?
SALSA/Java (RPI Grid)?
MPI/C (BlueGene)?
Distributed Evaluation Framework
10GMLE Architecture (Asynchronous)?
Scientific Models
Search Routines
Data Initialization Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Gradient Descent Genetic Search Simplex
Initial Parameters
Optimized Parameters
Work Request
Results
Work Request
Results
Work
Work
Evaluator (1)?
Evaluator (N)?
Evaluator Creation
BOINC (Internet)?
SALSA/Java (RPI Grid)?
MPI/C (BlueGene)?
Distributed Evaluation Framework
11Asynchronous Search Methods
- Asynchronous Genetic Search
- Traditional genetic search works in iterative
generations - N individuals are used to generate the next N
individuals by selection, crossover and mutation - Asynchronous genetic search continuously updates
a population - N individuals are generated randomly for the
initial population - When a evaluator requests more work, individuals
from the population are selected randomly to
generate either a crossover or mutation - The population keeps the most fit individuals,
discarding the less fit as results arrive
12Asynchronous Genetic Search Operators
- Average
- Traditional operator for continuous problems
- Generated parameters are the average of two
randomly selected parents - Double Shot
- Two parents generate three children
- The average of the parents
- A point outside the less fit parent, the same
distance from that parent as the average - A point outside the more fit parent, the same
distance from that parent as the average - Probabilistic Simplex
- N parents generate one child
- Points randomly along the line created by the
worst parent, and the centroid (average) of the
remaining parents
13 14Test Environments
- GMLE implemented in SALSA/Java and MPI/C
- Used 3 heterogeneous clusters on the RPI Grid
- 4 Quad-Processor PowerPCs (16 Processors)?
- 4 Quad-Processor Dual-Core Opterons (32
Processors)? - 10 Quad-Processor Opterons (40 Processors)?
- Used two BlueGene/L partitions
- 128 node (128 processors, 256 in virtual mode)?
- 512 node (512 processors, 1024 in virtual mode)?
Grid Testbed
OPT 4x1
10x 2.2GHz Quad-processor Single coreOpteron
PPC
4x 1.7GHz Quad-processor Single-core PowerPC
LAN
WAN
OPT 4x2
4x 2.2GHz Quad-processor Dual-core Opteron
15Computation Time, Grid BlueGene/L
2 Minute Evaluation MLE requires 10,000
Evaluations 15 Day Runtime
100x Speedup 1.5 Day Runtime
230x Speedup lt1 Day Runtime
16Asynchronous Search Performance
- Performance of iterative and asynchronous genetic
search was tested on the BlueGene, and
asynchronous genetic search on BOINC using the
astronomy application - Average operator used for Iterative GS and
asynchronous GS on the BlueGene - Double Shot, and Simplex (N 2..5) on the
BlueGene and BOINC - Note IGS and AGS (Average) on the BlueGene used
an older version whose optimum was 3.025, while
the DS and Simplex had an optimum of 2.987.
17Iterative Genetic Search (Average)?
18Asynchronous Genetic Search (Average)?
19Double Shot and Simplex on BlueGene
20Double Shot and Simplex on BOINC
21Performance Conclusions
- Iterative genetic search had the worst
convergence rate, and asynchronous genetic search
(using the average operator) provided a
significant improvement. - Using the double shot operator provided even
faster convergence times. - Using the probabilistic simplex operator provided
the fastest convergence times, which improved as
more parents were used to calculate the centroid. - Asynchronous search on BOINC did not converge as
quickly as on the BlueGene (due to many
individuals being calculated concurrently, and
highly heterogeneous report times), however it is
still competitive considering more computational
power is available.
22Simplex Operator Utility Evaluation
- The usefulness of the simplex operator was tested
on the BlueGene and BOINC - This was calculated as the percentage of
individuals that were inserted into the
population - Points were generated along the line between -1.5
to 1.5 times the distance from the worst to the
centroid, around the centroid (ie, -1.0 is the
worst parent, 1.0 is the reflection). - For BOINC, the number of updates to the
population that occurred while an individual was
being evaluated was also taken into consideration.
23BlueGene Insert Percentage Evaluation
24Updated in less than 100 Evaluations
25Updated within 101 .. 200 Evaluations
26Updated within 201 .. 400 Evaluations
27Updated within 401...800 Evaluations
28Utility Conclusions
- Between -1.5 and 0.5 had the highest insert
percentage - Points generated closer to the reflection (-0.5
.. -1.5) retained their usefulness more than
other points with long result reporting times - Even with a long time to report, results still
had good chances to improve the population
29Insert Position Evaluation
- The positions which individuals were placed in
the population was examined on the BlueGene and
BOINC - The lower the position, the higher the fitness of
the individual, and the more improvement to the
population - For BOINC, the effect of calculation time (in
terms of the number of individuals received
between generation time and result report time)
was also considered.
30BlueGene Insert Position Evaluation
31Inserted in less than 100 Evaluations
32Inserted within 101 .. 200 Evaluations
33Inserted within 201 .. 400 Evaluations
34Inserted within 401 .. 800 Evaluations
35Insert Position Conclusions
- Points generated within 0.5 .. -1.5 proved to be
the best as well - Points generated near the centroid (0.5 .. -0.5)
tended to provide the best improvement for fast
result report times - As the result reporting time increased, points
generated near the reflection (-0.5 .. -1.5)
began to be better than those near the centroid
36Conclusions
- The test application used is highly expensive,
but incomplete - Calculation only done over a single wedge for a
single test model - Higher Accuracy required
- Can be improved by more detailed integral
calculation, which increases computational time
polynomially - Calculating the convolution for each point
increases computation time by 30x or more. - More computational power is very enabling
- Faster turn-around times means models and data
can be tested quicker, streamlining the
scientific cycle - Also allows for more detailed models for richer
research
37Future Work
- Evaluating the convergence rates of the different
search methods on different architectures and
evaluation frameworks with multiple applications. - Expanding the available search methods and
testing new genetic search operators. - Continued collaboration with various scientific
disciplines to examine how different types of
scientific computation will scale and utilize
these search methods.
http//www.nasa.gov
38Contact Information
- Webpages
- http//wcl.cs.rpi.edu/gmle/
- http//milkyway.cs.rpi.edu/