Rensselaer Polytechnic Institute - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Rensselaer Polytechnic Institute

Description:

Other galaxies are easy to examine because we can look at them, however being ... structure and origin of the Milky Way galaxy? Applications Particle Physics ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 39

Provided by: Sta7553

Category:

more less

Transcript and Presenter's Notes

Title: Rensselaer Polytechnic Institute

1
Distributed and Generic Maximum Likelihood
Evaluation
Berkeley 2008
Carlos Varela, Travis Desell, Boleslaw
Szymanski, Malik Magdon-Ismail Department of
Computer Science
Nathan Cole, Heidi Newberg Department of
Physics, Applied Physics and Astronomy

Rensselaer Polytechnic Institute
http//wcl.cs.rpi.edu/gmle
http//milkyway.cs.rpi.edu
March, 2008

2
Overview

Introduction
Motivation
Research Questions and Challenges
Enabled Scientific Applications
GMLE (Generic Maximum Likelihood Evaluator)?
Approach and Goals
Architecture
Asynchronous Search Methods
Performance Evaluation
Test Environments
Grid and BlueGene Performance
Asynchronous Search Performance
Conclusions Future Work

3
Motivation

From a theoretical point of view, the most
important general method of estimation known so
far is the method of maximum likelihood
H. A. Cramer, Mathematical Methods of
Statistics
Distribution is essential for scientific
computing
Scientific Models are becoming increasingly
complex
Rates of data acquisition are far exceeding
increases in computing power
No Free Lunch in Machine Learning
No single parameter optimization method is the
best
Different

4
Research Questions and Challenges

Distributed Computing
Enable the easy use of distributed environments
Computational grids
The Internet
Supercomputers
Maximize Scalability
Learning methods
Scientific Model Evaluation
Optimize Performance
Reduce communication times
Load balance distribute computations
Handle Distributed Failures

Machine Learning
Examine the scalability of different search
methods
How many model evaluations can be done
concurrently?
Examine which search methods work best on what
computing environments
How can searches be modified for better use on
large-scale computing environments
Can the search be done asynchronously?
Generic use of search methods by different
scientific applications

5
Applications Astronomy
What is the structure and origin of the Milky Way
galaxy?

All the stars in the sky are being measured by
the SLOAN digital sky survey (figure at right
shows current progress). Already over 10
terabytes of data has been collected.
Other galaxies are easy to examine because we can
look at them, however being inside the Milky Way
makes determining its structure and how it
formed difficult.
Evaluating a single model of the Milky Way with a
single set of parameters can take over a year on
a typical high-end computer.
Models determine where different star streams are
in the Milky Way, which helps us understand
better its structure and how it was formed.

6
Applications Particle Physics
How can theory predicted but not yet observed
particles be found?

How are missing baryons found?
A scientific model with 10 to 100 fit parameters
is used to calculate the occurrence of missing
baryons based on observed data.
Current data sets involve 105 events.
Future data sets will involve 107 events with
the next generation of particle detectors.
Calculating a single set of fit parameters on a
single data set takes months to a year on a
single high-end computer.
Finding missing baryons will help verify current
models in quantum theory.

GMLE
A Distributed and Generic
Maximum Likelihood Evaluator

8
Approach Goals

Separation of Concerns
Scientific models, distributed evaluation
frameworks and search methods must be able to be
developed independently
Simple interfaces required for interaction
between these components
Goals
Plug-and-play scientific models, search methods
and distributed execution environments
Determine which applications and search methods
work best on which execution environments
Develop new search methods which take advantage
of large-scale computing environments
Enable more effective and efficient research into
difficult scientific problems and more complex
models

9
GMLE Architecture (Synchronous)?
Scientific Models
Search Routines
Data Initialization Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Gradient Descent Genetic Search Simplex
Initial Parameters
Optimized Parameters
Evaluation Request
Results
Distribute Parameters
Combine Results
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator
Evaluator Creation

BOINC (Internet)?
SALSA/Java (RPI Grid)?
MPI/C (BlueGene)?
Distributed Evaluation Framework
10
GMLE Architecture (Asynchronous)?
Scientific Models
Search Routines
Data Initialization Integral Function Integral
Composition Likelihood Function Likelihood
Composition
Gradient Descent Genetic Search Simplex
Initial Parameters
Optimized Parameters
Work Request
Results
Work Request
Results
Work
Work
Evaluator (1)?
Evaluator (N)?

Evaluator Creation
BOINC (Internet)?
SALSA/Java (RPI Grid)?
MPI/C (BlueGene)?
Distributed Evaluation Framework
11
Asynchronous Search Methods

Asynchronous Genetic Search
Traditional genetic search works in iterative
generations
N individuals are used to generate the next N
individuals by selection, crossover and mutation
Asynchronous genetic search continuously updates
a population
N individuals are generated randomly for the
initial population
When a evaluator requests more work, individuals
from the population are selected randomly to
generate either a crossover or mutation
The population keeps the most fit individuals,
discarding the less fit as results arrive

12
Asynchronous Genetic Search Operators

Average
Traditional operator for continuous problems
Generated parameters are the average of two
randomly selected parents
Double Shot
Two parents generate three children
The average of the parents
A point outside the less fit parent, the same
distance from that parent as the average
A point outside the more fit parent, the same
distance from that parent as the average
Probabilistic Simplex
N parents generate one child
Points randomly along the line created by the
worst parent, and the centroid (average) of the
remaining parents

Performance Evaluation

14
Test Environments

GMLE implemented in SALSA/Java and MPI/C
Used 3 heterogeneous clusters on the RPI Grid
4 Quad-Processor PowerPCs (16 Processors)?
4 Quad-Processor Dual-Core Opterons (32
Processors)?
10 Quad-Processor Opterons (40 Processors)?
Used two BlueGene/L partitions
128 node (128 processors, 256 in virtual mode)?
512 node (512 processors, 1024 in virtual mode)?

Grid Testbed
OPT 4x1
10x 2.2GHz Quad-processor Single coreOpteron
PPC
4x 1.7GHz Quad-processor Single-core PowerPC
LAN
WAN
OPT 4x2
4x 2.2GHz Quad-processor Dual-core Opteron
15
Computation Time, Grid BlueGene/L
2 Minute Evaluation MLE requires 10,000
Evaluations 15 Day Runtime
100x Speedup 1.5 Day Runtime
230x Speedup lt1 Day Runtime
16
Asynchronous Search Performance

Performance of iterative and asynchronous genetic
search was tested on the BlueGene, and
asynchronous genetic search on BOINC using the
astronomy application
Average operator used for Iterative GS and
asynchronous GS on the BlueGene
Double Shot, and Simplex (N 2..5) on the
BlueGene and BOINC
Note IGS and AGS (Average) on the BlueGene used
an older version whose optimum was 3.025, while
the DS and Simplex had an optimum of 2.987.

17
Iterative Genetic Search (Average)?
18
Asynchronous Genetic Search (Average)?
19
Double Shot and Simplex on BlueGene
20
Double Shot and Simplex on BOINC
21
Performance Conclusions

Iterative genetic search had the worst
convergence rate, and asynchronous genetic search
(using the average operator) provided a
significant improvement.
Using the double shot operator provided even
faster convergence times.
Using the probabilistic simplex operator provided
the fastest convergence times, which improved as
more parents were used to calculate the centroid.
Asynchronous search on BOINC did not converge as
quickly as on the BlueGene (due to many
individuals being calculated concurrently, and
highly heterogeneous report times), however it is
still competitive considering more computational
power is available.

22
Simplex Operator Utility Evaluation

The usefulness of the simplex operator was tested
on the BlueGene and BOINC
This was calculated as the percentage of
individuals that were inserted into the
population
Points were generated along the line between -1.5
to 1.5 times the distance from the worst to the
centroid, around the centroid (ie, -1.0 is the
worst parent, 1.0 is the reflection).
For BOINC, the number of updates to the
population that occurred while an individual was
being evaluated was also taken into consideration.

23
BlueGene Insert Percentage Evaluation
24
Updated in less than 100 Evaluations
25
Updated within 101 .. 200 Evaluations
26
Updated within 201 .. 400 Evaluations
27
Updated within 401...800 Evaluations
28
Utility Conclusions

Between -1.5 and 0.5 had the highest insert
percentage
Points generated closer to the reflection (-0.5
.. -1.5) retained their usefulness more than
other points with long result reporting times
Even with a long time to report, results still
had good chances to improve the population

29
Insert Position Evaluation

The positions which individuals were placed in
the population was examined on the BlueGene and
BOINC
The lower the position, the higher the fitness of
the individual, and the more improvement to the
population
For BOINC, the effect of calculation time (in
terms of the number of individuals received
between generation time and result report time)
was also considered.

30
BlueGene Insert Position Evaluation
31
Inserted in less than 100 Evaluations
32
Inserted within 101 .. 200 Evaluations
33
Inserted within 201 .. 400 Evaluations
34
Inserted within 401 .. 800 Evaluations
35
Insert Position Conclusions

Points generated within 0.5 .. -1.5 proved to be
the best as well
Points generated near the centroid (0.5 .. -0.5)
tended to provide the best improvement for fast
result report times
As the result reporting time increased, points
generated near the reflection (-0.5 .. -1.5)
began to be better than those near the centroid

36
Conclusions

The test application used is highly expensive,
but incomplete
Calculation only done over a single wedge for a
single test model
Higher Accuracy required
Can be improved by more detailed integral
calculation, which increases computational time
polynomially
Calculating the convolution for each point
increases computation time by 30x or more.
More computational power is very enabling
Faster turn-around times means models and data
can be tested quicker, streamlining the
scientific cycle
Also allows for more detailed models for richer
research

37
Future Work

Evaluating the convergence rates of the different
search methods on different architectures and
evaluation frameworks with multiple applications.
Expanding the available search methods and
testing new genetic search operators.
Continued collaboration with various scientific
disciplines to examine how different types of
scientific computation will scale and utilize
these search methods.

http//www.nasa.gov
38
Contact Information