Latency Tolerance Through Parallelization of Time in Scientific Applications - PowerPoint PPT Presentation

About This Presentation

Title:

Latency Tolerance Through Parallelization of Time in Scientific Applications

Description:

Latency Tolerance Through Parallelization of Time in Scientific Applications ... simulation similarly, in terms of coefficients b and perform least squares fit ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 21

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Latency Tolerance Through Parallelization of Time in Scientific Applications

1
Latency Tolerance Through Parallelization of Time
in Scientific Applications

Ashok Srinivasan
Computer Science
Florida State University

Namas Chandra Mechanical Engineering Florida
State University
Aim Long time scales on small physical
systems Solution features Time parallelization
to avoid fine granularity
www.cs.fsu.edu/asriniva
2
Outline

Application
Time parallelization
Prediction of a Carbon nanotube state
Experimental results
Conclusions and future work

3
Applications

Small physical systems for long time scales
Class of applications considered
State(Ti) F(StateTi-1)
Inherently sequential
Example
Molecular dynamics simulations of Carbon
nanotubes
Time step size 10-15 second
After a million steps, we are still only in the
nanosecond range
Even that requires about a day of sequential
computing time for around 3000 atoms
Spatial parallelization will lead to too fine a
granularity

4
CNT application

Pull the CNT at a constant velocity
Performed to determine material property
Material response can be used by an FEM
simulation in a multiscale model

5
Time parallelization

Based on a predict-verify approach
Use results of old simulations to speed up the
current simulation
Relationship between different problem parameters
often occurs in engineering
Example Temperature and time, stress and time
Find a relationship and use it to predict the
state at different times
The relationship is determined automatically, and
updated dynamically

6
Guided simulations

Notation
r Exact time/ Parallel overhead
P of Procs
a Progress rate
Speedup
P a /(11/r)
P a
If prediction and communication overheads are
relatively small
P Time steps
a ? 1/P,1
Requires all-reduce and broadcast

7
Fault tolerance too

In case of node failure, another processor fills
in the missing time interval
Other computations need not be discarded
Efficiency close to 1
For large P
Excluding loss in efficiency from errors
If communication cost is negligible
A master-worker design me be useful sometimes

Master
t1
t3
t4
t2
P3
P1
P2
P4
8
Fault tolerance

In case of node failure, another processor fills
in the missing time interval
Other computations need not be discarded
Efficiency close to 1
For large P
Excluding loss in efficiency from errors
If communication cost is negligible
A master-worker design me be useful sometimes

Master
t2
t5
t6
P3
P1
P2
P4
9
Requirements for this approach

Method for predicting a state
Criterion for determining whether two states
(predicted and actual) are similar
Choice of suitable base (old) simulation

10
Prediction of a Carbon nanotube state

Definition of equivalence of two states
Atoms vibrate around their mean position
Consider states equivalent if difference in
position, potential energy, and temperature are
within the normal range of fluctuations

Max displacement 0.211
Mean displacement 0.0789
Potential energy fluctuation 0.35
Temperature fluctuation 12.5 K

Displacement (from mean)
Mean position
11
Prediction

Predictor
Independently predict change in each coordinate
Normalize coordinates to be in 0,1
x tDt x t x tDt Dt
x tDt is the rate of change of x in this time
interval
It is unknown and needs to be estimated

12
Predict change in coordinates

Express x in terms of basis functions
Example
x tDt a0, tDt a1, tDt x t
a0, tDt, a1, tDt are unknown
Express changes, y, for the base (old) simulation
similarly, in terms of coefficients b and perform
least squares fit
Predict ai, tDt as bi, tDt R tDt
R tDt (1-b) R tDt b(ai, t- bi, t)
Intuitively, the difference between the base
coefficient and the current coefficient is
predicted as a weighted combination of previous
weights
We use b 0.5
Gives more weight to latest results
Does not let random fluctuations affect the
predictor too much
Velocity estimated as latest accurate results
known

13
Experimental results

Experimental parameters
Carbon nanotube with 1000 atoms
Around 200 atoms in the beginning fixed
Around 200 atoms at the end moved
deterministically
Time step size 0.5 femto seconds
Time interval per processor 1000 time steps
Tersoff-Brenner potential for MD
300 K temperature current 10 K base
b 0.5
Base simulation v 0.05A/1000 time steps
Actual simulation v 0.0625A/1000 time steps
A parallel run was simulated

14
Errors on 50 processors
Threshold for accepting the results
Difference between predictor and verifier
15
Errors on 50 processors
Threshold for accepting the results
Difference between predictor and verifier
16
Errors on 50 processors
Threshold for accepting the results
Error
Energy
Difference between predictor and verifier
17
Errors on 50 processors
Temperature
Threshold for accepting the results
Error
Difference between predictor and verifier
18
Speedup
Expected based on progress rate
Observed in simulations

Computation time for one time interval 10 s
Prediction time 10-3 s
Broadcast on 100 processors of IBM SP3 0.005 s
Allreduce on 100 processors of IBM SP3 0.0005 s

Overhead/computation is ratio negligible, and
speedup is determined only by errors
19
Limitations of the experiments

They are simulations of a parallel implementation
But large difference between computation and
communication time suggests efficient
implementation

20
Conclusions and future work

Conclusions
Promises significant improvement in speedup and
efficiency for long-time simulations, through
latency and fault-tolerance
Future work
Implementation on a parallel machine
Base simulations with a smaller time scale
Better predictors
Basis functions corresponding to physical
phenomena likely to be experienced
Use clustering techniques to determine phenomena
experienced by different regions
Automatically and dynamically determine a
suitable base to use from a large set of
existing results