Spatial Decomposition and Pairlist Management for Molecular Dynamics presentation

About This Presentation

Title:

Spatial Decomposition and Pairlist Management for Molecular Dynamics

Description:

Anton Arkhipov, Peter L. Freddolino, Lingling Miao, Leonardo Trabuco ... Anton Arkhipov, Peter L. Freddolino, Lingling Miao, Leonardo Trabuco ... –

Number of Views:78

Avg rating:3.0/5.0

Slides: 25

Provided by: coursesEc

Category:

more less

Transcript and Presenter's Notes

Title: Spatial Decomposition and Pairlist Management for Molecular Dynamics

1
Spatial Decomposition and Pairlist Management for
Molecular Dynamics
2
Molecular Dynamics (MD)
Potential function Utotal is derived from quantum
mechanical calculations. Divided into bonded,
short range nonbonded, and long range
electrostatic components.
3
The goal NAMD-GPU

NAMD is a fast, highly parallel MD program
developed by the Schulten group
Short-range nonbonded interactions already
implemented on GPU
Speedups limited as other portions of calculation
become bottleneck

From J. Stone et al., JCC 28(16)2618-2640 (2007)
4
Pairlists in NAMD

Short-range nonbonded interactions must be
calculated for all pairs within cutoff
O(n2) if all distances calculated every step
Instead, maintain a pairlist of all atoms within
a certain distance O(n rp3)

5
Tasks for Pairlists

Spatial binning -- divide atoms into a set of
patches (cubes) of side length rpairlist to limit
total number of comparisons
Pairlist generation -- for each atom, check
distance to all atoms in neighboring patches and
make a list of atoms that are within rpairlist
Pairlist update checking -- at every MD time
step, check to see if any atom have moved further
than (rpairlist - rcut) / 2
Pairlist generation functions can also be used
for MD postprocessing

6
Tasks for Pairlists
7
Spatial Binning

Assign each atom to a patch based on its position
Arrange atoms for efficient memory access during
pairlists generation

8
Binning Functions
1. Make a new coordinates array sorted by
increasing atomID
atomID
11 0 7 1
x11 y11 z11 x0 y0 z0 x7 y7 z7 x1 y1 z1
coords
x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3
coords2
2. Bin atoms using atomID and coords (independent
of step 1).
atomID
11 0 7 1
5 1 1 8
Patch (atom k is in patch Patchk)
3. Sort Patch and atomID based on patchID
0 0 0 0 1 1 1 1 2 2
Patch
20 3 2 17 0 7 5 15 19 14
atomID
9
Binning Functions (Cont.)
4. Process Patch array to calculate per-patch
memory offsets 5. Process offsets to calculate
number of atoms in each patch
6. Sort coords2 based on patchID
x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3
coords2
0 0 0 0 1 1 1 1 2 2
Patch
20 3 2 17 0 7 5 15 19 14
atomID
x20 y20 z20 x3 y3 z3 x2 y2 z2
coords
10
Coalesced Memory Operations
A common and efficient pattern for operating on
data in R3.
11
Binning Optimization
12
Binning Timing
13
Pairlist Generation

Any atom within the neighboring patches might be
a pair atom
When the distance between two atoms ltrpairlist,
they are considered to be a pair

14
Pairlist Generation Implementation

The density of our systems is uniform, i.e. each
atom has 1300 pair atoms
Preset a certain pairlist length (PL_SIZE) for
all the atoms in the system, which can be changed
by the user
A flag is there to test when the given PL_SIZE is
not big enough

15
Pairlist Generation Implementation

Using registers
Tiled data into shared memory
Using coalesced reading from global memory
Also some parameters of the system are stored in
constant memory

16
Pairlist Generation Optimization
17
Pairlist Generation Timing
18
Pairlist Update Checking
Time step

For all atoms, check whether any has moved
(rpairlist rcut)/2 since last update

If any atom has, run a complete binning and
pairlist cycle

Occurs once per 10 steps in MD

19
Plupdate Check Implementation
Coalesced memory reads
20
Plupdate Check Implementation, Multiple Loads
Global ? Registers
Repeat ELEM_PER_BLOCK times
Repeating loads allow one to cover global memory
latency
Registers ? Shared
Global ? Registers
Computation
Execution
21
Plupdate Check Optimization
22
Plupdate Check Timing
23
Conclusions

Pairlist generation and pairlist update check
both yielded good speedups (5-10x)
Binning on CPU much faster due to efficiency of
cache requires further study
Functions generated here are fast enough to be
usefully incorporated into NAMD-GPU
Other molecular modeling applications (eg., VMD)
can also make use of them