Title: Flexibility and Interoperability in a Parallel MD code
1Flexibility and Interoperability in a Parallel MD
code
- Robert Brunner,
- Laxmikant Kale,
- Jim Phillips
- University of Illinois at Urbana-Champaign
2Contributors
- Principal investigators
- Laxmikant Kale, Klaus Schulten, Robert Skeel
- Development team
- Milind Bhandarkar, Robert Brunner, Attila Gursoy,
Neal Krawetz, Jim Phillips, Ari Shinozaki, ...
3Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
4(No Transcript)
5What is needed
- Not application centered CS research
- Not isolated CS research
- Application oriented yet Computer Science
centered research, that will enhance the enabling
layers in the middle
6Challenges in Parallel Applications
- Scalable High Performance
- To a small and large number of processors
- Small and large molecular systems
- Modifiable and extensible design
- Ability to incorporate new algorithms
- Reusing new libraries without re-implementation
- Experimenting with alternate strategies
- How to achieve both simultaneously
7Suggested OO Approach
- Dynamic irregular applications
- Use multi-domain decomposition
- (multiple tasks assigned to each processor)
- Data driven scheduling
- Migratable objects
- Use registration and callbacks to avoid
hardwiring of object connections on a processor - Measurement based migration/load balancing
8Suggested Approach contd.
- Use most appropriate parallel programming
paradigm for each module - Reuse existing libraries, irrespective of
language/paradigm in which it is implemented - Need support for multiparadigm interoparibilty
- (Supported by Converse)
- Careful class design and use of C features
9Molecular Dynamics
- Collection of charged atoms, with bonds
- Newtonian mechanics
- At each time-step
- Calculate forces on each atom
- bonds
- non-bonded electrostatic and van der Waals
- Calculate velocities and Advance positions
- 1 femtosecond time-step, millions needed!
- Thousands of atoms (1,000 - 100,000)
10Further MD
- Use of cut-off radius to reduce work
- 8 - 14 Ã…
- Faraway charges ignored!
- 80-95 work is non-bonded force computations
- Some simulations need faraway contributions
11NAMD Design Objectives
- Scalable High Performance
- To a small and large number of processors
- Small and large molecular systems
- Modifiable and extensible design
- Ability to incorporate new algorithms
- Reusing new libraries without re-implementation
- Experimenting with alternate strategies
12Force Decomposition
Distribute force matrix to processors Matrix is
sparse, non uniform Each processor has one
block Communication N/sqrt(P) Ratio
sqrt(P) Better scalability (can use 100
processors) Hwang, Saltz, et al 6 on 32 Pes
36 on 128 processor
Not Scalable
13Spatial Decomposition
14Spatial decomposition modified
15Implementation
- Multiple Objects per processor
- Different types patches, pairwise forces, bonded
forces, - Each may have its data ready at different times
- Need ability to map and remap them
- Need prioritized scheduling
- Charm supports all of these
16Charm
- Data Driven Objects
- Object Groups
- global object with a representative on each PE
- Asynchronous method invocation
- Prioritized scheduling
- Mature, robust, portable
- http//charm.cs.uiuc.edu
17Data driven execution
Scheduler
Scheduler
Message Q
Message Q
18Object oriented design
- Two top level classes
- Patches cubes containing atoms
- Computes force calculation
- Home patches and Proxy patches
- Home patch sends coordinates to proxies, and
receives forces from them - Each compute interacts with local patches only
19Compute hierarchy
- Many compute subclasses
- Allow reuse of coordination code
- Reuse of bookkeeping tasks
- Easy to add new types of force objects
- Example steered molecular dynamics
- Implementor focuses on the new force functionality
20Multi-paradigm programming
- Long-range electrostatic interactions
- Some simulations require this feature
- Contributions of faraway atoms can be computed
infrequently - PVM based library, DPMTA
- Developed at Duke, by John Board, et al
- Patch life cycle
- better expressed as a thread
21Converse and Interoperability
- Supports multi-paradigm programming
- Provides portability
- Makes it easy to implement RTS for new paradigms
- Several languages/libraries
- Charm, threaded MPI, PVM, Java, md-perl, pc,
nexus, Path, Cid, CC,..
22Namd2 with Converse
23Separation of concerns
- Different developers, with different interests
and knowledge, can contribute effectively - Separation of communication and parallel logic
- Threads to encapsulate life-cycle of patches
- Adding new integrator, improving performance, new
MD ideas, can be performed modularly and
independently
24Load balancing
- Collect timing data for several cycles
- Run heuristic load balancer
- Several alternative ones
- Re-map and migrate objects accordingly
- Registration mechanisms facilitate migration
- Needs a separate talk!
25Performance size of system
26Performance various machines
27Speedup
28Conclusion
- Multi-domain decomposition works well for
dynamically evolving, or irregular apps - When supported by data driven objects (Charm),
user level threads, call backs - Object oriented parallel programming
- promotes reuse ,
- good performance
- Multi-paradigm programming is effective!
- Measurement based load balancing
29What works?
- To effectively parallelize irregular/dynamic
applications - decompose into multiple entities per processor
- use adaptive scheduling via data-driven objects
- object migration based load balancing
- registration and call backs to make migrations