Title: Managing Biomolecular Simulations in a Grid Environment with NAMDG
1Managing Biomolecular Simulations in a Grid
Environment with NAMD-G
- Michelle Gower, Jordi Cohen,
- James C. Phillips, Rick Kufrin, Klaus Schulten
- University of Illinois at Urbana-Champaign
- National Center for Supercomputing Applications
- Theoretical and Computational Biophysics
GroupNIH Resource for Macromolecular Modeling
and Bioinformatics
2Overview
- Scientific Motivation Hydrogenase O2 Problem
- Introduction to NAMD-G software
- Underlying Grid Middleware
- Technical Challenges/Lessons Learned
- NAMD-G Accomplishments
- Future Work
- Closing Remarks
3The Hydrogenase O2 Problem
- O2 permanently deactivates hydrogenase.
- Can we engineer O2 tolerance?
- We dont know how the O2 gets to the active
site. - Led to development of a method to study gas
migration pathways in proteins
?
Image created with VMD (http//www.ks.uiuc.edu/Res
earch/vmd)
4Gas Migration Pathways
- Opportunity to study gas migration pathways in
other proteins. - This means running many biomolecular
simulations. - Managing these simulations becomes a problem.
Image created with VMD (http//www.ks.uiuc.edu/Res
earch/vmd)
Sperm Whale MyoglobinO2 Accessibility
5NAMD
- Highly-scalable, high-performance molecular
dynamics code for large biomolecular simulations
(typically 8-512 processors) - Developed by the Theoretical and Computational
Biophysics Group (TCBG University of Illinois at
Urbana-Champaign) - NAMD can be told to output restart files.
6Computation
- A simple simulation consists of the following
sequence of NAMD runs - 2 pre-equilibration runs
- An equilibration run (1ns)
- A production run (6ns)
- A scientist might also want to continue
simulations for more timesteps or restart
simulations from interesting points with
different parameters.
7Typical Tasks for a Run
1. Store input files on MSS
Local Workstation
2. Submit remote batch job
6. Retrieve restart and
output files
3. Retrieve input and restart files
4. Execute NAMD
5. Store output and restart files
Mass Storage System
Remote HPC Machine
- Images of remote HPC machine (Mercury) and mass
storage system (UniTree) courtesy of the National
Center for Supercomputing Applications (NCSA) and
the Board of Trustees of the University of
Illinois
8Nanny for NAMD
NAMD-G
- NAMD-G is a grid-based automation engine for
biomolecular simulations. - Given input files and a description of the
simulation, NAMD-G submits remote batch jobs to
the specified remote system, handling the
transfers of input, output, and restart files. - If a job dies due to hitting the wallclock limit,
NAMD-G automatically submits another job until
the run is complete.
9NAMD-G Commands
- NAMD-G is a set of scripts with specific
knowledge of NAMD wrapped around existing generic
grid middleware.
- Submit a simulation ngsubmit RUNFILE
- Monitor a simulation ngstat
- Delete a simulation ngdel ID
- Restart a simulation ngrestart
10Pre-defined Runs
- There are pre-defined runs that will
automatically work for any system. - This greatly reduces the learning curve for
someone to start using NAMD-G. - They are very modular so the scientist can pick
which ones they want to use.
11Underlying Grid MiddlewareAuthentication
- Globus Toolkit
- Is an open source set of software developed by
the Globus Alliance that can be used to build
Grid applications. - Globus Toolkit Security Component
- GSI-Authentication
- Use proxy certificates instead of passwords or
ssh keys
12Underlying Grid MiddlewareData Transfers
- uberFTP
- GridFTP-enabled interactive client
- Developed by NCSA
- Globus Toolkit
- Pre-WS GridFTP Services
GridFTP Server
GridFTP Server
Mass Storage
Remote HPC Machine
13Underlying Grid MiddlewareJob Submission
Monitoring
- Condor
- Management system developed by the Condor Team,
led by Miron Livny, at the University of
Wisconsin-Madison. - Condor Condor-G
- Uses Globus Toolkit behind the scenes to submit
jobs to remote machines - Globus Toolkit GRAM component
- Pre-WS GRAM Service
Gatekeeper
Batch Jobmanager
Batch System
Remote HPC Machine
14Underlying Grid MiddlewareWorkflow Management
- Condor DAGMan
- Allows the user to specify ordering of jobs
- DAGMan keeps track of which jobs have been
successfully completed. Upon failure, it writes
a file allowing the user to easily restart it at
the failed job. - DAGMan can be told to repeat a job.
Job A
Job B
Job C
15Underlying Grid MiddlewareDAGMan - Single Run
- Pre
- Copy internal files to remote machine
- NAMD job
- Retrieve input and restart files
- Run NAMD
- Post
- Transfer output files to MSS
- Transfer output files to local machine
- Check whether run has completed
- Notify user via email
16Underlying Grid MiddlewareAuthentication Part
Two
- Globus Toolkit MyProxy
- Open source project started by NCSA to provide an
online credential repository - Condor-G can automatically renew a proxy using
MyProxy
17Grid Middleware Summary
- Local Workstation
- Globus Toolkit (no services)
- Condor
- uberFTP
- Remote HPC Machine
- Globus Toolkit Pre-WS services GridFTP, GRAM
- uberFTP
- Mass Storage System
- Globus Toolkit Pre-WS services GridFTP
18Technical Challenges/Lessons Learned
- Local machines were behind firewalls with minimal
open incoming ports - Could not use built-in file staging, had to write
code to push input files and pull output files - Messages from remote batch submission command not
sent back to local machine. - Discovered jobmanager problems
- Some Globus installations did not correctly use
single jobtype with processor count greater
than 1 - Difficulty distinguishing jobs on remote machine
- Currently cannot set batch jobname through Globus
- Currently cannot get batch jobid through Globus
19Technical Challenges cont.
- NAMD-G portability issues
- Shell script portability was an issue
- Different RSL has to be created depending upon
remote machine - Not all HPC machines have
- remote GridFTP access to home and scratch
directories. - a fork jobmanager.
- access to external MSS from compute nodes
- uberFTP installed
20NAMD-G Accomplishments
- NAMD-G developed hand-in-hand with a pilot
science project - Completed projects on gas conduction
- Comparative O2 pathways in 15 globins, from
plants to insects to mammals. - O2 pathways in two high-profile proteins
hydrogenase and copper amine oxidase. - Ongoing simulation of ribosome
hydrogenase
NAMD-G saves time, especially time spent on
mindless, boring, and error-prone tasks. with
a minimal initial investment, NAMD-G makes
simulations even more convenient than I dared
hope to initially. - Dr Emma Falck, Beckman
Fellow
Cohen, et al., Biophys. J. 91 (Sept. 2006)
soy leghemoglobin
21Future Work
- Allow simulations to be easily continued for more
timesteps
- Allow simulations to be easily branched from
other simulations
- Create NAMD-G configuration files at the system
and user levels.
22Closing Remarks
- Using existing grid middleware allowed for rapid
development of a functional system.
- NAMD-G is a perfect example of what can be
accomplished with tight collaboration where both
groups provide ideas, design and implementation
principles.
23Acknowledgements
- This work was supported in part by the National
Science Foundation grants SCI-0451538,
SCI-0504064, and SCI-0438712. Funding for the
Resource for Macromolecular Modeling and
Bioinformatics is provided by the National
Institutes of Health grant NIH P41 RR05969
24Links
- NAMD - www.ks.uiuc.edu/Research/namd
- Globus Toolkit - www.globus.org/toolkit
- Condor - www.cs.wisc.edu/Condor
- UberFTP - dims.ncsa.uiuc.edu/set/uberftp
- MyProxy grid.ncsa.uiuc.edu/myproxy
- NAMD-G - www.ks.uiuc.edu/Research/namdg
25Underlying Grid Middleware
- Authentication
- Globus Toolkit Security Component
- MyProxy
- Job Submission and Monitoring
- Condor Condor-G
- Globus Toolkit Pre-WS GRAM Services
- Workflow Management
- Condor DAGMan
- Data Transfer
- UberFTP
- Globus Toolkit Pre-WS GridFTP Service