Title: Henri Casanova
1 The NPACI MCell project applications, software,
research, and impact.
- Henri Casanova
- Grid Research And Innovation Laboratory (GRAIL)
- San Diego Supercomputer Center
- Computer Science and Engineering Dept.
- University of California, San Diego
2The MCell Project
- MCell Monte Carlo Cell simulator
- Developed at Salk and PSC
- Gain knowledge about neuro-transmission
- Fundamental for drug design (psychiatry)
- Large user base (yearly MCell workshop)
- Parallel MC simulations at the molecular level
3Activated receptors
4The MCell application
5A General Model
Input data
Tasks
Raw Output
Post-processing
Final Output
6Grid Computing
Grid Software Inftrastructure
Enable Resource Sharing among users,
applications, institutions
7MCell on the Grid
- Feasible
- Loosely coupled application
- Can exploit enormous amounts of resources
- Challenges
- Scheduling?
- How to make decisions to assign computation/data
to resources? - Logistics of application deployment
- Deployment by hand and with ad-hoc scripts will
not scale
8Scheduling MCell
9Scheduling of PSAs
10List Scheduling with Dynamic Priorities
- We leverage previous work on list scheduling with
Dynamic priority - We added a notion of adaptivity
- We added a notion of data locality
- We developed a new heuristic (XSufferage)
- We evaluated heuristics in simulation
- We demonstrated
- Effective use of data replication/locality for
performance - Robustness to performance prediction errors and
performance fluctuations due to adaptivity - Casanova et al., HCW00
11Experimental Validation
Casanova et al., SC01
12APST Deployment Software
13Parameter Sweep Applications
- MCell is but a representative
- Large number of computational tasks
- Little synchronization
- High performance
- Potentially large data-sets
- Potentially parallel sub-tasks
- PSAs arise in many fields of Science and
Engineering
14APST Prototype software
- Transparent Deployment Make it easy for users to
launch/monitor PSAs over common Grid
infrastructure (Globus, GridFTP, MDS, Condor,
etc.) - Automatic Scheduling Achieve high performance
with available resources - Simple Interface XML-based to describe
application and resources - APST has been used for MCell in pseudo-production
since 2001 (supported by NPACI) - Casanova et al., IJHPCA01 Casanova et al.,
Grid02
15APST Resource Description (1)
- Descriptions of sites and storage
- ltstoragegt
- ltdisk iddisk1 datadir/home/data/gt
- ltgridftp serverstorage.site1.edu /gt
- lt/diskgt
- ltdisk iddisk2gt
- ltcp serverstorage.site2.edu /gt
- lt/diskgt
- lt/storagegt
16APST Resource Description (2)
- Descriptions of compute hosts
- ltcomputegt
- lthost idhost1 diskdisk1gt
- ltglobus serverhost1.site1.edu /gt
- lt/hostgt
- lthost idhost2 diskdisk2gt
- ltglobus serverhost2.site2.edu procs40 /gt
- lt/condorgt
- lt/hostgt
- lt/computegt
17APST Resource Description (3)
- Descriptions of information sources
- ltgridinfogt
- ltnws servernws.site1.edu /gt
- ltmds servermds.site2.edu /gt
- ltmds servermds.globus.org /gt
- lt/gridinfogt
18APST App. Description
- Description of application tasks
- lttasksgt
- lttask executablemcell argumentsdfp
inputdfp.mdl outputdfp.out
stderrdfp.err cost10 /gt - lttask executablemcell argumentshbtx
inputhbtx.mdl outputhbtx.out
stderrhbtx.err cost2 /gt - lt/tasksgt
19APST Implementation
20The Virtual Instrument Project
21VI Goals
- To build a Grid execution environment for MCell
that provides - Computational Steering
- Database for managing application data
- User interface / portal
22VI Software Architecture
Tomography
Create MCell/VI Project
Electron Microscopy
23VI Software Architecture
compute
Grid Services
process
data
storage
Grid Storage and Compute Resources
24VI Software Architecture
DReAMM (OpenDX)
25SC02 Demo
BlueHorizon SP SDSC (California)
My Laptop SC02 (Baltimore)
Presto III TITECH (Japan)
GRAIL Lab UCSD (California)
Meteor Cluster SDSC (California)
26APST in use
27APST Broader Impact
- APST provides an easy way to deploy applications
on the Grid and is being used by an ever-larger
user community
28New Research Question?
29Divisible Workload
Why cant APST partition the app workload by
itself?
- Divisible Load Scheduling (Robertazzi 1996) How
to partition the workload to maximize
performance? - Trade-off
- Large chunks
- low overhead
- low communication/computation overlap
- sensitivity to performance prediction errors
- Small chunks
- high overhead
- high communication/computation overlap
- robustness to performance prediction errors
30Our Contribution
- UMR algorithm Yang, Casanova, IPDPS03
- Increases chunk size throughout execution
- Uses more realistic model than previously
proposed algorithms - Uses a number of restrictions to cope with the
model - And yet outperforms previously proposed
algorithms - Robust UMR RUMR Yang, Casanova, HPDC03?
- Increases and then decreases chunk size
throughout execution - Outperforms previously proposed algorithms in the
presence of performance prediction errors - Currently being implemented as part of APST
31Conclusion and Futures
32Summary Progress Flow
CS Research (Scheduling)
Prototype Software (APST 1.0)
Application (MCell)
EOL, Viztools, etc.
2001
1999
2000
2002
CS Research (Steering)
Production Software (APST 2.0)
2002
2002
CS Research (Divisible Load Scheduling)
Prototype Software (VI)
2003
2003
33Future Work
- Deployment of EOL
- Prototype EOL/APST version in place
- Large-scale demo for SC03
- Releases of APST - v2.0 just released
- NPACKAGE release
- NMI release
http//grail.sdsc.edu