Title: Coupling Parallel Programs via MetaChaos
1Coupling Parallel Programs via MetaChaos
- Alan Sussman
- Computer Science Dept.
- University of Maryland
With thanks to Mike Wiltberger (Dartmouth/NCAR)
2What is MetaChaos?
- A runtime meta-library that achieves direct data
transfers between data structures managed by
different parallel libraries - Runtime meta-library means that it interacts with
data parallel libraries and languages used for
separate programs (including MPI) - Can exchange data between separate (sequential or
parallel) programs, running on different machines - Also manages data transfers between different
libraries in the same application - This often referred to as the MxN problem in
parallel programming (e.g. CCA Forum)
3How does MetaChaos work?
- It all starts with the Data Descriptor (ESMF
state) - Information about how the data in each program is
distributed across the processors - Usually supplied by the library/program developer
- We are working on generalizing to work with
complex data distributions - MetaChaos then uses a linearization (LSA) of the
data to be moved (the regions) to determine the
optimal method to move data from set of regions
in A (SA ) to a set of regions in B (SB)
- Moving the data is a three step process
LSA lProgX(SA)LSB LSASB l-1ProgY(LSB)
- Only constraint on this operation is each region
must have the same number of elements
4MetaChaos goals
- Main goal is minimal modification to existing
programs - To enable a program to be coupled to others, add
calls to - describe data distribution across processors
build a data descriptor - describe data to be moved (imported or exported)
build set of regions - move the data build a communication
pattern/schedule, then use it - this is the part that requires interaction with
the other program
5MetaChaos goals
- Other main goal is low overhead and efficient
data transfers - Low overhead from building schedules efficiently
- take advantage of characteristics of data
descriptor - Efficient data transfers via customized
all-to-all message passing between source and
destination processes
6More details
- Bindings for C/C, Fortran77, Fortran90 coming
(data descriptor issues) - similar interface to MCEL, but get direct
communication (no server) - Currently message passing and program
interconnection via PVM - programs/components run on whatever
- heading towards Globus and other Grid services
- Each model/program can do whatever it wants
internally (MPI, pthreads, sockets, ) and
startup by whatever mechanism it wants (CCSM)
7A Simple Example Wave Eq Using P
include ltA.hgtmain(int argc, char argv)
Optimization_ManagerInitialize_Virtual_Machine("
",iNPES,argc,argv) doubleArray
daUnm1(iNumX2,iNumY2),daUn(iNumX2,iNumY2)
doubleArray daUnp1(iNumX2,iNumY2) Index
I(1,iNumX), J(1,iNumY) // Indices for
computational domain for(j1jltiNumY1j)
daUnm1(I,j) sin(dWdTime (daX(I)2dPi)/dLen
X) daUn(If,j) sin(dW0
(daX(If)2dPi)/dLenX) // Apply BC
omitted for space // Evolve a step forward in
time for(i1iltiNStepsi) daUnp1(I,J)
((dCdCdDTdDT)/(dDXdDX))
(daUn(I-1,J)-2daUn(I,J)daUn(I1,J))
2daUn(I,J) - daUnm1(I,J) // Apply
BC Omitted for space Optimization_Manager
Exit_Virtual_Machine()
8Split into two using MetaChaos
include ltA.hgtmain(int argc, char argv)
Optimization_ManagerInitialize_Virtual_Machine("
",NPES,argc,argv) this_pgm
InitPgm(pgm_name,NPES) other_pgm
WaitPgm(other_pgm_name,NPES_other)
Sync2Pgm(this_pgm,other_pgm) BP_set
Alloc_setOfRegion() left0 4 right0 4
stride0 1 left1 5 right1 5
stride0 1 reg Alloc_R_Block(DIM,left,righ
t,stride) Add_Region_setOfRegion(reg,BP_Set)
BP_da getPartiDescriptor(daUn) sched
ComputeScheduleForSender(,BP_da,BP_set,)
for(i1iltiNStepsi) daUnp1(I,J)
((dCdCdDTdDT)/(dDXdDX))
(daUn(I-1,J)-2daUn(I,J)daUn(I1,J))
2daUn(I,J) - daUnm1(I,J)
iDataMoveSend(other_pgm,sched,daUn,getLocalArray()
.getDataPointer) iDataMoveRecv(other_pgm,sche
d,daUn,getLocalArray().getDataPointer)
Sync2Pgm(this_pgm,other_pgm)
Optimization_ManagerExit_Virtual_Machine()
9We are using MetaChaos and Overture for Space
Science
10Space weather framework
- A set of tools/services
- not an integrated framework
- To allow new models/programs to interoperate
(exchange data) with ones that already use the
tools/interfaces - Application builder plugs together various
models, specifies how/when they interact
(exchange data) - There are already at least 5 physical models
currently, with more to come - from CISM (Center for Integrated Space Weather
Modeling, let by Boston U.)
11What are we working on now?
- Adding generalized block data distributions and
completely irregular, explicit distributions - Infrastructure for controlling interactions
between programs - the tools for building coupled applications to
run in the high performance, distributed,
heterogeneous Grid environment not just a
coordination language - built on top of basic Grid services (Globus, NWS,
resource schedulers/co-schedulers, etc.)
12(No Transcript)
13What is Overture?
- A Collection of C Classes that can be used to
solve PDEs on overlapping grids - Key Features
- High level interface for PDEs on adaptive and
curvilinear grids - Provides a library of finite differences
operators - Conservative/NonConservative
- 2nd and 4Th order
- Uses A/P array class for serial parallel
array operations - Extensive grid generation capablities
14Overture A toolkit for solving PDEs