Title: AstroGrid-D WP 5: Resource Management for Grid Jobs
1AstroGrid-DWP 5 Resource Management for Grid
Jobs
- Report by
- Rainer Spurzem
- (ZAH-ARI)
- spurzem_at_ari.uni-heidelberg.de
- and T. Brüsemeister, J. Steinacker
2Meeting 1310 1430 WG5
- Meeting WG5 and friends GridWay Discussion
(together with Ignacio Llorente) Expected List
of Topics - The Present Gridway Installation in Heidelberg -
Solutions and Problems. Which use cases work?
How? Demos or screenshots if available. - how about more than one gridway installation in
Astrogrid-D simultaneously at different sites? - cooperation of information system and job
submission (in general or in the special cases of
our Astrogrid-D information system and Gridway? - miscellaneous (data staging postponed to next
session)
3Meeting 1310 1430 WG5
- GridWay
- Leightweight Metascheduler on top of GT2.4/GT4
- Central Server Architecture
- Support of GGF DRMAA standard API for job
submission and management - Simple round robin/flooding scheduling algorithm,
but extensible
4Meeting 1310 1430 WG5
A practical example with screenshots
Information System
GT4 Resources
Matchmaking
hydra.ari.uni-heidelberg.de
Gridway
Scheduler / Broker
Job Status gwps
5Meeting 1310 1430 WG5
Our View (Thanks Hans-Martin)
6Meeting 1310 1430 WG5
- D5.1 central resource broker with queue
- Present use GridWay, throughway, round-robin
- More Installations useful?
- Questions
- Parameters needed Gridway Information System
- (queue status, module availability, data
availabilty, hardware) - When is it feasible to have a real brokerage?
How?
7Meeting 1515 1700 Use Cases
- Porting Use Cases onto the Grid NBODY6
Astrophysical Case for direct N-body Star
Clusters, Galactic Nuclei, Black Holes,
Gravitational Wave Generation - Special Hardware GRAPE, MPRACE (FPGA), future
technologies (HT, Xtoll, GRAPE-DR) - GRAPE in the Grid, Astrogrid-D, International
- DEISA
8Meeting 1515 1700 Use Cases
9N-Body Grav. Waves _at_ ARI Peter Berczik, Ingo
Berentzen, Jonathan Downing, Miguel Preto, Gabor
Kupi, Christoph Eichhorn David Merritt (RIT,
USA) in VESF/LSC collaboration on
gravitational wave modelling from dense star
clusters Pau Amaro-Seoane (AEI, Potsdam, D) G.
Schäfer, A. Gopakumar (Univ. Jena, D) M.
Benacquista (UT Brownsville, USA) Further
collaborations Sverre Aarseth (IoA Cambridge
UK) Seppo Mikkola (U Turku, FIN)
Jun Makino and colleagues in Tokyo support,
cooperation, over many years
10Globular Cluster ? Centauri (Central Region)
Ground Based View
11Detection of Gravitational Waves?
Was Einstein right?
12Example VIRGO Detector in Cascina near Pisa,
Italy
13Basic idea of any GRAPE N-body code
N
N2
14Hardware - GRAPE
128 Gflops for a price 5K USD Memory for up to
128K particles
GRAPE6a PCI board
GRAPE6a, -BL - PCI Board for PC-Clusters PROGRAPE-
4, FPGA based board from RIKEN (Hamada) GRAPE7
new FPGA based board from Tokyo Univ.
(Fukushige) GRAPE-DR new board from Makino et
al. NAOJ MPRACE1,2 FPGA boards from Univ.
Mannheim/GRACE (Kugel et al.)
15(No Transcript)
16ARI 32 node GRAPE6a clusters
- 32 dual-Xeon 3.0 GHz nodes
- 32 GRAPE6a
- 14 TB RAID
- Infiniband link (10 Gb/s)
- Speed 4 Tflops
- N up to 4M
- Cost 500K USD
- Funding NSF/NASA/RIT
- 32 dual-Xeon 3.2 GHz nodes
- 32 GRAPE6a
- 32 FPGA
- 7 TB RAID
- Dual port Infiniband link (20 Gb/s)
- Speed 4 Tflops
- N up to 4M
- Cost 380K EUR
- Funding Volkswagen/Baden-Württemberg
Infiniband Dual 20Gb/s
17ARI-ZAH RIT GRAPE6a clusters
Performance Analysis (3.2 Tflop/s) Harfst et
al. 2007, New Astron.
18(No Transcript)
19Hardware
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25Meeting 1515 1700 Use Cases
Software High Accuracy Integrators for Systems
with long-range force relaxation (gravothermal)
- S.J.Aarseth, S. Mikkola (ca. 20.000 lines)
- Hierarchical Block Time Steps
- Ahmad-Cohen Neighbour Scheme
- Kustaanheimo-Stiefel and Chain-Regular.
- for bound subsystems of Nlt6 (Quaternions!)
- 4th order Hermite scheme (pred/corr)
- Bulirsch-Stoer (for KS)
- NBODY6 (Aarseth 1999)
- NBODY6 (Spurzem 1999) using MPI/shmem, copy
algorithm - Parallel Binary Integration in Progress
- Parallel GRAPE Use (Harfst, Gualandris, Merritt,
- Spurzem, Berczik,
Portegies Zwart, 2007)
26Meeting 1515 1700 Use Cases
High Accuracy Integrators Record with GRAPE
cluster at 2 million particles!
?Harfst, Gualandris, Merritt, Spurzem, Berczik
?Baumgardt, Heggie, Hut Baumgardt, Makino
by D.C. Heggie Via www.maths.ed.ac.uk Larger N
needed!
27Meeting 1515 1700 Use Cases
ARI Cluster 3.2 Tlop/s sustained
Harfst, Gualandris, Merritt, Spurzem, Portegies
Zwart, Berczik, New Astron. 2007.
Parallel PP on GRAPE6a cluster
28Visualisation
With S. Dominiczak W. Frings John-von- Neumann In
stitute for Computing (NIC) FZ Jülich google
for xnbody
29Meeting 1515 1700 Use Cases
- Xnbody Visualization with FZ Jülich (Unicore)
- NBODY6 UseCase in Astrogrid-D (Globus GT4.0)
- Simple JSDL Job
ok - Parallel Job
GRAPE/MPRACE request - in
progress Astrogrid-D - Participation in international networks, like
MODEST, AGENA (EGEE) - Goal share and load-balance GRAPE/MPRACE
- resources in international grid-based frame
-
30International GRAPE-Grid Collaboration
Meeting 1515 1700 Use Cases
Members of Astrogrid-D ARI-ZAH Univ.
Heidelberg, D Main Astron. Obs. Kiev,
UA Candidates Univ. Amsterdam, NL Obs. Astroph.
Marseille, F Fessenkov Obs., Almaty, KZ
31Meeting 1515 1700 Use Cases
- Fortran 77 with cpp Preprocessor and make
- Data Access for Job Chain
- Staging of binary and ASCII input/output
- Optional
- Parallel Runs (PBS, mpich-mpif77, mpirun,
others) - GRAPE hardware
- xnbody direct visualization and interaction
interface - Future
- GridMPI, Runs across sites
32Meeting 1730 1830 WG5 with WG3
- Common Workgroup Meeting of WG3 (Distributed Data
Management) with WG5 (Resource Management for
Grid Jobs) Expected List of Topics - How can we improve data staging together? Which
steps, what is needed, action items, people? - Further Interaction with other WG's e.g. WG7 user
interfaces, WG6 Data Streaming, WG1 system
integration - Next deliverables 5.4-5.8, others...
- Open Discussion on sustainability,
internationality, EGEE, followup project,
breakout ideas, guided by Goals Last Year
33Meeting 1730 1830 WG5 with WG3
- How can we improve data staging together? Which
steps, what is needed, action items, people? - Use Astrogrid-D file management system?
34WP5 Resource Management for Grid Jobs Tasks
- Task V-1 Specification of Requirements and
Architecture - AIP (8), ARI-ZAH (6), ZIB (6), AEI (2), MPE
(2), MPA (1) - Start Sep. 05, Deliverable D5.1 Oct. 2006
COMPLETED - Task V-2 Development of Grid-Job Management
(Feb. 07) - ZIB (24), ARI-ZAH (12), MPA (5)
- Start June 06, Deliverable D5.2 Feb. 2007,
D5.6 June 2008 - 5.2 COMPLETED
- Task V-4 Adaptation of User- and Programmer
Interfaces (May 07) - AIP (18), ARI-ZAH (12), AEI (5), MPE (4),
MPA (1) - Start Dec. 06 Deliverable D5.4 May 2007,
D5.7 Sep. 2008 PENDING - Task V-3 Development Link to Robotic Telescopes,
Requests (Feb 07) AIP (17), ZIB (6) , Start Sep.
06 Deliverable D5.3 Feb. 2007, D5.5 Oct. 2007,
D5.8 Sep. 2008 IN PROGRESS -
35Meeting 1730 1830 WG5 with WG3
Next Steps in WG-5 / WG-3
- Short Term
- Improve the deployment by pushing the
implementation of modules - for at least 2-5 pioneer usecases (this year)
D5.4, 5.7. - Demonstrate the ability to deploy and run these
use case on more than one - resource using Gridway (this year) D5.4, 5.7.
- Use first primitive data staging (handing data
through). - Note Useful Document GridGateWay 2007-10-05
by HMA et al. - Middle Term
- Enable GridWay as AstroGrid-D job manager (May
08) D5.6 - Solve the problem how to handle data management
together - with Gridway (Aug 08) TA II-5
- increase number of use cases and prospective
users D5.4 - Improve international impact / compatibility
issues e.g. with EGEE
36wird vom open grid forum (OGF) unterstützt
WG5 Current status,Job Management
Entscheidung für die Job Submission Data Language
(JSDL)
wird vom open grid forum (OGF) unterstützt
jsdlproc
JSDL
RSL/XML
GUI
(GT4.2 wird gerade entwickelt und wird JSDL
direkt unterstützen)
GT4.0
37WG5 Current status, Scheduler/Broker
- GridWay
- Leightweight Metascheduler on top of GT2.4/GT4
- Central Server Architecture
- Support of GGF DRMAA standard API for job
submission and management - Simple round robin/flooding scheduling algorithm,
but extensible
38WG5 Current status,Scheduler/Broker
Information System
GT4 Resources
Matchmaking
hydra.ari.uni-heidelberg.de
Gridway
Scheduler / Broker
Job Status gwps
39WG5 Current status, Robotic Telescopes
STELLA-I
- First Steps accomplished toward the integration
into AstroGrid - Adopted the REMOTE TELESCOPE MARKUP LANGUAGE
(RTML) and developed a first description of
STELLA-I - This description can contain dynamic information
e.g. about weather - Developed a generic transformation from RTML to
RDF which we can upload to the AstroGrid
information service - (Therefore we modified the program OwlMap from
the FRESCO project) - The user can use SPARQL queries to find
appropriate telescopes. - Also SPARQL queries can be implemented in tools
like the Grid-Resource Map.
Robotic Telescopes STELLA-I II in
Tenerife (Canary Islands)
40WG5 Next Steps, Robotic Telescopes
- Next steps
- RTML description of STELLA-II, RoboTel and other
robotic telescopes - Develop a system that adds dynamic weather
information - Develop transformation from RTML to telescope
specific language for AIP operated telescopes to
be able to send observation requests in RTML - Provide access through the AstroGrid by applying
- Grid security mechanisms
- VO management
- Development of a scheduler for a network of
robotic telescopes - A lot of testing
- The AIP has a simulator for STELLA and RoboTel