Title: Presentazione%20di%20PowerPoint
1The DEISA HPC Grid for Astrophysical Applications
Claudio Gheller CINECA (c.gheller_at_cineca.it)
2Disclaimer
My background Computer science in
astrophysics My involvement in DEISA Support to
scientific extreme computing projects (DECI) Im
not A systems espert A networking expert
3Conclusions
DEISA is not Grid computing It is (super) super
computing
4The DEISA project overview
What is DEISA (Distributed European
Infrastructure for Super-computing Applications)
is a consortium of leading national EU
supercomputing centres Goals deploy and operate
a persistent, production quality, distributed
supercomputing environment with continental
scope. When The Project is funded by European
Commission May 2004 - April 2008. It has been
re-funded (DEISA2) May 2008 April 2010
5The DEISA project drivers
- Support High Performance Computing.
- Integrate the Europes most powerful
supercomputing systems. - Enable scientific discovery across a broad
spectrum of science and technology. - Best exploitation of the resources both at site
level and European level - Promote openness and usage of standards
6The DEISA project what is NOT
- DEISA is not a middleware development project.
- DEISA, actually, is not a Grid it does not
support Grid computing. Rather it supports
Cooperative Computing.
7The DEISA project core partners
BSC, Barcelona Supercomputing Centre,
Spain CINECA, Consorzio Interuniversitario,
Italy CSC, Finnish Information Technology Centre
for Science, Finland EPCC/HPCx, University of
Edinburgh and CCLRC, UK ECMWF, European Centre
for Medium-Range Weather Forecast, UK FZJ,
Research Centre Juelich, Germany HLRS, High
Performance Computing Centre Stuttgart,
Germany LRZ, Leibniz Rechenzentrum Munich,
Germany RZG, Rechenzentrum Garching of the Max
Planck Society, Germany IDRIS, Institut du
Développement et des Resources en Informatique
Scientifique CNRS, France SARA, Dutch National
High Performance Computing, Netherlands
8The DEISA project Project Organization
Three activity areas Networking management,
coordination and dissemination Service
Activities running the infrastructure Joint
Research Activities porting and running
scientific applications on the DEISA
infrastructure
9Deisa Activities, some (maybe too many) details
(1)
- Service Activities
- Network Operation and Support. (FZJ leader).
Deployment and operation of a gigabit per second
network infrastructure for an European
distributed supercomputing platform. - Data Management with Global file systems. (RZG
leader). Deployment and operation of global
distributed file systems, as basic building
blocks of the "inner" super-cluster, and as a way
of implementing lobal data management in a
heterogeneous Grid. - Resource Management. (CINECA leader). Deployment
and operation of global scheduling services for
the European super cluster, as well as for its
heterogeneous Grid extension. - Applications and User Support. (IDRIS leader).
Enabling the adoption by the scientific community
of the distributed supercomputing infrastructure,
as an efficient instrument for the production of
leading computational science. - Security. (SARA leader). Providing
administration, authorization and authentication
for a heterogeneous cluster of HPC systems, with
special emphasis on single sign-on
10Deisa Activities, some (maybe too many) details
(2)
- Scientific Applications Activities
- JRA1 Material Science.
- (RZG leader)
- JRA2 Cosmology.
- (EPCC leader)
- JRA3 Plasma Physics.
- (RZG leader)
- JRA4 Life Science.
- (IDRIS leader)
- JRA5 Industry.
- (CINECA leader)
- JRA6 Coupled Applications.
- (IDRIS leader)
- JRA7 Access to Resources in Heterogeneous
Environments. - (EPCC leader)
The DEISA Extreme Computing Initiative (DECI) See
http//www.deisa.org/applications
11JRA2 Cosmological Applications
- Goals
- to avail the Virgo Consortium of the most
advanced features of Grid computing by porting
their production applications - GADGET and FLASH
- to make an effective use of the DEISA
infrastructure - to lay the foundations of a Theoretical Virtual
Observatory - Leaded by EPCC which works in close partnership
with the Virgo Consortium - JRA2 managed jointly by Gavin Pringle
(EPCC/DEISA) and Carlos Frenk (co-PI of both
Virgo and VirtU) - work progressed after gathering clear user
requirements from Virgo Consortium. - requirements and results published as public
DEISA deliverables.
12Current DEISA status
- variety of systems connected via GEANT/GEANT2
(Premium IP) - centres contribute 5 to 10 of CPU cycles to
DEISA - running projects selected from the DEISA Extreme
Computing Initiative (DECI) calls
Premium IP is a service that offers network
priority over other traffic on GÉANT. Premium IP
traffic takes priority over all other services .
13DEISA HPC systems
14DEISA technical hints software stack
- UNICORE is the grid glue
- not built on Globus
- EPCC developing UNICORE command-line interface
- Other components
- IBMs General Parallel File System
- multiclusterGPFS can span different systems over
a WAN - recent developments for Linux as well as AIX
- IBMs Load Leveler for job scheduling
- Multicluster Load Leveler can re-route batch
jobs to different machines - also available on Linux
15DEISA model
- large parallel jobs running on a single
supercomputer - network latency between machines not a
significant issue - jobs submitted ideally - via UNICORE, in
practice via Load Leveler - re-routed where appropriate to remote resources
- Single-Sign-On access via GSI-SSH
- GPFS absolutely crucial to this model
- jobs have access to data no matter where they run
- no source code changes required
- standard fread/fwrite(or READ/WRITE) calls to
Unix files - also have a Common Production Environment
- defines a common set of environment variables
- defined locally to map to appropriate resources
- Eg DEISA_WORK will point to local workspace
16Running ideally on DEISA
- Fill all the gaps
- restart/continue jobs on any machine from file
checkpoints - no need to recompile application program
- no need to manually stage data
- multi-step jobs running on multiple machines
- easy access to data for post-processing after a
run
17Running on DEISA Load Leveler
18Running ideally on DEISA Unicore
19GPFS Multicluster
HPC systems mount /deisa/sitename users
read/write directly from/to these file
systems /deisa/idr /deisa/cne /deisa/rzg /deisa/fz
j /deisa/csc
20DEISA Common Production Environment (DCPE)
- DCPE what is it?
- both a set of software (the software stack) and a
generic interface to access the software (based
on the Modules tool) - Required to both offer a common interface to the
users and to hide the differences between local
installations - Essential feature for job migration inside
homogeneous super-clusters - The DCPE includes
- shells (Bash and Tcsh),
- compilers (C, C, Fortran and Java),
- libraries (for numerical analysis, data
formatting, etc.), - tools (debuggers, profilers, editors, development
tools), - applications.
21Modules Framework
- Modules tool chosen because it was well known by
many sites and many users - Public domain software
- Tcl implementation used
- Modules
- offer a common interface different software
components on different computers, - to hide different names and configurations
- to manage individually each software and load
only those required into the user environment, - for each user to change the version of each
software independently of the others, - for each user to switch independently between the
current default version of a software to another
one (older or newer).
22The HPC users vision
Initial vision Full Distributed computing
Task2
Task3
Task1
23The HPC users visions
Impossible!!!!
Initial vision Full Distributed computing
Task2
Task3
Task1
24The HPC users vision
Jump computing
Task
Task
25The HPC users vision
Jump computing
Difficult HPC applications are HPC
applications!!! Fine tuned on the architectures
Task
Task
26So what
Jump computing is useful to reduce queue waiting
times. Find the gap and fill it can work,
better on homogeneous systems
27So what
Single image filesystem is a great solution!!!!!
(even if moving data)
28So what
Usual Grid solution requires to learn new stuff
Often scientists are not willing to DEISA rely
on Load Leveler (or other common scheduling
systems) same scripts, same commands you are
used to!!! However, only IBM systems support
LL The Common Production Environment offers a
shared (and friendly) set of tools to the
users. However, compromises must be accepted
29Summing up
Growing up, DEISA is moving away from a
Grid. In order to fulfill the needs of HPC
users, it is trying to become a huge
supercomputer. On the other hand, DEISA2 must
lead to a service infrastructure and users
expectations MUST be matched (no more time for
experiments)
30DECI enabling Science to DEISA
- Identification, deployment and operation of a
number of  flagship applications requiring the
infrastructure services, in selected areas of
science and technology. - European Call for proposals in May - June every
year. Applications are selected on the basis of
scientific excellence, innovation potential and
relevance criteria, with the collaboration of the
HPC national evaluation committees. - DECI users are supported by the Applications Task
Force (ATASKF), whose objective is to enable and
deploy the Extreme Computing applications.
31LFI-SIM DECI Project (2006)
Principal Investigator(s) Fabio Pasian (INAF- O.A.T.), Hannu Kurki-Suonio (Univ. of Helsinki)
Leading Institution INAF -O.A Trieste and Univ. of Helsinki
Partner Institution(s) INAF-IASF Bologna, Consejo Superior de Investigaciones Cientificas (Instituto de Fisica de Cantabria), Max-Planck Institut für Astrophysik Garching, SISSA Trieste, University of Milano, University Tor Vergata Rome
DEISA Home Site CINECA
- Planck (useless) overview
- Planck is the 3rd generation space mission for
the mapping and the analysis of the microwave
sky its unprecedented combination of sky and
frequency coverage, accuracy, stability and
sensitivity is designed to achieve the most
efficient detection of the Cosmic Microwave
Background ( CMB ) in both temperature and
polarisation. In order to achieve the ambitious
goals of the mission, unanimously acknowledged by
the scientific community to be of the highest
importance, data processing of extreme accuracy
is needed.
32Need of simulations in Planck
- NOT the typical DECI-HPC project !!!
- Simulations are used to
- assess likely science outcomes
- set requirements on instruments in order to
achieve the expected scientific results - test the performance of data analysis algorithms
and infrastructure - help understanding the instrument and its noise
properties - analyze known and unforeseen systematic effects
- deal with known physics and new physics.
- Predicting the data is fundamental to understand
them.
33instrument parameters
Simulation pipeline
NEED OF HUGE COMPUTATIONAL RESOURCES GRID can be
a solution!!!
34Planck DEISA
- DEISA was expected to be used to
- simulate many times the whole mission of Plancks
LFI instrument, on the basis of different
scientific and instrumental hypotheses - reduce, calibrate and analyse the simulated data
down to the production of the final products of
the mission, in order to evaluate the impact of
possible LFI instrumental effects on the quality
of the scientific results, and consequently to
refine appropriately the data processing
algorithms.
35Outcomes
- Planck simulations are essential to get the best
possible understanding of the mission and to have
a conscious expectation of the unexpected - They also allow to properly plan Data Processing
Centre resources - The usage of the EGEE grid resulted to be more
suitable for such project since it provides fast
access to small/medium computing resources. Most
of the Planck pipeline is happy with such
resources!!! - However DEISA was useful to produce massive sets
of simulated data and to perform and test the
data processing steps which requires large
computing resources (lots of coupled processors,
large memories, large bandwidth) - Interoperation between the two grid
infrastructures (possibly based on the G-Lite
middleware) is expected in the next years