Title: DEISA
1 DEISA
www.deisa.org
Achim Streit
2Agenda
- Introduction
- SA3 Resource Management
- DEISA Extreme Computing Initiative
- Conclusion
3The DEISA Consortium
DEISA is a consortium of leading national
supercomputer centers in Europe
IDRIS CNRS, France FZJ, Jülich, Germany RZG,
Garching, Germany CINECA, Bologna, Italy EPCC,
Edinburgh, UK CSC, Helsinki, Finland SARA,
Amsterdam, The Netherlands HLRS, Stuttgart,
Germany BSC, Barcelona, Spain LRZ, Munich,
Germany ECMWF (European Organization), Reading, UK
Granted by European Union FP6 Grant
period May, 1st 2004 April, 30th 2008
4DEISA objectives
- To enable Europes terascale science by the
integration of Europes most powerful
supercomputing systems. - Enabling scientific discovery across a broad
spectrum of science and technology is the only
criterion for success - DEISA is an European Supercomputing Service built
on top of existing national services. - DEISA deploys and operates a persistent,
production quality, distributed, heterogeneous
supercomputing environment with continental scope.
5Basic requirements and strategies for the DEISA
research Infrastructure
- Fast deployment of a persistent, production
quality, grid empowered supercomputing
infrastructure with continental scope. - European supercomputing service built on top of
existing national services requires reliability
and non disruptive behavior. - User and application transparency
- Top-down approach technology choices result from
the business and operational models of our
virtual organization. DEISA technology choices
are fully open.
6The DEISA supercomputing Grid A layered
infrastructure
- Inner layer a distributed super-cluster
resulting from the deep integration of similar
IBM AIX platforms at IDRIS, FZ-Jülich,
RZG-Garching and CINECA (phase 1) then CSC (phase
2). It looks to external users as a single
supercomputing platform. - Outer layer a heterogeneous supercomputing Grid
- IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA,
CSC) close to 24 Tf - BSC, IBM PowerPC Linux system, 40 Tf
- LRZ, Linux cluster (2.7 Tf) moving to SGI ALTIX
system (33 Tf in 2006, 70 Tf in 2007) - SARA, SGI ALTIX Linux cluster, 2.2 Tf
- ECMWF, IBM AIX system, 32 Tf
- HLRS, NEC SX8 vector system, close to 10 Tf
7Logical view of the phase 2 DEISA network
FUnet
SURFnet
DFN
GÈANT
RENATER
UKERNA
GARR
RedIRIS
8AIX Super-Cluster May 2005
Services High performance datagrid via
GPFS Access to remote files use the
full available network bandwidth Job migration
across sites Used to load balance the global
workflow when a huge partition is allocated to
a DEISA project in one site Common Production
Environment
CSC
ECMWF
9Service Activities
- SA1 Network Operation and Support (FZJ)
- Deployment and operation of a gigabit per second
network infrastructure for an European
distributed supercomputing platform. Network
operation and optimization during project
activity. - SA2 Data Management with Global File Systems
(RZG) - Deployment and operation of global distributed
file systems, as basic building blocks of the
inner super-cluster, and as a way of
implementing global data management in a
heterogeneous Grid. - SA3 Resource Management (CINECA)
- Deployment and operation of global scheduling
services for the European super-cluster, as well
as for its heterogeneous Grid extension. - SA4 Applications and User Support (IDRIS)
- Enabling the adoption by the scientific community
of the distributed supercomputing infrastructure,
as an efficient instrument for the production of
leading computational science. - SA5 Security (SARA)
- Providing administration, authorization and
authentication for a heterogeneous cluster of HPC
systems, with special emphasis on single sign-on.
10SA3 A Three Layer Architecture
- Basic services
- located closest to the operating system of the
computing platforms - enable the operation of a single or a multiple
cluster through local or extended batch
schedulers and other cluster-like features - Intermediate services
- first-level Grid services that allow access to an
enlarged Grid-empowered infrastructure - dealing with resource and network monitoring and
information systems - Advanced service
- use the previous layers to implement the global
management of the distributed resources of the
infrastructure
11Logical Layout
- Services
- access
- workflow management
- co-allocation
- brokering
- job rerouting
- multiple accounting
- data staging
Policies implementation through the scheduler
(workload,advance reservation, accounting)
Resource manager
OS and communication
Hardware
12UNICORE Infrastructure
- Gateway 4.1.0
- NJS 4.2.0
- TSI 4.1.0
- J2SE 1.4.2
13Physical LayoutResource Management
IDRIS
FZJ
RZG
CINECA
CSC
ECMWF
SARA
LRZ
BSC
HLRS
UNICORE
LL backfill
LL backfill
LL backfill
LL backfill
LL backfill
LL backfill
LSF HPC
SGE
LL backfill
NEC NQE
LL RM
LL RM
LL RM
LL RM
LL RM
LL RM
LSF RM
SGE RM
LL RM
NEC NQE RM
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
RHEL SGI PP
RHEL SGI PP
SUSE
NEC OS
Power 4
Power 4
Power 4
Power 4
Power 4
Power 4
IA64
IA64
PPC
SX
14Physical LayoutData Management
IDRIS
FZJ
RZG
CINECA
CSC
ECMWF
SARA
LRZ
BSC
HLRS
IBM GPFS (General Parallel File System) over WAN
Client Ad Hoc
Client Ad Hoc
Client Ad Hoc ??
Client Native
Client Native
Client Native
Client Native
Client Native
Client Native
Client Native
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
AIX 5.2
RHEL SGI PP
RHEL SGI PP
SUSE
NEC OS
Power 4
Power 4
Power 4
Power 4
Power 4
Power 4
IA64
IA64
PPC
SX
15DEISA Supercomputing Grid services
- Workflow management based on UNICORE plus
further extensions and services coming from
DEISAs JRA7 and other projects (UniGrids, ) - Global data management a well defined
architecture implementing extended global file
systems on heterogeneous systems, fast data
transfers across sites, and hierarchical data
management at a continental scale. - Co-scheduling needed to support Grid
applications running on the heterogeneous
environment. - Science Gateways and portals specific Internet
interfaces to hide complex supercomputing
environments from end users, and facilitate the
access of new, non traditional, scientific
communities.
16Workflow Application with UNICOREGlobal Data
Management with GPFS
- Job-workflow
- FZJ
- CINECA
- RZG
- IDRIS
- SARA
Job
CPU
GPFS
CPU
GPFS
CPU
GPFS
CPU
GPFS
CPU
GPFS
Data
NRENs
17Resource ManagementInformation System (RMIS)
- Deliver up to date and complete resource
management information about the grid - Provide relevant information to system
administrators from remote sites and to end-users - Our approach
- performed a implementation-independent system
analysis - attempted to model the DEISA distributed
supercomputer platform designed to operate the
grid - identified the resource management part as a
sub-system needing to interface other sub-systems
to get relevant information - other sub-systems use external tools (monitoring
tools, data bases and batch system) with which we
need to interface
18Implementation
- Based on Ganglia monitoring tool coupled with
MDS2/Globus - The data published have been distinguished in two
groups - static data (MDS2) refresh time hours or days
- dynamic data (Ganglia) refresh time seconds
or minutes - Web server based on the Ganglia web front end
- allows the display of any relevant data from MDS2
or Ganglia
Cluster
Firewall
19Portals (Science Gateways)
- Same concept as TeraGrids Science Gateways
- Needed to enhance the outreach of supercomputing
infrastructures - Hiding complex supercomputing environments from
end users, providing discipline specific tools
and support, and moving in some cases towards
community allocations. - There is already work done by DEISA on Genomics
and Material Sciences portals - Intense brainstorming on the desing of a global
strategy, if possible interoperable with
TeraGrids Science Gateways
20Enabling science
- Initial, early users program a number of Joint
Research Activities integrated in the project
from the start. - Moving towards exceptional users the DEISA
Extreme Computing Initiative
Activity Scientific program Partners Leader
JRA1 Enabling Material Science, CPMD cods, portals RZG Hermann Lederer, RZG
JRA2 Computational environment for applications in Cosmology EPCC Gavin Pringle, EPCC
JRA3 Enabling the TORB Plasma Physics code RZG Hermann Lederer, RZG
JRA4 Life science genomic and eHealth Applications IDRIS, (BSC) Victor Alessanrini, IDRIS ? BSC
JRA5 CFD in automobile industry CINECA, CRI Roberto Tregnago, CRI
JRA6 Coupled applications Astrophysics, Combustion, Environment IDRIS (HLRS) Gilles Grasseau, IDRIS
21The Extreme Computing Initiative
- Identification, deployment and operation of a
number of flagship applications in selected
areas of science and technology - Applications must rely on the DEISA
Supercomputing Grid services (application
profiles have been clearly defined). They will
benefit from exceptional resources from the DEISA
pool. - Applications are selected on the basis of
scientific excellence, innovation potential, and
relevance criteria. - European call for proposals April 1st ? May 30,
2005
22Evaluation and allocation of DEISA resources
- National evaluation committees evaluate the
proposals and determine priorities. - On the basis of this information, the DEISA
consortium examines how the applications map to
the resources available in the DEISA pool, and
negotiates internally the way the resources will
be allocated and the final priorities for
projects. - Exceptional DEISA resources will be allocated
as in large scientific instruments at well
defined time windows (to be negotiated with the
users).
23DEISA Extreme Computing InitativeDECI
- Call for Expressions of Interest / Proposals in
April and May 2005 - 50 proposals submitted
- Requested CPU time 32 million CPU-hr
- European countries involved
- Finland, France, Germany, Greece, Hungary, Italy,
Netherlands, Russia, Spain, Sweden, Switzerland,
UK - Proposals
- Materials Science, Quantum Chemistry, Quantum
Computing 16 - Astrophysics (Cosmology, Stars, Solar Sys.) 13
- Life Sciences, Biophysics, Bioinformatics 8
- CFD, Fluid Mechanics, Combustion 5
- Earth Sciences, Climate Research 4
- Plasma Physics 2
- QCD, Particle Physics, Nuclear Physics 2
24Conclusions
- DEISA adopts Grid technologies to integrate
national supercomputing infrastructures, and to
provide an European Supercomputing Service. - Service activities are supported by the
coordinated action of the national center's
staffs. DEISA operates as a virtual European
supercomputing centre. - The big challenge we are facing is enabling new,
first class computational science. - Integrating leading supercomputing platforms with
Grid technologies creates a new research
dimension in Europe.
25October 1112, 2005 ETSI Headquarters, Sophia
Antipolis, France http//summit.unicore.org/2005
In conjunction with Grids_at_work Middleware,
Components, Users, Contest and Plugtests http//ww
w.etsi.org/plugtests/GRID.htm
Supported by