Title: Grid Computing at NIC
1Grid Computing at NIC
Achim Streit Team
a.streit_at_fz-juelich.de
2Grid Projects at FZJ
- UNICORE 08/1997-12/1999
- UNICORE Plus 01/2000-12/2002
- EUROGRID 11/2000-01/2004
- GRIP 01/2002-02/2004
- OpenMolGRID 09/2002-02/2005
- VIOLA 05/2004-04/2007
- DEISA 05/2004-04/2009
- UniGrids 07/2004-06/2006
- NextGrid 09/2004-08/2007
- CoreGRID 09/2004-08/2008
- D-Grid 09/2005-02/2008
3- a vertically integrated Grid middleware system
- provides seamless, secure, and intuitive access
to distributed resources and data - used in production and projects worldwide
- features
- intuitive GUI with single sign-on, X.509
certificates for AA and job/data signing, only
one opened port in firewall required, workflow
engine for complex multi-site/multi-step
workflows, job monitoring extensible application
support with plug-ins, production quality,
matured job monitoring, interactive access with
UNICORE-SSH, integrated secure data transfer,
resource management, full control of resources
remains, production quality, ...
4 Architecture
- Workflow-Engine
- Resource Management
- Job-Monitoring
- File Transfer
- User Management
- Application Support
Client
Multi-Site Jobs
SSL
opt. Firewall
opt. Firewall
Gateway
Gateway
Authentication
Usite
Usite
opt. Firewall
Vsite
Vsite
Vsite
Abstract
NJS
NJS
NJS
Authorization
Authorization
UUDB
UUDB
IDB
IDB
IDB
Incarnation
- similar to /etc/grid-security/grid-mapfile
TSI
TSI
Non-Abstract
TSI
- similar to Globus jobmanager
- fork
- LoadLeveler, (Open)PBS(Pro), CCS, LSF, NQE/NQS,
... - CONDOR, GT 2.4
RMS
Disc
RMS
Disc
RMS
Disc
5UNICORE Client
6UNICORE-SSH
- uses standard UNICORE security mechanisms to
open a SSH connection through the standard SSH
port
UNICORE-SSH button
7Workflow Automation Speed-up
- Automate, integrate, and speed-up drug discovery
in pharmaceutical industry - University of Ulster Data Warehouse
- University of TartuCompute Resources
- FZ JülichGrid Middleware
- ComGenex Inc.Data, User
- Instituto di Ricerche FarmacologicheMario
NegriUser
8Workflow Automation Speed-up
automatic split-up of data-parallel task
9_at_
- Open Source under BSD license
- Supported by FZJ
- Integration of own results andfrom other
projects - Release Management
- Problem tracking
- CVS, Mailing Lists
- Documentation
- Assistance
- Viable basis for many projects
- DEISA, VIOLA, UniGrids, D-Grid, NaReGI
- http//unicore.sourceforge.net
10From Testbed to Production
2005
2002
- Different communities
- Different computing resources (super computers,
clusters, ) - Know-how in Grid middleware
Success factor vertical integration
11Production
- National high-performance computing centre John
von Neumann Institute for Computing - About 650 users in 150 research projects
- Access via UNICORE to
- IBM p690 eSeries Cluster (1312 CPUs, 8.9 TFlops)
- IBM BlueGene/L (2048 CPUs, 5.7 TFlops)
- Cray XD1 (72 CPUs)
- 116 active UNICORE users
- 72 external, 44 internal
- Resource usage (CPU-hours)
- Dec 18.4, Jan 30.4, Feb 30.5, Mar 27.1, Apr
29.7, May 39.1, Jun 22.3, Jul 20.2, Aug 29.0
12Grid InteroperabilityUNICORE Globus Toolkit
Uniform Interface to Grid Services OGSA-based
UNICORE/GS WSRF-Interoperability
13Architecture UNICORE jobs on GLOBUS resources
UNICORE
NJS
Client
UUDB
Gateway
IDB
TSI
GridFTP Client
GRAM Client
Uspace
Globus 2
GRAM Gatekeeper
MDS
GRAM Job-Manager
GridFTP Server
RMS
14Consortium
Research Center Jülich (project
manager) Consorzio Interuniversitario per il
Calcolo Automatico dellItalia Nord
Orientale Fujitsu Laboratories of Europe Intel
GmbH University of Warsaw University of
Manchester T-Systems SfR
Funded by EU grant IST-2002-004279
15Web Services
- Unicore/GS Architecture
- Unicore Component
- New Component
- Web Services Interface
- Access Unicore Components as Web Services
- Integrate Web Services into the Unicore workflow
16Atomic Services
TSF
- UNICORE basic functions
- Site Management (TSF/TSS)
- Compute Resource Factory
- Submit, Resource Information
- Job Management (JMS)
- Start, Hold, Abort, Resume
- Storage Management (SMS)
- List directory, Copy,Make directory,Rename,
Remove - File Transfer (FTS)
- File import, file export
- Standardization
- JSDL WG Revitalized by UniGrids and NAREGI
- Atomic Services are input to the OGSA-BES WG
17Three levels of interoperability
- Level 1 Interoperability between WSRF services
- UNICORE/GS passed the official WSRF interop test
- GPE and JOGSA hosting environments succesfully
tested against UNICORE/GS and other endpoints - WSRF specification will be finalized soon!
- Currently UNICORE/GS WSRF 1.3, GTK WSRF 1.2
draft 1
WSRF Hosting Environment
JOGSA-HE
GPE-HE
GTK4-HE
UNICORE/GS-HE
18Three levels of interoperability
- Level 2 Interoperability between atomic service
implementations - Client API hides details about WSRF hosting
environment - Client code will work with different WSRF
implementations and WSRF versions if different
stubs are being used at the moment!
Atomic Services
Advanced services
CGSP
GPE-Workflow
GTK4
UoM-Broker
UNICORE/GS
GPE-Registry
WSRF Service API
JOGSA
GTK4
UNICORE/GS
GPE
WSRF Hosting Environment
JOGSA-HE
GPE-HE
GTK4-HE
UNICORE/GS-HE
19Three levels of interoperability
- Level 3 GridBeans working on top of different
Client implementations - Independent of atomic service implementations
- Independent of specification versions being used
- GridBean run on GTK or UNICORE/GS without
modifications - GridBeans survive version changes in the
underlying layers and are easy to maintain
Clients
Higher-level services
Portal
Apps
Expert
Visit
GPE
Atomic Service Client API
Atomic Services
Advanced services
CGSP
GPE-Workflow
GTK4
UoM-Broker
UNICORE/GS
GPE-Registry
WSRF Service API
JOGSA
GTK4
UNICORE/GS
GPE
WSRF Hosting Environment
JOGSA-HE
GPE-HE
GTK4-HE
UNICORE/GS-HE
20Consortium
DEISA is a consortium of leading national
supercomputer centers in Europe
IDRIS CNRS, France FZJ, Jülich, Germany RZG,
Garching, Germany CINECA, Bologna, Italy EPCC,
Edinburgh, UK CSC, Helsinki, Finland SARA,
Amsterdam, The Netherlands HLRS, Stuttgart,
Germany BSC, Barcelona, Spain LRZ, Munich,
Germany ECMWF (European Organization), Reading, UK
Granted by European Union FP6 Grant
period May, 1st 2004 April, 30th 2008
21DEISA objectives
- To enable Europes terascale science by the
integration of Europes most powerful
supercomputing systems. - Enabling scientific discovery across a broad
spectrum of science and technology is the only
criterion for success - DEISA is an European Supercomputing Service built
on top of existing national services. - DEISA deploys and operates a persistent,
production quality, distributed, heterogeneous
supercomputing environment with continental scope.
22Basic requirements and strategies for the DEISA
research Infrastructure
- Fast deployment of a persistent, production
quality, grid empowered supercomputing
infrastructure with continental scope. - European supercomputing service built on top of
existing national services requires reliability
and non disruptive behavior. - User and application transparency
- Top-down approach technology choices result from
the business and operational models of our
virtual organization. DEISA technology choices
are fully open.
23The DEISA supercomputing Grid A layered
infrastructure
- Inner layer a distributed super-cluster
resulting from the deep integration of similar
IBM AIX platforms at IDRIS, FZ-Jülich,
RZG-Garching and CINECA (phase 1) then CSC (phase
2). It looks to external users as a single
supercomputing platform. - Outer layer a heterogeneous supercomputing Grid
- IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA,
CSC) close to 24 Tf - BSC, IBM PowerPC Linux system, 40 Tf
- LRZ, Linux cluster (2.7 Tf) moving to SGI ALTIX
system (33 Tf in 2006, 70 Tf in 2007) - SARA, SGI ALTIX Linux cluster, 2.2 Tf
- ECMWF, IBM AIX system, 32 Tf
- HLRS, NEC SX8 vector system, close to 10 Tf
24Logical view of the phase 2 DEISA network
FUnet
SURFnet
DFN
GÈANT
RENATER
UKERNA
GARR
RedIRIS
25AIX Super-Cluster May 2005
Services High performance datagrid via
GPFS Access to remote files use the
full available network bandwidth Job migration
across sites Used to load balance the global
workflow when a huge partition is allocated to
a DEISA project in one site Common Production
Environment
CSC
ECMWF
26Service Activities
- SA1 Network Operation and Support (FZJ)
- Deployment and operation of a gigabit per second
network infrastructure for an European
distributed supercomputing platform. Network
operation and optimization during project
activity. - SA2 Data Management with Global File Systems
(RZG) - Deployment and operation of global distributed
file systems, as basic building blocks of the
inner super-cluster, and as a way of
implementing global data management in a
heterogeneous Grid. - SA3 Resource Management (CINECA)
- Deployment and operation of global scheduling
services for the European super-cluster, as well
as for its heterogeneous Grid extension. - SA4 Applications and User Support (IDRIS)
- Enabling the adoption by the scientific community
of the distributed supercomputing infrastructure,
as an efficient instrument for the production of
leading computational science. - SA5 Security (SARA)
- Providing administration, authorization and
authentication for a heterogeneous cluster of HPC
systems, with special emphasis on single sign-on.
27DEISA Supercomputing Grid services
- Workflow management based on UNICORE plus
further extensions and services coming from
DEISAs JRA7 and other projects (UniGrids, ) - Global data management a well defined
architecture implementing extended global file
systems on heterogeneous systems, fast data
transfers across sites, and hierarchical data
management at a continental scale. - Co-scheduling needed to support Grid
applications running on the heterogeneous
environment. - Science Gateways and portals specific Internet
interfaces to hide complex supercomputing
environments from end users, and facilitate the
access of new, non traditional, scientific
communities.
28Workflow Application with UNICOREGlobal Data
Management with GPFS
- Job-workflow
- FZJ
- CINECA
- RZG
- IDRIS
- SARA
Job
CPU
GPFS
CPU
GPFS
CPU
GPFS
CPU
GPFS
CPU
GPFS
Data
NRENs
29Workflow Application with UNICOREGlobal Data
Management with GPFS
30Usage in other Projects
- UNICORE as basic middleware for research and
development - Development of UNICONDORE interoperability layer
(UNICORE ? CONDOR) - Access to about 3000 CPUs with approx. 17 TFlops
peak in the NaReGI testbed
Integration Project
- UNICORE is used in the Core-D-Grid Infrastructure
- Development of tools for (even) easier
installation and configuration of client and
server components
31Summary
- establishes a seamless access to Grid resources
and data - designed as a vertically integrated Grid
Middleware - provides matured workflow capabilities
- used in production at NIC and in the DEISA
infrastructure - available as Open Source from http//unicore.sourc
eforge.net - used in research projects worldwide
- continuously enhanced by an international expert
team of Grid developers - currently transformed in the Web Services world
towards OGSA and WSRF compliance
32October 1112, 2005 ETSI Headquarters, Sophia
Antipolis, France http//summit.unicore.org/2005
In conjunction with Grids_at_work Middleware,
Components, Users, Contest and Plugtests http//ww
w.etsi.org/plugtests/GRID.htm
Supported by