Title: UNICORE and the DEISA supercomputing grid
1UNICORE and the DEISA supercomputing grid
- Jules Wolfrat
- wolfrat_at_sara.nl
2Outline
- DEISA overview
- UNICORE history
- UNICORE architecture
- Demo?
3THE DEISA SUPERCOMPUTING GRID
AIX distributed super-cluster
Vector systems (NEC, )
GEANT
Linux systems (SGI, IBM, )
4DEISA objectives
- To enable Europes terascale science by the
integration of Europes most powerful
supercomputing systems. - Enabling scientific discovery across a broad
spectrum of science and technology is the only
criterion for success - DEISA is an European Supercomputing Service built
on top of existing national services. This
service is based on the deployment and operation
of a persistent, production quality, distributed
supercomputing environment with continental
scope. - The integration of national facilities and
services, together with innovative operational
models, is expected to add substantial value to
existing infrastructures. - Main focus is High Performance Computing (HPC).
5Participating Sites
BSC Barcelona Supercomputing Centre
Spain CINECA Consortio Interuniversitario
per il Calcolo Automatico Italy CSC
Finnish Information Technology Centre for
Science Finland EPCC/HPCx University of
Edinburgh and CCLRC
UK ECMWF European Centre for
Medium-Range Weather Forecast UK
(int) FZJ Research Centre
Juelich Germany HLRS High
Performance Computing Centre Stuttgart
Germany IDRIS Institut du
Développement et des Ressources
France en Informatique Scientifique -
CNRS LRZ Leibniz Rechenzentrum Munich
Germany RZG Rechenzentrum Garching of
the Max Planck Society Germany SARA
Dutch National High Performance Computing
The Netherlands and Networking centre
6The DEISA supercomputing environment(21.900
processors and 145 Tf in 2006, more than 190 Tf
in 2007)
- IBM AIX Super-cluster
- FZJ-Julich, 1312 processors, 8,9 teraflops peak
- RZG Garching, 748 processors, 3,8 teraflops
peak - IDRIS, 1024 processors, 6.7 teraflops peak
- CINECA, 512 processors, 2,6 teraflops peak
- CSC, 512 processors, 2,6 teraflops peak
- ECMWF, 2 systems of 2276 processors each, 33
teraflops peak - HPCx, 1536 processors, 11 teraflops peak
- BSC, IBM PowerPC Linux system (MareNostrum) 4864
processeurs, 40 teraflops peak - SARA, SGI ALTIX Linux system, 416 processors,
2,2 teraflops peak - LRZ, Linux cluster (2.7 teraflops) moving to SGI
ALTIX system (5120 processors and 33 teraflops
peak in 2006, 70 teraflops peak in 2007) - HLRS, NEC SX8 vector system, 576 processors, 12,7
teraflops peak. - Systems interconnected with dedicated 1Gb/s
network currently upgrading to 10 Gb/s
provided by GEANT and NRENs
7The technology cycle
Technology pull
Service definitions Technology specifications
Technology providers RD projects
Technology watch
DEISA strategic and technologic management
WAN GPFS (IBM)
completed Multi-cluster batch processing (IBM)
completed GPFS for non-IBM systems (IBM)
ongoing Co-scheduling (Platform)
in preparation
8How is DEISA enhancing HPC services in Europe?
- Running larger parallel applications in
individual sites, by a cooperative reorganization
of the global computational workload on the whole
infrastructure, or by the operation of the job
migration service inside the AIX super-cluster. - Enabling workflow applications with UNICORE
(complex applicaions that are pipelined over
several computing platforms) - Enabling coupled multiphysics Grid applications
(when it makes sense) - Providing a global data management service whose
primordial objectives are - Integrating distributed data with distributed
computing platforms - Enabling efficient, high performance access to
remote datasets (with Global File Systems and
striped GridFTP). We believe that this service is
critical for the operation of (possible) future
European petascale systems - Integrating hierarchical storage management and
databases in the supercomputing Grid. - Deploying portals as a way to hide complex
environments to new users communities, and to
interoperate with another existing grid
infrastructures.
9Basic Services Global File Systems
network
nodes
HPC system at site A
Disk space
Global file system
Sophisticated software environment, necessary to
provide single system image if a clustered
computing platform.
They provide global data management. Data in the
GFS is symmetric with respect to all computing
nodes.
10The DEISA integration concept
Site A
Site B
Global distributed GPFS file system with
continental scope. Global resource pool is
dynamic nodes can enter and leave the pool
without Disrupting the national services.
Network interconnect (Reserved bandwidth)
Site C
Site D
11DEISA Global File System integration in
2006 (based on IBMs GPFS)
CINECA (IT)
FZJ (DE)
12Demonstration
Demonstration of a transparent data access in a
heterogeneous configuration
(1) A 64 processor job is running at SARA (SGI
Altix system)
(2) The input data for this run are read from the
Linux GPFS at SARA
(3) The output data will be written into the BSC
GPFS system in Spain
)
(4) Visualization at RZG system is reading the
output data produced by the application from BSC
GPFS
13Global File System Interoperability demo
during Supercomputing Conference 2005 in Seattle
American and European supercomputing
infrastructures linked bridging communities
with scalable, wide-area global file systems
DEISA Sites
TeraGrid Sites
14Basic services workflow simulations using UNICORE
UNICORE supports complex simulations that are
pipelined over several heterogeneous platforms
(workflows). UNICORE handles workflows as a
unique job and transparently moves the output
input data along the pipeline. UNICORE clients
that monitor the application can run in
laptops. UNICORE has a user friendly graphical
interface. DEISA has developed a command line
interface for UNICORE.
UNICORE infrastructure including all sites has
full production status. It has proven to be very
stable during the last few months.
15Other basic services
- Job migration inside the AIX super-cluster. Based
on LoadLeveler Multi-Cluster, it allows system
administrators to reroute jobs to other sites, in
a way transparent for the end users. Used to move
away simple jobs of  implicit users to make
place for a bigger application in a site. Full
production status. - Co-allocation. We are starting to prepare a first
generation co-allocation service on the full
heterogeneous infrastructure, using LSF
Multi-cluster. Important for coupled Grid
aplications and for data movement. Service in
development phase, prototype expected in 6-9
months - Remote I/O using Global File Systems and fast
data transfers. See next transparency - Integrating hierarchical data management and
databases in the supercomputing Grid. In progress
16Accessing remote data high performance remote
I/O and file transfer
Remote I/O with global file systems
implicitly moves data across platforms (in
production today)
DEISA will also deploy explicit high
performance data movers, using GridFTP
DATA REPOSITORY
GridFTP
Co-scheduled, parallel data mover tasks
17Summary
- DEISA provides an integrated supercomputing
environment, with efficient data sharing through
high performance global file systems. This is
highly transparent to end users. - DEISA enables job migration across sites (also
transparent to end users). Exceptional resources
for very demanding applications are made
available by the operation of the global resource
pool. We are load balancing computational
workload at a European scale. - Huge, demanding applications can be run as
such. - Support of Grid applications (which are
distributed by design). - With this operational model, the DEISA
super-cluster is not very different from a true
monolithic European supercomputer (which must be
partitioned in any case for fault tolerance and
QoS). - The main difference comes from the coexistence of
several independent administration domains. This
requires, as in TeraGrid, coordinated production
environments.
18UNICORE
- UNICORE
- UNiform Interface to COmputer Resources
- Following material thanks to UNICORE team
19Highlights
- Excellent workflow support
- Transparent data staging / transfer
- Multi-site, multi-step jobs heterogeneous
meta-computing - Uniform user authentication and security
mechanisms - The site maintains full control over their
resources - UNICORE Client offers
- Uniform GUI for job creation and monitoring
- Easy integration of applications through plugins
20History I
- Development started in 1997
- Projects UNICORE and UNICORE Plus
- Funded by the German Ministry of Education and
Research (until 12/2002)
21History II
- Developments in EC funded projects
- EUROGRID (11/2000 01/2004)
- IST-1999-20247
- Resource broker, Standard based File Transfer
(gridFTP) - Bio molecular simulations, Weather prediction,
coupled CAE simulations, Structural analysis - GRIP (01/2002 02/2004)
- IST-2000-32257
- Interoperability between UNICORE and Globus
(Integration of Globus maintained resources as
target system in UNICORE) - OpenMolGRID (09/2002 11/2004)
- IST-2001-37238
- Use UNICORE for molecular engineering
- Focus on scientific workflows
22History III
- Collaborators
- Intel GmbH (former Pallas GmbH)
- Fujitsu Laboratory of Europe (former fecit)
- Forschungszentrum Jülich
- Deutscher Wetterdienst
- Genias, RUS, RUKA, LRZ, PC2, ZHR, ZIB
- CNRS-IDRIS (F), CSCS (CH), GIE EADS CCR (F), ICM
(PL), Parallab (N), Soton (UK), UoM (UK), ANL
(US), UT (EE), UU (UK), ComGenex (HU), Negri (I)
23- Features
- Intuitive GUI with single sign-on
- X.509 certificates for AA and job/data signing
- only one opened port in firewall required
- workflow engine for
- complex multi-site
- multi-step workflows
- job monitoring
- extensible application support
- secure data transfer integrated
- resource management
- easy installation and configuration of client and
server components - full control of resources remains
- production quality,
24Software Status I
- Current version 5.6 (Client) / 4.6 (Server)
- User Client is platform independent (Java)
- Servers (Unix)
- Target systems (Unix)
- no batch
- T3E, SP3, VPP, hpcLine, SR 8000, SX-5,
PC-Clusters, , Globus 2.x as targets - NQS, LL, LSF, PBS, CCS, SGE, ...
25Software Status II
- UNICORE available at SourceForge as OpenSource
under BSD license - http//unicore.sourceforge.net
- UNICORE Forum e.V.
- http//www.unicore.org
- Public test system for testing (standard) client
functions available
26Deployment
- At all project partner sites
- DEISA sites (IDRIS, CINECA, RZ Garching, ...)
- Naregi project (Japan)
27UNICORE Architecture
UNICORE Client
The UNICORE Grid
28ARCHITECTURE
Client
Multi-Site Jobs
SSL
opt. Firewall
opt. Firewall
Gateway
Gateway
Authentication
Usite
Usite
opt. Firewall
Vsite
Vsite
Vsite
Abstract
NJS
NJS
NJS
Authorization
Authorization
UUDB
UUDB
IDB
IDB
IDB
Incarnation
TSI
TSI
Non-Abstract
TSI
RMS
Disc
RMS
Disc
RMS
Disc
29 Client
Usites
JobPreparation
WorkflowManagement
JobMonitoring
Vsites
30UNICORE Server
- Gateway
- Network Job Supervisor
- Configuration
- UNICORE User Data Base
- Target System Interface
- Demo package containing preconfigured components
available onsourceforge.net/projects/unicore
31Server Components
conf
Gateway
conf
Network Job Supervisor
UUDB
UNICORE User DB
Target System Interface
32Server Prerequisites
- Gateway and NJS
- Java ? 1.4.2
- X.509 certificates for Gateway and NJS
- Signer certificate(s)
- TSI
- Perl (? 5.004)
33Gateway
- Entry point of a UNICORE Site
- Accepts SSL connections from Clients and NJSs
- Accepts valid certificates from all signers known
to it (authentication) - Talks UNICORE Protocol Layer (UPL) on connections
to the outside world - Sends/receives AJOs to/from the NJSs
34Gateway connections
conf
gateway.properties gw.gateway_host_namelthost
namegt gw.portltportgt
Gateway
connectionsltVsite namegt ltNJS machinegt ltNJS portgt
conf
Network Job Supervisor
UUDB
UNICORE User DB
35Network Job Supervisor (NJS)
- UNICORE scheduler
- Receives/sends AJOs from/to local Gateway
- Translates AJO into batch job for target
- Maps the users Ulogin to Xlogin
- Sends sub-AJOs to corresponding Gateway according
to dependencies - Polls for status and output of sub-AJOs
- Sends batch jobs and requests to TSI
- Polls TSI for job status and output
36NJS Connections
conf
Gateway
connections
njs.properties njs.gatewaylthost namegt
njs.vsite_nameltnamegt njs.gateway_portltportgt
njs.admin_portltportgt
conf
Admin
NJS
UUDB
UNICORE User DB
TSI
37Incarnation Data Base
- Static definitions and translation table,
- contains definitions for
- GENERAL properties (file spaces, descriptions, )
- EXECUTION_TSI (host ports, resources, batch
queues, ) - STORAGE_TSI (for file transfers and management)
- RUN (translation rules for target)
- IMPORT, EXPORT, CLEANUP, LIST_DIRECTORY, RENAME,
COPY_FILE, DELETE_FILE, CHANGE_PERMISSIONS - FORTRAN, LINK
38UNICORE User Data Base
- Management of Ulogin Xlogin mapping
information - NJS accesses this information
- Basic version allows to map one certificate to
exactly one Xlogin - NJS to UUDB interface defined to adapt to site
specific user data bases (i.e. ldap) - http//www.unicore.org/downloads.htm ?
contributions offers an alternative uudb with
certificate-projectid pairs being mapped to
Xlogins
39NJS connections
conf
Gateway
connections
njs.properties njs.gatewaylthost namegt
njs.vsite_nameltnamegt njs.gateway_portltportgt
njs.admin_portltportgt
conf
Admin
NJS
njs.idb SOURCE ltTSI machinegt ltport1gt ltport2gt
UUDB
UNICORE User DB
TSI
40Target System Interface
- Interface to target operating and batch system
- Perl scripts and modules
- Needs root privileges to act on behalf of the
user (uses setreuid) - Provides interface to local system for
- Job submission
- Status query, job monitoring
- File handling
41Example Submit.pm
- jobname 2 if 1 eq "JOBNAME"
- outcome_dir 2 if 1 eq "OUTCOME_DIR"
- uspace_dir 2 if 1 eq "USPACE_DIR"
- time 2 if 1 eq "TIME"
- memory 2 if 1 eq "MEMORY"
- nodes 2 if 1 eq "NODES"
-
- memory "-lM memory"."Mb"
- my command "mainsubmit_cmd queue nodes
email memory time jobname stdout_loc
stderr_loc Submittsi_unique_file_name"
42TSI connections
conf
njs.properties
NJS
njs.idb SOURCE ltTSI machinegt ltport1gt ltport2gt
UNICORE User DB
UUDB
tsi.properties mainnjs_machine shift "NJS
host" mainnjs_port shift
"port1" mainmy_port shift port2
TSI
43Overview Server connections
44Firewall Issues
- Client ? Gateway
- Internet
- Allow connections to Gateway for https protocol
on the port the Gateway is listening on - Client side has to allow for outgoing traffic on
any port - Gateway ? NJS
- Intranet
- All connections from Gateway to NJS system and
NJSs Gateway port - NJS ? TSI
- Intranet
- All connections from NJS to TSI system and TSIs
NJS port - All connections from TSI to NJS system and NJSs
TSI port
45Current Trends and a look into the UNICORE
future...
- Web services for interoperability
- open up the architecture
- ...but keep the UNICORE strenghts
- abstraction and virtualisation
- workflows
- easy application integration
46Acronyms I
47Acronyms II
48 meets
CSC users
FZJ users
CNE users
RZG users
IDR users
SARA users
BSC users
LRZ users
RZG users
DEISA FZJ gateway DMZ
DEISA CNE gateway DMZ
DEISA RZG gateway DMZ
DEISA IDR gateway DMZ
DEISA CSC gateway DMZ
DEISA SARA gateway DMZ
DEISA BSC gateway DMZ
DEISA LRZ gateway DMZ
FZJ NJS
CNE NJS
RZG NJS
IDR NJS
CSC NJS
SARA NJS
BSC NJS
LRZ NJS
intranet
intranet
intranet
intranet
intranet
intranet
intranet
intranet
49UNICORE Security
- Security model based on X509 public key
infrastructure - Credential consists of a public and a private key
- No userid and password authentication
- Password protected keystore
- Single sign on
- UNICORE accepts following private key formats
- RSA (pkcs12)
- E.g. Openssl 0.9.7x
- Java keystore (jks)
- SUN Java
- Certificates provided e.g. by DFN CA
- Two server site security entities
- Gateway Authentication
- NJS Authorisation
-
50UNICORE Security - Client
- Access to password protected keystore
- Encrypted Keystore contains all imported
certificate(s) and the users private key(s) - UNICORE Keystore editor allows to
- Generate a X509 certificate request
- Import/export .p12 or .jks keystores
- Import public keys
- The User has to import (at least) three
certificates into the Client - Pluginsigners certificate (public key)
- Gateway signers certificate (public key)
- Users signed public key
51UNICORE Security Gateway
- Gateway authenticates the user
- Following checks are performed on certificates
presented by a client - Certificate is issued by one of the trusted CA
(e.g. DFN-CA) - Certificate is within its validity period
- Certificate has not been revoked (if check for
Certification Revocation Lists (CRL) is
activated) - Gateway accepts only SSL connections from Clients
and other NJSs - SSL-Handshake
- Optional SSL connection between Gateway and NJS
52Behind the scenes Authentication
Client
Gateway
User Certificate
Gateway Certificate
Trust user certificate issuer?
Trust gateway certificate issuer?
53UNICORE Security NJS
- NJS authorizes the user
- Access the UNICORE user Database (UUDB)
- Maps the users certificate to his xlogin on the
target system - Only users presenting certificates stored in the
UUDB can connect to the target system - NJS authorises other NJSs
- Explicit UUDB entry
54Behind the Scenes Authorisation
Typical UNICORE User
User Certificate
UUDB
IDB
User Login
TSI
55 UNICORE Job
- Job contains
- Sub-jobs and tasks
- Dependency information
- Without dependencies all tasks of a job are
executed in parallel - Workflow doN, loops, if-then-else
- Target system location
- Tasks are translated into batch jobs for the
destination system by the servers (NJSs)
56Abstract Job Object (AJO)
- Abstract, target system independent
representation of a job - Specifies actions to be performed by UNICORE
- Execute task
- File transfer task
- Control task
- Contains dependency graph
- Contains resource requests (nodes, memory, time,
...) - Contains data set descriptions for data to be
streamed - Realised as Java classes
57 _at_
- Open Source under BSD license
- Supported by FZJ
- Integration of own results andfrom other
projects - Release Management
- Problem tracking
- CVS, Mailing Lists
- Documentation
- Assistance
- Viable basis for many projects
- DEISA, UniGrids, NaReGI,
- http//unicore.sourceforge.net