Title: INFN-GRID Globus evaluation (WP 1)
1INFN-GRID Globus evaluation(WP 1)
- Massimo Sgaravatto
- INFN Padova
- for the INFN Globus group
- globus_at_infn.it
- http//www.infn.it/globus
2Globus
- Some basic services (security, information
service, resource management, ) must be
implemented in order to implement and use a Grid
for real applications - Globus identified as possible Grid framework
providing these services - but it has been developed mainly for
traditional computing, different from computing
in HEP - High performance vs. high throughput
- Supercomputers vs PC farms
- Distributed data intensive computing not
addressed - Need to assess what can be used for HEP
environment - ? WP1 Installation and Evaluation of the Globus
Toolkit of the INFN-GRID Project - Goal evaluation of the Globus toolkit
- Which services can be useful ?
- What is necessary to integrate/modify ?
- What is missing ?
3Globus Architecture
Applications
High-level Services and Tools
GlobusView
Testbed Status
DUROC
globusrun
MPI
Nimrod/G
MPI-IO
CC
Core Services
GRAM
Nexus
Metacomputing Directory Service
Globus Security Interface
Heartbeat Monitor
Gloperf
GASS
4Proposed work plan
- Security
- To access GRID resources mechanisms for user
authentication and authorization needed - ? Evaluation of GSI service
- Information Service
- To discover the GRID resources (CPU, storage,
network, ) mechanisms to publish them must be
defined - ? Analysis of GIS service to publish
information using a uniform and standard
interface - Resource Management
- Necessary a uniform interface to submit jobs on
GRID resources - Uniform standard interface to different resource
management systems - Uniform standard language for task management
- ? Assessment of Globus services for resource
allocation and process management
5Proposed work plan
- Data Access and Migration
- High performance and reliable tools needed to
manage data (access to remote data, data
transfers, wide area replica, ) - ? Assessment of Globus tools for data management
(GASS, Globusftp) - Fault Monitoring
- Faults in a GRID environment must be promptly
detected and recovery mechanisms must be
implemented - ? Evaluation of HBM service for fault detection
- Execution Environment Management
- Code migration (moving the application where the
job will actually be executed) as a possible
implementation strategy - ? Evaluation of GEM service to support code
migration - Globus installation tools
- Reduce complexity and manpower for Globus
installation and maintenance
6Globus installation tools
- Flavias presentation
- INFN-GRID installation tool to shorten the
installation time of the Globus toolkit, avoid
common mistakes, support for specific
customisations - Possibility (option) to install optional
software, to proceed with INFN specific
customizations (INFN CA, configuration of a
hierarchical GIS architecture), to install and
use specific INFN tools - Proven to be successful within INFN (used to
setup a INFN GRID testbed) and also outside
(CERN, FNAL, )
7Security
- Evaluation of Globus GSI
- User authentication (implementation based on
X.509 certificates) - User authorization managed by grid-mapfile
(mapping between Grid users and local users) - Some shortcomings, but the GSI security model
seems to satisfy our requirements - Some shortcomings already addressed
- INFN-CA used to sign certificates
- CRL (issued by INFN CA) distribution
- Centralized management of grid-mapfiles
8Security
- Centralized management of the grid-mapfiles
- Goal Ease the sharing of the same access
policies (represented by the grid-mapfiles) for
groups of hosts with common purposes - Proposed system
- Central repository (LDAP server) to store user
certificates and to define groups of users - Certificates published by CA manager
- Group manager responsible for editing group
memberships (using a LDAP client) - Resource owners (Globus administrators)
periodically (i.e. cron job) connect to this
repository, download the subject of the
certificates that meet a specified criterion
(i.e. all users of group X), and produce
grid-mapfile entries
9Security
- AFS tests
- Analysis of what can be done now with the
existing tools (quite unfit for any real need) - Possible ways to address the existing
shortcomings identified - New Globus tool (gsiklog) available
10Information Service
- Alessandros presentation
- Evaluation of Globus GIS (Grid Information
Service) - Definition and implementation of a hierarchical
architecture of GIS 1.1.3 - Performance and scalability tests
- Web interface for browsing
- Various shortcomings must be addressed (to use
the GIS in a production environment) - Mixed push/pull model more suitable than a pull
model - Performance
- Lack of security
11INFN GIS Topology
Dcinfn,dcit, ogrid
Top Level INFN GIIS
Expcms, ogrid
Dcbo, Dcinfn, dcit,ogrid
INFN CMS GIIS
Dcpd,Dcinfn, dcit,ogrid
GIIS
GIIS
GRIS
Padova
Bologna
12Resource Management
- Most of these activities as collaboration with
Grid Workload Management work package - Evaluation of Globus resource management
architecture - Evaluation of Globus GRAM
- Tests with fork, Condor, LSF and PBS as
underlying resource management systems - The model is fine, but lack of robustness
(needed for real production environments) - Memory leaks in the Globus job manager (fixed)
- Scalability (one job manager for each job)
- Reliability (the job manager is not persistent)
-
13Globus resource management architecture
(simplified design)
Resource Discovery
RSL
Broker
Grid Information Service (GIS)
Submit jobs
Broker chooses in which resources to submit the
jobs (not implemented in the Globus framework)
RSL
Information on characteristics and status of
local resources
RSL
Globus GRAM as uniform interface to different
local resource management systems
Globus GRAM
Globus GRAM
Globus GRAM
Local Resource Management Systems
CONDOR
LSF
PBS
Site1
Farms
Site2
Site3
14Resource Management
- Evaluation of GRAM API
- Evaluation of GRAM Reporter (cooperation
between GRAM and GIS) in particular for farms - Many useless attributes (at least for our needs),
attributes not calculated (always defined as
0), some attributes not properly calculated by
Globus shell scripts - Some important information describing the farms
and the submitted jobs (necessary for example for
a resource broker) missing - ? Draft proposal for a possible modification of
the default schema - Evaluation of RSL as uniform language to specify
resources - More flexibility required
- Submission of Condor jobs to Globus resources
- Condor-G (useful as a reliable crash-proof job
submission service) - GlideIn
- Evaluation of MPICH-G2 vs. MPICH
- Some shortcomings found (lack of support for
shared memory, worse latency performance wrt.
MPICH)
15Data management
- Tests with GASS
- Service to ease the access to remote files
without having a distributed file system and/or
transferring files from/to remote storage systems
- Tests with command line tools and APIs
- Problems (huge decrease in transfer rate) when
transferring big files - Tests with Globusftp alpha release 2
- Collaboration with WP network INFN-GRID
- Tests of new features
- Support for GSI mechanisms
- Capability of resuming interrupted file transfers
- Throughput tests using parallel data transfers
- Antonios presentation
16Other services
- Fault Monitoring (HBM)
- Evaluation of HBM for fault detection (for
system and user processes) - but the HBM package is not seeing active
development - Execution Environment Management (GEM)
- Evaluation of GEM as service for code migration
- but the GEM service now provides only limited
capabilities (executable staging)
17WP 1 Deliverables Milestones
- Deliverables
- Tools, documentation and operational procedures
for Globus deployment (6 Months) ? - Final report on suitability of the Globus toolkit
as basic Grid infrastructure (6 Months) ? - Milestones
- Basic deployment Grid infrastructure for the INFN
GRID (6 months) ? - Globus installed on 40 machines on 10
different sites
18Conclusions
- The activities of WP 1 are over
- The Globus toolkit can provide basic services
useful to create and deploy usable Grids, but
many shortcomings and issues must be addressed - more details in the report
- Other info http//www.infn.it/globus