Title: Mirco Mazzucato Infn Padova 1
1The Italian eInfrastructure (Internet
and Grids)Current achievements and INFN plans
for the future evolutions
- Comitato Tecnico Scientifico del Garr
- 15 luglio 2004
- Mirco Mazzucato
- INFN-Padova
- mirco.mazzucato_at_pd.infn.it
2Content
- The Italian eInfrastructure evolution
- INFN Grid, Grid.it, S-Paci, .
- The lessons from the LHC stress test of Grid
technologies as they are implemented in the
following Grid software Releases - Globus toolkit v.2.x
- DataGrid v2.x (EU-DataGrid is the European
project just ended) - LCG v2 (LHC Computing Grid project for HEP
experiments at di LHC collider at CERN selection
of services from Globus, Condor, DataGrid,
DataTAG) - INFN-Grid/GRID.IT v2 (customization of LCG v2 by
INFN within the italian grid project, GRID.IT) - Issues for the future
- The next steps
3Early Grid RD in Italy The INFN-GRID Project
- First national Grid project approved in Europe
beg. 2000 - Focused on the preparation of the INFN LHC comp.
infrastructure - The size of the project 20 Italian Sites, 100
people, 50 FTEs - Budget devoted to the development of the LHC
Regional Computing Centers and related
collaborative Grid infrastructure - ..but since the beginning the development of the
middleware in INFN Grid was conceived as being of
general use and has taken into account the
requirements of other sciences - Biology (PD) and Earth Observation(Esrin-ESA-Frasc
ati) - It is a successful example of collaboration
between physicists, sw engineers, computer
professionals and computer scientists (CS Dep. of
Universities of VE, PD, BO, CT, TO,) and
Italian Industries - DatamatSPA and Nice have been major contributors
in the joint developments of the Italian DataGrid
middleware components - INFN Grid has been and is the national container
for INFN to coordinate the contribution to all EU
and International Grid projects and to the GGF
standardization - Early RD in Italy include work done in ISUFI
(University of Lecce) - -see S-PACI
4The INFN Grid project and the Italian
eInfrastructure
- INFN Condor on WAN (started 1996, operational in
1998) - Integrate 20 sites CPU resources into a
national pool with 6 Ckpt domains - National testbed to evaluate Globus services in
1999 - INFN-GRID, INFN special project, (February
2000-) - National Grid infrastructure driven by INFN
experiments 2-3 M/year22 M T1,2 - DATAGRID , CERN Coord. EU Project, 3 years
duration 10MEuro(2001-2003) - European integration and new M/W services for
HEP, Biology, EO - DataTAG, CERN Coord. EU project, 2 years duration
4 MEuro(2002-2003) - Optical networking 1TB/0.5 Hours and
Interoperability with US Grid, GLUE - Grid.IT, National project 3 years (2003-2005)
MIUR funds 8.1 MEuro - Towards a national production eInfrastructure
- eBusiness, eIndustry, eGovernment, EScience and,
Technology -- (BIGEST) Italian Grid Initiative
(2003 -) - Coordination of all national eInfrastructure
activities - The Italian grid infrastructure in the new EU
project EGEE(2004 -) 32 MEuro INFN, S-PACI,
ENEA...link with CINECA for DEISA - The new production EU eInfrastructure for all
Sciences and beyond - 9. LCG the world-wide Grid for LHC experiments
(2002 -)
5The national Grid.it eInfrastructure
- In Grid.it INFN is responsible for the RD and
creation of a national Grid Infrastructure and
for studying and prototyping a national Grid
Operation Service (GOS) - The generalization of the infrastructure support
to other Sciences from INFN is a model
successfully established in the past with the
research network (INFNET - GARR) - Resources are provided by INFN and major Italian
Centers (S-Paci, Naples Campus Grid....) - The GOS support several Italian Sciences
applications and the operation of the Italian
infrastructure also in the context of the new
European Infrastructure project EGEE - The Italian eScience Grid.it infrastructure
currently support - Astrophysics
- Biology
- Computational Chemistry
- Geophysics
- Earth Observation
- but other sciences are joining thanks to new MIUR
funds
6Grid.IT Production Grid 0perations Portal
- User documentation
- site managers documentation
- Software repository
- Monitoring
- Trouble tickets system
- Knowledge base
http//grid-it.cnaf.infn.it
7Get your personal certificate
Clear, simple and automated procedure to
allow all Italian Institutions to set up a
Registration Authority and get INFN Certificates
8How to register to a VO
9Grid groups within the Grid.it support system
Trouble Ticketing System
http//helpdesk.oneorzero.com
10Grid services suported by Grid.it
User Interface
Grid Monitoring (GridICE)
VO server CMS
VO server atlas
VO server inaf
Information Index
Resource Broker
BDII
INFN-Milano
INFN-Padova
INFN-Napoli
INFN-CNAF
Computing Element
Storage Element
Computing Element
Storage Element
GIIS
GRIS1
GIIS
GRIS1
SRM
SRM
GRIS
GRIS
GRAM
GRAM
RLS
WorkerNode
WorkerNode
WorkerNode
WorkerNode
...
WorkerNode
...
WorkerNode
11(No Transcript)
12GridICE
- A monitoring system developed for the INFN Grid
Infrastructure and adopted by LCG - Selects grid entities (resources and services)
per VOs and sites - Automatic discovery based on Grid Information
Service (Globus/MDS2.x, BDII) and Glue schema
extensions. - Layered architecture
- Measurement service (local monitoring interfaced
to the GIS ) - Publisher service
- Data collector service with auto-discovery
feature - Data Analyzer Detection and notification
service (on going) - Presentation service via web interface
- Modularity, flexibility and interoperability
- Ongoing activities
- Integration of network resource monitoring
13Grid ICE components
WEB
GIIS Server
Gfx/Presentation
GIIS
1
LDAP
SQL
MonitorigServer
2
LDAP
3
4
GRIS
1 LDAP Query 2 available CE/SE 3 LDAP Query 4
CEIDs, WNs, Steps 3,4 repeated for every CE/SE
Computing Element/Storage Element
14Data presentation (3)
15Resources
16Services
17Grid Service monitoring
18General User Interface GENIUS
PORTALJointly developed by INFN and Nice
- Based on WEB portal architecture
- Support for generic applications
- Basic Requirement Grid transparent access
- It must be accessed from everywhere and by
everything (desktop, laptop, PDA, WAP phone). - It must be redundantly secure at all levels 1)
secure for web transactions, 2) secure for user
credentials, 3) secure for user authentication,
4) secure at VO level. - All available grid services must be incorporated
in a logic way, just one mouse click away. - Its layout must be easily understandable and user
friendly.
19GENIUS how it works
WEB Browser
GENIUS
Local WS
EnginFrame
Apache
WMS UI
From Roberto Barbera
20GENIUS interfaced to 100 Grid services
Roberto Barbera
21Graphic job description
In collaboration with DATAMAT, Italy
Roberto Barbera
Roberto Barbera
22GENIUS PDA version (1)
Roberto Barbera
23GENIUS PDA version (2)
Roberto Barbera
24Italian Grid now (Site/resource map)
INFN CMS T2 T2/3 Atlas T2 T2/3
Alice T2 T2/3 LHCb T2 T2/3
Babar VIRGO T2 (150 nodes 50 TB) T3 (10-15
nodes) T1 Cnaf (800 nodes, 220TB disk 1600 TB
Tape MSS) grid.it resources INFN (15-25
nodes) INAF (5-10 nodes) INGV (NEC
computers), BIO (tbd) general purpose
resources (8-15 nodes)
TRENTO
UDINE
MILANO
PADOVA
TORINO
LNL
PAVIA
National Grid (Internet)
TRIESTE
FERRARA
GENOVA
PARMA
CNAF
BOLOGNA
PISA
FIRENZE
S.Piero
PERUGIA
LNGS
ROMA
LAQUILA
ROMA2
LNF
SASSARI
NAPOLI
BARI
LECCE
SALERNO
CAGLIARI
COSENZA
PALERMO
CATANIA
LNS
Tot. 1400 nodes, 2800 processors
25The new INFN national computing facility at
CNAF(BO)
- 1250 KVA Power Generator with 5,000 l oil tank to
be safe against power-cuts. - 800 KVA Uninterruptable Power Supply with
batteries lasting for 10 at nominal power. - 570 KW Cooling System.
- 1000 m2 Computing Room.
- GARR Giga-PoP with multiple 2.5 Gbps backbone
lines and 1Gbps Wide Area Network Access - CPU 800 1U nodes, 1.6K Intel processors, 1.3
MKSI2K - Disk 220 TB
- Tape library 1.6 PB
- Will grow x 4 to meet LHC expts requirements in
2007
26Global Grid services view in Grid-it/EDG 2.0/LCG-2
users
CA services VOMS Policy and Accounting
Monitoring
Users WMS-UI, Genius
Applications and WMS-API
Collective services Grid scheduler (RB), replica
manager (RLS).
Resource or Core services GRAM, GSI, GRIS ,
Basic Data Access (SRM).
Grid Resources and Local Services layer Compute
Element, Storage Element, Network, Local
Authorization Service, Local policy Service,
Local Scheduler
Network Layer
27Grid.it services were stressed by INFN in LHC
expts 2004 Data Challenges(104 simultaneous
jobs, 106 files, 20 TB data mov.)
- Grid job submission INFN/EDG WMS/RB
- RJS (Remote Job Submission) to specific
computers - The user submits a job on a WAN computing system
providing the address - RJS to Grid domains (set of computers) Via a Grid
scheduler without knowing the destination
computers MatchmakingOptimization - Need of a Grid Information System LCG BDII
- Data Management
- Data replica management
- Remote Data location via File catalogue and
metadata catalogue (RLS, Replica Location
Service) WEB Services interface - Performace issues
- Data transfer and access,
- GridFTP, Globus-url-copy Provides
high-performance, reliable data transfer - RM provide optimized transfer
- Storage Resource Management (SRM) for
- Storage allocation
- File pinning..
- Grid User Interface
28Grid services tests
- General Services
- Security based on Globus GSI (Grid Security
Infrastructure) - Standard Protocols X.509 certificates, PKI,
GSS-API . - Login once (credential delegation)
- VO oriented Authentication/Authorization tools
(VOMS) - Monitoring System
- VO oriented Policy and accounting system
- VO oriented User Support systems
29Grid Information Service
- The Information Service plays a fundamental role
since resource discovery and decision making is
based upon the information service
infrastructure. Basically an IS is needed to
collect and organize, in a coherent manner,
information about grid resources and status and
make them available to the consumer entities. - Resource schema conceptual model of grid
resources to be used as a base schema of the Grid
Information Service for discovery and monitoring
purposes. - GLUE schemas aims to provide standards for
- Computing Service information model
- Storage Manager Service information model
- Network Service information model (connectivity
between Grid Domains) - New astronomical catalogue model (just started
within Grid.it as collaboration between INFN and
INAF)
30GLUE WHO and WHEN
- GLUE (Grid Laboratory Uniform Environment),
promoted by EU-DataTAG and US-iVDGL in the
context of the High Energy and Nuclear Physics
InterGrid Joint Technical Board (HI-JTB
http//www.hicb.org/) - GLUE is a collaborate joint efforts towards
standards based global Grid infrastructure for
the HEP Experiments, focusing on interoperability
issues between US and EU HEP Grid related
projects. - Frst Results on Grid resource schema and user
authentication/authorization management.
Contributions from DataGrid, Globus, PPDG and
GriPhyn - GLUE Schema activity started in April 2002
- Objective define a common schema to represent
Grid resources in order to support the activity
of discovery and monitoring
31Included in GT2 and EDG 2.0 release
32Included in GT2 and EDG 2.0 release
33(No Transcript)
34INFN/DataGrid Information Directory Information
Tree
35Real Grid Job Submission is allowed via the
Work load Management Service (INFN within
EU-DataGrid)
- The user interacts with Grid via a Workload
Management System (not directly with GRAM) - The Goal of WMS is the distributed scheduling
and resource management in a Grid environment. - What does it allow Grid users to do?
- To submit their jobs via a Job description
language - To execute them
- selecting the CE
- or leaving WMS to optimize according to data and
CPU availability - To get information about their status
- To retrieve their output
- The WMS tries to optimize the usage of resources
using Match Making and re-scheduling
collective
36GUI APIs
37WMS Components
- WMS is currently composed of the following parts
- User Interface (UI) access point for the user
to the WMS - Resource Broker (RB) the broker of GRID
resources, responsible to find the best
resources where to submit jobs - Job Submission Service (JSS) provides a
reliable submission system - Information Index (II) a specialized Globus
GIIS (LDAP server) used by the Resource Broker as
a filter to the information service (IS) to
select resources - Logging and Bookkeeping services (LB) store Job
Info available for users to query
38WMS Architecture in EDG v2.x
RLS
RB node
Network Server
Match- Maker/ Broker
Inform. Service
Workload Manager
Job Adapter
RB storage
Job Contr. - CondorG
Logging Bookkeeping
CE characts status
SE characts status
Log Monitor
39WMS release 2 functionalities
- User APIs
- GUI
- Support for interactive jobs
- Job checkpointing
- Support for parallel jobs
- Support for automatic output data upload and
registration - VOMS(Managerment Service) support
- Support for job dependencies (via DAGman)
- Lazy scheduling job (node) bound to a resource
(by RB) just before that job can be submitted
(i.e. when it is free of dependencies)
40WMS Future activities
- New functionalities
- Support for job partitioning (available soon)
- Use of job checkpointing and DAGMan mechanisms
- Original job partitioned in sub-jobs which can be
executed in parallel - At the end each sub-job must save a final state,
then retrieved by a job aggregator, responsible
to collect the results of the sub-jobs and
produce the overall output - Grid Accounting DGAS (testing in progress)
- Based upon a computational economy model
- Users pay in order to execute their jobs on the
resources and the owner of the resources earn
credits by executing the user jobs - Start with providing detailed accounting of
resource usage - Scheduling optimization
- VO-based Resource access Policy support PBOX
(development in progress) Grid resource sharing
requires - to deploy VO-wide policies.
- to respect local site policies.
- to specify policies relating to the behavior of
the grid as a whole. - RB will take decision (Match Making) on the
basis of VO/user policies
41Grid Storage Element Interfaces
- First version of SE in LCG DC
- disk server with GridFTP and NFS server protocols
- Future SE version
- SRM interface
- Management and control
- SRM (with possible evolution)
- Posix-like File I/O
- File Access
- Open, read, write
- Not real posix (like rfio)
Storage Management
SRM interface
POSIXAPI File I/O
rfio
dcap
chirp
aio
File access
dCache
NeST
Castor MSS
Disk
42Data Management services
- A Replica Location Service (RLS) is a distributed
registry service that records the locations of
data copies and allows discovery of replicas - Maintains mappings between logical identifiers
and target names - Physical targets Map to exact locations of
replicated data - Logical targets Map to another layer of logical
names, allowing storage systems to move data
without informing the RLS - Provide optimization to replica access
- RLS was designed and implemented in a
collaboration between the Globus project and the
DataGrid project - Different interfaces
- WMS interacts with RLS to optimize job scheduling
43WMS DM architecture/interaction
Resource Broker
Information Service
User Interface
Replica Location Service
Replica Manager Client
Use
Communication
DM communication
Replica Optimisation
StorageElementMonitor
StorageElement
NetworkMonitor
44General Grid services
- General Services
- Security
- Login once
- Based on Public Key Infrastructure
- VO oriented Authentication/Authorization tools
- VO oriented Policy and Accounting
- Monitoring System
- User Support System
45User interacts with CA, VO and Resource Providers
- Certificates are issued by a set of well-defined
Certification Authorities (CAs). - Grant authorization at the VO level.
- Each VO has its own VOMS server.
- Contains (group / role / capabilities) triples
for each member of the VO. - RPs evaluate authorization granted by VO to a
user and map into local credentials to access
resources
- CAs Policies and procedures ? mutual thrust
2
1
4
VO-Manager (administer user membership, roles
and Capabilities)
cert-request
3
agreement
cert signing
Resource provider (map into Local credential)
cert/crl update
R B
Service
46Near futureVO oriented Policy system PBOX
- Policy Examples
- Users belonging to group /vo/a may only submit 10
jobs a day. - Users belonging to group /vo/b should have their
jobs submitted on the max priority queue. - User some user is banned from the CNAF site.
- Requirements. The system should
- Be VO-based and distributed.
- Be highly configurable and able to define and
enforce previously unknown types of policies. - Leave total control on local sites to local
admins. - Be capable of express policies requiring a global
view of the grid. - Be compliant to existing protocols and not
require their redesign. - Objective help the Workload Management System in
Grid resource selection.
47 PBOX
User
VO PBOX
VO Admin
Resource Broker
Policy enforcement point (PEP)
Policy decision point (PDP)
Policy Communication Interface (PCI)
Site Admin
Policy Communication Interface (PCI)
Grid or FARM PBOX
Policy enforcement point (PEP)
Policy decision point (PDP)
48Conclusions
- First generation of Grid services are ready for
production Grids and already in use in LCG in EU,
Grid.IT in Italy and in Grid3 in US. - They are still evolving for more functionalities,
robustness and security - Application as LHC experiments Data Challenges
indicate clear directions for the evolution to
satisfy those communities - Major basic services are still completely missing
or missing very important functionalities
required by user communities - Metadata Catalogs, User defined collections,
Reliable Data and Metadata replication services,
Policy Framework.. - Next major step now is towards Bringing Grid
Web Services Together - The Web Service Resource Framework WSRF is the
new proposed standard Modeling Stateful
Resources with Web Services - WS-Notification
- Provides a publish-subscribe messaging capability
for Web Services - WS-Resource framework Transactions..
- However implementation of basic WSRF services
still missing. No measurement of performance - LHC experiments (10 people) need to have a fully
operational infrastructure in place for 2007.
Should concentrate on providing basic services
especially those missing