Title: Richieste sblocco sj DELPHI
1Grids and OO New directions in computing for HEP
Mirco Mazzucato INFN-Padova
2- Main conclusion of the LHC Comp. Review
- The Panel recommends the multi-tier hierarchical
model proposed by Monarc as one key element of
the LHC computing model with the majority of the
resources not based at CERN 1/3 in 2/3 out - About equal share between Tier0 at CERN, Tier1s
and lower level Tiers down to desktops - Tier0 / S(Tier1) / S (all Tier2 ) 1 /1 /1
- All experiments should perform Data Challenges of
increasing size and complexity until LHC start-up
involving also Tier2 - EU Testbed 30-50 of one LHC experiment by 2004
- Limit heterogeneity OS Linux , Persistency
2 tools max - General consensus that GRID technologies
developed by Datagrid can provide the way to
efficiently realize this infrastructure
3- HEP Monarc Regional Centre Hierarchy
CERN
Tier 0
2.5Gbps
UK
Tier 1
France
INFN 2.5Gbps
Fermilab
2.5Gbps
622Mbps
Tier2 center
Tier 2
622Mbps
Tier 3
Site
Site
Site
100Mbps-1Gbps
Tier 4
desktop
INFN-GRID
4- NICE PICTURE
- .BUT WHAT DOES IT MEANS ?
5- The real Challenge the software
- How to put together all these WAN distributed
resources in a transparent way for the users - transparent means that user should not note the
presence of network and many WAN distributed
sources of resources - As the WEB with good network connectivity
- How to group them dynamically to satisfy virtual
organizations tasks? - Here comes the Grid paradigm
- End of 99 for EU and LHC Computing Start of
DataGrid Project US - GRIDSEnable communities (virtual
organizations) to share geographically
distributed resources as they pursue common
goalsin the absence of central control,
omniscience, trust relationships (Ian Foster _at_
Carl Kesselmann CERN January 2001) - Just in time to answer the question opened by the
Monarc model.
6- Each resource (our Farms in the 90 language) is
transformed by the Grid middleware in a
GridService which is accessible via network - Speaks a well defined protocol
- Has standard APIs
- Contains information on itself which are made
available to an index (accessible via network)
when it register itself - Has a policy which control its access
- Can be used to form more complex GridServices
7The Anatomy of the Grid Enabling Scalable
Virtual Organizations, I. Foster, C. Kesselman,
S. Tuecke, Intl J. Supercomputer Applns, 2001.
www.globus.org/research/papers/anatomy.pdf
- The Globus TeamLayered Grid Architecture
Application
8- ComputingElement(CE)
- StorageElement(SE)
- GridScheduler
- Information and Monitoring
- ReplicaManager(RM)
- FileMover
- ReplicaCatalog
- But also
- UserRunTimeEnvironment
- Network
- SecurityPolicyService
- Accounting
- Well defined interfaces
- Simple dependencies
- Well defined interactions
9EU-DataGrid Architecture
Local Application
Local Database
Local Computing
Grid Application Layer
Grid
Data Management
Metadata Management
Object to File Mapping
Job Management
Collective Services
Information Monitoring
Replica Manager
Grid Scheduler
Underlying Grid Services
Computing Element Services
Authorization Authentication and Accounting
Replica Catalog
Storage Element Services
Service Index
SQL Database Services
Grid
Fabric services
Fabric
Node Installation Management
Monitoring and Fault Tolerance
Fabric Storage Management
Configuration Management
Resource Management
10- The available Basic Services (Globus EDG...)
- Basic and essential services required in a Grid
environment . - Computing and Storage Element Service
- Represent the basic and essential services
required in a Grid environment. These services
include the ability to - submit jobs on remote clusters, (Globus GRAM)
- to transfer files efficiently between sites(
Globus GridFTP) GDMP - schedule jobs on Grid services (EDG Broker)
- The Replica Catalog and Replica Manager (Globus)
- Stores information about physical files stored on
any given Storage Element and manage replicas - The Information Service (Globus MDS2)
- Provide information on available resources
- SQL Database Service (EDG)
- Provide the ability to store Grid Metadata
- Service Index (EDG)
- Stores information on Grid services and access
urls - Security Authentication, Authorization and
Accounting (GlobusEDG) - All the services concerning security on the Grid
- Fabric Transform hardware in a Grid service
(EDG)
11- Interest in Grid technology started to
significantly grow in HENP physics community at
the end of 1999 - Chep2000 (February ) GRID Technology is launched
in HENPinvited talk of I. Foster at the plenary
session introduced basic Grid concepts - Saturday and Sunday after the end of Chep2000
100 people in Padova for the first Globus
tutorial to HENP community in Europe - Summer 2000 Turning point
- Approval of HENP Grid projects GriPhyN and
DataGrid - Many National Grid projects INFN Grid, UK
eScience Grid, .. - HENP Grid community significantly increase
- 2001 Approval of PPDG, iVGDL, DataTAG
- Autumn 2001 Approval of LHC Computing Grid
Project - Chep2001 50 abstracts on Grids
12Grid progress review Experiments
- Experiments are increasingly integrating Grid
technology in their core software - Alice,Atlas,CMS,LHCb,D0, Cosmology
- Extensive tests of available Grid tools using
existing environment - STAR(10-032) Gridftp in production BNL-LBL
- First modification of expts application
environment to integrate available grid software - Definition of architecture for experiments Grid
aware applications - Definition of requirements for future Grid
middleware development -
13ATLAS ATHENA Grid enabled Data Mangement using
Globus Replica Catalog
- When an Athena job creates an event collection in
a physical database file, register data in a
grid-enabled collection - add filename to the (replica catalog) collection
- add filename to location object describing Site A
- (can use OutputDatabase from job options as
filename) - Command-line equivalent of what needs to be done
is - globus-replica-catalog -collection
-add-filenames XXX - globus-replica-catalog -location Site A
-add-filenames \ XXX - (The elides LDAP URL of the collection, and
authentication information)
14ALICE farm
HPSS At CCIN2P3
bbftp
stdout, stderr
Run DB at Catania (MySQL)
bbftp
Globus
Input
Catania CERN Lyon Torino ........
CASTOR At CERN
Monitoring Server at Bari
Anywhere
P. Cerello CHEP2001,
Beijing 3-7/9/2001
15 Alice/Grid Sites Resources
DUBNA
BIRMINGHAM
NIKHEF
SACLAY
GSI
PADOVA
CERN
TORINO
IRB
LYON
BOLOGNA
YEREVAN
BARI
CAGLIARI
COLUMBUS, US
CATANIA
CALCUTTA, IN
CAPETOWN, ZA
MEXICO CITY, MX
16- G-Tools Integration into the CMS Environment
Site A
CMS environment
Physics software
CheckDB script
GDMP system
Write DB
DB completeness check
CMS/GDMP interface
Production federation
catalog
Site B
Purge file
Copy file to MSS
Stage Purge scripts
Stage Purge scripts
Copy file to MSS
MSS
MSS
Transfer attach
Update catalog
Purge file
wan
User federation
User federation
catalog
catalog
Stage file (opt)
trigger
trigger
trigger
read
GDMP export catalog
Subscribers list
GDMP import catalog
Replicate files
write
Generate import catalog
Publish new catalog
Generate new catalog
GDMP server
17Distributed MC production in future (using
DataGRID middleware) LHC-b 10-011
Submit jobs remotely via Web
WP 1 job submission tools WP 4 environment
WP 2 data replication WP 5 API for mass storage
Transfer data to CASTOR (and HPSS, RAL Datastore)
Execute on farm
Update bookkeeping database
WP 1 job submission tools
WP 2 meta data tools WP1 tools
Monitor performance of farm via Web
Data Quality Check Online
WP 3 monitoring tools
Online histogram production using GRID pipes
18Workflow Management for Cosmology
- Approach
- Use the Grid for coordination of remote
facilities, including telescopes, computing and
storage - Use Grid directory-based information service to
find needed computing and storage resources and
to discover access methods appropriate to their
use - Supernova search analysis is now running on the
prototype DOE Science Grid based at Berkeley Lab - They will implement a set of workflow management
services aimed at the DOE Science Grid - Implementation
- SWAP-based (Simplified Workflow Access Protocol)
engine for job submission, tracking and
completion notification - Condor to manage analysis and categorization
tasks with Class Ads to match needs to
resources - DAGman (Directed Acyclic Graph Job Manager) to
schedule parallel execution constrained by
tree-like dependency
19D0 SAM and PPDG 10-037
Python codes, Java codes
Client Applications
D0 Framework C codes
Web
Command line
Request Formulator and Planner
Storage Manager
Request Manager
Collective Services
Job Manager
Cache Manager
Dataset Editor
File Storage Server
Project Master
Station Master
Station Master
SAM Resource Management
Batch Systems - LSF, FBS, PBS, Condor
Job Services
Data Mover
Stager
Optimiser
Significant Event Logger
Naming Service
Database Manager
Catalog Manager
File transfer protocols - ftp, bbftp, rcp
Mass Storage systems protocols e.g. encp, hpss
CORBA
UDP
GridFTP
Catalog protocols
Connectivity and Resource
GSI
SAM-specific user, group, node, station
registration
Bbftp cookie
Authentication and Security
Fabric
Resource and Services Catalog
Tape Storage Elements
Disk Storage Elements
LANs and WANs
Meta-data Catalog
Code Repostory
Compute Elements
Replica Catalog
Indicates component that will be replaced
using PPDG and Grid tools
or added
enhanced
Name in quotes is SAM-given software component
name
20The New DataGrid Middleware
To be delivered October 2001
21- Status of Grid middleware
- Software and Middleware
- Concluded evaluation phase. Basic Grid services
(Globus and Condor) are in installed in several
testbeds INFN, France, UK, US - Need in general more robustness, reliability and
scalability (HEP has hundreds of users, hundreds
of jobs, enormous data sets) - But DataGrid and US Testbeds 0 are up and running
- Solved problems of multiple CA, Authorization
- Release 1 of Datagrid middleware is expected this
week - Real experiments applications will use GRID
software in production (ALICE, ATLAS, CMS, LHC-B,
but also EO, biology, Virgo/LIGO .) - DataGrid Testbed 1 in November will include major
Tier1..Tiern Centers in Europe and will be soon
extended to US.
22- Summary on Grid developments
- Activities still mainly concentrated on
strategies, architectures, tests - General adoption of Globus concept of layered
architecture - General adoption of Globus basic services
- Core Data Grid services transport (GridFTP),
Replica Management and Replica Catalog - Resource management (GRAM), information services
(MDS) - Security and policy for collaborative groups
(PKI) - but new middleware tools start to appear and
being largely used - Broker, GDMP, Condor-G.
- In general good collaboration between EU-US Grid
developers - GDMP, Condor-G, Improvements in Globus Resource
Management - Progress facilitated by largely shared Open
Source approach - Experiments getting on top of Grid activities
- CMS requirements for the Grid
- DataGrid WP8 requirement document (100 pages for
LHC expts, EO and Biology) - Need to plan carefully next iteration of Grid
middleware development (realistic application
requirements, results of testbeds)
23Grids and Mass Storage
- HENP world has adopted many different MSS
solutions - Castor, ADSM/TSM, ENSTORE,Eurostore HPSS,JASMine
- All present same (good) functionalities but
- Different client API
- Different Data handling and distributuion
- Different Hardware support and monitoring
- and many different Databases solutions
- Objectivity (OO Db), Root( File based), Oracle
- Difficult to interoperate. ..Possible way out
- Adopt neutral database Object description that
allows movement between platforms and DBs e.g.
(Atlas) Data DictionaryDescription Language
(DDDL) - Adopt Grid standard access layer on top of
different native access methods as GRAM over LSF,
PBS, Condor...
24Grid and OO Simulation Reconstruction
- Geant4 (the OO simulation toolkit) is slowly
reaching HENP experiments - Extensive debugging of Hadronic models with test
beams, Geometry descriptions, low energy e.m.
descriptions. - Expected to be adopted soon as basic production
simulation tool by many experiments Babar, LHC
expts... - CMS has OSCAR (Geat4) simulation and ORCA
reconstruction fully integrated in their
Framework COBRA - Preliminary tests of simulation and
reconstruction on the Grid done by all LHC expts
Babar, D0. - Need to plan now Grid aware Framework to fully
profit of Grid middleware
25- Large developments are ongoing on Grid middleware
in parallel in EU and US Workflow and Data
Management, Information Services - All adopt Open Source approach
- Several experiments are developing Job and Meta
Data Managers - natural and safe
- ..but strong coordination is needed to avoid
divergent solutions - InterGrid organization EU-US-Asia for HENP world
- Global Grid Forum for general standardization of
protocols and API - Grid projects should develop a new world-wide
standard engine to provide transparent access
to resources (computing, storage, network.) - As the WEB engine for information in early 90
- Since Source codes are available better to
improve existing tool than starting parallel
divergent solution - Big Science like HENP due this to the worldwide
tax payers - HENP Grid infancy ends with the LHC Computing
Grid project and Chep2001