Title: les robertson cernit 1
1LHC Computing Grid Project
- Status of High Level Planning
- LHCC 5 July 2002
- Les Robertson, Project Leader
- CERN, IT Division
- les.robertson_at_cern.ch
2Fundamental Goal of the LCG
- The experiments computing projects are
fundamental to getting the best, most reliable
and accurate physics results from the data coming
from the detectors - The goal of the LCG project is to help the
experiments do this
3High-level Goals
- Phase 1 2002-05
- To prepare and deploy the environment for LHC
computing - development/support for applications libraries,
tools, frameworks, data management (inc.
persistency), .. common components - develop/acquire the software for managing a
distributed computing system on the scale
required for LHC the local computing fabric,
integration of the fabrics into a global grid - put in place a pilot service
- proof of concept the technology and the
distributed analysis environment - platform for learning how to manage and use the
system - provide a solid service for physics and computing
data challenges - produce a TDR describing the distributed LHC
computing system for the first years of LHC
running - maintain opportunities for re-use of developments
outside the LHC programme - Phase 2 2006-08 acquire, build and operate the
LHC computing service
4Background Environment
- The projects starts in an environment where
- there is already a great deal of activity
- requirements are changing as understanding and
experience develop - some fundamental parts of the environment are
evolving more or less independently of the
project and LHC - the scope of the applications component of the
project is being defined with experiments over
the next 18 months - basic requirements for the computing facility
from the report of the LHC Computing Review -
February 2001 - changing due to review of trigger rates, event
sizes, experience with program prototypes, ----
and will continue to change as experience is
gained with applications and the analysis model
is developed - technology is in continuous evolution
- driven by market forces (processors, storage,
networking, ..) - and by government-funded research (grid
middleware) - we have to follow these developments - remain
flexible, open to change - Regional Computing Centres
- established user communities wider than LHC
many external constraints - limited experience of collaborating to provide an
integrated service - project funding is from many sources, each with
its own constraints
5Funding Sources
- Regional centres providing resources for LHC
experiments - in many cases facility shared between experiments
(LHC and non-LHC) and maybe with other sciences - Grid projects suppliers and maintainers of
middleware - CERN personnel and materials - including special
contributions from member and observer states
this is the highest priority for CERN computing
staff - Experiment resources
- people participating in common applications
developments, data challenges, .. - computing resources provided through Regional
Centres - Industrial contributions
6Funding Sources
- Regional centres providing resources for LHC
experiments - in many cases facility shared between experiments
(LHC and non-LHC) and maybe with other sciences - Grid projects suppliers and maintainers of
middleware - CERN personnel and materials - including special
contributions from member and observer states
this is the highest priority for CERN computing
staff - Experiment resources
- people participating in common applications
developments, data challenges, .. - computing resources provided through Regional
Centres - Industrial contributions
- The project has differing degrees of management
control and influence - Some of the funding has been provided because
HEP LHC are seen as computing ground-breakers
for Grid technology -- - -- so we must deliver for LHC and show the
relevance for other sciences - -- also must be sensitive to potential
opportunities for non-HEP funding of Phase 2
7The LHC Computing Grid Project Organisation
LHCC
Common Computing RRB (funding agencies)
Reviews
Resources
Reports
Project Overview Board
Requirements, Monitoring
ProjectExecutionBoard
Software andComputingCommittee(SC2)
8SC2 PEB Roles
- SC2 includes the four experiments, Tier 1
Regional Centres - SC2 identifies common solutions and sets
requirements for the project - may use an RTAG Requirements and Technical
Assessment Group - limited scope, two-month lifetime with
intermediate report - one member per experiment experts
- PEB manages the implementation
- organising projects, work packages
- coordinating between the Regional Centres
- collaborating with Grid projects
- organising grid services
- SC2 approves the work plan, monitors progress
9SC2 Monitors Progress of the Project
- Receives regular status reports
- Written status report every 6 months
- milestones, performance, resources
- estimates time and cost to complete
- Organises a peer-review
- about once a year
- presentations by the different components of the
project - review of documents
- review of planning data
10RTAG status
- in application software area
- data persistency finished 05apr02
- software support process finished 06may02
- mathematical libraries finished 02may02
- detector geometry description started
- Monte Carlo generators starting
- applications architectural blueprint started
- in fabric area
- mass storage requirements finished 03may02
- in Grid technology and deployment area
- Grid technology use cases finished 07jun02
- Regional Centre categorisation finished
07jun02 - Current status of RTAGs (and available reports)
on www.cern.ch/lcg/sc2
11Project Execution Organisation
- Four areas each with area project manager
- Applications
- Grid Technology
- Fabrics
- Grid deployment
12Applications Area
- Area manager Torre Wenaus
- Open weekly applications area meeting
- Software Architects Committee set up
- process for taking LCG-wide software decisions
- Importance of RTAGs to define scope
- Common projects
- everything that is not an experiment-specific
component is a potential candidate for a common
project - important changes are under way
- new persistency strategy
- integration of Geant 4 and Fluka(Alice) into the
experiments frameworks - good time to define common solutions, but there
will be inevitable delays in agreeing
requirements, organising common resources - long term advantages in us of resources, support,
maintenance
13Applications Area
- Key work packages
- Object persistency system
- agreement on hybrid solution (root, Relational
Database Management System) - Common frameworks for simulation and analysis
- Difficulties in getting agreement on simulation
requirements - Architectural blueprint RTAG started opening
the way to RTAGs/work on analysis components?
14Candidate RTAGs
Simulation tools Detector description
model Conditions database Data
dictionary Interactive frameworks Statistical
analysis Detector event visualization Physics
packages Framework services C class libraries
Event processing framework Distributed analysis
interfaces Distributed production systems Small
scale persistency Software testing Software
distribution OO language usage LCG benchmarking
suite Online notebooks
Completing the RTAGs - setting the requirements
will take about 2 years
15 16The MONARC Multi-Tier Model (1999)
Tier 0 - recording, reconstruction
les.robertson_at_cern.ch
17Building a Grid
Collaborating Computer Centres
18Building a Grid
?The virtual LHC Computing Centre
Grid
Collaborating Computer Centres
Alice VO
CMS VO
19Virtual Computing Centre
- The user ---
- sees the image of a single cluster
- does not need to know - where the data is
- - where the
processing capacity is - - how
things are interconnected - - the
details of the different hardware - and is not concerned by the conflicting
policies of the
equipment owners and managers
20Grid Technology Area
- Area Manager Fabrizio Gagliardi
- Ensures that the appropriate middleware is
available - Dependency on deliverables supplied and
maintained by the Grid projects - Many RD projects in Europe and US with strong
HEP participation/leadership - Immature technology evolving, parallel
developments - conflict between new functionality and stability
- scope for divergence, especially trans-Atlantic
- It is proving hard to get the first production
grids going - from demonstration to service - Can these projects provide long-term support and
maintenance? - HICB (High Energy Nuclear Physics Intergrid
Collaboration Board) ? GLUE - recommendations
for compatible US-European middleware - LCG will have to make hard decisions on
middleware towards the end of this year
21Fabric Area
- Area Manager Bernd Panzer
- Tier 1,2 centre collaboration
- develop/share experience on installing and
operating a Grid - exchange information on planning and experience
of large fabric management - look for areas for collaboration and cooperation
- Grid-Fabric integration middleware
- Technology assessment
- likely evolution, cost estimates
- CERN Tier 01 centre
- Automated systems management package
- Evolution operation of CERN prototype
integrating the base LHC computing services in
the LCG grid
22Grid Deployment Area
- Grid Deployment Area manager not yet appointed
- Job is to set up and operate a Global Grid
Service - stable, reliable, manageable Grid for Data
Challenges and regular production work - integrating computing fabrics at Regional Centres
- learn how to provide support, maintenance,
operation - Grid Deployment Board Mirco Mazzucato
- Regional Centre senior management
- Grid deployment standards and policies
- authentication, authorisation, formal agreements,
computing rules, sharing, reporting, accounting,
.. - first meeting in September
23Grid Deployment Teams the plan
suppliers integration teams provide tested
releases
common applications s/w
Trillium - US grid middleware
DataGrid middleware
certification, build distribution
LCG infrastructure coordination operation
user support
grid operation
call centre
fabric operation regional centre A
fabric operation regional centre B
fabric operation regional centre Y
fabric operation regional centre X
24Staffing summary
- Foil to be provided -
- summary of staff available at CERN
- statement about staff available in Regional
Centres
25Status of Planning
- Launch workshop in March 2002 established broad
priorities - Establishing the high-level goals, deliverables
and milestones - Beginning to build the PBS and WBS as the staff
builds up and the detailed requirements and
possibilities emerge - Detailed planning will take some time - end of
2002, beginning 2003 - many things are not yet
clear - Applications requirements need further work by
SC2 (RTAGs) - Grid Technology negotiation of deliverables
from Grid projects - Grid Deployment agreements with Regional
Centres (GDB) - This is computing success requires flexibility
getting the right balance between - reliable, tested, solid technology
- exploiting leading edge developments that give
major benefits - early recognition of de facto standards
26- Proposed High Level Milestones
27Tactics
- First data is in 2007 LCG should focus on
long-term goals - the difficult problems of distributed data
analysis unpredictable (chaotic) usage patterns
masses of data batch and interactive - reliable, stable, dependable services
- LCG must leverage current solutions, set
realistic targets? short term (this year) - use current (classic) solutions for physics data
challenges (event productions) - consolidate (stabilise, maintain) middleware
and see it used for physics - learn what a production grid really means by
working with the Grid RD projects - get the new data persistency prototype going
- ? medium term (next year)
- make a first release of the persistency system
- Set up a reliable global grid service (not too
many nodes, but in three continents) - Stabilise it
- Grow it to include all active Tier2 centres, with
support for some Tier 3 centres
28Proposed Level 1 Milestones
29Proposed Level 1 Milestones
Hybrid Event Store available for general users
applications
Distributed production using grid services
Distributed end-user interactive analysis
Full Persistency Framework
grid
LHC Global Grid TDR
50 prototype (LCG-3) available
LCG-1 reliability and performance targets
First Global Grid Service (LCG-1) available
30Major Risks
- Complexity of the project Regional Centres,
Grid projects, experiments, funding sources and
funding motivation - Grid technology
- immaturity
- number of development projects
- US-Europe compatibility
- Materials funding at CERN
- about 60 of Phase 1 funding not yet identified
- includes the investments to prepare the CERN
Computer Centre for the giant computing fabrics
needed in Phase 2 - situation about the missing CHF 80M for Phase 2
unclear - CHF 60M removed from LHC cost-to-completion
- CHF 20M already to be found from other sources
31Materials Budget Evolution(from Computing RRB
April 2002)
32LCG and the LHCC
- LCG Phase 1 was approved by Council
- deliverables are common applications tools and
components TDR for Phase 2 computing
facility - We do not have an LHCC-approved proposal as a
starting point - LHCC Referees have been appointed
- During the rest of this year, while the detailed
planning is being done, we need some discussion
with the referees to - - Ensure that the LHCC has the background and
planning information it needs - Agree on the Level 1 milestones to be tracked by
the LHCC - Agree on reporting style and frequency