Title: les robertson cernit 1
1LHC Computing Grid Project
- Status Report LHCC Open Session
- 2 October 2002
- Les Robertson, CERN
- les.robertson_at_cern.ch
- http//www.cern.ch/lcg
2The Goal of the LCG
- To help the experiments computing projects
prepare, build and operate the computing
environment needed to manage and analyse the data
coming from the detectors - Phase 1 2002-05prepare and deploy a prototype
of the environment for LHC computing - Phase 2 2006-08acquire, build and operate the
LHC computing service
matthias.kasemann_at_fnal.gov
3Background
- Recommendations of the LHC Computing Review
CERN-LHCC-2001-004 20 February 2001 - Common solutions and support for applications
- Estimates of total requirements and costs
- Distributed computing environment using Grid
technology - Data recording and reconstruction at CERN
- Analysis in Regional Centres and CERN (CERN only
1/8 of total Tier 1 analysis capacity) - Simulation in Regional Centres
- Launch Committee CERN/2379/Rev ? Council 20
September 200 - organisation
- integrating and coordinating work done by
experiments, regional centres, grid projects - separating requirements setting (SC2) from
implementation (PEB) - project reviewed by LHCC
- Computing Resource Review Board common to the
four experiments
4SC2 ( PEB) Roles
- SC2 brings together the four experiments and
major Tier 1 Regional Centers - it identifies common domains and sets
requirements for the project - may use an RTAG Requirements and Technical
Assessment Group - limited scope, two-month lifetime with
intermediate report - one member per experiment experts
- PEB manages the implementation
- organizing projects, work packages
- coordinating between the Regional Centers
- collaborating with Grid projects
- organizing grid services
- SC2 approves the work plan, monitors progress
LHCC
Computing RRB
Overview Board
ProjectExecutionBoard
Software andComputingCommittee
matthias.kasemann_at_fnal.gov
5RTAG status
- On applications
- data persistency finished 05apr02
- software support process finished 06may02
- mathematical libraries finished 02may02
- detector geometry description to finish in
October - Monte Carlo generators to finish in October
- applications architectural blueprint to finish
in October - Detector simulation to finish in October
- On Fabrics
- mass storage requirements finished 03may02
- On Grid technology and deployment area
- Grid technology use cases finished 07jun02
- Regional Center categorization finished
07jun02 - Current status of RTAGs (and available reports)
on www.cern.ch/lcg/sc2
matthias.kasemann_at_fnal.gov
6SC2 Monitors Progress of the Project
- Requirements for several key work packages have
been defined - PEB has turned these into workplans
- Data Persistency, Software Support, Mass Storage
- Other workplans are in preparation
- Grid use cases, Mathematical Libraries
- Key requirements are scheduled to finish in
October - Detector Simulation , detector geometry
description, Monte Carlo Generators - Blueprint for LHC architecture
- This will trigger important further activity in
application developments.
matthias.kasemann_at_fnal.gov
7Project Execution
- Four areas
- Applications
- Grid Technology
- Fabric Management
- Grid Deployment
8Project Execution Board
- Decision taking - as close as possible to the
work - by those who will be responsible for the
consequences - Two bodies set up to coordinate take decisions
- Architects Forum
- software architect from each experiment and the
application area manager - makes common design decisions and agreements
between experiments in the applications area - supported by a weekly applications area meeting
open to all participants - Grid Deployment Board
- representatives from the experiments and from
each country with an active Regional Centre
taking part in the LCG Grid Service - forges the agreements, takes the decisions,
defines the standards and policies that are
needed to set up and manage the LCG Global Grid
Services - coordinates the planning of resources for physics
and computing data challenges
9Project Planning Resources PHASE 1
- Launch Workshop at CERN 11-15 March 2002
- set the scope and priorities for the project
- High level planning paper prepared and presented
to LHCC in July - see www.cern.ch/lcg/peb ? Status of High Level
Planning - planning evolving rapidly aim to have a formal
WBS plan byend 2003 - Voluntary special funding to fill gap between
CERN base budget and the estimated costs of Phase
1 - Doing well with human resources
- Only 50 of the materials gap has been filled
10Recruitment of staff from special funding
- 50 people expected to have joined with
external funding by end of year - Some additional commitments Germany, Italy
Experience-weighted FTEs
11LCG Phase 1 Staffing at CERN(without EP,
Experiment Staff)
Planned staff level for phase 2
12experience-weighted FTEs
Scheduling issues -- in the applications area
slower than expected agreement on
requirements fast ramp-up of qualified staff --
in the grid deployment area more work than
originally anticipated later arrival of stuff
13Estimated Materials Costs Phase 1
14CERN IT Materials
15Applications area
- Three active projects
- Software Process and Infrastructure (SPI)
- Persistency Framework (POOL)
- Math Libraries.
- Common staffing
- Applications projects will integrate staff from
experiments, IT and EP Divisions - Already achieved with POOL project
- Migration of key staff to building 32
- Urgent to get requirements agreed on
- simulation
- architectural blueprint ? analysis
16Pool Project Status (D.Duellmann)
- First internal release this week
- First interface definition and design round for
core components concluded - Project s/w and build infrastructure in place
- Core component implementation ready
- Release Target - navigation between persistent
objects operational - On track wrt work plan proposed to SC2/PEB early
August - One more internal release in October
- First external release in November
- Very tight (but feasible) schedule
- More details available on the Pool Web Site
- http//lcgapp.cern.ch/project/persist/
- planning and design documents for core components
dirk.duellmann_at_cern.ch
17LCG SPI Status (A.Aimar)
- Services
- AFS delivery area
- CVS server
- Build platforms
- Components
- Code documentation
- Software documentation
- Coding and design guidelines
- CVS organization
- Workpackage testing
- Other services and components started
- Project web
- Memory testing
- Configuration management
alberto.aimar_at_cern.ch
18LCG SPI Status (A.Aimar)
- Everything is done in collaboration with
- LCG and LCG projects (Pool)
- LHC experiments
- Big projects (G4, Root, etc)
- Currently available or being developed, and used
by Pool - All is done assessing what is available in the
Laboratory (Experiments, IT, etc.) and in the
free software in general - Users are involved in the definition of the
solutions, all is public on the SPI web site - Development of specific tools is avoided, IT
services are used - A workplan is being prepared for next LCG SC2
committee
alberto.aimar_at_cern.ch
19Simulation
- First set of formal requirements for LCG for MC
generators and simulation due in October - It is expected that there will be a need for both
GEANT 4 and FLUKA - GEANT4
- independent collaboration, including HEP
institutes, LHC and other experiments, other
sciences - significant CERN and LHC-related related
resources - MoU being re-discussed now
- Proposal to create an HEP User Requirements
Committee chaired by an LHC physicist - need to ensure long-term support
- CERN resources will be under the direction of the
project
20Grid Technology in LCG
- This area of the project is concerned with
- ensuring that the LCG requirements are known to
current and potential Grid projects - active lobbying for suitable solutions
influencing plans and priorities - negotiating support for tools developed by Grid
projects - developing a plan to supply solutions that do not
emerge from other sources - BUT this must be done with caution important to
avoid HEP-SPECIAL solutions
21Grid Technology Status
- A base set of requirements has been defined
(HEPCAL) - 43 use cases
- 2/3 of which should be satisfied 2003 by
currently funded projects - LCG plans to use the technology emerging from
some of the many Grid projects receiving
substantial national and EU RD funding, and
perhaps later from industry - Today
- many of these projects are led by, or strongly
influenced by HEP - are built on the Globus toolkit
- and form two main groups
- around the (European) DataGrid project
- subscribing to the (US) Virtual Data Toolkit -
VDT - rapidly growing interest investment from other
sciences, industry - HEP (LHC data challenges, BaBar, LCG, ) an early
adopter - Tomorrow
- must remain in the main line leverage the
massive investments being made - increasingly difficult for HEP to influence
direction - expect several major architectural changes before
things mature
22Grid Technology Deploymentthe Strategy
- Next 9 months - acquire experience during physics
data challenges using the current versions of the
grid packages - DataGrid, VDT, NorduGrid, AliEN,
.. - Choose a common set of middleware to be used for
the first LCG grid service LCG-1 - target - full definition of LCG-1 by the end of
the year - LCG-1 in operation mid-2003 -
LCG-1 in full service by end of 2003 - this will be conservative stability before
functionalityand will not satisfy all of the
HEPCAL requirements - but must be sufficient for the data challenges
scheduled in 2004 - Actively use the HEPCAL requirements to
negotiate/influence future developments by
HEP-led and other grid projects - this will take time
23Grid Technology Deploymentthe Details
- Close collaboration between LCG and EDG on
integration and certification of grid middleware - common teams being established
- prepares the ground for long-term LCG support of
grid middleware, integrating and certifying tools
from several sources - GLUE common US-European activity to achieve
compatible solutions - effort provided by DataTAG (Europe), iVDGL/PPDG
(US) - Grid Deployment Board
- first task is the detailed definition of LCG-1,
the initial LCG Global Grid Service - this will include defining the set of grid
middleware tools to be deployed - LCG-1 will provide
- a service for Data Challenges
- experience in close operational collaboration
between the Regional Centres - a testbed for learning how to maintain and
operate
a global grid service
24Data Challenges in 2002
256 million events 20 sites
26grid tools used at 11 sites
Alois.Putzer_at_cern.ch
27ADC IV performances Period 1
- Event building with flat data traffic
- No recording - 5 days non-stop
- 1800 MBytes/s sustained (milestone 1000
Mbytes/s)
- Event building and data recording with ALICE-like
data traffic - Recording to CASTOR - 4.5 days non-stop
- Data to disk
- total data volume 140 Tbytes
- 350 MBytes/s sustained (milestone 300
MBytes/s)
28RUNNING PHYSICS DATA CHALLENGES WITH ALIEN
AliEn
Production Status
_at_GRID
15100 jobs, 12CPUh/job,
1GB output/job
up to 450 concurrently running jobs
9/30/02
Predrag
.
Buncic
_at_
cern
.
ch
3
29CERN - Computing Challenges - J. Closier
30Fabrics Area
- CERN prototype system
- expanded to 400 systems, 50 TeraBytes of disk
- mass storage performance being expanded to 350
MB/sec - Prototype used for
- testbeds for Grid middleware
- computing data challenges, including ATLAS filter
farm tests - high performance data intensive cluster
- needed for ALICE data recording challenges
- will be upgraded with Enterasys 10 Gbit Ethernet
switch - extension of LXBATCH for physics data challenges
- ALICE data challenges will drive the CASTOR mass
storage management developments - Technical information exchange between Regional
Centre Staff exploits the HEPiX organisation - Large Scale Cluster Workshop being organised at
HEPiX in October at FNAL focus on operating a
fabric within a grid - Next year - revised costing for Phase 2 at CERN
- Revised trigger and event size data
- New technology review nearing completion PASTA
III
31Computer Centre Upgrade Background
- LHC computing requires additional power and air
conditioning capacity in B513. - Following studies in 2000/2001, the following
plan was developed - Convert the tape vault to a machine room area in
2002 - Use this space from 2003, both for new equipment
and to empty part of the existing machine room - Upgrade the electrical distribution in the
existing machine room during 2003-2005, using the
vault space as a buffer. - Create a dedicated substation to meet power
needs. - For air conditioning reasons, the vault should be
used for bulky equipment with low heat
dissipation. - e.g. Tape robotics.
32From Tape vault
to computer room
33Computer Centre Upgrade What next?
- From October 14th
- Migrate equipment from the machine room to the
vault. - Robots to move from January
- Subject to Funding
- Upgrade the Machine Room electrical distribution
infrastructure from June - Start construction of a new substation for the
computer centre early in 2003.
34LCG Level 1 Milestonesproposed to LHCC
35LCG Level 1 Milestones
Hybrid Event Store available for general users
applications
Distributed production using grid services
Distributed end-user interactive analysis
Full Persistency Framework
grid
LHC Global Grid TDR
50 prototype (LCG-3) available
LCG-1 reliability and performance targets
First Global Grid Service (LCG-1) available
36Challenges
- Complexity of the project Regional Centres,
Grid projects, experiments, funding sources and
funding motivation - The project is getting under way in an
environment where - there is already a great deal of activity
applications software, data challenges, grid
testbeds - requirements are changing as understanding and
experience develop - fundamental technologies are evolving
independently of the project and LHC - Grid technology
- immaturity, US-Europe compatibility
- to what extent will mainline products satisfy LCG
requirements?
37Risks
- Achieving agreement on common projects,
requirements, architecture - Phase 1 funding at CERN
- about 50 of materials funding not yet identified
- includes the investments to prepare the CERN
Computer Centre for the giant computing fabrics
needed in Phase 2 - Phase 2 funding at CERN
- CHF 20M is NOT included in the LHC cost to
completion budget - a significant reduction in the size of the CERN
facility will have a major impact on the assumed
computing model - Grid middleware maintenance
- by short-lived Grid projects
- problem beginning to be recognised by funding
agencies - expect that LCG will have to get involved at some
level in middleware maintenance