Title: Software Overview and LCG Project Status
1Software OverviewandLCG Project Status Plans
- Torre Wenaus
- BNL/CERN
- DOE/NSF Review of US LHC Software and Computing
- NSF, Arlington
- June 20, 2002
2Outline
- Overview, organization and planning
- Comments on activity areas
- Personnel
- Conclusions
- LCG Project Status and Plans
3U.S. ATLAS Software Project Overview
- Control framework and architecture
- Chief Architect, principal development role.
Software agreement in place - Databases and data management
- Database Leader, primary ATLAS expertise on
ROOT/relational baseline - Software support for development and analysis
- Software librarian, quality control, software
development tools, training - Automated build/testing system adopted by and
(partly) transferred to Intl ATLAS - Subsystem software roles complementing hardware
responsibilities - Muon system software coordinator
- Scope commensurate with U.S. in ATLAS 20 of
overall effort - Commensurate representation on steering group
- Strong role and participation in LCG common effort
4U.S. ATLAS Software Organization
5U.S. ATLAS - ATLAS Coordination
US
International
US roles in Intl ATLAS software D. Quarrie
(LBNL), Chief Architect D. Malon (ANL), Database
Coordinator P. Nevski (BNL), Geant3 Simulation
Coordinator H. Ma (BNL), Raw Data
Coordinator C. Tull (LBNL), Eurogrid WP8
Liaison T. Wenaus (BNL), Planning Officer
6ATLAS Subsystem/Task Matrix
Offline Coordinator Reconstruction Simulation Database
Chair N. McCubbin D. Rousseau A. DellAcqua D. Malon
Inner Detector D. Barberis D. Rousseau F. Luehring S. Bentvelsen / D. Calvet
Liquid Argon J. Collot S. Rajagopalan M. Leltchouk H. Ma
Tile Calorimeter A. Solodkov F. Merritt V.Tsulaya T. LeCompte
Muon J.Shank J.F. Laporte A. Rimoldi S. Goldfarb
LVL 2 Trigger/ Trigger DAQ S. George S. Tapprogge M. Weilers A. Amorim / F. Touchard
Event Filter V. Vercesi F. Touchard
Computing Steering Group members/attendees 4 of
19 from US (Malon, Quarrie, Shank, Wenaus)
7Project Planning Status
- U.S./Intl ATLAS WBS/PBS and schedule fully
unified - Projected out of common sources (XProject)
mostly the same - US/Intl software planning covered by the same
person - Synergies outweigh the added burden of the ATLAS
Planning Officer role - No coordination layer between US and Intl
ATLAS planning direct official interaction
with Intl ATLAS computing managers. Much more
efficent - No more out of the loop problems on planning
(CSG attendance) - True because of how the ATLAS Planning Officer
role is currently scoped - As pointed out by an informal ATLAS computing
review in March, ATLAS would benefit from a full
FTE devoted to the Planning Officer function - I have a standing offer to the Computing
Coordinator to willingly step aside if/when a
capable person with more time is found - Until then, I scope the job to what I have time
for and what is highest priority - ATLAS management sought to impose a different
planning regime on computing (PPT) which would
have destroyed US/Intl planning commonality we
reached an accommodation which will make my time
more rather than less effective, so I remained in
the job
8ATLAS Computing Planning
- US led a comprehensive review and update of ATLAS
computing schedule in Jan-Mar - Milestone count increased by 50 to 600 many
others updated - Milestones and planning coordinated around DC
schedule - Reasonably comprehensive and detailed through
2002 - Things are better, but still not great
- Schedule still lacks detail beyond end 2002
- Data Challenge schedules and objectives unstable
- Weak decision making (still a major problem)
translates to weak planning - Strong recommendation of the March review to fix
this no observable change - Use of the new reporting tool PPT (standard in
ATLAS construction project) may help improve
overall planning - Systematic, regular reporting coerced by
automated nagging - Being introduced so as to integrate with and
complement XProject-based planning materials.
XProject adapted waiting on PPT adaptations.
9Short ATLAS planning horizon
As of 3/02
10Summary Software Milestones
Green Done Gray Original date Blue
Current date
11Data Challenge 1
- DC1 phase 1 (simu production for HLT TDR) is
ready to start - Software ready and tested, much developed in the
US - Baseline core software, VDC, Magda, production
scripts - 2M events generated and available for filtering
and simulation - US is providing the first 50K of filtered, fully
simulated events for QA - Results will be reviewed by QA group before the
green light is given for full scale production in
about a week - During the summer we expect to process 500k fully
simulated events at the BNL Tier 1
12Brief Comments on Activity Areas
- Control Framework and Architecture
- Database
- Software Support and Quality Control
- Grid Software
13Control Framework and Architecture
- US leadership and principal development roles
- David Quarrie recently offered and accepted a
further 2 year term as Chief Architect - Athena role in ATLAS appears well consolidated
- Basis of post-simulation Data Challenge
processing - Actively used by end users, with feedback
commensurate with Athenas stage of development - LHCb collaboration working well
- FADS/Goofy (simulation framework) issue resolved
satisfactorily - Regular, well attended tutorials
- Other areas still have to prove themselves
- ATLAS data definition language
- Being deployed now in a form capable of
describing the ATLAS event model - Interactive scripting in Athena
- Strongly impacted by funding cutbacks
- New scripting service emerging now
- Tremendous expansion in ATLAS attention to event
model - About time! A very positive development
- Broad, (US) coordinated effort across the
subsystems to develop a coherent ATLAS event
model - Built around the US-developed StoreGate
infrastructure - Core infrastructure effort receiving useful
feedback from the expanded activity
14Database
- US leadership and key technical expertise
- The ROOT/relational hybrid store in which the US
has unique expertise in ATLAS is now the
baseline, and is in active development - The early US effort in ROOT and relational
approaches (in the face of dilution of effort
criticisms) was a good investment for the long
term as well as the short term - Event data storage and management now fully
aligned with the LCG effort - 1 FTE each at ANL and BNL identified to
participate and now becoming active - Work packages and US roles now being laid out
- ATLAS and US ATLAS have to be proactive and
assertive in the common project for the interests
of ATLAS (I dont have my LCG hat on here!), and
I am pushing this hard - Delivering a data management infrastructure that
meets the needs of (US) ATLAS and effectively
uses our expertise demand it
15Software Support, Quality Control
- New releases are available in the US 1 day after
CERN (with some exceptions when problems arise!) - Provided in AFS for use throughout the US
- Librarian receives help requests and queries from
25 people in the US - US-developed nightly build facility used
throughout ATLAS - Central tool in the day to day work of developers
and the release process - Recently expanded as framework for progressively
integrating more quality control and testing - Testing at component, package and application
level - Code checking to be integrated
- CERN support functions being transferred to new
ATLAS librarian - Plan to resume BNL-based nightlies
- Much more stable build environment than CERN at
the moment - Hope to use timely, robust nightlies to attract
more usage to the Tier 1 - pacman (Boston U) for remote software
installation - Adopted by grid projects for VDT, and a central
tool in US grid testbed work
16Grid Software
- Software development within the ATLAS complements
of the grid projects is being managed as an
integral part of the software effort - Objective is to integrate grid software
activities tightly into ongoing core software
program, for maximal relevance and return - Grid project programs consistent with this have
been developed - And has been successful
- e.g. Distributed data manager tool (Magda) we
developed was adopted ATLAS-wide for data
management in the DCs - Grid goals, schedules integrated with ATLAS
(particularly DC) program - However we do suffer some program distortion
- e.g. we have to limit effort on providing ATLAS
with event storage capability in order to do work
on longer-range, higher-level distributed data
management services
17Effort Level Changes
- ANL/Chicago loss of .5 FTE in DB
- Ed Frank departure no resources to replace
- BNL cancelled 1 FTE new hire in data management
- Insufficient funding in the project and the base
program to sustain the bare-bones plan - Results in transfer of DB effort to grid (PPDG)
effort because the latter pays the bills, even
if it distorts our program towards lesser
priorities - As funding looks now, gt50 of the FY03 BNL sw
development effort will be on grid!! - LBNL stable FTE count in architecture/framework
- One expensive/experienced person replaced by very
good postdoc - It is the DB effort that is most hard-hit, but
ameliorated by common project - Because the work is now in the context of a broad
common project, US can still sustain our major
role in ATLAS DB - This is a real, material example of common effort
translating into savings (even if we wouldnt
have chosen to structure the savings this way!)
18Personnel Priorities for FY02, FY03
- Priorities are the same as presented last time,
and this is how we are doing - Sustain LBNL (4.5FTE) and ANL (3FTE) support
- This we are doing so far.
- Add FY02, FY03 1FTE increments at BNL to reach
3FTEs - Failed. FY02 hire cancelled.
- Restore the .5FTE lost at UC to ANL
- No resources
- Establish sustained presence at CERN.
- No resources
- As stated last time we rely on labs to continue
base program and other lab support to sustain
existing complement of developers - And they are either failing or predicting failure
soon. Lab base programs are being hammered as
well.
19SW Funding Profile Comparisons
2000 agency guideline
January 2000 PMP
11/2001 guideline
Current bare bones
Compromise profile requested in 2000
20Conclusions
- US has consolidated the leading roles in our
targeted core software areas - Architecture/framework effort level being
sustained so far - And is delivering the baseline core software of
ATLAS - Database effort reduced but so far preserving our
key technical expertise - Leveraging that expertise for a strong role in
common project - Any further reduction will cut into our expertise
base and seriously weaken the US ATLAS role and
influence in LHC database work - US has made major contributions to an effective
software development and release infrastructure
in ATLAS - Plan to give renewed emphasis to leveraging and
expanding this work to make the US development
and production environment as effective as
possible - Weakening support from the project and base
programs while the emphasis on grids grows is
beginning to distort our program in a troubling
way
21LCG Project Status Plans(with an emphasis on
applications software)
- Torre Wenaus
- BNL/CERN
- DOE/NSF Review of US LHC Software and Computing
- NSF, Arlington
- June 20, 2002
22The LHC Computing Grid (LCG) Project
Goal Prepare and deploy the LHC computing
environment
- Developed in light of the LHC Computing Review
conclusions - Approved (3 years) by CERN Council, September
2001 - Injecting substantial new facilities and
personnel resources - Activity areas
- Common software for physics applications
- Tools, frameworks, analysis environment
- Computing for the LHC
- Computing facilities (fabrics)
- Grid middleware
- Grid deployment
- ? Global analysis environment
- Foster collaboration, coherence of LHC computing
centers
23LCG Project Structure
LHCC
Resources Board
Reviews
The LHC Computing Grid Project
Resource Issues
Reports
Project Overview Board
ComputingGridProjects
Project Manager ProjectExecutionBoard
Software andComputingCommittee
Requirements, Monitoring
HEPGridProjects
Other Labs
Project execution teams
24Current Status
- High level workplan just written (linked from
this reviews web page) - Two main threads to the work
- Testbed development (Fabrics, Grid Technology and
Grid Deployment areas) - A combination of primarily in-house CERN
facilities work and working with external centers
and the grid projects - Developing a first distributed testbed for data
challenges by mid 2003 - Applications software (Applications area)
- The most active and advanced part of the project
- Currently three active projects in applications
- Software process and infrastructure
- Mathematical libraries
- Persistency
- Pressuring the SC2 to open additional project
areas ASAP not enough current scope to put
available people to work effectively (new LCG and
existing IT people)
25LHC Manpower needs for Core Software
From LHC Computing Review (FTEs)
2000 Have (miss) 2001 2002 2003 2004 2005
ALICE 12(5) 17.5 16.5 17 17.5 16.5
ATLAS 23(8) 36 35 30 28 29
CMS 15(10) 27 31 33 33 33
LHCb 14(5) 25 24 23 22 21
Total 64(28) 105.5 106.5 103 100.5 99.5
Only computing professionals counted
LCG common project activity in applications
software Expected number of new LCG-funded
people in applications is 23 Number hired or
identified to date 9 experienced, 3 very junior
Number working today 8 LCG (3 in the last 2
weeks), plus 3 existing IT, plus expts
26Applications Area Scope
- Application Software Infrastructure
- Scientific libraries, foundation libraries,
software development tools and infrastructure,
distribution infrastructure - Physics Data Management
- Storing and managing physics data events,
calibrations, analysis objects - Common Frameworks
- Common frameworks and toolkits in simulation,
reconstruction and analysis (e.g. ROOT, Geant4) - Support for Physics Applications
- Grid portals and interfaces to provide
distributed functionality to physics applications - Integration of physics applications with common
software
27Typical LHC Experiment Software Architecture a
Grid-Enabled View
Common solutions being pursued or foreseen
Applications built on top of frameworks
High level triggers
Reconstruction
Analysis
Simulation
One main framework, e.g. ROOT Various
specialized frameworks persistency (I/O),
visualization, interactivity, simulation,
etc. Grid integration
Frameworks Toolkits
Frameworks Toolkits
Grid Interfaces
Widely used utility libraries (STL, CLHEP)
distributed services
Grid Services
Standard Libraries
28Current major project Persistency
- First major common software project begun in
April Persistency Framework (POOL) - To manage locating, reading and writing physics
data - Moving data around will be handled by the grid,
as will the distributed cataloging - Will support either event data or non-event (e.g.
conditions) data - Selected approach a hybrid store
- Data objects stored by writing them to ROOT files
- The bulk of the data
- Metadata describing the files and enabling lookup
are stored in relational databases - Small in volume, but with stringent access time
and search requirements, well suited to
relational databases - Successful approach in current experiments, e.g.
STAR (RHIC) and CDF (Tevatron) - LHC implementation needs to scale to much greater
data volumes, provide distributed functionality,
and serve the physics data object models of four
different experiments - Early prototype is scheduled for September 02
(likely to be a bit late!) - Prototype to serve a scale of 50TB, O(100k)
files, O(10) sites - Early milestone driven by CMS, but would have
been invented anyway we need to move development
work from abstract discussions to iterating on
written software - Commitments from all four experiments to
development participation - 3 FTEs each from ATLAS and CMS in ATLAS, all
the participation (2 FTEs) so far is from the US
(ANL and BNL) another 1-2 FTE from LHCbALICE
29Hybrid Data Store Schematic View
Dataset Locator
Dataset DB
Name
Process dataset
Human Interaction
File(s)
Experiment event model
Storage
ID File DB
Pass object(s)
Persistency Manager
Locator Service
Distributed Replica Manager (Grid)
Get ID(s)
Retrieval
Locate Files
Pass ID(s)
File(s)
Get object(s)
Storage Manager
Data objects
Object Dictionary Service
Experiment specific object model descriptions
fopen etc.
File info
Object Streaming Service
Object descriptions
File records
Data File
Object info
30Coming Applications RTAGs
- After a very long dry spell (since Jan), the SC2
has initiated the first stage of setting up
additional projects establishing requirements
and technical advisory groups (RTAGs) with 2-3
month durations - Detector geometry and materials description
- To address high degree of redundant work in this
area (in the case of ATLAS, even within the same
experiment) - Applications architectural blueprint
- High level architecture for LCG software
- Pending RTAGs in applications
- Physics generators (launched yesterday)
- A fourth attempt at a simulation RTAG in the
works (politics!) - Analysis tools (will follow the blueprint RTAG)
31Major LCG Milestones
- June 2003 LCG global grid service available
(24x7 at 10 centers) - June 2003 Hybrid event store release
- Nov 2003 Fully operational LCG-1 service and
distributed production environment (capacity,
throughput, availability sustained for 30 days) - May 2004 Distributed end-user interactive
analysis from Tier 3 - Dec 2004 Fully operational LCG-3 service (all
essential functionality required for the initial
LHC production service) - Mar 2005 Full function release of persistency
framework - Jun 2005 Completion of computing service TDR
32LCG Assessment
- In the computing fabrics (facilities) area, LCG
is now the context (and funding/effort source)
for CERN Tier0/1 development - But countries have been slow to follow
commitments with currency - In the grid middleware area, the project is still
trying to sort out its role as not just another
grid project not yet clear how it will achieve
the principal mission of ensuring the needed
middleware is available - In the deployment area (integrating the above
two), testbed/DC plans are taking shape well with
an aggressive mid 03 production deployment - In the applications area, the persistency project
seems on track, but politics etc. have delayed
the initiation of new projects - The experiments do seem solidly committed to
common projects - This will change rapidly if LCG hasnt delivered
in 1 year - CMS is most proactive in integrating the LCG in
their plans ATLAS less so to date (this extends
to the US program). I will continue to push (with
my ATLAS hat on!) to change this