Title: USCMS
1 US-CMS User Facilities Project Vivian
ODell Fermilab USCMS Software and Computing
Oversight Panel Review October 23, 2000
2Goals of the US-CMS SC Project
- To provide the software and computing resources
needed to enable US physicists to fully
participate in the physics program of CMS - Allow US physicists to play key roles and exert
an appropriate level of leadership in all stages
of computing related activities - From software infrastructure to reconstruction
to extraction of physics results - From their home institutions, as well as at CERN
This capability should extend to physicists
working at their home institutions
3Data Grid Hierarchy (CMS)
1 TIPS 25,000 SpecInt95 PC (2000) 20 SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Tier 01
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
0.6-2.5 Gbits/sec
or Air Freight
Tier 1
FNAL Regional Center
France Regional Center
Italy Regional Center
UK Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
4Issues
- Scale of LHC computing is greater than anything
weve tried in HEP - Number of collaborating physicists
- Number of collaborating institutions
- Number of collaborating nations
- Geographic distribution
- Size and complexity of the data set
- Projected length of the experiment
- We have never had a single site which has been
capable of providing resources on the scale of
the whole LHC effort -- CERN does not even plan
to try - There are excellent reasons to believe that the
experience at our largest current sites will not
scale or provide appropriate resources for the
LHC
5User Facility Project
-
- Provide the enabling infrastructure of software
and computing for US physicists to fully
participate in the CMS physics program - Acquire, develop, install, integrate, commission
and operate the infrastructure - Includes a Tier1 Regional Center at Fermilab
- Plans are being developed for lower level centers
and support for collaborators at their home
institutions - Tier2 10-20 of Tier1
- Includes S/W training, workshops and support for
USCMS physicists - High speed US and international networks are a
major requirement - We assume that DOE/NSF want to address the issue
of the CERN-US link in a combined way for CMS and
ATLAS to get the most cost effective solution - We assume that the issue of connectivity of
individual institutions to FNAL has to be worked
out in an individual basis because there are
policy issues at each institution -
6Schedule for US-CMS SC Project
- In order to match the proposed funding profile
from DOE/NSF more closely as well as the LHC
schedule, we have stretched out the CMS Software
and Computing Project Schedule. - Now end FY2003 RD phase
- FY2004-FY2006 Implementation Phase
- FY2007 and onwards Operations Phase
- This required a new hardware cost estimate and
resource loaded WBS.
7Hoffmann Review
- To answer to computing and software needs, CERN
has set up a review of LHC software and computing - Chaired by Hans Hoffmann, Director for Scientific
Computing - Three Panels
- Software quantify software needs for each of the
LHC experiments - Computing Model review the computing model for
each experiment and make recommendations - Resource panel review computing resources
required by all LHC experiments - Resource group focuses on the amount of computing
needed at CERN and at Regional Centers - Eventual result MOUs between T1 RCs and T0
(CERN)? - This should represent the minimum computing
requirements for a Regional Center
8Hoffmann Review
- Results from Hoffmann Review
- Precise numbers still in flux, but converging
rapidly! - Tape Storage at each Tier 1 for CMS
Our T1 h/w projections include robotics for 2PB
but media for 1PB
9Hoffmann Review
- Results from Hoffmann Review
- Precise numbers still in flux, but converging
rapidly! - Disk Storage at each Tier 1 for CMS
Disk is assumed to be 75 efficient approx RAID
overhead or duplicate disk needs for I/O.
10Hoffmann Review
- Results from Hoffmann Review
- Precise numbers still in flux, but converging
rapidly! - CPU at each Tier 1 for CMS
Not included in Hoffman Review
11Hardware Costing Projections CPU
- Commodity (Cheap) CPU
- Use Moores Law to extrapolate from FNAL
experience - Performance of CPU doubles every
- 1.2 years while the cost stays the same.
SMP (expensive) CPU Performance of CPU/Cost
doubles every 1.1 years
12Hardware Costing Projections Disk
- Commodity (Cheap) Disks
- Performance/cost of disk doubles every 1.4 years
Datacenter (expensive) Disks Performance/cost of
disk doubles every 1.4 years
13Hardware Costing Projections Tape
- Tape Media
- Performance/cost of tape doubles every 2.1 years
(? not impressively linear)
Tape Drives Performance/cost of tape drives
doubles every 2.1 years
I believe there is considerable risk in depending
on tape storage over The next 15 years We should
be flexible here.
14Hardware Costing For UF w/o T2
Hardware costing for User Facilities, excluding
Tier 2
15Hardware Costing For UF
Hardware costing for User Facilities, INCLUDING
Tier 2 from GriPhyN
16Hardware Costing For UF
Total hardware costs including escalation,
overhead and 10 Management Reserve
17Personnel for the User Facility Project
- Estimating the number of people needed for the
User Facility Project - We did this in two ways
- Top Down using our experience with Run 2
- Bottoms Up developing a full WBS with
experienced FNAL folks
18Top Down Approach
- What is the appropriate scale for US-CMS User
Facility? - We can ask Fermilab because
- Scale of US Physicist Participation in CMS is
comparable to CDF or D0 during Run 2, e.g. 500
physicists. This will most likely grow as LHC
comes closer. - Software and Computing for Run 2 was reviewed
separately from the detector Run 2 project. There
are good reasons for this. As a result we have
good records as to the hardware and personnel
spending for Run 2. - We can start with a top down approach as to
resources it took for CDF/D0 and assume
supporting roughly the same population with
similar needs will take roughly the same amount
of computing. - This top down approach breaks down with respect
to computing hardware, since LHC events are much
more complicated (I.e. 20 interactions/crossing
for LHC, 3 for Run 2).
19WBS From Top Down
- 2001 2002 2003 2004 2005 2006 2007
- 1.1 Tier 1 Regional Center 8 18 20 17
- 1.1.1 Hardware Systems 4 9 9 7
- 1.1.2 Software Systems 4 9 11 10
- 1.2 System and User Support 1.5 1.5 2 3.5 5 6 8
- 1.3 Maintenance and Operation 1 1.5 1.5 3 5 5 6
- 1.4 Support for Tier 2 RCs .5 1 1.5 1.5 2 2 2
- 1.5 Networking 0.5 0.5 1 1 2 2 2
- 1.6 RD and CMS Construction
- Phase Activities 6 7 7 6 3 0 0
- 1.4.1 Hardware 3 3 3 3 2 0 0
- 1.4.2 Software 3 4 4 3 1 0 0
- User Facilities (total) 9.5 11.5 13 23 35 35 35
- Full time staff working on User Facilities tasks
(technicians, support staff, computing
professionals)
20Bottoms Up Approach
-
- We can also get the experts from Fermilab to help
us estimate personnel and hardware costs for User
Facility Computing. - We first met to debate the WBS items and discuss
the scope and schedule of the project. Then we
assigned WBS items to experts in the area and had
them start from a bottoms up approach without
worrying about total FTEs from any one WBS item. - People working on the WBS were from
CDF/D0/Networking/Systems Support and other CD
areas of the lab.
21From Bottoms Up Approach
-
- 2001 2002 2003 2004 2005 2006 2007
- 1.1 Tier 1 Regional Center 13 17 20 18
- 1.2 System and User Support 1.3 2.25 3 3 5 4.5 4.5
- 1.3 Operations and Infrastructure 1 1.25 1.25 3 5
4.5 4.5 - 1.4 Tier 2 RCs 2.5 3.5 4 5 6.5 7.5 7.5
- 1.5 Networking 0.5 1 2 2.5 3 3 3
- 1.6 Computing and
- Software RD 3 4 4.5 1.4 0.6 0.6 0.6
- 1.7 Construction Phase 2 2 2 0.25
- Computing
- 1.8 Support of FNAL 0.4 1.4 1 2.6 2.25 2 2
- based Computing
- User Facilities (total) 10 15 18 30 37 43 41
- (Without T2 Personnel) 7.5 11.5 14 25 30 36 34
- Full time staff working on User Facilities tasks
(technicians, support staff, computing
professionals)
22Scale of User Facility Project
-
- Amazingly enough, the bottoms up approach gave us
very similar results! - I didnt tell the experts in advance how many
FTEs I thought things would take because I
wanted to see if there was any value to either
top down or bottoms up. - Considering they landed within 5 of each other,
except for some scheduling differences, it gives
me confidence that the scale is correct. Of
course, although through the bottoms up approach
we have assigned each FTE to tasks in detail, the
project must maintain flexibility in order to
succeed. - For example, I assume the problem of large linux
farms for interactive and chaotic analyses will
be solved by 2004. If not we may have to buy
different (expensive!) hardware or hire smarter
people.
23User Facility Personnel Needs
24Personnel Costs for User Facility Project
-
- Lots of different kinds of people with different
skills are included in the WBS. - We used average Computing Division salaries for
Computing Professionals to calculate personnel
costs. - Personnel Costs were calculated by
- Average CD/CP Salary 78k
- Benefits 28.5
- FNAL Overhead 30
- Add TravelDesktop and general infrastructure
support - Total encumbered salary 154,000
- (This gets escalated by 3/year by usual DOE
rules) - This average salary in CD includes high level
people. Hopefully we will be hiring more of them
in the future!
25User Facility Personnel Costs
26Total User Facility Costs
27Current Status and Recent Activities
- User Facility
- Current Status
28User Facility Personnel
- Have Hired 4 People this Year
- We continue to leverage additional support from
Fermilab when possible - User Facility People
- Joe Kaiser (WBS 1.2/1.7)
- system installation and commisioningsystem
administration - Yujun Wu (WBS 1.7)
- Computing simulation tools and verification for
distributed computing RDdocumentationdistribute
d databases - Michael Zolakar (WBS 1.8/1.7)
- Data storage and date import/export issues
- Natalia Ratnikova 100 UF (WBS 1.2/1.7/1.8)
- Will start October 30 Code librarian, code
distribution and documentation - Joe Kaiser is the old man on CMS UF. He started
Aug, 2000. Personnel are still getting up to
speed. We are leveraging from FNAL employees
already working on CMS (CAS engineers, Guest
Scientists, other CD departments)
29Current Status of FNAL User Facility
- Current Hardware Resources
- CPU
- SUN E4500 with 8400 MHz CPUs (CMSUN1)
- 3 4-CPU 650 MHz Linux Systems (Wonder, Gallo,
Velveeta) - 2 2-CPU 500 MHz Linux Systems (Boursin, Gamay)
- Disk
- 125 GB local disk space
- 1 TB RAID on CMSUN1
- ¼ TB RAID on each Wonder, Gallo, Velveeta
- Storage
- 3 TB in HPSS tape library
- 20 TB in Enstore tape library
- New Systems
- 40 dual 750 MHz CPU linux farm (popcrn01 40)
- In burn-in phase will be released to users end
of October
30Current Status of FNAL User Facility
- CMS software support
- The full CMS software environment is supported on
Sun and linux (although SUN support lags a bit
behind) - CMSIM120, ORCA 4_3, SCRAM, IGUANA, GEANT4
- CMS user support
- Software Developers
- Support s/w developers
- support full CMS software installation at FNAL
- operate a standard code-build facility for code
development test - support other US institutions to fulfill their
software development responsibilities
31Current US-CMS User Support
- Monte Carlo Studies
- Currently supporting US CMS groups studying
Higher Level Triggers - Provide dedicated effort to make a system for MC
production - Provide data storage and transport across US CMS
institutions - Provide data analysis resources
- Physics studies
- Detector Monte Carlo studies
- Test Beam Studies
- Support requests
- Miscellaneous
- Hold biweekly meetings over VRVS to discuss
support issues with users - Every other Tuesday, 1pm CST, Saturn virtual room
(alternates with US-CMS physics meeting)
32Training Plans
- An important part of the User Facility is
training for both users and staff. - Fermilab offers C and OOAD classes regularly.
All CMS people are free to register for such
classes. In addition, given enough interest we
could organize a special C or OOAD class for
CMS. - In addition we just had a CMS specific training
workshop for beginners and also design
discussions for the experts October 12-14. - (http//home.fnal.gov/cmsdaq/tutorial.html)
- 60 Attendees, 45 from outside institutions.
About 40 were beginners, the other 20 closer to
expert. - Experts from CERN came to give the tutorials. All
material was installed and running on Fermilab
CMS machines, where people actually ran the
tutorial hands on. - Good opportunity to meet CMS colleagues and to
discuss s/w design details as well. - We plan to do this as often as there is interest,
hopefully twice a year or so.
33Summary
- A new bottoms up hardware cost has been made
for the User Facility Project. We have a
fledgeling costbook to justify these estimates. - The hardware requirements are coming out of CMS
through the Hoffman review panel. We are tracking
these estimates closely. - A new resource loaded, bottoms-up estimate for
personnel requirements has been made for the User
Facility Project. - This estimate agrees with the old top-down
approach to within 5 or so. - This talk has just been a brief overview of our
current project status, but tomorrow some of you
will have the pleasure of going through the cost
estimates and WBS in detail.