Title: US Grid Initiatives for Particle Physics
1US Grid Initiatives for Particle Physics
Richard P. Mount SLAC HEPCCC SLAC,
July 7, 2000
2GriPhyN and PPDG Grid
Physics Network Particle
Physics Data Grid (NSF)
(DoE)
- Focus on Petascale Data Grid and Tier 2
Computing for LHC, LIGO, SDSS - Components
- NSF ITR proposal focused on needed Computer
Science and middleware (Virtual Data Toolkit) - Tier 2 hardware and manpower funding for LHC in
the context of US Atlas/CMS computing plans
(plus LIGO, SDSS in a separate context).
- Short-term focus on
- making existing middleware useful for Run 2,
BaBar, RHIC etc. - High-speed data transfer
- Cached access to remote data.
- Longer-term focus (was APOGEE)
- Instrumentation and Monitoring
- Modeling distributed data management systems
- Agents and virtual data.
- Funding does not include networks or (much)
hardware.
3Collaborators in GriPhyN and PPDG
University Scientists DoE Laboratory
Scientists
University Scientists Scientists with
University and Lab Appointments
ComputerScientists
Physicists HEP Gravity Wave Astronomy
ComputerScientistsSupportingPhysics
ComputerScientists
Physicists HENP
ComputerScientistsSupportingPhysics
- Relationship
- Significant overlap of GriPhyN and PPDG senior
scientists - Coordinated RD planned.
4PPDG Collaborators
5PPDG Collaborators
Particle Accelerator Computer
Physics Laboratory
Science ANL X X LBNL
X X BNL X X x Caltech X X Fermilab X X x Jeffer
son Lab X X x SLAC X X x SDSC X Wisconsin X
6GriPhyN Some Collaborators
7Sites Participating in PPDG and GriPhyN/LHC
CalREN NTON ESNet Abilene
ESNet MREN
Abilene
Wisconsin
Fermilab
Boston
MREN
BNL
LBNL/UCB
CalREN NTON ESNet Abilene
ANL
ESNet MREN
ESNet
SLAC
JLAB
CalREN NTON ESNet Abilene
Indiana
Caltech
ESNet
Abilene
CalREN NTON ESNet Abilene
Florida
SDSC
Abilene
8Management Issues
- In GriPhyN/PPDG there are many collaborating
Universities/Labs - The funding per institute is (or will be) modest
- Hence
- PPDG has appointed Doug Olson as full-time
Project Coordinator - GriPhyN plans full-time project coordinator
- GriPhyN-PPDG management will be coordinated
- GriPhyN-PPDG co-PI s include leading members of
customer experiments - GriPhyN-PPDG deliverables (as a function of time)
to be agreed with management of customer
experiments.
9Longer-Term VisionDriving PPDG and GriPhyN
- Agent Computing
- on
- Virtual Data
10Why Agent Computing?
- LHC Grid Hierarchy Example
- Tier0 CERN
- Tier1 National Regional Center
- Tier2 Regional Center
- Tier3 Institute Workgroup Server
- Tier4 Individual Desktop
- Total 5 Levels
11Why Virtual Data?
Typical particle physics experiment in
2000-2005On year of acquisition and analysis of
data
Access Rates (aggregate, average) 100 Mbytes/s
(2-5 physicists) 1000 Mbytes/s (10-20
physicists) 2000 Mbytes/s (100
physicists) 4000 Mbytes/s (300 physicists)
Raw Data 1000 Tbytes
Reco-V1 1000 Tbytes
Reco-V2 1000 Tbytes
ESD-V1.1 100 Tbytes
ESD-V1.2 100 Tbytes
ESD-V2.1 100 Tbytes
ESD-V2.2 100 Tbytes
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
AOD 10 TB
12GriPhyN-PPDGDirection-Setting VisionAgent
Computing on Virtual Data
- Perform all tasks at the best place in the
Grid - Best implies optimization based on cost,
throughput, scientific policy, local policy (e.g.
ownership), etc.
- At least 90 of HENP analysis accesses derived
data - Derived data may be computed
- In advance of access or
- On the fly
- Derived data may be stored
- Nowhere or as
- One or many distributed copies.
Maximize analysis capabilities per spent on
storage, network and CPU.
13Towards the Goals
- Evaluation and exploitation of computer-science
and commercial products (Globus, SRB, Grand
Challenge, OOFS ) - Instrumentation and monitoring at all levels
- Modeling of distributed data management systems
(especially failure modes) - Testing everything in the environment of real
physics experiments - Major computer-science developments in
- Information models
- Resource management and usage optimization
models - Workflow management models
- Distributed service models.
14Funding Needs and Perspectives
- GriPhyN NSF ITR proposal
- 2.5m/year for 5 years
- Status proposal appears to have reviewed well
awaiting final decision - Tier 2 centers and network enhancements
- Plans being developed (order of magnitude 60M)
- Discussions with NSF.
- PPDG project
- Funded at 1.2M in August 1999 (DoE/OASCR/MICS
NGI) - Plus 1.2M in June 2000 (DoE/OASCR/MICS
DoE/HENP) - Heavy leverage of facilities and personnel
supporting current HEP experiments. - PPDG future
- FY 2001 onwards needs in the range 3M to 4M
per year.
15PPDG More Details
16First Year PPDG Deliverables
- Implement and Run two services in support of the
major physics experiments at BNL, FNAL, JLAB,
SLAC - High-Speed Site-to-Site File Replication
Service Data replication
up to 100 Mbytes/s - Multi-Site Cached File Access Service Based
on deployment of file-cataloging, and transparent
cache-management and data movement middleware - First Year Optimized cached read access to file
in the range of 1-10 Gbytes, from a total data
set of order One Petabyte - Using middleware components already developed
by the Proponents
17PPDG Site-to-Site Replication Service
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
- Network Protocols Tuned for High Throughput
- Use of DiffServ for (1) Predictable high
priority delivery of high - bandwidth data
streams (2) Reliable background transfers - Use of integrated instrumentation to
detect/diagnose/correct problems in long-lived
high speed transfers NetLogger DoE/NGI
developments - Coordinated reservaton/allocation techniques
for storage-to-storage performance
18PPDG Multi-site Cached File Access System
PRIMARY SITE Data Acquisition, Tape, CPU, Disk,
Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
University CPU, Disk, Users
19PPDG Middleware Components
20First Year PPDG System Components
Middleware Components (Initial Choice) See PPDG
Proposal Page 15 Object and File-Based
Objectivity/DB (SLAC enhanced) Application
Services GC Query Object, Event Iterator,
Query Monitor FNAL SAM System
Resource Management Start with Human
Intervention (but begin to deploy resource
discovery mgmnt tools) File Access Service
Components of OOFS (SLAC) Cache Manager GC
Cache Manager (LBNL) Mass Storage
Manager HPSS, Enstore, OSM (Site-dependent)
Matchmaking Service Condor (U.
Wisconsin) File Replication Index
MCAT (SDSC) Transfer Cost Estimation
Service Globus (ANL) File Fetching
Service Components of OOFS File Movers(s)
SRB (SDSC) Site specific End-to-end Network
Services Globus tools for QoS reservation
Security and authentication Globus (ANL)
21(No Transcript)
22PPDG First Year Progress
- Demonstration of multi-site cached file access
based mainly on SRB.(LBNL, ANL, U.Wisconsin) - Evaluation and development of bulk-transfer tools
(gsiftp, bbftp, sfcp ) - Modest-speed site-to-site transfer servicese.g.
SLAC-Lyon, Fermilab to Indiana - Valiant attempts (continuing) to establish a
multiple OC12 path between SLAC and Caltech. - http//www-user.slac.stanford.edu/rmount/public/PP
DG_HENP_april00_public.doc
23ProgressMulti-site Cached File Access
- Exploratory installations of components of Globus
at Fermilab, Wisconsin, ANL, SLAC, Caltech. - Exploratory installations of SRB at LBNL,
Wisconsin, ANL, Fermilab - SRB used in successful demonstration of Wisconsin
and Fermilab accessing files, via ANL cache,
originating in the LBNL HPSS.
24Progress100 Mbytes/s Site-to-Site
- Focus on SLAC Caltech over NTON
- Fibers in place
- SLAC Cisco 12000 with OC48 and 2 OC12 in
place - 300 Mbits/s single stream achieved recently.
- Lower speed Fermilab-Indiana trials.
25PPDG Work at Caltech (High-Speed File Transfer)
- Work on the NTON connections between Caltech and
SLAC - Test with 8 OC3 adapters on the Caltech Exemplar
multiplexed across to a SLAC Cisco GSR router.
Limited throughput due to small MTU in the GSR. - Purchased a Dell dual Pentium III based server
with two OC12 ATM cards. Configured to allow
aggregate transfer of more then 100
Mbytes/seconds in both directions Caltech ? SLAC. - Monitoring tools installed at Caltech/CACR
- PingER installed to monitor WAN HEP connectivity
- A Surveyor device will be installed soon, for
very precise measurement of network traffic
speeds - Investigations into a distributed resource
management architecture that co-manages
processors and data
26Towards Serious Deployment
- Agreement by CDF and D0 to make a serious effort
to use PPDG services. - Rapidly rising enthusiasm in BaBar SLAC-CCIN2P3
Grid MUST be made to work.
27A Global HEP Grid Program?
- HEP grid people see international collaboration
as vital to their mission - CS Grid people are very enthusiastic about
international collaborations - National funding agencies
- Welcome international collaboration
- Often need to show benefits for national
competitiveness.
28SLAC Computing
July 7, 2000
29SLAC Computing Services
30BaBar Support
31Business Applications
32Desktop
33Phones and Security
34Research and Development