Title: Harvey B. Newman, Caltech
1 LHC Experiments and the PACIA Partnership
for Global Data Analysis
- Harvey B. Newman, Caltech
- Advisory Panel on CyberInfrastructure
- National Science Foundation
- November 29, 2001
- http//l3www.cern.ch/newman/LHCGridsPACI.ppt
2Global Data Grid Challenge
- Global scientific communities, served by
networks with bandwidths varying by orders of
magnitude, need to perform computationally
demanding analyses of geographically distributed
datasets that will grow by at least 3 orders of
magnitude over the next decade, from the 100
Terabyte to the 100 Petabyte scale from 2000 to
2007
3The Large Hadron Collider (2006-)
- The Next-generation Particle Collider
- The largest superconductor installation in
the world - Bunch-bunch collisions at 40 MHz,Each generating
20 interactions - Only one in a trillion may lead to a major
physics discovery - Real-time data filtering Petabytes per second
to Gigabytes per second - Accumulated data of many Petabytes/Year
Large data samples explored and analyzed by
thousands of globally dispersed scientists, in
hundreds of teams
4Four LHC Experiments The Petabyte to Exabyte
Challenge
- ATLAS, CMS, ALICE, LHCBHiggs New particles
Quark-Gluon Plasma CP Violation
Data stored 40 Petabytes/Year and UP
CPU 0.30 Petaflops and UP 0.1
to 1 Exabyte (1 EB 1018
Bytes) (2007) (2012 ?) for the LHC
Experiments
5Evidence for the Higgs at LEP at M115 GeV The
LEP Program Has Now Ended
6LHC Higgs Decay into 4 muons 1000X LEP Data Rate
109 events/sec, selectivity 1 in 1013 (1 person
in a thousand world populations)
7LHC Data Grid Hierarchy
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
PByte/sec
100-400 MBytes/sec
Online System
Experiment
CERN 700k SI95 1 PB Disk Tape Robot
Tier 0 1
HPSS
2.5 Gbits/sec
Tier 1
FNAL 200k SI95 600 TB
IN2P3 Center
INFN Center
RAL Center
2.5 Gbps
Tier 2
2.5 Gbps
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
8TeraGridNCSA, ANL, SDSC, Caltech
StarLight Intl Optical Peering Point (see
www.startap.net)
A Preview of the Grid Hierarchyand Networks of
the LHC Era
Abilene
Chicago
Indianapolis
DTF Backplane(4x? 40 Gbps)
Urbana
Pasadena
Starlight / NW Univ
UIC
I-WIRE
San Diego
Multiple Carrier Hubs
Ill Inst of Tech
ANL
OC-48 (2.5 Gb/s, Abilene)
Univ of Chicago
Multiple 10 GbE (Qwest)
Indianapolis (Abilene NOC)
Multiple 10 GbE (I-WIRE Dark Fiber)
NCSA/UIUC
- Solid lines in place and/or available in 2001
- Dashed I-WIRE lines planned for Summer 2002
Source Charlie Catlett, Argonne
9Current Grid Challenges Resource Discovery,
Co-Scheduling, Transparency
- Discovery and Efficient Co-Scheduling of
Computing, Data Handling, and Network Resources - Effective, Consistent Replica Management
- Virtual Data Recomputation Versus Data Transport
Decisions - Reduction of Complexity In a Petascale World
- GA3 Global Authentication, Authorization,
Allocation - VDT Transparent Access to Results (and Data
When Necessary) - Location Independence of the User Analysis,
Grid,and Grid-Development Environments - Seamless Multi-Step Data Processing and
AnalysisDAGMan (Wisc), MOPIMPALA(FNAL)
10CMS Production Event Simulation and
Reconstruction
Common Prod. tools (IMPALA)
GDMP
Digitization
Simulation
PU
No PU
Fully operational
?
?
CERN
?
?
FNAL
In progress
?
Moscow
?
?
INFN
?
?
Caltech
?
?
UCSD
?
?
UFL
Worldwide Productionat 12 Sites
?
?
Imperial College
?
?
Bristol
?
?
Wisconsin
Not Op.
?
?
IN2P3
Not Op.
Not Op.
Helsinki
Grid-Enabled
Automated
11US CMS TeraGrid Seamless Prototype
- Caltech/Wisconsin Condor/NCSA Production
- Simple Job Launch from Caltech
- Authentication Using Globus Security
Infrastructure (GSI) - Resources Identified Using Globus Information
Infrastructure (GIS) - CMSIM Jobs (Batches of 100, 12-14 Hours, 100 GB
Output) Sent to the Wisconsin Condor Flock
Using Condor-G - Output Files Automatically Stored in NCSA Unitree
(Gridftp) - ORCA Phase Read-in and Process Jobs at NCSA
- Output Files Automatically Stored in NCSA Unitree
- Future Multiple CMS Sites Storage in Caltech
HPSS Also,Using GDMP (With LBNLs HRM). - Animated Flow Diagram of the DTF Prototype
http//cmsdoc.cern.ch/wisniew/infrastructure.html
12Baseline BW for the US-CERN Link HENP
Transatlantic WG (DOENSF)
Transoceanic NetworkingIntegrated with the
TeraGrid, Abilene, Regional Netsand Continental
NetworkInfrastructuresin US, Europe, Asia,
South America
US-CERN Plans 155 Mbps to 2 X 155 Mbps this
Year 622 Mbps in April 2002DataTAG 2.5 Gbps
Research Link in Summer 200210 Gbps Research
Link in 2003
13 Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements
Installed BW. Maximum Link Occupancy 50
Assumed The Network Challenge is Shared by Both
Next- and Present Generation Experiments
14Internet2 HENP Networking WG Mission
- To help ensure that the required
- National and international network
infrastructures - Standardized tools and facilities for high
performance and end-to-end monitoring and
tracking, and - Collaborative systems
- are developed and deployed in a timely manner,
and used effectively to meet the needs of the US
LHC and other major HENP Programs, as well as
the general needs of our scientific community. - To carry out these developments in a way that is
broadly applicable across many fields, within and
beyond the scientific community - Co-Chairs S. McKee (Michigan), H. Newman
(Caltech) With thanks to R. Gardner and J.
Williams (Indiana)
15Grid RD Focal Areas for NPACI/HENP Partnership
- Development of Grid-Enabled User Analysis
Environments - CLARENS (IGUANA) Project for Portable
Grid-Enabled Event Visualization, Data
Processing and Analysis - Object Integration backed by an ORDBMS, and
File-Level Virtual Data Catalogs - Simulation Toolsets for Systems Modeling,
Optimization - For example the MONARC System
- Globally Scalable Agent-Based Realtime
Information Marshalling Systems - To face the next-generation challenge of
DynamicGlobal Grid design and operations - Self-learning (e.g. SONN) optimization
- Simulation (Now-Casting) enhanced to monitor,
track and forward predict site, network and
global system state - 1-10 Gbps Networking development and global
deployment - Work with the TeraGrid, STARLIGHT, Abilene, the
iVDGL GGGOC, HENP Internet2 WG, Internet2 E2E,
and DataTAG - Global Collaboratory Development e.g. VRVS,
Access Grid
16CLARENS a Data AnalysisPortal to the Grid
Steenberg (Caltech)
- A highly functional graphical interface,
Grid-enabling the working environment for
non-specialist physicists data analysis - Clarens consists of a server communicating with
various clients via the commodity XML-RPC
protocol. This ensures implementation
independence. - The server is implemented in C to give access
to the CMS OO analysis toolkit. - The server will provide a remote API to Grid
tools - Security services provided by the Grid (GSI)
- The Virtual Data Toolkit Object collection
access - Data movement between Tier centers using GSI-FTP
- CMS analysis software (ORCA/COBRA)
- Current prototype is running on the Caltech
Proto-Tier2 - More information at http//heppc22.hep.caltech.edu
, along with a web-based demo
17Modeling and SimulationMONARC System
- Modelling and understanding current systems,
their performance and limitations, is essential
for the design of the future large scale
distributed processing systems. - The simulation program developed within the
MONARC (Models Of Networked Analysis At Regional
Centers) project is based on a process oriented
approach for discrete event simulation. It is
based on the on Java(TM) technology and provides
a realistic modelling tool for such large scale
distributed systems.
SIMULATION of Complex Distributed Systems
18MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 9)
ltEgt 0.73
ltEgt 0.83
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
ltEgt 0.66
Day 9
19Maximizing US-CERN TCP Throughput (S.Ravot,
Caltech)
- TCP Protocol Study Limits
- We determined Precisely
- The parameters which limit the throughput over
a high-BW, long delay (170 msec) network - How to avoid intrinsic limits unnecessary
packet loss - Methods Used to Improve TCP
- Linux kernel programming in order to tune TCP
parameters - We modified the TCP algorithm
- A Linux patch will soon be available
- Result The Current State of the Art for
Reproducible Throughput - 125 Mbps between CERN and Caltech
- 135 Mbps between CERN and Chicago
- Status Ready for Tests at Higher BW (622 Mbps)
in Spring 2002
Congestion window behavior of a TCP connection
over the transatlantic line
Reproducible 125 Mbps BetweenCERN and
Caltech/CACR
20Agent-Based Distributed System JINI Prototype
(Caltech/Pakistan)
- Includes Station Servers (static) that host
mobile Dynamic Services - Servers are interconnected dynamically to form a
fabric in which mobile agents travel, with a
payload of physics analysis tasks - Prototype is highly flexible and robust against
network outages - Amenable to deployment on leading edge and
future portable devices (WAP, iAppliances, etc.) - The system for the travelling physicist
- The Design and Studies with this prototype use
the MONARC Simulator, and build on SONN
studies? See http//home.cern.ch/clegrand/lia/
21Globally Scalable Monitoring Service
Lookup Service
Discovery
Lookup Service
Proxy
Client (other service)
Registration
Push Pull rsh ssh existing scripts snmp
RC Monitor Service
- Component Factory
- GUI marshaling
- Code Transport
- RMI data access
Farm Monitor
Farm Monitor
22Examples
- GLAST meeting
- 10 participants connected via VRVS (and 16
participants in Audio only)
VRVS 7300 Hosts 4300 Registered Users In 58
Countries 34 Reflectors 7 In I2 Annual Growth
250
US CMS will use the CDF/KEK remote control room
concept for Fermilab Run II as a starting point.
However, we will (1) expand the scope to
encompass a US based physics group and US LHC
accelerator tasks, and (2) extend the concept to
a Global Collaboratory for realtime data
acquisition analysis
23Next Round Grid Challenges Global Workflow
Monitoring, Management, and Optimization
- Workflow Management, Balancing Policy Versus
Moment-to-moment Capability to Complete Tasks - Balance High Levels of Usage of Limited Resources
Against Better Turnaround Times for Priority
Jobs - Goal-Oriented According to (Yet to be Developed)
Metrics - Maintaining a Global View of Resources and System
State - Global System Monitoring, Modeling,
Quasi-realtime simulation feedback on the
Macro- and Micro-Scales - Adaptive Learning new paradigms for execution
optimization and Decision Support (eventually
automated) - Grid-enabled User Environments
24PACI, TeraGrid and HENP
- The scale, complexity and global extent of the
LHC Data Analysis problem is unprecedented - The solution of the problem, using globally
distributed Grids, is mission-critical for
frontier science and engineering - HENP has a tradition of deploying new highly
functional systems (and sometimes new
technologies) to meet its technical and
ultimately its scientific needs - HENP problems are mostly embarrassingly
parallel but potentially overwhelming in their
data- and network intensiveness - HENP/Computer Science synergy has increased
dramatically over the last two years, focused on
Data Grids - Successful collaborations in GriPhyN, PPDG, EU
Data Grid - The TeraGrid (present and future) and its
development program is scoped at an appropriate
level of depth and diversity - to tackle the LHC and other Petascale
problems, over a 5 year time span - matched to the LHC time schedule, with full ops.
In 2007
25Some Extra Slides Follow
26Computing Challenges LHC Example
- Geographical dispersion of people and resources
- Complexity the detector and the LHC environment
- Scale Tens of Petabytes per year of data
5000 Physicists 250 Institutes 60
Countries
Major challenges associated with Communication
and collaboration at a distance Network-distribute
d computing and data resources Remote software
development and physics analysis RD New Forms
of Distributed Systems Data Grids
27Why Worldwide Computing? Regional Center Concept
Goals
- Managed, fair-shared access for Physicists
everywhere - Maximize total funding resources while meeting
the total computing and data handling needs - Balance proximity of datasets to large central
resources, against regional resources under more
local control - Tier-N Model
- Efficient network use higher throughput on short
paths - Local gt regional gt national gt international
- Utilizing all intellectual resources, in several
time zones - CERN, national labs, universities, remote sites
- Involving physicists and students at their home
institutions - Greater flexibility to pursue different physics
interests, priorities, and resource allocation
strategies by region - And/or by Common Interests (physics topics,
subdetectors,) - Manage the Systems Complexity
- Partitioning facility tasks, to manage and focus
resources
28HENP Related Data Grid Projects
- Funded Projects
- PPDG I USA DOE 2M 1999-2001
- GriPhyN USA NSF 11.9M 1.6M 2000-2005
- EU DataGrid EU EC 10M 2001-2004
- PPDG II (CP) USA DOE 9.5M 2001-2004
- iVDGL USA NSF 13.7M 2M 2001-2006
- DataTAG EU EC 4M 2002-2004
- About to be Funded Project
- GridPP UK PPARC gt15M? 2001-2004
- Many national projects of interest to HENP
- Initiatives in US, UK, Italy, France, NL,
Germany, Japan, - EU networking initiatives (Géant, SURFNet)
- US Distributed Terascale Facility (53M, 12
TFL, 40 Gb/s network)
in final stages of approval
29Network Progress andIssues for Major Experiments
- Network backbones are advancing rapidly to the 10
Gbps range Gbps end-to-end data flows will
soon be in demand - These advances are likely to have a profound
impacton the major physics Experiments
Computing Models - We need to work on the technical and political
network issues - Share technical knowledge of TCP Windows,
Multiple Streams, OS kernel issues Provide User
Toolset - Getting higher bandwidth to regions outside W.
Europe and US China, Russia, Pakistan, India,
Brazil, Chile, Turkey, etc. - Even to enable their collaboration
- Advanced integrated applications, such as Data
Grids, rely onseamless transparent operation
of our LANs and WANs - With reliable, quantifiable (monitored), high
performance - Networks need to become part of the Grid(s)
design - New paradigms of network and system
monitoringand use need to be developed, in the
Grid context
30Grid-Related RD Projects in CMS Caltech, FNAL,
UCSD, UWisc, UFl
- Installation, Configuration and Deployment of
Prototype Tier2 Centers at Caltech/UCSD and
Florida - Large Scale Automated Distributed Simulation
Production - DTF TeraGrid (Micro-)Prototype CIT, Wisconsin
Condor, NCSA - Distributed MOnte Carlo Production (MOP) FNAL
- MONARC Distributed Systems Modeling
Simulation system applications to Grid Hierarchy
management - Site configurations, analysis model, workload
- Applications to strategy development e.g.
inter-siteload balancing using a Self
Organizing Neural Net (SONN) - Agent-based System Architecture for
DistributedDynamic Services - Grid-Enabled Object Oriented Data Analysis
31MONARC Simulation System Validation
CMS Proto-Tier1 Production Farm at FNAL
CMS Farm at CERN
32MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 0)
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
Day 0
33US CMS Remote Control RoomFor LHC
34Full Event Database of 100,000 large objects
Denver Client
Full Event Database of 40,000 large objects
?
?
?
Request
?
Request
?
?
Parallel tuned GSI FTP
Parallel tuned GSI FTP
Tag database of 140,000 small objects
Bandwidth Greedy Grid-enabled Object Collection
Analysis for Particle Physics (SC2001
Demo) Julian Bunn, Ian Fisk, Koen Holtman, Harvey
Newman, James Patton
The object of this demo is to show grid-supported
interactive physics analysis on a set of 144,000
physics events. Initially we start out with
144,000 small Tag objects, one for each event, on
the Denver client machine. We also have 144,000
LARGE objects, containing full event data,
divided over the two tier2 servers.
? Using local Tag event database, user plots
event parameters of interest ? User selects
subset of events to be fetched for further
analysis ? Lists of matching events sent to
Caltech and San Diego ? Tier2 servers begin
sorting through databases extracting required
events ? For each required event, a new large
virtual object is materialized in the server-side
cache, this object contains all tracks in the
event. ? The database files containing the new
objects are sent to the client using Globus FTP,
the client adds them to its local cache of
large objects ? The user can now plot event
parameters not available in the Tag ? Future
requests take advantage of previously cached
large objects in the client
http//pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall
_JJB.htm