Harvey B Newman, Professor of Physics

About This Presentation

Title:

Harvey B Newman, Professor of Physics

Description:

Much of the Traffic: SLAC IN2P3/RAL/INFN; via ESnet France; Abilene CERN ... San Diego. DTF Backplane: 4 X 10 Gbps. Abilene. Chicago. Indianapolis. Urbana ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 50

Provided by: cms574

Learn more at: https://www.hep.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Harvey B Newman, Professor of Physics

1
HENP Grids and Networks Global Virtual
Organizations

Harvey B Newman, Professor of Physics
LHCNet PI, US CMS Collaboration Board Chair
Drivers of the Formation of the Information
SocietyApril 18, 2003

2
Computing Challenges Petabyes, Petaflops,
Global VOs

Geographical dispersion of people and resources
Complexity the detector and the LHC environment
Scale Tens of Petabytes per year of data

5000 Physicists 250 Institutes 60
Countries
Major challenges associated with Communication
and collaboration at a distance Managing globally
distributed computing data resources
Cooperative software development and physics
analysis New Forms of Distributed Systems Data
Grids
3
Next Generation Networks for Experiments Goals
and Needs
Large data samples explored and analyzed by
thousands of globally dispersed scientists, in
hundreds of teams

Providing rapid access to event samples, subsets
and analyzed physics results from massive data
stores
From Petabytes in 2003, 100 Petabytes by 2008,
to 1 Exabyte by 2013.
Providing analyzed results with rapid turnaround,
bycoordinating and managing the large but
LIMITED computing, data handling and NETWORK
resources effectively
Enabling rapid access to the data and the
collaboration
Across an ensemble of networks of varying
capability
Advanced integrated applications, such as Data
Grids, rely on seamless operation of our LANs
and WANs
With reliable, monitored, quantifiable high
performance

4
LHC Collaborations
CMS
ATLAS
US
The US provides about 20-25 of the author list
in both experiments
5
US LHC INSTITUTIONS
US CMS
Accelerator
US ATLAS
6
Four LHC Experiments The
Petabyte to Exabyte Challenge

ATLAS, CMS, ALICE, LHCBHiggs New particles
Quark-Gluon Plasma CP Violation

Data stored 40 Petabytes/Year and UP
CPU 0.30 Petaflops and UP 0.1
to 1 Exabyte (1 EB 1018
Bytes) (2008) (2013 ?) for the LHC
Experiments
7
LHC Higgs Decay into 4 muons (Tracker only)
1000X LEP Data Rate
109 events/sec, selectivity 1 in 1013 (1 person
in a thousand world populations)
8
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
PByte/sec
100-1500 MBytes/sec
Online System
Experiment
CERN 700k SI95 1 PB Disk Tape Robot
Tier 0 1
HPSS
2.5-10 Gbps
Tier 1
FNAL 200k SI95 600 TB
IN2P3 Center
INFN Center
RAL Center
2.5-10 Gbps
Tier 2
2.5-10 Gbps
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
0.110 Gbps
Physics data cache
Tier 4
Workstations
9
Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements

Installed BW. Maximum Link Occupancy 50
Assumed See http//gate.hep.anl.gov/lprice/TAN
10
History One large Research Site
Much of the TrafficSLAC ? IN2P3/RAL/INFNvia
ESnetFranceAbileneCERN
Current Traffic 400 MbpsESNet
LimitationProjections 0.5 to 24 Tbps by 2012
11
Progress Max. Sustained TCP Thruput on
Transatlantic and US Links

8-9/01 105 Mbps 30 Streams SLAC-IN2P3 102
Mbps 1 Stream CIT-CERN
11/5/01 125 Mbps in One Stream (modified
kernel) CIT-CERN
1/09/02 190 Mbps for One stream shared on 2
155 Mbps links
3/11/02 120 Mbps Disk-to-Disk with One Stream
on 155 Mbps link (Chicago-CERN)
5/20/02 450-600 Mbps SLAC-Manchester on OC12
with 100 Streams
6/1/02 290 Mbps Chicago-CERN One Stream on
OC12 (mod. Kernel)
9/02 850, 1350, 1900 Mbps Chicago-CERN
1,2,3 GbE Streams, OC48 Link
11-12/02 FAST 940 Mbps in 1 Stream
SNV-CERN 9.4
Gbps in 10 Flows SNV-Chicago

Also see http//www-iepm.slac.stanford.edu/monitor
ing/bulk/ and the Internet2 E2E Initiative
http//www.internet2.edu/e2e
12
DataTAG Project
NewYork
ABILENE
STARLIGHT
ESNET
2.5 to 10G
GENEVA
Wave Triangle
10 G
10G
CALREN2
STAR-TAP

EU-Solicited Project. CERN, PPARC (UK), Amsterdam
(NL), and INFN (IT)and US (DOE/NSF UIC, NWU
and Caltech) partners
Main Aims
Ensure maximum interoperability between US and EU
Grid Projects
Transatlantic Testbed for advanced network
research
2.5 Gbps Wavelength Triangle from 7/02 to 10
Gbps Triangle by Early 2003

13
FAST (Caltech) A Scalable, Fair Protocol for
Next-Generation Networks from 0.1 To 100 Gbps
SC2002 11/02
Highlights of FAST TCP

Standard Packet Size
940 Mbps single flow/GE card
9.4 petabit-m/sec
1.9 times LSR
9.4 Gbps with 10 flows
37.0 petabit-m/sec
6.9 times LSR
22 TB in 6 hours in 10 flows
Implementation
Sender-side (only) mods
Delay (RTT) based
Stabilized Vegas

Sunnyvale-Geneva
Baltimore-Geneva
Baltimore-Sunnyvale
SC2002 10 flows
SC2002 2 flows
I2 LSR
29.3.00 multiple
SC2002 1 flow
9.4.02 1 flow
22.8.02 IPv6
URL netlab.caltech.edu/FAST
Next 10GbE 1 GB/sec disk to disk
C. Jin, D. Wei, S. Low FAST Team Partners
14
FAST TCP Baltimore/Sunnyvale
88

RTT estimation fine-grain timer
Fast convergence to equilibrium
Delay monitoring in equilibrium
Pacing reducing burstiness

10G
90
9G
90

Measurements
Std Packet Size
Utilization averaged over gt 1hr
3000 km Path

Average utilization
92
8.6 Gbps 21.6 TB in 6 Hours
95
Fair SharingFast Recovery
1 flow 2 flows 7 flows
9 flows 10 flows
15
On Feb. 27-28, a Terabyte of data was transferred
in 3700 seconds by S. Ravot of Caltech between
the Level3 PoP in Sunnyvale near SLAC and CERN
through the TeraGrid router at StarLight from
memory to memory As a single TCP/IP stream at
average rate of 2.38 Gbps. (Using large
windows and 9kB Jumbo frames)This beat the
former record by a factor of 2.5, and used
the US-CERN link at 99 efficiency.
10GigE Data Transfer Trial Internet2 LSR 2003
European Commission
10GigE NIC
16
TeraGrid (www.teragrid.org)NCSA, ANL, SDSC,
Caltech, PSC
Abilene
Chicago
DTF Backplane 4 X 10 Gbps
Indianapolis
Urbana
Caltech
Starlight / NW Univ
UIC
San Diego
I-WIRE
Multiple Carrier Hubs
Ill Inst of Tech
ANL
A Preview of the Grid Hierarchyand Networks of
the LHC Era Higgs Study at Caltech and
Productionwith FAST TCP is a Flagship
TeraGridAppication
Univ of Chicago
OC-48 (2.5 Gb/s, Abilene)
Indianapolis (Abilene NOC)
Multiple 10 GbE (Qwest)
NCSA/UIUC
Multiple 10 GbE (I-WIRE Dark Fiber)
Source Charlie Catlett, Argonne
17
National Light Rail Footprint

NLR
Buildout Started November 2002
Initially 4 10G Wavelengths
To 40 10G Waves in Future

Transition now to optical, multi-wavelength RE
networks US, Europe and Intercontinental
(US-China-Russia) InitiativesHEP is the
universally recognized leading application for
initial NLR use
18
HENP Major Links Bandwidth Roadmap (Scenario)
in Gbps
Continuing the Trend 1000 Times Bandwidth
Growth Per Decade We are Learning to Use and
Share Multi-Gbps Networks Efficiently HENP is
leading the way towards future networks dynamic
Grids
19
HENP Lambda GridsFibers for Physics

Problem Extract Small Data Subsets of 1 to 100
Terabytes from 1 to 1000 Petabyte Data Stores
Survivability of the HENP Global Grid System,
with hundreds of such transactions per day
(circa 2007)requires that each transaction be
completed in a relatively short time.
Example Take 800 secs to complete the
transaction. Then
Transaction Size (TB) Net
Throughput (Gbps)
1
10
10
100
100
1000 (Capacity of
Fiber
Today)
Summary Providing Switching of 10 Gbps
wavelengthswithin 3-5 years and Terabit
Switching within 5-8 yearswould enable
Petascale Grids with Terabyte transactions,as
required to fully realize the discovery potential
of major HENP programs, as well as other
data-intensive fields.

20
Emerging Data Grid User Communities

NSF Network for Earthquake Engineering Simulation
(NEES)
Integrated instrumentation, collaboration,
simulation
Grid Physics Network (GriPhyN)
ATLAS, CMS, LIGO, SDSS
Access Grid VRVS supporting group-based
collaboration
And
Genomics, Proteomics, ...
The Earth System Grid and EOSDIS
Federating Brain Data
Computed MicroTomography
Virtual Observatories

21
Particle Physics Data GridCollaboratory Pilot
(2001-2003)
The Particle Physics Data Grid Collaboratory
Pilot will develop, evaluate and deliver vitally
needed Grid-enabled tools for data-intensive
collaboration in particle and nuclear physics.
Novel mechanisms and policies will be integrated
with Grid Middleware, experiment specific
applications and computing resources to provide
effective end-to-end capability.

A Strong Computer Science/PhysicsPartnership
Reflected in a groundbreakingDOE MICS/HENP
Partnership
Driving Forces Now-running future
experiments, ongoing CS projects leading-edge
Grid develoments
Focus on End-to-end services
Integration of Grid Middleware with
Experiment-specific Components
Security Authenticate, -orize, Allocate
Practical Orientation Monitoring,
instrumentation, networks

22
PPDG Mission and Focii Today

Mission Enabling new scales of research in
experimental physics and experimental computer
science
Advancing Grid Technologies by addressing key
issues in architecture, integration, deployment
and robustness
Vertical Integration of Grid Technologies into
the Applications frameworks of Major
Experimental Programs
Ongoing as Grids, Networks and Applications
Progress
Deployment, hardening and extensions of Common
Grid services and standards
Data replication, storage and job management,
monitoring and task execution-planning.
Mission-oriented, Interdisciplinary teams of
physicists, software and network engineers, and
computer scientists
Driven by demanding end-to-end applications of
experimental physics

23
PPDG Accomplishments (I)

First-Generation HENP Application Grids Grid
Subsystems
Production Simulation Grids for ATLAS and
CMSSTAR Distributed Analysis Jobs to 30
TBytes, 30 Sites
Data Replication for BaBar Terabyte Stores
systematically replicated from California to
France and the UK.
Replication and Storage Management for STAR and
JLAB Development and Deployment of Standard
APIs, andInteroperable Implementations.
Data Transfer, Job and Information Management for
D0 GridFTP integrated with SAM Condor-G job
scheduler, MDS resource discovery all integrated
with SAM.
Initial Security Infrastructure for Virtual
Organizations
PKI certificate management, policies and trust
relationships (using DOE Science Grid and
Globus)
Standardizing Authorization mechanisms standard
callouts for Local Center Authorization for
Globus, EDG
Prototyping secure credential stores
Engagement of site security teams

24
PPDG Accomplishments (II)

Data and Storage Management
Robust data transfer over heterogeneous networks
using standard and next-generation protocols
GridFTP, bbcp GridDT, FAST TCP
Distributed Data Replica management SRB, SAM,
SRM
Common Storage Management interface and services
across diverse implementations SRM - HPSS,
Jasmine, Enstore
Object Collection management in diverse RDBMSs
CAIGEE, SOCATS
Job Planning, Execution and Monitoring
Job scheduling based on resource discovery and
status Condor-G and extensions for strategy
policy
Retry and Fault Tolerance in response to error
conditionshardened gram, gass-cache, ftsh
Condor-G
Distributed monitoring infrastructure for system
tracking, resource discovery, resource job
information Monalisa, MDS, Hawkeye
Prototypes and Evaluations
Grid enabled physics analysis tools prototypical
environments
End to end troubleshooting and fault handling
Cooperative Monitoring of Grid, Fabric,
Applications

25
WorldGrid EU-US InteroperationOne of 24 Demos
at SC2002
Collaborating with iVDGL and DataTAG on
international grid testbeds will lead to easier
deployment of experiment grids across the globe.
26
PPDG Collaborators Participated in 24 SC2002 Demos
27
CAIGEE
Progress in Interfacing AnalysisTools to the
Grid CAIGEE

CMS Analysis an Integrated Grid Enabled
Environment

Lightweight, functional, making use of existing
software AFAP
Plug-in Architecture based on Web Services
Expose Grid Views of the Global System to
physicists at various levels of detail,with
Feedback
Supports Data Requests, Preparation, Production,
Movement, Analysis of Physics Object Collections
Initial Target US-CMS physicists in California
(CIT, UCSD, Riverside, Davis, UCLA)
Expand to Include FIU, UF, FSU, UERJ
Future Whole US CMS and CMS

28
CAIGEE Draft Architecture

Multiplatform, Light Client
Object Collection Access
Interface to (O)RDBMSs
Use of Web Services

CAIGEEARCHITECTURE
29
PPDG Common Technologies, Tools Applications
Who Uses What ?
30
PPDG Inter-ProjectCollaborations and
Interactions
US Physics Grid Projects Virtual Data Toolkit
GriPhyN, iVDGL, PPDG the Trillium EU-US
Physics Grids LHC Computing Grid ATLAS,
CMS European Data Grid - BaBar, D0, ATLAS,
CMSHEP Intergrid Coordination Board SciDAC
projects Interaction Earth System Grid II SRM DOE
Science Grid CA, RA A High Performance DataGrid
Toolkit GlobusToolkit Storage Resource Mngmnt for
DG Apps SRM Security and Policy for Group
Collaborations GSI, CAS Scientific Data
Management Center STAR SDM Bandwidth Estimation
Measurement IEPEM-BW Methologies and
Applications A Natl Computational Infrastructure
for TJNAF/LQCD Lattice Gauge Theory Distributed
Monitoring Framework Netlogger, Glue Schema
31
PPDG CP GriPhyN Virtual Data Grids

Users View of a PVDG (PPDG-CP Proposal
April 2000)

32
HENP Data Grids Versus Classical Grids

The original Computational and Data Grid concepts
are largely stateless, open systems known to
be scalable
Analogous to the Web
The classical Grid architecture has a number of
implicit assumptions
The ability to locate and schedule suitable
resources, within a tolerably short time (i.e.
resource richness)
Short transactions with relatively simple
failure modes
HEP Grids are Data Intensive, and
Resource-Constrained
1000s of users competing for resources at 100s
of sites
Resource usage governed by local and global
policies
Long transactions some long queues
Need Realtime Monitoring and Tracking
Distributed failure modes ?Strategic task
management

33
Layered Grid Architecture
Coordinating multiple resources ubiquitous
infrastructure services, app-specific distributed
services
Collective
Sharing single resources nego- tiating
access, controlling use
Resource
Talking to things communicatn (Internet
protocols) security
Connectivity
Controlling things locally Access to,
control of, resources
Fabric
The Anatomy of the Grid Enabling Scalable
Virtual Organizations, Foster, Kesselman,
Tuecke, Intl J. High Performance Computing
Applications, 15(3), 2001.
34
Current Grid Challenges SecureWorkflow
Management and Optimization

Maintaining a Global View of Resources and System
State
Coherent end-to-end System Monitoring
Adaptive Learning new algorithms and
strategiesfor execution optimization
(increasingly automated)
Workflow Strategic Balance of Policy Versus
Moment-to-moment Capability to Complete Tasks
Balance High Levels of Usage of Limited Resources
Against Better Turnaround Times for Priority
Jobs
Goal-Oriented Algorithms Steering Requests
According to (Yet to be Developed) Metrics
Handling User-Grid Interactions Guidelines
Agents
Building Higher Level Services, and an
IntegratedScalable User Environment for the Above

35
HENP Grid Architecture Layers Above the
Collective Layer

Physicists Application Codes
Reconstruction, Calibration, Analysis
Experiments Software Framework Layer
Modular and Grid-aware Architecture able to
interact effectively with the lower layers
(above)
Grid Applications Layer (Parameters and
algorithms that govern system operations)
Policy and priority metrics
Workflow evaluation metrics
Task-Site Coupling proximity metrics
Global End-to-End System Services Layer
Workflow monitoring and evaluation mechanisms
Error recovery and long-term redirection
mechanisms
System self-monitoring, steering, evaluation and
optimisation mechanisms
Monitoring and Tracking Component performance

36
Distributed System Services Architecture (DSSA)
CIT/Romania/Pakistan

Agents Autonomous, Auto-discovering,
self-organizing, collaborative, adaptive
Station Servers (static) host mobile Dynamic
Services
Servers interconnect dynamically form a robust
fabric in which mobile agents travel, with a
payload of (analysis) tasks
Adaptable to Web services JINI/Javaspaces
WSDL/UDDIOGSA Integration/Migration planned
Adaptable to Ubiquitous Working Environments

Managing Global Data Intensive Systems Requires A
New Generation of Scalable, Intelligent Software
Systems
37
MonaLisa A Globally Scalable Grid Monitoring
System

Deployed on US CMS Test GridCERN, Bucharest,
Taiwan, Pakistan
Agent-based Dynamic information / resource
discovery mechanism
Talks w/Other Monitoring Systems MDS, Hawkeye
Implemented in
Java/Jini SNMP
WDSL / SOAP with UDDI
For a Global Grid Monitoring Service

38
MONARC SONN 3 Regional Centres Learning to
Export Jobs (Day 9)
ltEgt 0.73
ltEgt 0.83
1MB/s 150 ms RTT
CERN30 CPUs
CALTECH 25 CPUs
1.2 MB/s 150 ms RTT
0.8 MB/s 200 ms RTT
NUST 20 CPUs
ltEgt 0.66
Simulations for Strategy Development Self-Learnin
g Algorithms for Optimization are Key Elements
of Globally Scalable Grids
Optimized Day 9
39
Grids and Open Standardsthe Move to OGSA
App-specific Services
Open Grid Services Arch
Web services
Increased functionality, standardization
GGF OGSI, ( OASIS, W3C) Multiple
implementations, including Globus Toolkit
X.509, LDAP, FTP,
Globus Toolkit
Defacto standards GGF GridFTP, GSI
Custom solutions
Time
40
OGSA ExampleReliable File Transfer Service

A standard substrate the Grid service
Standard interfaces behaviors to address key
distributed system issues
Refactoring, extension of Globus Toolkit
protocol suite

Client
Client
Client
Request manage file transfer operations
File Transfer
Internal State
Data transfer operations
41
2003 ITR Proposals Globally EnabledAnalysis
Communities

Develop and build Dynamic Workspaces
Construct Autonomous Communities Operating
Within Global Collaborations
Build Private Grids to support scientific
analysis communities
e.g. Using Agent Based Peer-to-peer Web
Services
Drive the democratization of science via the
deployment of new technologies
Empower small groups of scientists (Teachers
and Students) to profit from and contribute
to intl big science

42
Private Grids and P2P Sub-Communities in Global
CMS
43
GECSR

Initial targets are the global HENP
collaborations, but GESCR is expected to be
widely applicable to other large scale
collaborative scientific endeavors
Giving scientists from all world regions the
means to function as full partners in the
process of search and discovery

The importance of Collaboration Services is
highlighted in the Cyberinfrastructure report of
Atkins et al. 2003
44
A Global Grid Enabled Collaboratory for
Scientific Research (GECSR)

A joint ITR proposal from
Caltech (HN PI,JBCoPI)
Michigan (CoPI,CoPI)
Maryland (CoPI)
and Senior Personnel from
Lawrence Berkeley Lab
Oklahoma
Fermilab
Arlington (U. Texas)
Iowa
Florida State

The first Grid-enabled Collaboratory Tight
integration between
Science of Collaboratories,
Globally scalable working environment
A Sophisticated Set of Collaborative
Tools(VRVS, VNC Next-Gen)
Agent based monitoring and decision support
system (MonALISA)

45
14000 Hosts8000 Registered Users in 64
Countries 56 (7 I2) Reflectors Annual Growth 2
to 3X
46
Building Petascale Global GridsImplications for
Society

Meeting the challenges of Next Generation
Petabyte-to-Exabyte Grids, and Terascale data
transactions over Gigabit-to-Terabit Networks,
will transform research in science and
engineering
These developments will create the first truly
global virtual organizations (GVO)
They could also be the model for the data
intensive business processes of future
corporations
If these developments are successful this could
lead to profound advances in industry, commerce
and society at large
By changing the relationship between people
and persistent information in their daily
lives
Within the next five to ten years
HENP is leading these developments, together
with leading computing scientists

47
Networks, Grids, HENP and WAN-in-Lab

Current generation of 2.5-10 Gbps network
backbones arrived in the last 15 Months in the
US, Europe and Japan
Major transoceanic links also at 2.5 - 10 Gbps
in 2003
Capability Increased 4 Times, i.e. 2-3 Times
Moores
Reliable high End-to-end Performance of network
applications(large file transfers Grids) is
required. Achieving this requires
A Deep understanding of Protocol Issues, for
efficient use
Getting high performance (TCP) toolkits in
users hands
End-to-end monitoring a coherent approach
Removing Regional, Last Mile Bottlenecks and
Compromises in Network Quality are now On the
critical path, in all regions
HENP is working in Concert with Internet2,
TERENA, AMPATH APAN DataTAG, the Grid
projects and the Global Grid Forum to solve
these problems

48
ICFA Standing Committee on Interregional
Connectivity (SCIC)

Created by ICFA in July 1998 in Vancouver
Following ICFA-NTF
CHARGE
Make recommendations to ICFA concerning the
connectivity between the Americas, Asia and
Europe (and network requirements of HENP)
As part of the process of developing
theserecommendations, the committee should
Monitor traffic
Keep track of technology developments
Periodically review forecasts of future
bandwidth needs, and
Provide early warning of potential problems
Create subcommittees when necessary to meet the
charge
Representatives Major labs, ECFA, ACFA, NA
Users, S. America
The chair of the committee should report to ICFA
once peryear, at its joint meeting with
laboratory directors (Feb. 2003)

49
SCIC in 2002-3A Period of Intense Activity

Formed WGs in March 2002 9 Meetings in 12 Months
Strong Focus on the Digital Divide
Presentations at Meetings and Workshops(e.g.
LISHEP, APAN, AMPATH, ICTP and ICFA Seminars)
HENP more visible to governments in the WSIS
Process
Five Reports Presented to ICFA Feb. 13,2003See
http//cern.ch/icfa-scic
Main Report Networking for HENP H. Newman et
al.
Monitoring WG Report L. Cottrell
Advanced Technologies WG Report R. Hughes-Jones,

O. Martin et al.
Digital Divide Report A. Santoro et al.
Digital Divide in Russia Report V. Ilyin

50
SCIC Work in 2003

Continue Digital Divide Focus
Improve and Systematize Information in Europe
in Cooperation with TERENA and SERENATE
More in-depth information on Asia, with APAN
More in-depth information on South America, with
AMPATH
Begin Work on Africa, with ICTP
Set Up HENP Networks Web Site and Database
Share Information on Problems, Pricing Example
Solutions
Continue and if Possible Strengthen Monitoring
Work (IEPM)
Continue Work on Specific Improvements
Brazil and So. America Romania Russia India
Pakistan, China
An ICFA-Sponsored Statement at the World Summit
on the Information Society (12/03 in Geneva),
prepared by SCIC CERN
Watch Requirements the Lambda Grid
Analysis revolutions
Discuss, Begin to Create a New Culture of
Collaboration