Title: Issues%20for%20Grids%20and%20WorldWide%20Computing
1- Issues for Grids and WorldWide Computing
-
- Harvey B NewmanCalifornia Institute of
Technology - ACAT2000Fermilab, October 19, 2000
2LHC Vision Data Grid Hierarchy
1 Bunch crossing 17 interactions per 25 nsecs
100 triggers per second. Event is 1 MByte in
size
PBytes/sec
100 MBytes/sec
Online System
Experiment
Offline Farm,CERN Computer Ctr gt 20 TIPS
Tier 0 1
HPSS
0.6-2.5 Gbits/sec
Tier 1
FNAL Center
Italy Center
UK Center
FranceCenter
2.5 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
3US-CERN Link BW RequirementsProjection
(PRELIMINARY)
Includes 1.5 Gbps Each for ATLAS and CMS,
Plus Babar, Run2 and Other D0 and CDF at
Run2 Needs Presumed to Be to be Comparable
to BaBar
4Grids The Broader Issues and Requirements
- A New Level of Intersite Cooperation,
andResource Sharing - Security and Authentication Across World-Region
Boundaries - Start with cooperation among Grid Projects
(PPDG, GriPhyN, EU DataGrid, etc.) - Develop Methods for Effective HEP/CS
Collaboration In Grid and VDT Design - Joint Design and Prototyping Effort, with
(Iterative) Design Specifications - Find an Appropriate Level of Abstraction
- Adapted to gt 1 Experiment gt 1 Working
Environment - Be Ready to Adapt to the Coming Revolutions
- In Network, Collaborative, and Internet
Information Technologies
5BaBar
HENPGCUsers
D0
Condor Users
BaBar Data Management
HENP GC
D0 Data Management
Condor
PPDG
SRB Users
CDF
CDF Data Management
SRB Team
Globus Team
Nucl Physics Data Management
Atlas Data Management
CMS Data Management
Nuclear Physics
Globus Users
CMS
Atlas
6GriPhyN PetaScale Virtual Data Grids
- Build the Foundation for Petascale Virtual Data
Grids
Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
Security and
Management
Policy
Services
Services
Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
7EU-Grid ProjectWork Packages
?
?
?
?
?
?
?
?
8Grid Issues A Short List of Coming Revolutions
- Network Technologies
- Wireless Broadband (from ca. 2003)
- 10 Gigabit Ethernet (from 2002 See
www.10gea.org) 10GbE/DWDM-Wavelength (OC-192)
integration OXC - Internet Information Software Technologies
- Global Information Broadcast Architecture
- E.g the Multipoint Information Distribution
Protocol (MIDP Tie.Liao_at_inria.fr) - Programmable Coordinated Agent Archtectures
- E.g. Mobile Agent Reactive Spaces (MARS) by
Cabri et al., Univ. of Modena - The Data Grid - Human Interface
- Interactive monitoring and control of Grid
resources - By authorized groups and individuals
- By Autonomous Agents
9CAnet 3 National Optical Internetin Canada
Consortium Partners Bell Nexxia Nortel Cisco JDS
Uniphase Newbridge
CAnet 3 Primary Route
CAnet 3 Diverse Route
GigaPOP
ORAN
Deploying a 4 channel CWDM Gigabit Ethernet
network 400 km
Deploying a 4 channel Gigabit Ethernet
transparent optical DWDM 1500 km
Multiple Customer Owned Dark Fiber Networks
connecting universities and schools
Condo Fiber Network linking all universities and
hospital
Condo Dark Fiber Networks connecting universities
and schools
Netera
MRnet
SRnet
ACORN
St. Johns
BCnet
Calgary
Regina
Winnipeg
Charlottetown
RISQ
ONet
Fredericton
16 channel DWDM -8 wavelengths _at_OC-192 reserved
for CANARIE -8 wavelengths for carrier and other
customers
Montreal
Vancouver
Halifax
Ottawa
Seattle
STAR TAP
Toronto
Los Angeles
Chicago
New York
10CAnet 4 Possible Architecture
Optional Layer 3 aggregation service
Dedicated Wavelength or SONET channel
St. Johns
Regina
Winnipeg
Charlottetown
Calgary
Europe
Montreal
Large channel WDM system
OBGP switches
Fredericton
Halifax
Seattle
Ottawa
Vancouver
Chicago
New York
Toronto
Los Angeles
Miami
11OBGP Traffic Engineering - Physical
Tier 1 ISP
Tier 2 ISP
Intermediate ISP
Router redirects networks with heavy traffic load
to optical switch, but routing policy still
maintained by ISP
AS 5
Optical switch looks like BGP router and AS1 is
direct connected to Tier 1 ISP but still transits
AS 5
Red Default Wavelength
AS 4
AS 3
AS 1
AS 2
Bulk of AS 1 traffic is to Tier 1 ISP
For simplicity only data forwarding paths in one
direction shown
Dual Connected Router to AS 5
12 VRVS Remote Collaboration System
Statistics
30 Reflectors52 Countries
Mbone, H.323, MPEG2 Streaming, VNC
13VRVS Mbone/H.323/QT Snapshot
14VRVS RD Sharing Desktop
- VNC technology integrated in the upcoming VRVS
release
15Worldwide Computing Issues
- Beyond Grid Prototype Components Integration of
Grid Prototypes for End-to-end Data Transport - Particle Physics Data Grid (PPDG) ReqM SAM in D0
- PPDG/EU DataGrid GDMP for CMS HLT Productions
- Start Building the Grid System(s) Integration
with Experiment-specific software frameworks - Derivation of Strategies (MONARC Simulation
System) - Data caching, query estimation, co-scheduling
- Load balancing and workload management amongst
Tier0/Tier1/Tier2 sites (SONN by Legrand) - Transaction robustness simulate and verify
- Transparent Interfaces for Replica Management
- Deep versus shallow copies Thresholds
tracking, monitoring and control
16Grid Data Management Prototype (GDMP)
- Distributed Job Execution and Data
HandlingGoals - Transparency
- Performance
- Security
- Fault Tolerance
- Automation
Site A
Site B
Submit job
Replicate data
Job writes data locally
Replicate data
- Jobs are executed locally or remotely
- Data is always written locally
- Data is replicated to remote sites
Site C
17 MONARC Simulation Physics Analysis at
Regional Centres
- Similar data processing jobs are performed in
each of several RCs - There is profile of jobs,each submitted to a job
scheduler - Each Centre has TAGand AOD databases
replicated. - Main Centre provides ESD and RAW data
- Each job processes AOD data, and also aa
fraction of ESD and RAW data.
18ORCA Production on CERN/IT-LoanedEvent Filter
Farm Test Facility
HPSS
Total 24 Pile Up Servers
Lock Server
Pileup DB
Pileup DB
Pileup DB
Pileup DB
Lock Server
17 Servers
SUN
Pileup DB
Output Server
Pileup DB
9 Servers
Output Server
Pileup DB
...
FARM 140 Processing Nodes
The strategy is to use many commodity PCs as
Database Servers
19 Network Traffic Job efficiency
Measurement
Mean measured Value 48MB/s
Simulation
20From UserFederation To Private Copy
UF.boot
AMS
MyFED.boot
MC
User Collection
CH
MH
TH
MC
MD
CD
CD
MD
TD
ORCA 4 tutorial, part II - 14. October 2000
21Beyond Traditional ArchitecturesMobile Agents
Agents are objects with rules and legs -- D.
Taylor
Agent
Service
Agent
Application
- Mobile Agents (Semi)-Autonomous, Goal Driven,
Adaptive - Execute Asynchronously
- Reduce Network Load Local Conversations
- Overcome Network Latency Some Outages
- Adaptive ? Robust, Fault Tolerant
- Naturally Heterogeneous
- Extensible Concept Coordinated Agent
Architectures
22Coordination Architectures for Mobile Java Agents
- A lot of Progress since 1998
- Fourth Generation Architecture Associative
Blackboards - After 1) Client/Server, 2) Meeting-Oriented, 3)
Blackboards - Analogous to CMS ORCA software Observer-based
action on demand - MARS Mobile Agent Reactive Spaces (Cabri et
al.)See http//sirio.dsi.unimo.it/MOON - Resilient and Scalable Simple Implementation
- Works with standard Agent implementations (e.g.
Aglets http//www.trl.ibm.co.jp) - Data-oriented, to provide temporal and spatial
asynchronicity (See Java Spaces, Page Spaces) - Programmable, authorized reactions, based
onvirtual Tuple spaces
23Mobile Agent Reactive Spaces (MARS) Architecture
- MARS Programmed Reactions Based on Metalevel
4-Ples (Reaction, Tuple, Operation-Type,
Agent-ID) - Allows Security, Policies
- Allows Production of Tuple on Demand
NETWORK NODE
A
Agent Server
The Internet
NETWORK NODE
Reference to the local Tuple Space
B
C
Tuple Space
D
A Agents Arrive B They Get Ref. To Tuple
Space C They Access Tuple Space D Tuple Space
Reacts, with Programmed Behavior
MetaLevel Tuple space
24GRIDs In 2000 Summary
- Grids are (in) our Future Lets Get to Work
25Grid Data ManagementIssues
- Data movement and responsibility for updating
the Replica Catalog - Metadata update and replica consistency
- Concurrency and locking
- Performance characteristics of replicas
- Advance Reservation Policy, time-limit
- How to advertise policy and resource
availability - Pull versus push (strategy security)
- Fault tolerance recovery procedures
- Queue management
- Access control, both global and local