Towards a US and LHC Grid

About This Presentation

Title:

Towards a US and LHC Grid

Description:

February 12, 2000: Towards a US and LHC Grid Environment for ... acceptable turnaround times, and efficient resource utilisation. Problems to be Explored ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 39

Provided by: CMS199

Category:

more less

Transcript and Presenter's Notes

Title: Towards a US and LHC Grid

1

Towards a US (and LHC) Grid
Environment for HENP Experiments
CHEP 2000 Grid WorkshopHarvey B. Newman, Caltech
Padova
February 12, 2000

2
Data Grid Hierarchy Integration, Collaboration,
Marshal resources
1 TIPS 25,000 SpecInt95 PC (today) 10-15
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
100 MBytes/sec
Tier 0
CERN Computer Center
622 Mbits/sec
or Air Freight
Tier 1
Fermilab4 TIPS
France Regional Center
Italy Regional Center
Germany Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
3
To Solve the LHC Data Problem

The proposed LHC computing and data handling will
not support FREE access, transport or processing
for more than a small part of the data
Balance between proximity to large computational
and data handling facilities, and proximity
to end users and more local resources for
frequently-accessed datasets
Strategies must be studied and prototyped, to
ensure both acceptable turnaround times, and
efficient resource utilisation
Problems to be Explored
How to meet demands of hundreds of users who
need transparent access to local and remote
data, in disk caches and tape stores
Prioritise hundreds of requests of local and
remote communities, consistent with local and
regional policies
Ensure that the system is dimensioned/used/manag
ed optimally, for the mixed workload

4
Regional Center Architecture Example by I.
Gaines (MONARC)
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
5
Grid Services Architecture
Applns
HEP Data-Analysis Related Applications
Appln Toolkits
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, data discovery, etc.
Grid Fabric
Networks, data stores, computers, display
devices, etc. associated local services (local
implementations)
Adapted from Ian Foster
6
Grid Hierarchy Goals Better Resource Use and
Faster Turnaround

Grid integration and (de facto standard) common
services to ease development, operation,
management and security
Efficient resource use and improved
responsiveness through
Treatment of the ensemble of site and network
resourcesas an integrated (loosely coupled)
system
Resource discovery, query estimation
(redirection),
co-scheduling, prioritization, local and global
allocations
Network and site instrumentation performance
tracking, monitoring, forward-prediction,
problem trapping and handling

7
GriPhyN First Production Scale Grid Physics
Network

Develop a New Integrated Distributed System,
while Meeting Primary Goals of the US LIGO, SDSS
and LHC Programs
Unified GRID System Concept Hierarchical
Structure
Twenty Centers with Three Sub-Implementations
5-6 Each in US for LIGO, CMS, ATLAS 2-3 for
SDSS
Emphasis on Training, Mentoring and Remote
Collaboration
Focus on LIGO, SDSS (BaBar and Run2) handling
of real data, and LHC Mock Data Challenges with
simulated data
Making the Process of Discovery Accessible to
Students Worldwide
GriPhyN Web Site http//www.phys.ufl.edu/avery/m
re/
White Paper http//www.phys.ufl.edu/avery/mre/wh
ite_paper.html

8
Grid Development Issues

Integration of applications with Grid Middleware
Performance-oriented user application software
architectureis required, to deal with the
realities of data access and delivery
Application frameworks must work with system
state and policy information (instructions)
from the Grid
O(R)DBMSs must be extended to work across
networks
E.g. Invisible (to the DBMS) data transport,
and catalog update
Interfacility cooperation at a new level, across
world regions
Agreement on choice and implementation of
standard Grid components, services, security and
authentication
Interface the common services locally to match
with heterogeneous resources, performance levels,
and local operational requirements
Accounting and exchange of value software to
enable cooperation

9
Roles of Projectsfor HENP Distributed Analysis

RD45, GIOD Networked Object Databases
Clipper/GC High speed access to Objects
or File data FNAL/SAM for
processing and analysis
SLAC/OOFS Distributed File System
Objectivity Interface
NILE, Condor Fault Tolerant Distributed
Computing with Heterogeneous CPU Resources
MONARC LHC Computing Models Architecture,
Simulation, Strategy, Politics
PPDG First Distributed Data Services and
Data Grid System Prototype
ALDAP OO Database Structures and
Access Methods for Astrophysics and HENP
Data
GriPhyN Production-Scale Data Grid
Simulation/Modeling, Application Network
Instrumentation, System Optimization/Evaluation
APOGEE

10
Other ODBMS tests
Tests with Versant(fallback ODBMS)
DRO WAN Tests with CERN
Production on CERNs PCSF and file movement to
Caltech
Objectivity/DB Creation of 32000 database
federation
11
The China Clipper ProjectA Data Intensive Grid
ANL-SLAC-Berkeley

China Clipper Goal
Develop and demonstrate middleware allowing
applications transparent, high-speed access to
large data sets distributed over wide-area
networks.
? Builds on expertise and assets at ANL, LBNL
SLAC
? NERSC, ESnet
? Builds on Globus Middleware and
high-performance distributed storage
system (DPSS from LBNL)
? Initial focus on large DOE HENP applications
? RHIC/STAR, BaBar
? Demonstrated data rates to 57 Mbytes/sec.

12
Grand Challenge Architecture

An order-optimized prefetch architecture for data
retrieval from multilevel storage in a multiuser
environment
Queries select events and specific event
components based upon tag attribute ranges
Query estimates are provided prior to execution
Queries are monitored for progress, multi-use
Because event components are distributed over
several files, processing an event requires
delivery of a bundle of files
Events are delivered in an order that takes
advantage of what is already on disk, and
multiuser policy-based prefetching of further
data from tertiary storage
GCA intercomponent communication is CORBA-based,
but physicists are shielded from this layer

13
GCA System Overview
GCA STACS
File Catalog
Index
Staged event files
(Other) disk-resident event data
Event Tags
pftp
HPSS
14
STorage Access Coordination System (STACS)
Query Estimator
Query
Bit-Sliced Index
Estimate
List of file bundles and events
Policy Module
Query Monitor
File Bundles, Event lists
Query Status, Cache Map
Requests for file caching and purging
Pftp and file purge commands
Cache Manager
File Catalog

15
The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot

First Year Goal Optimized cached read access to
1-10 Gbytes, drawn from a total data set of
order One Petabyte

Multi-Site Cached File Access Service
16
The Particle Physics Data Grid (PPDG)

The ability to query and partially retrieve
hundreds of terabytes across Wide Area Networks
within seconds,
PPDG uses advanced services in three areas
Distributed caching to allow for rapid data
delivery in response to multiple requests
Matchmaking and Request/Resource co-scheduling
to manage workflow and use computing and net
resources efficiently to achieve high throughput
Differentiated Services to allow
particle-physics bulk data transport to coexist
with interactive and real-time remote
collaboration sessions, and other network
traffic.

17
PPDG Architecture for Reliable High Speed Data
Delivery
Resource Management
Object-based and File-based Application Services
File Replication Index
Matchmaking Service
File Access Service
Cost Estimation
Cache Manager
File Fetching Service
Mass Storage Manager
File Mover
File Mover
FutureFile and Object Export Cache State
Tracking Forward Prediction
End-to-End Network Services
Site Boundary
Security Domain
18
First Year PPDG System Components

Middleware Components (Initial Choice) See PPDG
Proposal
Object and File-Based Objectivity/DB (SLAC
enhanced) Application Services GC Query Object,
Event Iterator, Query Monitor
FNAL SAM System
Resource Management Start with Human
Intervention(but begin to deploy resource
discovery mgmnt tools Condor, SRB)
File Access Service Components of OOFS
(SLAC)
Cache Manager GC Cache Manager (LBNL)
Mass Storage Manager HPSS, Enstore, OSM
(Site-dependent)
Matchmaking Service Condor (U.
Wisconsin)
File Replication Index MCAT
(SDSC)
Transfer Cost Estimation Service Globus (ANL)
File Fetching Service Components of OOFS
File Movers(s)
SRB (SDSC) Site specific
End-to-end Network Services Globus tools for
QoS reservation
Security and authentication Globus (ANL)

19
CONDOR Matchmaking A Resource Allocation Paradigm

Parties use ClassAds to advertise properties,
requirements and ranking to a matchmaker
ClassAds are Self-describing (no separate schema)
ClassAds combine query and data

High Throughput Computing
http//www.cs.wisc.edu/condor
20
Remote Execution in Condor
Agents for Remote Execution in CONDOR
Execution
Submission
Request Queue
Owner Agent
Customer Agent
Object Files
Object Files
Execution Agent
Application Agent
Data Object Files
Ckpt Files
Application Process
Application Process
Remote I/O Ckpt
21
Beyond Traditional ArchitecturesMobile Agents
(Java Aglets)
Agents are objects with rules and legs -- D.
Taylor
Agent
Service
Agent
Application

Mobile Agents
Execute Asynchronously
Reduce Network Load Local Conversations
Overcome Network Latency Some Outages
Adaptive ? Robust, Fault Tolerant
Naturally Heterogeneous
Extensible Concept Agent Hierarchies

22
Using the Globus Tools

Tests with gsiftp, a modified ftp
server/client that allows control of the TCP
buffer size
Transfers of Objy database files from the
Exemplar to
Itself
An O2K at Argonne (via CalREN2 and Abilene)
A Linux machine at INFN (via US-CERN
Transatlantic link)
Target /dev/null in multiple streams (1 to 16
parallel gsiftp sessions).
Aggregate throughput as a function of number
of streams and send/receive buffer sizes

25 MB/sec on HiPPI loop-back
4MB/sec to Argonne by tuning TCP window size
Saturating available B/W to Argonne
23
Distributed Data Delivery and LHC Software
Architecture

Software Architectural Choices
Traditional, single-threaded applications
Wait for data location, arrival and reassembly
OR
Performance-Oriented (Complex)
I/O requests up-front multi-threaded data
driven respond to ensemble of (changing) cost
estimates
Possible code movement as well as data movement
Loosely coupled, dynamic

24
GriPhyN Foundation

Build on the Distributed System Results of the
GIOD, MONARC, NILE, Clipper/GC and PPDG Projects
Long Term Vision in Three Phases
1. Read/write access to high volume data and
processing power
Condor/Globus/SRB NetLogger components to
manage jobs and resources
2. WAN-distributed data-intensive Grid computing
system
Tasks move automatically to the most effective
Node in the Grid
Scalable implementation using mobile agent
technology
3. Virtual Data concept for multi-PB
distributed data management, with
large-scale Agent Hierarchies
Transparently match data to sites, manage data
replication or transport, co-schedule data
compute resources
Build on VRVS Developments for Remote
Collaboration

25
GriPhyN/APOGEE Production-Design of a Data
Analysis Grid

INSTRUMENTATION, SIMULATION, OPTIMIZATION,
COORDINATION
SIMULATION of a Production-Scale Grid Hierarchy
Provide a Toolset for HENP experiments to test
and optimize their data analysis and resource
usage strategies
INSTRUMENTATION of Grid Prototypes
Characterize the Grid components performance
under load
Validate the Simulation
Monitor, Track and Report system state, trends
and Events
OPTIMIZATION of the Data Grid
Genetic algorithms, or other evolutionary methods
Deliver optimization package for HENP distributed
systems
Applications to other experiments accelerator
and other control systems other fields
COORDINATE with Experiment-Specific Projects
CMS, ATLAS, BaBar, Run2

26
Grid (IT) Issues to be Addressed

Dataset compaction data caching and mirroring
strategies
Using large time-quanta or very high bandwidth
bursts, for large data transactions
Query estimators, Query Monitors (cf. GCA work)
Enable flexible, resilient prioritisation
schemes (marginal utility)
Query redirection, fragmentation, priority
alteration, etc.
Pre-Emptive and realtime data/resource
matchmaking
Resource discovery
Data and CPU Location Brokers
Co-scheduling and queueing processes
State, workflow, performance-monitoring
instrumentation tracking and forward
prediction
Security Authentication (for resource
allocation/usage and priority) running a
certificate authority

27
CMS Example Data Grid Program of Work (I)

FY 2000
Build basic services 1 Million event samples
on proto-Tier2s
For HLT milestones and detector/physics studies
with ORCA
MONARC Phase 3 simulations for
study/optimization
FY 2001
Set up initial Grid system based on PPDG
deliverables at the first Tier2 centers and
Tier1-prototype centers
High speed site-to-site file replication service
Multi-site cached file access
CMS Data Challenges in support of DAQ TDR
Shakedown of preliminary PPDG ( MONARC and
GIOD) system strategies and tools
FY 2002
Deploy Grid system at the second set of Tier2
centers
CMS Data Challenges for Software and Computing
TDR and Physics TDR

28
Data Analysis Grid Program of Work (II)

FY 2003
Deploy Tier2 centers at last set of sites
5-Scale Data Challenge in Support of Physics
TDR
Production-prototype test of Grid Hierarchy
System, with first elements of the production
Tier1 Center
FY 2004
20 Production (Online and Offline) CMS Mock
Data Challenge, with all Tier2 Centers, and
partly completed Tier1 Center
Build Production-quality Grid System
FY 2005 (Q1 - Q2)
Final Production CMS (Online and Offline)
Shakedown
Full distributed system software and
instrumentation
Using full capabilities of the Tier2 and Tier1
Centers

29
Summary

The HENP/LHC data handling problem
Multi-Petabyte scale, binary pre-filtered data,
resources distributed worldwide
Has no analog now, but will be increasingly
prevalent in research, and industry by 2005.
Development of a robust PB-scale networked data
access and analysis system is mission-critical
An effective partnership exists, HENP-wide,
through many RD projects -
RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR,
ALDAP, PPDG, ...
An aggressive RD program is required to develop
Resilient self-aware systems, for data access,
processing and analysis across a hierarchy of
networks
Solutions that could be widely applicable to
data problems in other scientific fields and
industry, by LHC startup
Focus on Data Grids for Next Generation Physics

30
LHC Data Models 1994-2000

HEP data models are complex!
Rich hierarchy of hundreds of complex data
types (classes)
Many relations between them
Different access patterns (Multiple Viewpoints)
OO technology
OO applications deal with networks of objects
(and containers)
Pointers (or references) are used to describe
relations
Existing solutions do not scale
Solution suggested by RD45 ODBMS coupled to a
Mass Storage System
Construction of Compact Datasets for
AnalysisRapid Access/Navigation/Transport

31
Content Delivery Networks (CDN)

Web-Based Server-Farm Networks Circa 2000Dynamic
(Grid-Like) Content Delivery Engines
Akamai, Adero, Sandpiper
1200 ? Thousands of Network-Resident Servers
25 ? 60 ISP Networks
25 ? 30 Countries
40 Corporate Customers
25 B Capitalization
Resource Discovery
Build Weathermap of Server Network (State
Tracking)
Query Estimation Matchmaking/Optimization
Request rerouting
Virtual IP Addressing One address per
server-farm
Mirroring, Caching
(1200) Autonomous-Agent Implementation

32
Strawman Tier 2 Evolution

2000 2005
Linux Farm 1,200 SI95 20,000 SI95
Disks on CPUs 4 TB 50 TB
RAID Array 1 TB 30 TB
Tape Library 1-2 TB 50-100 TB
LAN Speed 0.1 - 1 Gbps 10-100 Gbps
WAN Speed 155 - 622 Mbps 2.5 - 10 Gbps
Collaborative MPEG2 VGA Realtime
HDTVInfrastructure (1.5 - 3 Mbps) (10 -
20 Mbps)
Reflects lower Tier 2 component costs due to
less demanding usage. Some of the CPU will be
used for simulation.

33
USCMS SC Spending profile
2006 is a model year for the operations phase of
CMS
34
GriPhyN Cost

System support 8.0 M
RD 15.0 M
Software 2.0 M
Tier 2 networking 10.0 M
Tier 2 hardware 50.0 M
Total 85.0 M

35
Grid Hierarchy ConceptBroader Advantages

Partitioning of users into proximate
communitiesinto for support, troubleshooting,
mentoring
Partitioning of facility tasks, to manage and
focus resources
Greater flexibility to pursue different physics
interests, priorities, and resource allocation
strategies by region
Lower tiers of the hierarchy ? More local control

36
Storage Request Brokers (SRB)

Name Transparency Access to data by attributes
stored in an RDBMS (MCAT).
Location Transparency Logical collections (by
attributes) spanning multiple physical resources.
Combined Location and Name Transparency
meansthat datasets can be replicated across
multiple caches and data archives (PPDG).
Data Management Protocol Transparency SRB with
custom-built drivers in front of each storage
system
User does not need to know how the data is
accessedSRB deals with local file system
managers
SRBs (agents) authenticate themselves and users,
using Grid Security Infrastructure (GSI)

37
Role of Simulationfor Distributed Systems

Simulations are widely recognized and used as
essential tools for the design, performance
evaluation and optimisation of complex
distributed systems
From battlefields to agriculture from the
factory floor to telecommunications systems
Discrete event simulations with an appropriate
and high level of abstraction
Just beginning to be part of the HEP culture
Some experience in trigger, DAQ and tightly
coupledcomputing systems CERN CS2 models
(Event-oriented)
MONARC (Process-Oriented Java 2 Threads Class
Lib)
These simulations are very different from HEP
Monte Carlos
Time intervals and interrupts are the
essentials
Simulation is a vital part of the study of site
architectures, network behavior, data
access/processing/delivery strategies,
for HENP Grid Design and Optimization