Title: Rick Cavanaugh
1The Emerging Grid Infrastructurewill it
change the way we do physics?
- Rick Cavanaugh
- GriPhyN/CMS,
- University of Florida,
- U.S.A.
- 1. Introduction
- 2. Existing grid technology
- 3. Looking into the future
- UniversitJ catholique de Louvain-La-Neuve
- 9 December, 2002
2CMS Data Grid Hierarchy
1 TIPS 25,000 SpecInt95 PC (today) 10-20
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm 20 TIPS
Bunch crossing per 25 nsecs. 100 triggers per
second Event is 1 MByte in size
100 MBytes/sec
CERN Computer Center
Tier 0
622 Mbits/sec
Fermilab 4 TIPS
France Regional Center
Germany Regional Center
Italy Regional Center
Tier 1
2.4 Gbits/sec
Tier 2
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
622 Mbits/sec
Tier 3
Institute 0.25 TIPS
Physicists work on analysis channels. Each
institute has 10 physicists workingon one or
more channels Data for these channels is cached
by the institute server
Institute 0.25 TIPS
Institute 0.25 TIPS
Physics data cache
1-10 Gbits/sec
Tier 4
Workstations
Picture Taken from Harvey Newman
3CMS Data Grid Hierarchy
1 TIPS 25,000 SpecInt95 PC (today) 10-20
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm 20 TIPS
Bunch crossing per 25 nsecs. 100 triggers per
second Event is 1 MByte in size
100 MBytes/sec
Florida and Louvain are working here!
CERN Computer Center
Tier 0
622 Mbits/sec
Fermilab 4 TIPS
France Regional Center
Germany Regional Center
Italy Regional Center
Tier 1
2.4 Gbits/sec
Tier 2
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
622 Mbits/sec
Tier 3
Institute 0.25 TIPS
Physicists work on analysis channels. Each
institute has 10 physicists workingon one or
more channels Data for these channels is cached
by the institute server
Institute 0.25 TIPS
Institute 0.25 TIPS
Physics data cache
1-10 Gbits/sec
Tier 4
Workstations
Picture Taken from Harvey Newman
4CMS Data Grid Hierarchy
1 TIPS 25,000 SpecInt95 PC (today) 10-20
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm 20 TIPS
Bunch crossing per 25 nsecs. 100 triggers per
second Event is 1 MByte in size
100 MBytes/sec
The Grid begins here!
CERN Computer Center
Tier 0
622 Mbits/sec
Fermilab 4 TIPS
France Regional Center
Germany Regional Center
Italy Regional Center
Tier 1
2.4 Gbits/sec
Tier 2
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
622 Mbits/sec
Tier 3
Institute 0.25 TIPS
Physicists work on analysis channels. Each
institute has 10 physicists workingon one or
more channels Data for these channels is cached
by the institute server
Institute 0.25 TIPS
Institute 0.25 TIPS
Physics data cache
1-10 Gbits/sec
Tier 4
Workstations
Picture Taken from Harvey Newman
5CMS Data Grid Hierarchy
1 TIPS 25,000 SpecInt95 PC (today) 10-20
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm 20 TIPS
Bunch crossing per 25 nsecs. 100 triggers per
second Event is 1 MByte in size
100 MBytes/sec
In this talk, I will focus on U.S. activities...
CERN Computer Center
Tier 0
622 Mbits/sec
Fermilab 4 TIPS
France Regional Center
Germany Regional Center
Italy Regional Center
Tier 1
2.4 Gbits/sec
Tier 2
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
Tier2 Center 1 TIPS
622 Mbits/sec
Tier 3
Institute 0.25 TIPS
Physicists work on analysis channels. Each
institute has 10 physicists workingon one or
more channels Data for these channels is cached
by the institute server
Institute 0.25 TIPS
Institute 0.25 TIPS
Physics data cache
1-10 Gbits/sec
Tier 4
Workstations
Picture Taken from Harvey Newman
6Grid Middleware
Application
- Higher Level Grid Middleware (mainly from the
E.U.) - Resource Broker Load balancing
- Replica Manager Book-keeping
- GDMP Data movement
- Chimera Workflow management
- Lower Level Grid Middleware (from the U.S.)
- Globus grid services/security
- Condor-G grid queue system
- Condor local queue system
- GridFTP data transfer
- Provides the GLUE for the grid
- Still rather rudimentary
Middleware
Fabric
7Packaging the Lower-level MiddlewareThe Virtual
Data Toolkit
- Different Flavors tested/packaged/configured as
a whole - VDT Server (Globus, Condor, GDMP/GridFTP)
- VDT Client (Globus, DAGMan/Condor-G)
- VDT Developer (Globus, Class-Ads)
- Continuously Evolving
- Now includes a Virtual Data Catalog (Chimera)
VDT Server 1
Condor
Globus
VDT Client
Virtual Data Catalogue
Abstract Planner
GridFTP
Globus
Concrete Planner
DAGMan/ Condor-G
Globus
VDT Server N
Condor
Replica Catalogue
GridFTP
8Globus Toolkit
- Grid infrastructure software
- Tools that simplify working across multiple
institutions - Authentication (Grid Security Infrastructure--GSI)
- Job management
- Grid Resource Allocation Manager (GRAM)
- Global Access to Secondary Storage (GASS)
- File transfer (GridFTP)
- Resource description / Information service
- Grid Resource Information Service (GRIS)
- Grid Information Index Service (GIIS)
9Grid Resource Allocation Manager (GRAM) Components
Client
Remote Site boundary
Grid Security Infrastructure
Local Resource Manager
Allocate create processes
Request
Job Manager
Create
Gatekeeper
Process
Monitor control
Process
Process
10Condor High Throughput Computing
- Specialised batch system for managing
compute-intensive jobs - optimise running over days or months (High
Throughput) rather than
seconds (High Performance) - mature software very robust and fault tolerant
Pre / Simulation Jobs / Post (UW Condor)
ooDigis at NCSA
ooHits at NCSA
Delay due to script error
- Provides
- queuing mechanism
- scheduling policy
- priority scheme and resource classification
- Users submit jobs to Condor
- job is put into queue
- job is run
- user is informed of the result
Picture Taken from Miron Livny
11Condor Matchmaking with ClassAds
- Analogous to a Market
- machines ("sellers") advertise attributes
- OS, RAM, CPU, current load, etc
- policies
- users ("buyers") specify job requirements
- OS, RAM, CPU, etc
- Condor matches the "buyers" with the "sellers"
12What Is Condor-G?
- Enhanced version of Condor that uses Globus to
manage Grid jobs - Condor
- Designed to run jobs within a single
administrative domain - Globus
- Designed to run jobs across many administrative
domains - Condor-G
- Combine the strengths of both
13How It Works
Grid-site running Globus
Laptop running Condor-G
Schedd
LSF
Picture Taken from Miron Livny
14How It Works
600 Grid jobs
Grid-site running Globus
Laptop running Condor-G
Schedd
LSF
Picture Taken from Miron Livny
15How It Works
Grid-site running Globus
Laptop running Condor-G
Schedd
LSF
GridManager
Picture Taken from Miron Livny
16How It Works
Grid-site running Globus
Laptop running Condor-G
JobManager
Schedd
LSF
GridManager
Picture Taken from Miron Livny
17How It Works
Grid-site running Globus
Laptop running Condor-G
JobManager
Schedd
LSF
GridManager
User Job
Picture Taken from Miron Livny
18DAGMan
- Directed Acyclic Graph Manager
- DAGMan allows you to specify the dependencies
between your Condor jobs, so it can manage them
automatically for you. - (e.g., Dont run job B until job A has
completed successfully.) - Designed to be Fault Tolerant
19What is a DAG?
- A DAG is the data structure used by DAGMan to
represent these dependencies. - Each job is a node in the DAG.
- Each node can have any number of parent or
children nodes as long as there are no loops! - We usually talk in units of "DAGs"
Picture Taken from Peter Couvares
20US-CMS Development and Integration Grid Testbeds
- DGT Testbed has been functional for 1 year
- VO managed by GroupMAN
- Grid credentials
- Based on U.S. Dept. Of Energy-
- Science Grid Certificate Authority
- Grid software VDT 1.1.3
- Globus 2.0
- Condor-G 6.4.3
- Condor 6.4.3
- ClassAds 0.9
- GDMP 3.0.7
- Chimera beta
- Objectivity 6.1
- CMS Grid Related Software
- MOP distributed CMS Monte carlO Production
- CLARENS - Distributed CMS Physics Analysis
- DAR Distribution After Release for CMS
applications (RH 6.2)
21Using MCRunJob and MOP in CMS Production on the
IGT
VDT Server 1
Condor
Globus
VDT Client
MCRunJob
DAGMan/ Condor-G
GridFTP
Linker
ScriptGen
Globus
Config
Master
mop-submitter
Req.
Self Desc.
GridFTP
Globus
VDT Server N
Condor
Globus
GridFTP
22Fresh IGT Results
- Assigned 1.5 million Monte Carlo Events from CMS
- Requested real assignments to ensure production
quality of the grid software - Started in early November
- Produced 750 000 Events so far
- Discovered/corrected many fundamental core-grid
software bugs (Condor and Globus) - huge success from this point of view alone
- Anticipate finishing full assignment by Christmas
23Monitoring and Information Services
- MonaLisa (Caltech)
- Currently deployed on the Test-bed
- Dynamic information/resource discovery mechanism
using agents - Implemented in
- Java / Jini with interfaces to SNMP
- WDSL / SOAP with UDDI
- Aim to incorporate into a Grid Control Room
Service for the Testbed
Pictures taken from Iosif Legrand
24General Observations
- Easy to say We have a Grid!
- ...more difficult to make it do
real work - a grid is like an "on-line system" for a physics
detector. - a grid is complex with many modes of failure
- often difficult to track down simple problems
- host certificates expired
- gatekeepers not synchronised
- sometimes difficult to fix the problem
- bugs still exist in the core-grid software itself
- need for multiple monitoring systems
(performance, match-making, debugging,
heart-beat, etc) - need for more transparent data access (Virtual
Data) - Nevertheless, these are typical of "growing
pains!"
25Lessons Learned (so far)
- Test Grid commissioning revealed a need for
- grid-wide debugging
- ability to log into a remote site and talk to the
System Manager over the phone proved vital... - remote logins telephone calls not a scalable
solution! - site configuration monitoring
- how are Globus, Condor, etc configured?
- on what port is a particular grid service
listening? - should be monitored by standard monitoring tools?
- programmers to write very robust code!
26The Virtual Data Concept
- Track the steps used to create each data product
- a data product is a file
- (or a cluster of objects in an OODBMS)
- (or or set of rows in an RDBMS)
- Track the input products upon which each product
depends - Be able to re-create the same data product at a
later time and different place (...similar to the
lazy OO paradigm...)
27The Virtual Data Concept
- Data Dependencies tracked via
- explicit declaration
- extraction from a job control language
- generation by higher level job creation
interfaces - creation by monitoring and logging job execution
facilities - Can be tracked before the job is executed
- link to information needed to generate a file
- or, after the job is executed
- record of how to re-generate the file
28Motivations
- Data track-ability and result audit-ability
- Universally sought by scientists
- Facilitates tool and data sharing and
collaboration - Data can be sent along with its recipe
- Workflow management
- A new, structured paradigm for organizing,
locating, specifying, and requesting data
products - Performance optimizations
- Ability to re-create data rather than move it
29Introducing CHIMERA The GriPhyN Virtual Data
Catalog
- Virtual Data Language
- textual
- XML
- Virtual Data Interpreter
- implemented in Java
- Virtual Data Catalog
- early implementation uses a PostGreSQL DB
- a MySQL version will appear in the future
- to be released independent of a DB
30Virtual Data Basic Notions
- Transformation
- Abstract description of how a program is invoked
- Similar to a "function declaration" in C/C
- Derivation
- Invocation of a transformation
- Similar to a "function call" in C/C
- Can be either past or future
- a record of how logical files were already
produced - a recipe for creating logical files sometime in
the future
31Virtual Data Language
file1
TR pythia( out a2, in a1 ) Â argument stdin
a1Â argument file a2 TR cmsim(
out a2, in a1 ) argument file
a1 argument file a2 DV
x1-gtpythia( a2_at_outfile2, a1_at_infile1) DV
x2-gtcmsim( a2_at_outfile3, a1_at_infile2)
x1
file2
x2
file3
Picture Taken from Mike Wilde
32Abstract and Concrete DAGs
- Abstract DAGs (Virtual Data DAG)
- Resource locations unspecified
- File names are logical
- Data destinations unspecified
- Concrete DAGs (stuff for DAGMan)
- Resource locations determined
- Physical file names specified
- Data delivered to and returned from physical
- locations
VDL
XML
VDC
XML
Abs. Plan
Logical
DAX
RC
C. Plan.
DAG
Physical
DAGMan
33A virtual space of simulated data is created
for future use by scientists...
mass 200
34A virtual space of simulated data is created
for future use by scientists...
mass 200
mass 200 event 8
mass 200 plot 1
35mass 200 decay bb
A virtual space of simulated data is created
for future use by scientists...
mass 200
mass 200 decay ZZ
mass 200 decay WW
mass 200 event 8
mass 200 plot 1
36mass 200 decay bb
A virtual space of simulated data is created
for future use by scientists...
mass 200
mass 200 decay ZZ
mass 200 decay WW
mass 200 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW plot 1
37mass 200 decay bb
A virtual space of simulated data is created
for future use by scientists...
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW plot 1
38mass 200 decay bb
A virtual space of simulated data is created
for future use by scientists...
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
39Search for WW decays of the Higgs Boson and
where only stable, final state particles are
recorded
mass 200 decay bb
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
40Search for WW decays of the Higgs Boson and
where only stable, final state particles are
recorded mass 200 decay WW stability
1
mass 200 decay bb
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
41mass 200 decay bb
Scientist discovers an interesting result and
wants to know how it was fully derived.
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
42mass 200 decay bb
Now the scientist wants to dig deeper...
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
43...The scientist adds a new derived data
branch...
mass 200 decay bb
mass 200
mass 200 decay ZZ
mass 200 decay WW stability 3
mass 200 decay WW stability 1 LowPt
20 HighPt 10000
mass 200 decay WW
...and continues to investigate !
mass 200 decay WW stability 1
mass 200 event 8
mass 200 decay WW stability 1 event 8
mass 200 decay WW event 8
mass 200 plot 1
mass 200 decay WW stability 1 plot 1
mass 200 decay WW plot 1
44Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
ODBMS
Analysis Scripts
Calib. DB
writeESD
writeAOD
writeTAG
45Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
ODBMS
Analysis Scripts
Online Teams
Calib. DB
writeESD
writeAOD
writeTAG
46Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
MC Production Team
ODBMS
Analysis Scripts
Online Teams
Calib. DB
writeESD
writeAOD
writeTAG
47Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
MC Production Team
ODBMS
(Re)processing Team
Analysis Scripts
Online Teams
Calib. DB
writeESD
writeAOD
writeTAG
48Generator
Formator
Simulator
Digitiser
writeESD
writeAOD
writeTAG
MC Production Team
ODBMS
(Re)processing Team
Physics Groups
Analysis Scripts
Online Teams
Calib. DB
writeESD
writeAOD
writeTAG
49A Collaborative Data-flow Development
Environment Complex Data Flow and and Data
Provenance in HEP
Plots, Tables, Fits
AOD
ESD
Raw
TAG
- Family History of a Data Analysis
- "Check-point" a Data Analysis
- Analysis Development Environment (like CVS)
- Audit a Data Analysis
Comparisons Plots, Tables, Fits
Real Data
Simulated Data
50The Value of Data Provenance
- Allows different teams to collaborate
semi-autonomously across many time zones - Allows individuals to discover other scientists
work and build from it - Allows Physics Groups to work in a modular
fashion - reuse previous results
- reuse previous code or the entire analysis chain
- Allows for a systematic way to conduct studies of
Systematic Uncertainties - Allows a Publication Review Board to audit the
integrity of a suspicious analysis result
51The Value of Virtual Data
- Provides full reproducibility (fault tolerance)
of one's results - tracks ALL dependencies between transformations
and their derived data products - something like a "Virtual Logbook"
- Provides transparency with respect to location
and existence. The user need not know - the data location
- how many data files are in a data set
- if the requested derived data exists
- Allows for optimal performance in planning.
Should the derived data be - staged-in from a remote site?
- re-created locally on demand?
52A Possible Look into the Future...
Picture Taken from Mike Wilde
53Summary Grid Production of CMS Simulated Data
- CMS production of simulated data
- O(10) sites
- O(103) CPUs
- 50 TB of data
- 10 production managers
- Goal is to double every yearwithout increasing
the number of production managers! - The US-CMS Integration Grid Testbed has now
demonstrated the same production efficiency as
normal non-grid production in the US, but with
1/5 the manpower during normal running - EDG having good success with a similar stress
test - More automation and fault tolerance will be
needed for a 5 DC04, however!
54SummaryThe Grid is beginning to come of age
- Many Technologies are now emerging as mature
products - Virtual Data Toolkit
- Globus Toolkit
- DAGMan / Condor-G
- EDG Middleware
- Resource Broker (not discussed here)
- Other, higher level services are beginning to
come online - Chimera Virtual Data System
- Grid Enabled Data Analyses
55Will the Grid Changethe Way we do Physics?
56Will the Grid Changethe Way we do Physics?
- NO! The grid should preserve and enable the way
we already do physics...
57Will the Grid Changethe Way we do Physics?
- NO! The grid should preserve and enable the way
we already do physics... - BUT, The Grid will change the way we
collaborate... - Travel will likely be reduced
- Meeting topics may be raised to higher levels of
physics and sources of systematic uncertainties,
rather than the lower levels of verifying mundane
computer details - More confidence/trust may be placed in our own
or, other peoples' scientific work - Improved collaboration will be a significant
intellectual contribution of the Grid !
58Will the Grid Changethe Way we do Physics?
- NO! The grid should preserve and enable the way
we already do physics... - BUT, The Grid will change the way we
collaborate... - Travel will likely be reduced
- Meeting topics may be raised to higher levels of
physics and sources of systematic uncertainties,
rather than the lower levels of verifying mundane
computer details - More confidence/trust may be placed in our own
or, other peoples' scientific work - Improved collaboration will be a significant
intellectual contribution of the Grid !
Just my humble opinion!