The CMS Integration Grid Testbed - PowerPoint PPT Presentation

About This Presentation

Title:

The CMS Integration Grid Testbed

Description:

Total: 240 0.8 equiv. RH6 CPU. 152 2.4 GHz RH7 CPU. 17-January-2003 ... http://www.uscms.org/scpages/subsystems/DPE/index.html. http://computing.fnal.gov/cms ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 17

Provided by: Gregory216

Learn more at: https://uscms.org

Category:

more less

Transcript and Presenter's Notes

Title: The CMS Integration Grid Testbed

1
The CMS Integration Grid Testbed

Greg Graham
Fermilab CD/CMS
17-January-2003

2
The Integration Grid Testbed

Controlled environment on which to test the DPE
in preparation for release
Currently, IGT uses USCMS Tier-1/Tier-2
designated resources.
Soon (end of this month), most of the IGT
resources will be turned over to a production
grid.
IGT will retain a small number of resources
deemed necessary to do integration testing
VO management should be flexible enough to allow
PG to loan resources to IGT when needed for
scalability tests
In the meantime, IGT has been commissioned with
real production assignments
Testing Grid operations, troubleshooting
procedures, and scalability issues

3
The Current IGT - Hardware
DGT Sites IGT Sites
CERN LCG Participates with 72 2.4 GHz
CPU at RH7
Fermilab 40 dual 750 MHz nodes 2 servers,
RH6 Florida 40 dual 1 GHz nodes 1 server,
RH6 UCSD 20 dual 800 MHz nodes 1 server, RH6
New 20 dual 2.4 GHz nodes 1 server,
RH7 Caltech 20 dual 800 MHz nodes 1 server,
RH6 New 20 dual 2.4 GHz nodes 1 server,
RH7 UW Madison Not a protoype Tier-2 center,
support
Total 240 0.8 equiv. RH6 CPU 152 2.4 GHz RH7 CPU
4
Grid Middleware in the DPE

Based on the Virtual Data Toolkit 1.1.3
VDT Client
Globus Toolkit 2.0
Condor-G 6.4.3
VDT Server
Globus Toolkit 2.0
mkgridmap
Condor 6.4.3
ftsh
GDMP 3.0.7
Virtual Organisation Management
LDAP Server deployed at Fermilab
Contains the DNs for all US-CMS Grid Users
GroupMAN (from PPDG and adapted from EDG) used to
manage the VO
Investigating/evaluting the use of VOMS from the
EDG
Use D.O.E. Science Grid certificates
Accept EDG and Globus certificates

5
DPE 1.0 Architecture - Layer View

Value added at each layer
Job Creation Chains many applications together
in a tree-like structure. MCRunjob keeps track
of functional dependencies among processing
nodes.
DAG creation Wraps applications in generic DAGs
for co-scheduling. MOP contains the structure of
the generic DAGs.
DAGMAN/Condor Scheduling layer. On IGT,
scheduling is still a human decision Dzero
version has scheduling.
Globus and Job Manager Grid Interface with VO
services

6
DPE 1.0 Architecture - Layer View

Monitoring not shown
Monitoring information is scattered and unwieldy
Health monitoring is more or less under control
Application monitoring can be provided by BOSS
Configuration monitoring is a new concept
Monitoring information is not used at any level
yet.
Local Ganglia, MDS, SNMP, Hawkeye
Grid MonaLisa, MDS, Hawkeye
Scheduling not shown
A roadmap for demonstrating where scheduling
decisions can be made
resource broker vs. scheduler

7
DPE Architecture- Component View

MCRunJob uses Configurators to manage
metadata associated with each production step in
a complex tree of multi-step processing
metadata associated with different runtime
environments
functional depencencies
Also in use at DZero
mop_submitter defines generic DAGs
wraps jobs into DAGs
submits to DAGMAN for execution
Condor-G
runs DAG nodes as Globus jobs
(in lieu of Condor backend)
Results are returned to submit site using GridFTP
protocol
Could be returned anywhere in practice
Monitoring (IGT) information is returned using
Ganglia to MonaLisa interfaces
MDS to MonaLisa interfaces
Not used in any automatic system

VDT Server 1
Condor
Globus
VDT Client
MCRunJob
DAGMan/ Condor-G
GridFTP
Linker
ScriptGen
Config
Globus
mop-submitter
Master
Req.
Self Desc.
GridFTP
Globus
VDT Server N
Condor
Globus
GridFTP
8
MCRunJob

MCRunJob was developed at Dzero to assist in
managing large Monte Carlo productions.
It has also been used at the tail end of Spring02
Monte Carlo production in CMS, and was used on
the IGT production to chain different processing
steps together into a complex workflow
description.
It will be used in future CMS productions.
MCRunJob has a modular architecture which is
metadata oriented.
Implemented in Python
Receives input from many different sources
Targets many different processing steps
Targets many different runtime environments

9
MCRunJob Architecture in Brief

Metadata containers are called Configurators
Configurators can communicate with each other in
structured ways
namely, when a dependency relationship is
declared
Configurators can explicitly declare metadata
elements to depend on other elements
All external entities are represented in MCRunJob
as Configurators (ie- as metadata collections)
The RefDB at CERN, SAM, An application in a
processing chain
Scripts are generated by registered objects with
the ScriptGen interface (not shown).
You guess it, configurators!

10
MCRunJob Architecture in Brief

Special Configurators called ScriptGenerators
manage building of concrete executable scripts to
implement the workplan.
Currently, targets Dzero production, Impala (CMS
production), and Virtual Data Language (Chimera)
Incidentally, this means that MCRunJob can
translate among these environments.
Users and machines communicate with MCRunJob
through a macro language
In CMS, the macro language has been implemented
in registered external functions on top of the
Configurator API
Thus the macro language itself is extensible to
meet the needs of different experiments

11
IGT Production Results

The IGT progress was remarkably consistent.
Compare to Spring 2002 official production
However, did not do production with pileup.
Two flatliners
SC2002 conference
infamous // bug
Holidays
Eid - the end of Ramadan (Anzar Afaq was the man
behind the curtain.)
Word just in from Globusworld
The IGT has garnered some attention in plenary
sessions there!

12
IGT Production Results

Efficiency Estimates
Single events were about 430 sec on 750 MHz
processors.
Theoretical IGT max throughput was therefore 45K
events per day
adjusting all processors for 750 MHz
Efficiency was calculated as throughput
normalized to the maximum.
Manpower estimates 1 FTE peaking at 2.5 during
troubleshooting
This is above the normal admin support, did not
have a helpdesk, etc.
Data gathered by informal survey.

13
A Sampler of IGT Lessons

The infamous (and amusing!) // problem
UNIX filenames allow // in pathnames. The CMS
C application experts interpret // as a
comment. Instant mayhem.
This bug was first identified incorrectly as a
middleware problem.
Symptom Application dumps 230 MB binary data to
stdout- caused GASS cache to fail.
A production expert identified the problem within
5 minutes.
But only after days of middleware troubleshooting
Lessons
Problems come in all shapes and sizes!!!
We have learned yet again that theres always
something new that can go wrong.
Need better application monitoring
Can be provided by BOSS
Need better error reporting and problem routing

14
A Sampler of IGT Lessons

Automatic scheduling should be done in an
environment with reliable health monitoring
information.
Poor-man script based queue length scheduling
failed because of particular middleware failures
Job Managers often lose contact with jobs in
failure mode
Condor has no independent way of verifying job
status, assumes it is dead
Queue length based scheduling leads to broken
farms!
New scalability problems always lurk behind the
next modest increase in scale
200 or so jobs was OK. When CERN IGT site joined,
we discovered new GAHP server scale limitation
of 250 jobs per mop master site.

15
Conclusions

The Integration Grid Testbed has achieved a
measure of success using the USCMS designated
DPE.
1.5 M events were produced for CMS
Grid environment operated in continuous fashion
for over 2 months
Operational experiences of running the Grid
documented
Functionality was limited
But it matched the available manpower well
Much can be addressed in MCRrunJob also
Success is recognized by demos at SC2002 and
poster at Globusworld.

16
Acknowledgements

Many thanks to Peter Couvares (CONDOR), Anzar
Afaq (PPDG), and Rick Cavanaugh (iVDGL/GriPhyN)
for heroic efforts!
As usual, many more that I am not mentioning
here...
For further reference
http//www.uscms.org/scpages/subsystems/DPE/index.
html
http//computing.fnal.gov/cms/Monitor/cms_producti
on.html
http//home.fnal.gov/ggraham/MCRunJob_Presentatio
ns/