The CMS Integration Grid Testbed - PowerPoint PPT Presentation

About This Presentation
Title:

The CMS Integration Grid Testbed

Description:

Total: 240 0.8 equiv. RH6 CPU. 152 2.4 GHz RH7 CPU. 17-January-2003 ... http://www.uscms.org/scpages/subsystems/DPE/index.html. http://computing.fnal.gov/cms ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 17
Provided by: Gregory216
Learn more at: https://uscms.org
Category:

less

Transcript and Presenter's Notes

Title: The CMS Integration Grid Testbed


1
The CMS Integration Grid Testbed
  • Greg Graham
  • Fermilab CD/CMS
  • 17-January-2003

2
The Integration Grid Testbed
  • Controlled environment on which to test the DPE
    in preparation for release
  • Currently, IGT uses USCMS Tier-1/Tier-2
    designated resources.
  • Soon (end of this month), most of the IGT
    resources will be turned over to a production
    grid.
  • IGT will retain a small number of resources
    deemed necessary to do integration testing
  • VO management should be flexible enough to allow
    PG to loan resources to IGT when needed for
    scalability tests
  • In the meantime, IGT has been commissioned with
    real production assignments
  • Testing Grid operations, troubleshooting
    procedures, and scalability issues

3
The Current IGT - Hardware
DGT Sites IGT Sites
CERN LCG Participates with 72 2.4 GHz
CPU at RH7
Fermilab 40 dual 750 MHz nodes 2 servers,
RH6 Florida 40 dual 1 GHz nodes 1 server,
RH6 UCSD 20 dual 800 MHz nodes 1 server, RH6
New 20 dual 2.4 GHz nodes 1 server,
RH7 Caltech 20 dual 800 MHz nodes 1 server,
RH6 New 20 dual 2.4 GHz nodes 1 server,
RH7 UW Madison Not a protoype Tier-2 center,
support
Total 240 0.8 equiv. RH6 CPU 152 2.4 GHz RH7 CPU
4
Grid Middleware in the DPE
  • Based on the Virtual Data Toolkit 1.1.3
  • VDT Client
  • Globus Toolkit 2.0
  • Condor-G 6.4.3
  • VDT Server
  • Globus Toolkit 2.0
  • mkgridmap
  • Condor 6.4.3
  • ftsh
  • GDMP 3.0.7
  • Virtual Organisation Management
  • LDAP Server deployed at Fermilab
  • Contains the DNs for all US-CMS Grid Users
  • GroupMAN (from PPDG and adapted from EDG) used to
    manage the VO
  • Investigating/evaluting the use of VOMS from the
    EDG
  • Use D.O.E. Science Grid certificates
  • Accept EDG and Globus certificates

5
DPE 1.0 Architecture - Layer View
  • Value added at each layer
  • Job Creation Chains many applications together
    in a tree-like structure. MCRunjob keeps track
    of functional dependencies among processing
    nodes.
  • DAG creation Wraps applications in generic DAGs
    for co-scheduling. MOP contains the structure of
    the generic DAGs.
  • DAGMAN/Condor Scheduling layer. On IGT,
    scheduling is still a human decision Dzero
    version has scheduling.
  • Globus and Job Manager Grid Interface with VO
    services

6
DPE 1.0 Architecture - Layer View
  • Monitoring not shown
  • Monitoring information is scattered and unwieldy
  • Health monitoring is more or less under control
  • Application monitoring can be provided by BOSS
  • Configuration monitoring is a new concept
  • Monitoring information is not used at any level
    yet.
  • Local Ganglia, MDS, SNMP, Hawkeye
  • Grid MonaLisa, MDS, Hawkeye
  • Scheduling not shown
  • A roadmap for demonstrating where scheduling
    decisions can be made
  • resource broker vs. scheduler

7
DPE Architecture- Component View
  • MCRunJob uses Configurators to manage
  • metadata associated with each production step in
    a complex tree of multi-step processing
  • metadata associated with different runtime
    environments
  • functional depencencies
  • Also in use at DZero
  • mop_submitter defines generic DAGs
  • wraps jobs into DAGs
  • submits to DAGMAN for execution
  • Condor-G
  • runs DAG nodes as Globus jobs
  • (in lieu of Condor backend)
  • Results are returned to submit site using GridFTP
    protocol
  • Could be returned anywhere in practice
  • Monitoring (IGT) information is returned using
  • Ganglia to MonaLisa interfaces
  • MDS to MonaLisa interfaces
  • Not used in any automatic system

VDT Server 1
Condor
Globus
VDT Client
MCRunJob
DAGMan/ Condor-G
GridFTP
Linker
ScriptGen
Config
Globus
mop-submitter
Master
Req.
Self Desc.
GridFTP
Globus
VDT Server N
Condor
Globus
GridFTP
8
MCRunJob
  • MCRunJob was developed at Dzero to assist in
    managing large Monte Carlo productions.
  • It has also been used at the tail end of Spring02
    Monte Carlo production in CMS, and was used on
    the IGT production to chain different processing
    steps together into a complex workflow
    description.
  • It will be used in future CMS productions.
  • MCRunJob has a modular architecture which is
    metadata oriented.
  • Implemented in Python
  • Receives input from many different sources
  • Targets many different processing steps
  • Targets many different runtime environments

9
MCRunJob Architecture in Brief
  • Metadata containers are called Configurators
  • Configurators can communicate with each other in
    structured ways
  • namely, when a dependency relationship is
    declared
  • Configurators can explicitly declare metadata
    elements to depend on other elements
  • All external entities are represented in MCRunJob
    as Configurators (ie- as metadata collections)
  • The RefDB at CERN, SAM, An application in a
    processing chain
  • Scripts are generated by registered objects with
    the ScriptGen interface (not shown).
  • You guess it, configurators!

10
MCRunJob Architecture in Brief
  • Special Configurators called ScriptGenerators
    manage building of concrete executable scripts to
    implement the workplan.
  • Currently, targets Dzero production, Impala (CMS
    production), and Virtual Data Language (Chimera)
  • Incidentally, this means that MCRunJob can
    translate among these environments.
  • Users and machines communicate with MCRunJob
    through a macro language
  • In CMS, the macro language has been implemented
    in registered external functions on top of the
    Configurator API
  • Thus the macro language itself is extensible to
    meet the needs of different experiments

11
IGT Production Results
  • The IGT progress was remarkably consistent.
  • Compare to Spring 2002 official production
  • However, did not do production with pileup.
  • Two flatliners
  • SC2002 conference
  • infamous // bug
  • Holidays
  • Eid - the end of Ramadan (Anzar Afaq was the man
    behind the curtain.)
  • Word just in from Globusworld
  • The IGT has garnered some attention in plenary
    sessions there!

12
IGT Production Results
  • Efficiency Estimates
  • Single events were about 430 sec on 750 MHz
    processors.
  • Theoretical IGT max throughput was therefore 45K
    events per day
  • adjusting all processors for 750 MHz
  • Efficiency was calculated as throughput
    normalized to the maximum.
  • Manpower estimates 1 FTE peaking at 2.5 during
    troubleshooting
  • This is above the normal admin support, did not
    have a helpdesk, etc.
  • Data gathered by informal survey.

13
A Sampler of IGT Lessons
  • The infamous (and amusing!) // problem
  • UNIX filenames allow // in pathnames. The CMS
    C application experts interpret // as a
    comment. Instant mayhem.
  • This bug was first identified incorrectly as a
    middleware problem.
  • Symptom Application dumps 230 MB binary data to
    stdout- caused GASS cache to fail.
  • A production expert identified the problem within
    5 minutes.
  • But only after days of middleware troubleshooting
  • Lessons
  • Problems come in all shapes and sizes!!!
  • We have learned yet again that theres always
    something new that can go wrong.
  • Need better application monitoring
  • Can be provided by BOSS
  • Need better error reporting and problem routing

14
A Sampler of IGT Lessons
  • Automatic scheduling should be done in an
    environment with reliable health monitoring
    information.
  • Poor-man script based queue length scheduling
    failed because of particular middleware failures
  • Job Managers often lose contact with jobs in
    failure mode
  • Condor has no independent way of verifying job
    status, assumes it is dead
  • Queue length based scheduling leads to broken
    farms!
  • New scalability problems always lurk behind the
    next modest increase in scale
  • 200 or so jobs was OK. When CERN IGT site joined,
    we discovered new GAHP server scale limitation
    of 250 jobs per mop master site.

15
Conclusions
  • The Integration Grid Testbed has achieved a
    measure of success using the USCMS designated
    DPE.
  • 1.5 M events were produced for CMS
  • Grid environment operated in continuous fashion
    for over 2 months
  • Operational experiences of running the Grid
    documented
  • Functionality was limited
  • But it matched the available manpower well
  • Much can be addressed in MCRrunJob also
  • Success is recognized by demos at SC2002 and
    poster at Globusworld.

16
Acknowledgements
  • Many thanks to Peter Couvares (CONDOR), Anzar
    Afaq (PPDG), and Rick Cavanaugh (iVDGL/GriPhyN)
    for heroic efforts!
  • As usual, many more that I am not mentioning
    here...
  • For further reference
  • http//www.uscms.org/scpages/subsystems/DPE/index.
    html
  • http//computing.fnal.gov/cms/Monitor/cms_producti
    on.html
  • http//home.fnal.gov/ggraham/MCRunJob_Presentatio
    ns/
Write a Comment
User Comments (0)
About PowerShow.com