The CMS Integration Grid Testbed and Distributed Processing Environment - PowerPoint PPT Presentation

About This Presentation
Title:

The CMS Integration Grid Testbed and Distributed Processing Environment

Description:

Total: 240 0.8 equiv. RH6 CPU. 152 2.4 GHz RH7 CPU. 16-January-2003 ... http://www.uscms.org/scpages/subsystems/DPE/index.html. http://computing.fnal.gov/cms ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 22
Provided by: Gregory216
Learn more at: https://uscms.org
Category:

less

Transcript and Presenter's Notes

Title: The CMS Integration Grid Testbed and Distributed Processing Environment


1
The CMS Integration Grid Testbed and Distributed
Processing Environment
  • Greg Graham
  • Fermilab CD/CMS
  • 16-Jan-2003

2
Goals
  • CMS must have a working worldwide distributed
    system enabling effective collaboration among US
    based physicists and their colleagues worldwide.
  • Issues of scale
  • Thousands of people
  • Petabytes of distributed data
  • Increasing complexity
  • Tools must handle the scale in an integrated
    fashion
  • The Grid has shown promise as a general framework
    in pursuit of this goal.

3
Some Significant Questions
  • How does CMS move Grid Technology effectively
    from the drawing board to real production
    services?
  • How does CMS select from among many possible
    emerging technologies?
  • How does CMS SC ramp up from the current level
    of effort to that required for a production grid
  • in time for DC04?
  • In time for real data taking?
  • How can USCMS make an effective contribution?

4
Strategic Focus
  • Maintain production quality services with maximum
    possible flexibility
  • Because new technology is coming on line all the
    time
  • Complement the expected LCG services with an
    effective regime of Grid prototyping.
  • Cooperate with the Trillium groups and directly
    with middleware providers.
  • USCMS has a very successful and strong
    relationship with the Condor Team.
  • We need to stay in close contact with the LCG
  • We aim to provide what the LCG will provide
    before they announce what they will provide.

5
What is Needed
  • A focussed CMS-oriented RD program is needed (in
    addition to external Grid research projects)
  • Prototyping a rolling prototype with emphasis
    on HA
  • Integration of Grid Tools with existing CMS
    environments
  • Starting with Monte Carlo production
  • Gaining experience with Grid middleware
  • A middleware support plan with required level of
    effort
  • Training and Documentation
  • A guiding management plan and effective WBS
    structure
  • The structure should contain mechanisms for
    change
  • The plan should be comprehensive

6
How the IGT Will Help
  • The Integration Grid Testbed (IGT) complements
    the existing Grids in USCMS
  • DGT Development Grid Testbed (The Initial
    State)
  • Speculative development
  • New tools, APIs, software layers, etc.
  • PG Production Grid (The Final State)
  • No middleware development
  • Production quality services
  • The IGT is a Transitional State where new
    technologies are integrated in existing
    environments
  • We expect to integrate Trillium/LCG provided
    software here
  • Industry recognized cycle development-integration
    -release

7
The Current IGT - Hardware
DGT Sites IGT Sites
CERN LCG Participates with 72 2.4 GHz
CPU at RH7
Fermilab 40 dual 750 MHz nodes 2 servers,
RH6 Florida 40 dual 1 GHz nodes 1 server,
RH6 UCSD 20 dual 800 MHz nodes 1 server, RH6
New 20 dual 2.4 GHz nodes 1 server,
RH7 Caltech 20 dual 800 MHz nodes 1 server,
RH6 New 20 dual 2.4 GHz nodes 1 server,
RH7 UW Madison Not a protoype Tier-2 center,
support
Total 240 0.8 equiv. RH6 CPU 152 2.4 GHz RH7 CPU
8
How the DPE Will Help
  • The Distibuted Processing Environment (DPE)
    comprises software that implements the rolling
    prototype
  • DPE is a container for software that is developed
    externally (ie- we have no developers)
  • DPE is a structure within which we do integration
    testing
  • WBS Structure of DPE is comprehensive
  • Effort is reported from outside where applicable
  • Helps throw focus on areas where further effort
    is needed
  • Rolling Prototype
  • The DPE prototype must never be seriously
    broken.
  • Maximum flexibility to schedule rapid deployment
    of some Grid tools
  • Provides a continual baseline to limit exposure
    to missing Grid tools

9
Major Areas of the DPE
  • 1.3.1 Self Evaluations
  • 1.3.2 Evaluations of External Software
  • This is where GRID Software can be explicitly
    tracked
  • 1.3.3 Integration Rolling Prototype
  • 1.3.4 Support and Transitioning
  • 1.3.5 Milestones
  • 1.3.6 Tier-0/Tier-1/Tier-2 Integration
  • The next few slides highlight recent progress in
    DPE.
  • There is not time to show them all...

10
DPE Progress
  • 1.3.1 Evaluations of Current Practice
  • 1.3.1.1 Production Processing Tools Review
    (BiAnnual)
  • Preliminary draft Nov. 2002
  • 1.3.1.2 Analysis Tools Review (BiAnnual)
  • First will take place next summer.
  • 1.3.1.3 Domain Analysis (Semi-Annual)
  • First will take place in Spring 2003.
  • 1.3.1.4 CMS Software Tutorial (Annual)
  • UCSD Tutorial Spring 2002
  • Next will coincide with LHC Workshop at FNAL this
    Spring.

11
DPE Progress
  • 1.3.2 Evaluations of External Software
    Developments
  • 1.3.2.1 Grid Integration Task (Annual)
  • Provided by CCS (C. Grandi) - First report out
    Jan 2002
  • 1.3.2.2 Testbed Deployments of Tools and Systems
    (Ongoing)
  • Testbed deployment of Virtual Data Toolkit (VDT)
    in preparation for Integration Grid Testbed and
    Production Grid.
  • 1.3.2.3 LHC Computing Grid (Ongoing)
  • Liason established- LCG participates in USCMS
    led Integration Grid Testbed (IGT).
  • 1.3.2.4 CHIMERA (Ongoing)
  • Chimera v1.0 deployed and tested successfully on
    DGT.
  • CMS MCRunJob production tool successfully
    integrated with Virtual Data Language (VDL).

12
DPE Prototype Progress
  • 1.3.3.1 Overall Architecture (SemiAnnual)
  • DPE 1.0 Defined (To be released at end of
    January)
  • 1.3.3.2 Distributed Process Management/Batch Job
    Scheduling
  • Defined Micro/Mini/Group DAG structure of jobs.
  • 1.3.3.4 Virtual Organization
  • New VO management plan crafted Dec 2002
  • what is needed to deploy VOMS and EDG Gatekeeper
    in FNAL security environment.
  • 1.3.3.5 Monitoring
  • Interface of MDS and MonaLisa
  • Interface of Ganglia and MonaLisa
  • Configuration and Application Monitoring Tools
    Needed
  • Application monitoring can be provided by BOSS

13
DPE Prototype Progress
  • 1.3.3.6 Dataset Tracking
  • 1.3.3.6.1 Metadata Definition
  • Started Nov. 2002
  • 1.3.3.6.2 Replica Catalogues
  • Started investigations into SRB, Nov 2002
  • Plan PACMAN deployment of SRB in Feb. 2003
  • 1.3.3.7 Data Movement
  • 1.3.3.7.1 Storage System Interfaces
  • Collaboration with Fermilab/CCF dCache with
    GridFTP protocol
  • part of work towards a more general MSS to MSS
    API
  • 1.3.3.7.3 Performance Optimization
  • Window sizes adjusted to optimal value in
    globus-url-copy in context of investigations into
    many tools
  • 1.3.3.8 Resource Brokering
  • Discussions underway with Condor Team

14
DPE Prototype Progress
  • 1.3.3.9 Production processing Support Tools
  • 1.3.3.9.1 Request Handlers/Trackers
  • Provided by CCS The RefDB.
  • 1.3.3.9.2 Job Builders
  • Maintained Impala bash script based tools.
  • Released MCRunJob Python based tools which
    replace Impala.
  • 1.3.3.9.3 User Interfaces
  • Version 0.9 of MCRunjob GUI
  • 1.3.3.10 Analysis Tools
  • Provided by CAIGEE
  • 1.3.3.11 Internal Milestones
  • SC2002 Milestone met, Nov 2002 soup to nuts
    production/analysis in a grid environment
  • Production Grid Milestone (DPE 1.0) release at
    end of January

15
DPE Prototype Progress
  • 1.3.3.13 Software Quality Assurance
  • 1.3.3.13.2 Implementation of Software Quality
    Assurance Tools
  • Started regular test procedures before release of
    production tools.
  • 1.3.3.13.3 Technical Writing Assistance
  • 0.1 FTE assigned in Jan 2003
  • 1.3.3.15 Prototype Evaluations (Ongoing)
  • Problems section written in IGT-long.ps document.

16
DPE Progress
  • 1.3.4 Software Support and Transitioning
  • 1.3.4.1 Software Environment
  • Provided by VDT this year
  • 1.3.4.2 Release Management (SemiAnnual)
  • To begin in Jan 2003
  • 1.3.4.3 Release Notes (SemiAnnual)
  • To begin in Jan 2003
  • 1.3.4.4 Deployment Support
  • Provided by VDT and Tier-1 site support coming
    soon
  • 1.3.5 External Milestones
  • LCG 24x7 Production Grid
  • DC04 Pre-Challenge Production
  • DC04 itself

17
DPE Planning
  • In addition to the WBS dictionary, there is an
    ambitious meeting schedule
  • Weekly short term focus meetings
  • Monthly Milestone Meetings
  • Semi-Annual WBS and project review, Release
    Status
  • Bi-Annual
  • Production Tools Review
  • Analysis Tools Review
  • More detailed plans are being drawn up for
  • Providing production grid services
  • VOMS based (and FNAL compliant) VO structure
  • EDG is producing this probably(?) to be adopted
    by LCG

18
SC2002 Highlights
  • SC2002 soup to nuts demonstration proposed in
    April 2002.
  • Production Phase Generate Monte Carlo with
    MCRunjob on the Grid Analysis Phase Analysis of
    distributed ROOT files using CLARENS, live on the
    show floor

19
DPE in Practice on the IGT
  • The story begins before the IGT
  • USMOP site was commissioned in Spring 2002
  • Middleware was found lacking
  • The IGT was commissioned in October 2002
  • September engineering run with 50K events.
  • The middleware was declared to be DPE 0.99
  • 1.5 M official CMS events were produced
  • 1 FTE sustained effort peaking at 2.5 (PPDG)
  • But functionality was light
  • Documentation
  • Lots already written (and being integrated)
  • Papers coming out

20
Conclusions
  • IGT is a necessary addition to the DGT and
    Production Grid services
  • A CMS-oriented layer with focus on preparing
    releases for production
  • coming from the LCG or in addition to the LCG!
  • Relies heavily on developers and expertise
    outside of USCMS
  • This is a risk
  • Would have liked to explore configuration
    monitoring, scheduling, and production tools
    development in more detail.
  • The DPE is a necessary structure to allow us to
    track what is installed on the IGT and on the PG.
  • An aid to planning for USCMS, not in competition
    with LCG
  • Though would like to see more cooperation with
    LCG
  • It may be useful as a vehicle to provide support
    for Grid tools

21
Acknowledgements
  • Many thanks to Condor Team and to the Development
    Grid Testbed
  • Especially Rick Cavanaugh, Anzar Afaq
  • For further reference
  • http//www.uscms.org/scpages/subsystems/DPE/index.
    html
  • http//computing.fnal.gov/cms/Monitor/cms_producti
    on.html
Write a Comment
User Comments (0)
About PowerShow.com