Summary of Category 3 HENP Computing Systems and Infrastructure

1 / 19
About This Presentation
Title:

Summary of Category 3 HENP Computing Systems and Infrastructure

Description:

Heard general talks about building and securing large multi ... Example from CDF: Central Analysis Facility is very well used. Future (very near future) plan is ... –

Number of Views:14
Avg rating:3.0/5.0
Slides: 20
Provided by: cms565
Category:

less

Transcript and Presenter's Notes

Title: Summary of Category 3 HENP Computing Systems and Infrastructure


1
Summary of Category 3HENP Computing Systems and
Infrastructure
  • Ian Fisk and Michael Ernst
  • CHEP 2003
  • March 28, 2003

2
Introduction
  • We tried to break the week into themes
  • We Discussed Fabrics and Architectures on Monday
  • Heard general talks about building and securing
    large multi-purpose facilities
  • As well as updates from a number of HEPN
    computing efforts
  • We discussed emerging hardware and software
    technology on Tuesday
  • Review of the most recent pasta report and update
    of commodity disk storage work
  • Software for flexible clusters MOSIX. Advanced
    storage and data serving CASTOR, ENSTOR, dCache,
    Data Farm and ROOT-IO
  • We discussed Grid and other services on Thursday
  • Grid Interfaces and Storage Management over the
    grid
  • Monitoring services
  • It was a full week with a lot to discuss.
    Special thanks to all those who presented.
  • There is no way to cover very much of what was
    presented in a thirty minute talk.

3
General Observations
  • Grid functionality is coming quickly
  • Basic underlying concepts of distributed,
    parasitic, and multi-purpose computing are
    already being deployed in running experiments
  • Early implementation of interfaces for grid
    services to fabrics
  • I would expect by the time the LHC experiments
    have real data that the tools and techniques will
    have been well broken-in by experiments running
    today
  • Shift to commodity equipment accelerated since
    the last CHEP
  • I would argue that the shift is nearly complete
  • At least two large computing centers admitted to
    having nothing in their work rooms but Linux
    systems and a few Suns to debug software
  • This has resulted in the development of tools to
    help handle this complicated component
    environment
  • With notable exceptions high energy computing
    does not work well together
  • The individual experiments often have subtly
    different requirements, which results in
    completely independent development efforts

4
Distributed Computing
  • Example from CDF Central Analysis Facility is
    very well used
  • Future (very near future) plan is
  • to deploy satellite analysis farms
  • to increase the computing
  • resources.

5
Distributed Computing
  • Peter Elmer presented how the Babar experiment
    has been able to take advantage of distributed
    computing resources for primary event
    reconstruction
  • By splitting their prompt
  • calibration and event
  • reconstruction, they now
  • take advantage of 5
  • reconstruction farms at
  • SLAC and 4 in Padova

6
Parasitic Computing
  • Bill Lee presented the CLuED0 work of the D0
    experiment
  • CLuED0 is a cluster of D0 desktop machines which
    along with some custom management software
    provides D0 with 50 of their analysis CPU cycles
    parasitically.
  • Heterogeneous system with distributed support
  • The US LHC experiments submitted a proposal on
    Monday which, among many other topics, discussed
    the use of economic theories to optimize resource
    allocations.
  • Techniques already used in D0

7
Multipurpose Computing
  • Fundamental to a grid connected facility is the
    ability to support multiple experiments at a
    minimum and ideally multiple disciplines
  • The people responsible for computing systems have
    been thinking about how to make this possible,
    because so many regional computing centers have
    to support multiple experiments and user
    communities.
  • John Gordon gave an interesting talk on whether
    it was possible to build a multipurpose center
  • John identified 6 categories of problems and
    discussed possible solutions
  • Software levels
  • experts
  • Local rules
  • Security
  • Firewalls
  • The accelerator centres

8
Early Interfacing of Grid Services to Fabrics
  • Alex Sim gave a talk on the Storage Resource
    Manager SRM Functionality
  • Manage space
  • Negotiate and assign space to users, Manage
    lifetime of spaces
  • Manage files on behalf of a user
  • Pin files in storage till they are released,
    Manage lifetime of files
  • Manage action when pins expire (depends on file
    types)
  • Manage file sharing
  • Policies on what should reside on a storage
    resource at any one time
  • Policies on what to evict when space is needed
  • Get files from remote locations when necessary
  • Purpose to simplify clients task
  • Manage multi-file requests
  • A brokering function queue file requests,
    pre-stage when possible
  • Provide grid access to/from mass storage systems
  • HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE
    (Jlab), Castor (CERN), MSS (NCAR),

9
Early Implementation
  • The functionality of SRM is impressive, leads to
    interesting analysis scenarios
  • Equally interesting is the number of places that
    are prepared to interface their storage to the
    WAN using SRM
  • Robust file replication between BNL and LBNL

10
Shift to commodity equipment
11
Benefits and Complications
  • The benefit is very substantial computing
    resources at a reasonable hardware cost.
  • The complication is the scale and complexity of
    the commodity computing cluster
  • A reasonably big computing cluster today might be
    1000 systems
  • With all the possible hardware problems
    associated with 1000 systems bought from the
    lowest bidder
  • Considerable amount of deployment, integration,
    and development effort to create tools that allow
    a shelf or rack of linux boxes to behave like a
    computing resource.
  • Configuration Tools
  • Monitoring Tools
  • Tools for systems control
  • Scheduling Tools
  • Security Techniques

12
Configuration Tools
  • We heard an interesting talk from Thorsten
    Kleinwork on install and running systems at CERN
  • Systems are installed with kickstart and RPMs
  • CERN and several other centers are deploying the
    configuration tools from EDG WP4
  • Pan CDB (Configuration Data Base) for
    describing hosts
  • Pan is a very flexible language for describing
    host configuration information
  • Expressed in templates (ASCII)
  • Allows includes (inheritance)
  • Pan is compiled into XML, inside CDB
  • XML is downloaded and the information provided by
    CCConfig, which is the high level API
  • Complicated even to track what it is you have.
  • We had an interesting presentation from Jens
    Kreutzkamp from DESY about how they track their
    IT assets.

13
Monitoring Tools
  • Systems are complicated consisting of many
    components this has lead to the development of
    lots of monitoring tools
  • Very functional, complete and scalable though
    complicated to extend tools like NGOP, which
    Tanya Levshina presented

14
Monitoring Tools (cont.)
  • On the opposite end where examples of extremely
    lightweight monitoring packages for Babar
    presented by Matthias Wittgen.
  • Monitors CPU and network usage as well packets
    sent to disk and number of processes
  • Writes it to a central server where it is kept on
    a flat file.

15
Tools for system control
  • Andras Horvath presented a technique for secure
    system control and reset access for a reasonable
    cost
  • This solutions doesnt scale to 6000 boxes
  • System Andras is implementing consists of serial
    connections for console access and relays
    attached to the reset switch on the motherboard
    for resets

16
Security Techniques
  • Number of systems in these large commodity
    clusters makes for interesting security work
  • Doubly so when worrying about making grid
    interfaces
  • The work to secure the BNL facility was presented
  • Work prioritizing their assets and forming
    responses for security breaches

17
Field doesnt cooperate well
  • This is not necessarily a problem, nor is it a
    criticism, simply an observation
  • One doesnt see a lot of common detector building
    projects, maybe it isnt surprising that there
    arent a lot common computing development efforts
  • I noticed during the week that there is a lot of
    duplication of effort, even between experiments
    that are geographically close
  • We have forums for exchange like HEPIX and the
    Large Cluster Workshop meetings
  • Even with these, we dont seem to do much
    development in common
  • There are notable exceptions
  • Alan Silverman presented the work to write a
    guide to building and operating a large cluster
  • Their noble if somewhat ambitious goal is to
  • Produce the definitive guide to building and
    running a cluster - how to choose, acquire, test
    and operate the hardware software installation
    and upgrade tools performance mgmt, logging,
    accounting, alarms, security, etc, etc

18
Grid Projects
  • The grid projects are another area in which the
    field is working effectively together
  • A number of sites indicated the desire to use
    common tools developed by EDG Work Package 4
  • Good buy in from fabric managers about the use of
    SRM
  • Software deployment through the VDT

19
Conclusions
  • It was a long and interesting week
  • Apologies for not being able to summarize
    everything
  • We had very interesting discussions and
    presentations yesterday about how to interface
    the fabrics and the grid services
  • I also didnt get a change to cover some of the
    hardware and software RD results
  • I encourage people to look at the web page.
    Almost all the talks were posted.
Write a Comment
User Comments (0)
About PowerShow.com