Batch System Operation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Batch System Operation

Description:

Strong message from batch system developers that pre-emption is A GOOD THING. With pre-emption schedulers can maximise throughput/resource usage by ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 27
Provided by: tony138
Category:

less

Transcript and Presenter's Notes

Title: Batch System Operation


1
Batch System OperationInteraction with the
GridLCG/EGEE Operations WorkshopMay 25th
2005Tony.Cass_at_CERN.ch
2
Why a Batch Workshop at HEPiX?
  • Proposed after the last Operations Workshop.
  • Remember the complaints then?
  • ETT doesnt work
  • ETT is meaningless when fairsharing is in place
  • The solution of a queue per VO while easy to
    implement now but is not a good or long term
    solution.
  • The ETT algorithm was questioned and other
    proposals were given.
  • Idea was to bring together site managers, grid
    local scheduler developers.

3
Workshop Aims
  • Understand how different batch scheduling systems
    are used at HEP sites
  • Are there any commonalities?
  • How do sites see the Grid interface?
  • How would sites like to see the Grid interface?
  • What is the impact of the current interface?
  • How do developers of local and Grid level
    schedulers see the future?
  • How/can HEP site managers influence future
    developments?
  • Well attended (70-80)
  • Definite interest in this area from site managers
  • See http//www.fzk.de/hepix

4
Agenda
  • Local Scheduler usage
  • SLAC, RAL, LeSC, JLab, IN2P3, FNAL, DESY, CERN,
    BNL
  • LSF, PBS, Torque/Maui, SGE (N1GE6), BQS, Condor
  • Impact of Grid on sites
  • Jeff Templon overview (c.f. previous talk),
    BQS_at_IN2P3
  • Local scheduler view
  • LSF, PBS, LoadLeveler, Condor, BQS
  • Grid Developments
  • EGEE/BLAHP, GLUE
  • Common batch environment
  • See earlier.

5
Site Presentations --- I
  • Site reports covered
  • Brief overview of the available computing
    resources, showing (in)homogeneity of resources
  • Queue configuration---what and why
  • How do users select queues---cpu time alone or
    specifying other resources (e.g. memory, local
    disk space availability)
  • Need for, and use of, "special" queues---for
    "production managers", sudden high priority work,
    other reasons.
  • Question from LHCC referee If there is some
    urgent analysis, how can gLite send this to a
    special queue?
  • Level of resource utilisation

6
Site Presentations --- II
  • Overall, configurations and concerns were broadly
    equivalent across sites.
  • Concerns were around
  • Scheduling
  • Security
  • Interface Scalability
  • Cover these issues in next few slides.

7
  • Scheduling Issues

8
Local Load Scheduling summary
  • Batch schedulers at local sites enable
    fine-grained control over heterogeneous systems
    and are used to enforce local policies on
    resource allocation and provide SLA for users
    (turnround time).
  • Large sites have subdivision of user groups
  • Scheduling is by CPU time, some need to request
  • minimum CPU capacity for server
  • memory requirement
  • available disk work space (/pool, /scratch, /tmp)
  • Sites want Grid interface to use existing
    queue(s)
  • NOT to create a queue per VO.
  • EMPHATICALLY NOT to replicate queue structure per
    VO

9
Grid/Local interface problems
  • Jeffs presentation!
  • In short
  • Not enough information passed from the site to
    the Grid
  • No information passed from the Grid to the site
  • Result
  • Queues at sites whilst others sit empty
  • Confused/frustrated site managers
  • Inefficient behaviour as people work the system
  • Tragedy of the commons

10
Should sites (be able to) enforce policies?
  • Sites are funded for particular tasks and need to
    show funding agencies and users that they are
    fulfilling their mission.
  • This is a Grid. Why does it matter if you are
    running jobs for X not Y? Y may be happily
    running jobs at another site.
  • My view
  • Sites need to understand and feel comfortable
    with the way they accept jobs from the Grid.
  • If they are comfortable, account may be taken of
    global activity when setting local priorities.
  • Lets walk before we try to run

11
Can/Should we fix this?
  • or should we wait to see some general standard
    emerge?
  • Strong support from commercial people (especially
    Platform and Sun) for HEP to work out solutions
    to this problem.
  • They are interested in what we do.
  • Standards bodies (GGF,) wont come up with any
    common solution soon.
  • But this doesnt mean HEP shouldnt participate
  • Raise profile of problems of interest to us
  • Give practical input based on realworld
    experience.

12
How to fix?
  • Improve information available to Grid scheduler
  • VO information added in GLUE schema (v1.2)
  • Need volunteer per batch system to maintain
    dynamic plug-ins and the job manager.
  • CERN will do this for LSF. Need other volunteers!
  • but still assumption of homogeneous resources at
    a site.
  • There is a plan to start work on GLUE v2 in
    November
  • No requirement for backwards compatibility.
  • Discussion should start NOW!
  • But need to assess impact of v1.2 changes before
    rushing into anything.
  • Grid scheduler should pass job resource
    requirements to the local resource manager.
  • Not yet. When? How?
  • Needs normalisation Does this need to be per VO?

13
  • Security

14
Security Issues
  • Sites are still VERY concerned about traceability
    of users.
  • Mechanisms seem to be in place to allow this, but
    sites have little practical experience.
  • c.f. delays for CERN to block user systematically
    crashing worker nodes.
  • Security group have doubts that sites are
    fulfilling obligations in terms of log retention.
  • Security Challenges mooted these may help
    increase confidence
  • Whatever, it does NOT seem to be a good idea to
    have a portal handling user job requests and
    passing these on with a common certificate

15
  • Interface Scalability

16
Interface Scalability
  • IN2P3 example GridJobManager asks job status
    once per minute (even for 15-hour jobs).
  • 5000 queued jobs 1000 running jobs 100
    queries/s
  • Being solved by egee BLAHP
  • Caches query response
  • But
  • further example need for discussion between sites
    developers (IN2P3 fixing this issue
    independently)
  • are there other similar issues out there?
  • c.f. LSF targets
  • Scalability 5K hosts, 500K active jobs, 100
    concurrent users, 1M completed jobs per day
  • Performance gt90 slot utilistion, 5s max command
    response time, 4kB memory/job, master failover
    in lt5mins
  • What are targets for the CE? RB?

17
  • Some other Topics

18
End-to-End Guarantees
  • The Condor talk raised many interesting points.
    One in particular was the (in)ability of the
    overall system to offer end-to-end execution
    guarantees to the users.
  • Condor glide-in pilot job submitted via the
    Grid which takes a job from a condor queue.
  • Fair enough modulo security for system
    managers PROVIDED pilot job expresses same
    resource requests as it advertises in a class-ad
    when it starts.
  • Shouldnt claim to be maximum possible length
    then run short job.
  • Class ads and GLUE schema not so different Both
    are ways of saying what a node/site can do in a
    way that can be used to express (and then match)
    requirements.

19
Pre-emption Virtualisation
  • Strong message from batch system developers that
    pre-emption is A GOOD THING. With pre-emption
    schedulers can maximise throughput/resource usage
    by
  • suspending many jobs to allow parallel job to run
  • suspending long running jobs to provide quick
    turnround for priority jobs.
  • Interest in virtualisation as method to ease this
  • Also discussed at last operations workshop as a
    way to ease handling of multiple (conflicting)
    requirements for OS versions.
  • Something to watch.
  • How would (pre-empted) users like this?
  • No guarantee of time to completion once job
    starts

20
Push vs Pull
  • A false dichotomy
  • Sites can manipulate pull model to create a local
    queue
  • Real issue is early vs. late allocation of task
    to resource
  • Early site resource utilisation maximised a
    free cpu resource can be filled immediately with
    a job from the local queue
  • Late user doesnt see job sent to site A just
    before a cpu becomes free at site B.
  • Questions
  • Long term, will most cpu resources be full?
  • What do people want to maximise? Throughput or ?
  • Efficient scheduling important anyway
    transparency of grid/local interface will be key.
  • Pre-emption, anyone?

21
  • Conclusion

22
  • Conclusion
  • Summary

23
Workshop Summary
  • Useful workshop. IMHO
  • Good that there has been progress since the
    November workshop at CERN (GLUE schema update),
    but much is still to be done.

24
The Service is the Challenge
25
Workshop Summary
  • Useful workshop.
  • Good that there has been progress since the
    November workshop at CERN (GLUE schema update),
    but much is still to be done.
  • Still Need to increase dialogue between site
    managers and Grid scheduler developers
  • Site managers know a lot about running services.
  • Unfortunate that a meeting change created a clash
    and reduced scope for egee developers to
    participate in Kaelsruhe discussions.
  • A smaller session is pencilled in for HEPiX in
    SLAC, October 10th 14th. More dialogue then?
  • Not too early to start thinking about GLUE v2!

26
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com