Batch System Operation - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Batch System Operation

Description:

Strong message from batch system developers that pre-emption is A GOOD THING. With pre-emption schedulers can maximise throughput/resource usage by ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 27

Provided by: tony138

Category:

more less

Transcript and Presenter's Notes

Title: Batch System Operation

1
Batch System OperationInteraction with the
GridLCG/EGEE Operations WorkshopMay 25th
2005Tony.Cass_at_CERN.ch
2
Why a Batch Workshop at HEPiX?

Proposed after the last Operations Workshop.
Remember the complaints then?
ETT doesnt work
ETT is meaningless when fairsharing is in place
The solution of a queue per VO while easy to
implement now but is not a good or long term
solution.
The ETT algorithm was questioned and other
proposals were given.
Idea was to bring together site managers, grid
local scheduler developers.

3
Workshop Aims

Understand how different batch scheduling systems
are used at HEP sites
Are there any commonalities?
How do sites see the Grid interface?
How would sites like to see the Grid interface?
What is the impact of the current interface?
How do developers of local and Grid level
schedulers see the future?
How/can HEP site managers influence future
developments?
Well attended (70-80)
Definite interest in this area from site managers
See http//www.fzk.de/hepix

4
Agenda

Local Scheduler usage
SLAC, RAL, LeSC, JLab, IN2P3, FNAL, DESY, CERN,
BNL
LSF, PBS, Torque/Maui, SGE (N1GE6), BQS, Condor
Impact of Grid on sites
Jeff Templon overview (c.f. previous talk),
BQS_at_IN2P3
Local scheduler view
LSF, PBS, LoadLeveler, Condor, BQS
Grid Developments
EGEE/BLAHP, GLUE
Common batch environment
See earlier.

5
Site Presentations --- I

Site reports covered
Brief overview of the available computing
resources, showing (in)homogeneity of resources
Queue configuration---what and why
How do users select queues---cpu time alone or
specifying other resources (e.g. memory, local
disk space availability)
Need for, and use of, "special" queues---for
"production managers", sudden high priority work,
other reasons.
Question from LHCC referee If there is some
urgent analysis, how can gLite send this to a
special queue?
Level of resource utilisation

6
Site Presentations --- II

Overall, configurations and concerns were broadly
equivalent across sites.
Concerns were around
Scheduling
Security
Interface Scalability
Cover these issues in next few slides.

Scheduling Issues

8
Local Load Scheduling summary

Batch schedulers at local sites enable
fine-grained control over heterogeneous systems
and are used to enforce local policies on
resource allocation and provide SLA for users
(turnround time).
Large sites have subdivision of user groups
Scheduling is by CPU time, some need to request
minimum CPU capacity for server
memory requirement
available disk work space (/pool, /scratch, /tmp)
Sites want Grid interface to use existing
queue(s)
NOT to create a queue per VO.
EMPHATICALLY NOT to replicate queue structure per
VO

9
Grid/Local interface problems

Jeffs presentation!
In short
Not enough information passed from the site to
the Grid
No information passed from the Grid to the site
Result
Queues at sites whilst others sit empty
Confused/frustrated site managers
Inefficient behaviour as people work the system
Tragedy of the commons

10
Should sites (be able to) enforce policies?

Sites are funded for particular tasks and need to
show funding agencies and users that they are
fulfilling their mission.
This is a Grid. Why does it matter if you are
running jobs for X not Y? Y may be happily
running jobs at another site.
My view
Sites need to understand and feel comfortable
with the way they accept jobs from the Grid.
If they are comfortable, account may be taken of
global activity when setting local priorities.
Lets walk before we try to run

11
Can/Should we fix this?

or should we wait to see some general standard
emerge?
Strong support from commercial people (especially
Platform and Sun) for HEP to work out solutions
to this problem.
They are interested in what we do.
Standards bodies (GGF,) wont come up with any
common solution soon.
But this doesnt mean HEP shouldnt participate
Raise profile of problems of interest to us
Give practical input based on realworld
experience.

12
How to fix?

Improve information available to Grid scheduler
VO information added in GLUE schema (v1.2)
Need volunteer per batch system to maintain
dynamic plug-ins and the job manager.
CERN will do this for LSF. Need other volunteers!
but still assumption of homogeneous resources at
a site.
There is a plan to start work on GLUE v2 in
November
No requirement for backwards compatibility.
Discussion should start NOW!
But need to assess impact of v1.2 changes before
rushing into anything.
Grid scheduler should pass job resource
requirements to the local resource manager.
Not yet. When? How?
Needs normalisation Does this need to be per VO?

Security

14
Security Issues

Sites are still VERY concerned about traceability
of users.
Mechanisms seem to be in place to allow this, but
sites have little practical experience.
c.f. delays for CERN to block user systematically
crashing worker nodes.
Security group have doubts that sites are
fulfilling obligations in terms of log retention.
Security Challenges mooted these may help
increase confidence
Whatever, it does NOT seem to be a good idea to
have a portal handling user job requests and
passing these on with a common certificate

Interface Scalability

16
Interface Scalability

IN2P3 example GridJobManager asks job status
once per minute (even for 15-hour jobs).
5000 queued jobs 1000 running jobs 100
queries/s
Being solved by egee BLAHP
Caches query response
But
further example need for discussion between sites
developers (IN2P3 fixing this issue
independently)
are there other similar issues out there?
c.f. LSF targets
Scalability 5K hosts, 500K active jobs, 100
concurrent users, 1M completed jobs per day
Performance gt90 slot utilistion, 5s max command
response time, 4kB memory/job, master failover
in lt5mins
What are targets for the CE? RB?

Some other Topics

18
End-to-End Guarantees

The Condor talk raised many interesting points.
One in particular was the (in)ability of the
overall system to offer end-to-end execution
guarantees to the users.
Condor glide-in pilot job submitted via the
Grid which takes a job from a condor queue.
Fair enough modulo security for system
managers PROVIDED pilot job expresses same
resource requests as it advertises in a class-ad
when it starts.
Shouldnt claim to be maximum possible length
then run short job.
Class ads and GLUE schema not so different Both
are ways of saying what a node/site can do in a
way that can be used to express (and then match)
requirements.

19
Pre-emption Virtualisation

Strong message from batch system developers that
pre-emption is A GOOD THING. With pre-emption
schedulers can maximise throughput/resource usage
by
suspending many jobs to allow parallel job to run
suspending long running jobs to provide quick
turnround for priority jobs.
Interest in virtualisation as method to ease this
Also discussed at last operations workshop as a
way to ease handling of multiple (conflicting)
requirements for OS versions.
Something to watch.
How would (pre-empted) users like this?
No guarantee of time to completion once job
starts

20
Push vs Pull

A false dichotomy
Sites can manipulate pull model to create a local
queue
Real issue is early vs. late allocation of task
to resource
Early site resource utilisation maximised a
free cpu resource can be filled immediately with
a job from the local queue
Late user doesnt see job sent to site A just
before a cpu becomes free at site B.
Questions
Long term, will most cpu resources be full?
What do people want to maximise? Throughput or ?
Efficient scheduling important anyway
transparency of grid/local interface will be key.
Pre-emption, anyone?

Conclusion

Conclusion
Summary

23
Workshop Summary

Useful workshop. IMHO
Good that there has been progress since the
November workshop at CERN (GLUE schema update),
but much is still to be done.

24
The Service is the Challenge
25
Workshop Summary

Useful workshop.
Good that there has been progress since the
November workshop at CERN (GLUE schema update),
but much is still to be done.
Still Need to increase dialogue between site
managers and Grid scheduler developers
Site managers know a lot about running services.
Unfortunate that a meeting change created a clash
and reduced scope for egee developers to
participate in Kaelsruhe discussions.
A smaller session is pencilled in for HEPiX in
SLAC, October 10th 14th. More dialogue then?
Not too early to start thinking about GLUE v2!

26
(No Transcript)

Write a Comment

User Comments (0)