Title: Workload Management
1Workload Management
David Colling Imperial College London
2- Release 2 is not based on release 1
- Whole new architecture (pretty much described in
D1.4) - More modular
- I have little practical experience of this new
architecture (yet).
3So what is the new architecture?
See D1.4 for details
4The architecture
User Interface Although there have been several
changes to the architecture, the commands
available at the user end are (almost) the same
now edg-job-submit etc Also now apis Network
Server The Network Server is a generic network
daemon, responsible for accepting incoming
requests from the UI (e.g. job submission, job
removal), which, if valid, are then passed to the
Workload Manager.
5The architecture
Workload manager The Workload Manager is the
core component of the Workload Management System.
Given a valid request, it has to take the
appropriate actions to satisfy it. To do so, it
may need support from other components, which are
specific to the different request types.
6The architecture
- Resource Broker
- This has been turned into one of the modules that
help the workload manager, actually 3
sub-modules - Matchmaking
- Ranking
- Scheduling
- Job Adapter
- The Job Adapter put the finishing touches to the
jobs jdl and creates the job wrapper. -
7The architecture
Job Controller and CondoG Actually submit the job
to the resources and track progress.
So how does this all work
8Job submission example (for a simple job)
RB node
Replica Catalog
Network Server
Workload Manager
Inform. Service
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
9Job submission
- edg-job-submit myjob.jdl
- Myjob.jdl
- JobType Normal
- Executable "(CMS)/exe/sum.exe"
- InputData "LFtestbed0-00019"
- ReplicaCatalog "ldap//sunlab2g.cnaf.infn.it201
0/rcWP2 INFN Test Replica Catalog,dcsunlab2g,
dccnaf, dcinfn, dcit" - DataAccessProtocol "gridftp"
- InputSandbox "/home/user/WP1testC","/home/file
, "/home/user/DATA/" - OutputSandbox sim.err, test.out,
sim.log" - Requirements other. GlueHostOperatingSystemNam
e linux" - other. GlueHostOperatingSystemRelease "Red Hat
6.2 other.GlueCEPolicyMaxWallClockTime gt
10000 - Rank other.GlueCEStateFreeCPUs
Job Status
RB node
submitted
Replica Catalog
Network Server
Workload Manager
Inform. Service
Job Description Language (JDL) to specify job
characteristics and requirements
UI allows users to access the
functionalities of the WMS
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
10NS network daemon responsible for
accepting incoming requests
RB node
Job Status
Job submission
Replica Catalog
Network Server
Job
Input Sandbox files
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
11RB node
Job Status
Job submission
Replica Catalog
Network Server
Job
Workload Manager
Inform. Service
RB storage
WM responsible to take the appropriate actions
to satisfy the request
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
12RB node
Job Status
Job submission
Replica Catalog
Network Server
Match- maker
Workload Manager
Inform. Service
RB storage
Where does this job must be executed ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
13RB node
Job Status
Job submission
Replica Catalog
Network Server
Matchmaker responsible to find the best CE
where to submit a job
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
14RB node
Job Status
Where are (which SEs) the needed data ?
Job submission
Replica Catalog
Network Server
Match- Maker/ Broker
Workload Manager
Inform. Service
RB storage
What is the status of the Grid ?
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
15RB node
Job Status
Job submission
Replica Catalog
Network Server
Match- maker
Workload Manager
Inform. Service
RB storage
CE choice
Job Contr. - CondorG
CE characts status
SE characts status
Computing Element
Storage Element
16RB node
Job Status
Job submission
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job Adapter
Job Contr. - CondorG
CE characts status
SE characts status
JA responsible for the final touches to the
job before performing submission (e.g. creation
of wrapper script, etc.)
Computing Element
Storage Element
17RB node
Job Status
Job submission
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job
Job Contr. - CondorG
CE characts status
JC responsible for the actual job
management operations (done via CondorG)
SE characts status
Computing Element
Storage Element
18RB node
Job Status
Job submission
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
CE characts status
Input Sandbox files
SE characts status
Job
Computing Element
Storage Element
19RB node
Job Status
Job submission
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Input Sandbox
Grid enabled data transfers/ accesses
Computing Element
Storage Element
20RB node
Job Status
Job submission
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Output Sandbox files
Computing Element
Storage Element
21Job submission
RB node
Job Status
edg-job-get-output ltdg-job-idgt
Replica Catalog
Network Server
Workload Manager
Inform. Service
RB storage
Job Contr. - CondorG
Output Sandbox
Computing Element
Storage Element
22RB node
Job Status
Job submission
submitted
Replica Catalog
Network Server
waiting
ready
Output Sandbox files
Workload Manager
Inform. Service
RB storage
scheduled
Job Contr. - CondorG
running
done
cleared
Computing Element
Storage Element
23Logging and bookkeeping.
edg-job-status ltdg-job-idgt
LB receives and stores job events processes
corresponding job status
Job status
Logging Bookkeeping
Log Monitor
Log of job events
LM parses CondorG log file (where CondorG
logs info about jobs) and notifies LB
24New functionality
- Release 2 of WP 1 software
- New functionality includes
- MPI job submission
- User APIs
- Accounting infrastructure (Management have
decided not to deploy this for testbed 2) - Interactive job support
- Job logical checkpointing
25New functionality
All these are implemented Specify which sort of
job using the JobType classad e.g. JobType
Checkpointable However only tested on the WP 1
testbed as yet
Dont have time to go through all of these so
will just will just go through checkpointing.
26Job checkpointing scenario
RB node
Network Server
Workload Manager
Logging Bookkeeping Server
Job Contr. - CondorG
27Job Status
- edg-job-submit jobchkpt.jdl
- jobchkpt.jdl
- JobType Checkpointable
- Executable "hsum.exe"
- StdOutput Outfile
- InputSandbox "/home/user/hsum.exe,
- OutputSandbox Outfile,
- Requirements member("ROOT", other.GlueHostApplic
ationSoftwareRunTimeEnvironment)
member("CHKPT", other.GlueHostApplicationSoftwareR
unTimeEnvironment) - Rank -other.GlueCEStateEstimatedResponseTime
RB node
submitted
Replica Catalog
Network Server
Workload Manager
Logging Bookkeeping Server
Job Description Language (JDL) to specify job
characteristics and requirements
UI allows users to access the
functionalities of the WMS
Job Contr. - CondorG
28RB node
Job Status
Network Server
1
Job
Match- maker
Job
1
2
3
Input Sandbox files
Workload Manager
Logging Bookkeeping Server
RB storage
4
Job Adapter
5
Job
Job Contr. - CondorG
6
Input Sandbox files
6
Job
29RB node
Job Status
Network Server
Workload Manager
Logging Bookkeeping Server
RB storage
Job Contr. - CondorG
ltsave intermediate filesgt State.saveValue(var1
, value1gt State.saveValue(varn,
valuen) State.saveState()
From time to time users job asks to save the
intermediate state
30RB node
Job Status
Network Server
Workload Manager
Logging Bookkeeping Server
RB storage
Job Contr. - CondorG
Saving of intermediate files
Saving of job state
31RB node
Job Status
Network Server
Workload Manager
Logging Bookkeeping Server
RB storage
Job Contr. - CondorG
Job fails (e.g. for a CE problem)
Computing Element X
Computing Element Y
32RB node
Job Status
Network Server
Match- maker
Workload Manager
Logging Bookkeeping Server
RB storage
Where must this job be executed ? Possibly on a
different CE where the job was previously
submitted
Job Contr. - CondorG
Reschedule and resubmit job
Job
33RB node
Job Status
Network Server
Match- maker
Workload Manager
Logging Bookkeeping Server
RB storage
CE choice CEy
Job Contr. - CondorG
34RB node
Job Status
Network Server
Workload Manager
Logging Bookkeeping Server
RB storage
Job Adapter
Job
Job Contr. - CondorG
CE characts status
35RB node
Job Status
Network Server
Workload Manager
Logging Bookkeeping Server
RB storage
Job Contr. - CondorG
Input Sandbox files
Job
36RB node
Job Status
scheduled
Network Server
Workload Manager
Logging Bookkeeping Server
done (failed)
RB storage
waiting
Retrieval of last saved state when job starts
Job Contr. - CondorG
ready
Retrieval of intermediate files (previously saved)
scheduled
37RB node
Job Status
scheduled
Network Server
Workload Manager
Logging Bookkeeping Server
done (failed)
RB storage
waiting
Job Contr. - CondorG
ready
Job keeps running starting from the
point corresponding to the retrieved state
(doesnt need to start from the beginning)
scheduled
Job
38Further additional functionality
The order of implementation is not up to WP 1
people Dependent jobs Using Condor DAGMan
For example
39Further additional functionality
A Executable "A.sh" PreScript
"PreA.sh" PreScriptArguments "1"
Children "B", "C" B
Executable "B.sh" PostScript
"PostA.sh" PostScriptArguments "RETURN"
Children "D" C
Executable "C.sh" Children "D"
D Executable "D.sh"
PreScript "PreD.sh" PostScript
"PostD.sh" PostScriptArguments "1", "a"
40Further additional functionality
Job partitioning will be similar to
checkpointing, with the jobs being partitioned
according to some variable. Partitioned jobs
will also have a pre-job and aggregator e.g.
41Further additional functionality
JobType Partitionable Executable
... JobSteps ... StepWeight
... Requirements ... ...
... Prejob
Executable ... Requirements ...
... ... Aggregator
Executable ...
Requirements ... ... ...
42Further additional functionality
Also planned is advanced reservation of resources
and co-location. Much more monitoring and
performance quantification
43- Summary
- New architecture has been implemented
- Lots of new functionality but not stress
tested - Further functionality and performance
quantification implemented by testbed 3.
44Further into the future
EDG will not use OGSA, however the future is in
the OGSA grid world. Work is being done at LeSC
(See Steven Newhouses talk tomorrow) to wrap the
WP 1 components. Communication via JDML and
LBML Virtualisation of RB through OGSA
factory Use virtualisation to load
balance Increase interoperability