Title: EDG WP1 Work Load Management System Activities
1EDG WP1 (Work Load Management System)
Activities
- Plans
- elisabetta.ronchieri _at_cnaf.infn.it
2A new era
- Architecture has been revised
- Increase reliability and flexibility of the
system - Simplify the whole system (e.g. minimize
duplication of persistent information) - Make easy to plug-in new components that
implementing new functionalities - Address some of the shortcomings that emerged in
the first DataGrid testbed - Favor interoperability with other Grid
frameworks, by allowing exploiting WP1 modules
also outside the WP1 WMS - New Functionalities are supported
- A coordination between EDG WP1 and PPDG has been
established to define a common guidelines
3New Functionalities
- Interactive jobs
- Job Checkpointing
- Job Partitioning
- Job Dependencies
- Integration with WP2 Query Optimization Service
- C and Java API, and GUI
- Deployment of Accounting infrastructure over
Testbed (HLRs with command line interface) - Advance reservation API
- Co-allocation API
- RB relying on the GLUE schema
4New features Interactive Jobs
- Interactive Job represents a job with continuos
feedback, so a job for what a user needs to have
standard streams (stdin, stdout, and stderr) on
the UI (submitting) machine. - The connection between WN and UI is always open
from the job (we assume OutBoundIP connectivity
available from WNs). - We do NOT support
- remote signal sending
- asynchronous interaction with the job
- Possible extensions will be evaluated after first
deployment phase. - We use an existing tools Condor Bypass (Grid
Console) - http//www.cs.wisc.edu/condor/bypass
5Bypass What is it ? 1/7
- Bypass is a tool for writing interposition agents
and split execution systems. - Most applications communicate with the operating
system via a standard library which converts
their procedure calls into appropiate kernel
operations. - An interposition agent is a piece of software
which transforms a programs operation
interposing iteself between the program and the
operating system. - An interposition agent squeezes itself into
existing program and modify its behavior - SO, the agent grabs control and manipulates the
results, when the program attemps certain system
calls. - An agent can be used to instrument programs, to
attach it to new systems, and to emulate
operations that otherwise might not be available.
6Bypass What is it? 2/7
- Bypass allows you to
- Split and dinamically-link application
- Transparently use heterogeneous systems
- Trap calls with minimal overhead
- Control execution paths with plain C
- Combine small agents
- Bypass language
- Declare what procedures to trap in C
- Annotate pointer types with data flow (direction
and binary data) - Give two function bodies agent_action and
shadow_action - SO, e.g. the programmer provides a specification
which lists what system calls are to be trapped
and the code to replace. Bypass parses the
specification and produces C code for an agent.
7Bypass Grid Console (GC) 3/7
- The Grid Console is a system for getting
mostrly-continuous input/output rom remote
programs running on an unrealiable network - The GC is robust to many types of failures that
can takle place in such a context (e.g. crashed
machines, partitioned networks, full disks) - Its first priority is to keep jobs running
- Its second priority is to keep the output moving
when conditions permit - The GC is implemented using Bypass
- GC consists of two software components an agent
and a shadow - The agent intercepts reds and writes on stdin,
stdout and stderr. All other operations are
untouched. Reads and writers on these streams are
forwarded to the shadow for execution.
8Bypass Example 4/7
- File simple.bypass
- ssize_t write
- (
- int fd,
- in "length" const void data,
- size_t length
- )
- agent_action
-
- if (fd lt 3)
- return bypass_shadow_write(fd, data, length)
- else return write(fd, data, length)
-
-
- shadow_action
- return write(fd,data,length)
-
9Bypass agent_action and shadow_action 5/7
- An agent action
- Is any arbitrary C code
- When a program invokes write(), the agent_action
is exevuted at the home machine - Within the agent_action
- write() invoke the original write() at the
foreign machine - bypass_shadow_write() invoke the shadow action
via RPC - A shadow action
- Is any arbitrary C code
- If the agent decides to invoke the RPC to the
shadow, the shadow_action is executed at the home
machine - Within the shadow_action
- Write() invoke write() at the home machine
10Bypass How use it! 6/7
- Run bypass to read the specification and
produce C source code - bypass agent shadow simple.bypass
- The shadow is compiled into a plain executable
- The agent is compiled into a shared library
- The dynamic linker is used to force the agent
into an executable at run-time - seteenv LD_PRELOAD simple_agent.so
- export LD_PRELOADsimple_agent.so
- Procedure calls are trapped merely by putting the
agent first in the link list - This method can be used on any dynamically-linked
program tcsh, emacs, .
11Can Bypass be used by a real user ? 7/7
- Bypass works on unmodified executables.
- Real users are not willing/able to
rewrite/recompile their programs - Bypass requires no special privileges
- Real users do not have the root pwd
- SO, Bypass allows a Real User to make good use of
a remote machine without begging the
administrator to configure it to his/her needs.
12How to use Bypass GC in WP1 1/2
- A Job Shadow is the Grid Console Shadow running
on the UI machine. - A Pillow process is a process started on the WN
just beore the job that intercepts the job
standard streams. - The Pillow process is linked against a Job Agent
which is a slightly modified Grid Console
Interposition Agent.
13How to use Bypass GC in WP1 2/2
- Job submission goes through usual command
(dg-job-submit) - The attribute JobType is set to Interactive.
- Other attributes are
- ShadowPort (is not mandatory)
- ShadowHost (always filled by UI)
- UI starts the Job Shadow process on the
submitting machine, at the specified port - UI writes in LB, the ShadowPort and ShadowHost
values
14In case of crash at the UI side
- dg-job-attach ltjobIDgt
- If the job is still running, reads ShadowPort
from LB - Re-starts the shadow on that port
- If the port is not available starts the shadow on
a different port and sores in LB - On the WN the agent retries to contact the shadow
- After a number of failures queries the LB for the
ShadowPort - If it has changed tries to contact the shadow at
the new port - If it fails again, it gives up and the job is
aborted
15New Features Job checkpointing
- Checkpointing a job during its execution means
saving its state, so that the job execution can
be suspended, and resumed later, starting from
the same point where it was previously stopped. - The idea is providing users with a trivial
checkpointing service through a proper API, a
user can save, at any moment during the execution
of a job, the state of this job. The hypothesis
is, of course, that the job can be restarted from
an intermediate state.
16New features Job Partitioning
- Job Partitioning takes place when a job has to
process a large set of independent elements. - In these cases it may be worthwhile to decompose
the job into smaller sub-jobs (which can be
executed in parallel), in order to reduce the
overall time needed to process all these
elements, and to optimize the usage of all
available Grid resources. - At the end each sub-job must save a final state,
then retrieved by a job aggregator, responsible
to collect the results of the sub-jobs and
produce the overall output. - This problem has been addressed in the context of
job checkpointing and makes large use of the
DAGMan mechanism.
17New features Job Dependencies
- Job dependencies takes place when the execution
of a program Y cannot start before the program X
has successfully finished.
- We consider just temporal dependencies (e.g. run
job Y only when job X has finished).(1) - We are investigating whether there are other kind
of dependencies. - It is based on Condor DAGMan
- http//www.cs.wisc.edu/condor/dagman
18DAGMan Meta-Scheduler
- DAGMan means Directed Acyclic Graph Manager
- DAGMan is an existing solution to handle
inter-job dependencies. It handles a set of jobs
that must be run in a certain order. - (e.g., Dont run job Y until job X has
completed successfully, so there is a time order
to preserve) - DAGMan navigates the graph, determines which
graph nodes are free of dependencies, and follows
the execution of the corresponding jobs. - DAGMan is a product developed within the Condor
project - A DAGMan process is started by CondorG for each
DAG submitted to it.
19DAGMan Whats a DAG? 1/2
- A DAG is the data structure used by DAGMan to
represent these dependencies.
- Each job (program) is a node in the DAG.
- Each node can have any number of parent or
children nodes as long as there are no loops! - Dependencies are represented by contiguos
segments called arcs - The arcs are directed since there is a clear time
order on which jobs should be run.
- Each node consists of three parts
- A PRE-script, which is executed before the users
job is run - A users job
- A POST-script, which is executed after the users
job has run
20DAGMan Whats a DAG? 2/2
- The jobs (nodes) are independent each one has
its own executable, input, output, running
environment, requirements, and so on. - A DAG node fails, if any of these three parts
fail - A whole DAG succeeds, if and only if all its
member jobs succeed
Job Z is executed only after both Job Y and W are
completed. At their turn, Y and W have both to
wait for X to be completed before being started.
21How a user can define a DAG 1/2
- A DAG is specified via JDL.
- A DAG consists of a ClassAd, where the attribute
JobType is set to DAG, containing a set of
ClassAd attributes, each one representing a job. - Arcs ltarray of couple of stringsgt (each couple
of string is an arc) - PreScript ltstringgt (the script to run before
job execution) - PreScriptArguments ltarray of stringsgt (the list
of - Arguments for the PRE-script)
- PostScript ltstringgt (the script to run after
the job - has completed)
- PostScriptArguements ltarray of stringsgt (the
arguments for the POST-script)
22Example of DAG 2/2
-
- JobType DAG
- JA
- Executable JA.sh
- PreScript PreJA.sh
- PreScriptArguments 1
-
- JB
- Executable JB.sh
- PostScript PostJB.sh
- PostScriptArguments RETURN
-
- JC
- Executable JC.sh
-
- JD
- Executable JD.sh
- PreScript PreJD.sh
- PostScript PostJD.sh
The RETURN macro represents the exit status of
B.sh. In general, an exit status other than zero
implies that the node, and hence the whole DAG,
has failed.
23What operations a user can do on DAGs
- dg-job-submit
- Submits a DAG.
- dg-job-cancel
- Kills a previously submitted DAG.
- All the jobs part of the DAG get killed.
- A rescue DAG is produced.
- dg-job-status
- Returns the current status of the DAG.
- dg-job-get-output
- Retrieves the output sandbox for all the DAG
member jobs, assuming that the DAG has completed.
24New features Integration with WP2 Query
Optimization Service
- Help RB to find the best CE based on data
location. - RB will use access cost estimation APIs provided
by WP2 - Trigger of input data transfer
- Up to now all input data have to be copied where
they are expected to be by users, there is no
automatic frequently-accessed file local fetching
25New features C and Java API, and GUI
- C/Java API provides a series of actions over a
job or a collection of jobs such as performing a
submission or looking for a matching resource,
get the status and the logging info, retrieve the
output files and cancel a running job. Moreover
the package allows to manage proxy certificates,
and to create JDL files. - GUI allows the user to
- Monitor the status of one or more jobs during
his/their life cycle - Create-manage graphically step by step a
syntax-error-safe JDL file - GUI exploits the Java API package. (There is also
one in python)
26New features Deployment of Accounting
infrastructure over Testbed
- Based upon a computational economy model, users
pay in order to execute their jobs on the
resources, and the owner of the resources earn
credits by executing the user jobs. - The are two reasons for
- To have a nearly stable equilibrium able to
satisfy the needs of both resource providers and
consumers - To credit of job resources usage to the resource
owner(s) after execution
27New features Advance reservation API
- Advance reservation of resources allows to
realize end-to-end quality of service (QoS), and
to reduce competition for resources. - The approach is based on concepts discussed in
the Global Grid Forum. - A reservation is a promise from the system that
an application will receive a certain level of
service from a resource (e.g, a reservation may
promise a given percentage of a CPU). - Advance reservation API is composed by
- The Reservation Agent API ,which accepts a
generic reservation from a user, maps it into a
reservation on a specific resource, matches the
requirements and preferences specified by the
user, performs the allocation on the specific
resource, and allows the user to use a granted
reservation for his job. - The Resource-Dependent Reservation Agent API
where a reservation for the specified request of
user is created, binds a reservation to run-time
parameters, unbinds a reservation, cancels a
reservation, modifies the parameters associated
with a reservation, and returns the status of the
resource reservation.
28How can a user request a resource reservation ?
1/2
- A resource reservation request is specified via
JDL. - The attribute Type is set to Reservation.
- The other attributes are
- ReservationResource (type of underlying resource)
- ReservationType (used in case a resource supports
different types of reservation) - ReservationStart (specify the time when the
reservation may begin) - ReservationEnd (specify the time when the
reservation can expire) - ReservationDuration (specify how long the
reservation lasts) - ReservationParameters (specify resource-depend
parameters) - Not all the attributes are mandatory
ReservationStart and ReservationEnd default
values are respectively now and end time.
29Example of resource reservation request 2/2
- Reservation request for three nodes for 300
seconds on a CE running Linux, whose architecture
is i386 -
- Type Reservation
- ReservationResource computing
- ReservationStart 1021539656
- ReservationEnd 1021541000
- ReservationDuration 300
- ReservationParameters nodes 3
- ..
- Requirements other.Arch i386 other.OpSys
Linux other.SupportReservation -
- The time is an integer value expressing the
number of seconds since the epoch, which
corresponds to the midnight of the 1st of January
1970 UTC.
30New features Co-allocation API
- Co-allocation allows the concurrent allocation of
multiple resources. - These resources can be homogeneous or
heterogeneous. - The Co-allocation API is composed by
- Co-allocation Agent API which accepts a
co-allocation request from a user, discovers
resources compatible with the requirements and
preferences included in all the resource
descriptions, finds compatible combinations of
resources that would satisfy the co-allocation
request, and tries each combination - The Application Programming Interface API which
creates a co-allocation, cancels a co-allocation,
canceling all the reservations belonging to the
specified co-allocation, modifies the allocation,
returns the status of co-allocation.
31How can a user request a co-allocation ? 1/2
- A resource reservation request is specified via
JDL. - The attribute Type is set to coallocation.
- The other attributes are
- ReservationResource (type of underlying resource)
- ReservationType (used in case a resource supports
different types of reservation) - ReservationStart (specify the time when the
reservation may begin) - ReservationEnd (specify the time when the
reservation can expire) - ReservationDuration (specify how long the
reservation lasts) - ReservationParameters (specify resource-depend
parameters) - Not all the attributes are mandatory
ReservationStart and ReservationEnd default
values are respectively now and end time (
infinite).
32Example of co-allocation request 2/2
- Co-allocation request for a computing node, 100
GB of storage in a SE speaking a certain
protocol (gridFTP), and a connection between the
considered CE and SE fo 10 MB/s. -
- Type coallocation
- ReservationStart 102224828
- ReservationEnd 1022255428
- ReservationDuration 3600
- Res1
- Type Reservation
- ReservationResource computing
- ReservationParameters nodes 3
- Requirements other.Arch i386 other.OpSys
Linux other.SupportReservation - InputData LFtestbed0-00019
- ReplicaCatalog ldap//sunlab2g.cnaf.inn.it2010
/rcINFN Test RC, dcsunlab2g, dccnaf, dcinfn,
dc it -
- Res2
- Type Reservation
- ReservationResource storage
- ReservationParameters space 100000
- Requirements other.Protocol gridftp
other.FreeSoace gt ReservationPrameters.space
other.SupportReservation
33New features RB relying on the GLUE schema
- Use the new CE schema for interoperability
between EU Grid Project and US HEP Grid Projects