Title: Introduction for Jobs Submission
1Introduction for Jobs Submission
- Giuseppe LA ROCCA
- giuseppe.larocca_at_ct.infn.it
- INFN Catania, ITALY
2Outline
An introduction to the WMS and JDL The
gLite WMS architecture The Command Line
Interface (CLI) Advanced jobs References
Hands-on
3Overview of gLite Middleware
4Overview
- The Workload Management System (WMS) is the gLite
3 component that allows users to submit jobs, and
performs all tasks required to execute them,
without exposing the user to the complexity of
the Grid. - It is the responsibility of the user to describe
his jobs and their requirements, and to retrieve
the output when the jobs are finished. - In the WLCG/EGEE Grid, two different workload
management systems are deployed the legacy LCG-2
system and the new system from the EGEE project,
which is an evolution of the former and therefore
has more functionalities. - In the following sections, we will describe the
basic concepts of the language used to describe a
job and the basic command line interface to
submit and manage simple jobs.
5Workload Management System
- Workload Management System (WMS) comprises a set
of Grid middleware components responsible for
distribution and management of tasks across Grid
resources. - The Workload Manager(WM) aims to accept and
satisfy requests for job management coming from
its clients. - WM will pass the job to an appropriate CE for
execution taking into account requirements and
the preferences expressed in the job description. - The decision of which resource should be used is
the outcome of a matchmaking process. - The Logging and Bookkeeping service tracks jobs
managed by the WMS. It collects events from many
WMS components and records the status and history
of the job.
6Job Description Language
- The Job Description Language (JDL) is a
high-level language based on the Classified
Advertisement (ClassAd) language, used to
describe jobs and aggregates of jobs with
arbitrary dependency relations. - The JDL is used in WLCG/EGEE to specify the
desired job characteristics and constraints,
which are taken into account by the WMS to select
the best resource to execute the job. - A job description is a file (called JDL file)
consisting of lines having the format attribute
expression - Expressions can span several lines, but only the
last one must be terminated by a semicolon.
7Job Description Language
- The character cannot be used in the JDL.
- Comments must be preceded by a sharp character
() or a double slash (//) at the beginning if
each line. - Multi-line comments must be enclosed between /
and / .
Attention! The JDL is sensitive to blank
characters and tabs. No blank characters or tabs
should follow the semicolon at the end of a line.
8Simple JDL example
- Executable "/bin/hostname"
- StdOutput "std.out"
- StdError "std.err"
- The Executable attribute specifies the command
to be run by the job. If the command is already
present on the WN, it must be expressed as a
absolute path if it has to be copied from the
UI, only the file name must be specified, and the
path of the command on the UI should be given in
the InputSandbox attribute. - Executable "test.sh"
- InputSandbox "/home/doe/test.sh"
- StdOutput "std.out"
- StdError "std.err"
9- The Arguments attribute can contain a string
value, which is taken as argument list for the
executable - Arguments "fileA 10"
- In the Executable and in the Arguments attributes
it may be necessary to use special characters,
such as , \, , gt, lt. These characters should be
preceded by triple \ in the JDL, or specified
inside quoted strings e.g. Arguments "-f
file1\\\file2" - The attributes StdOutput and StdError define the
name of the files containing the standard output
and standard error of the executable, once the
job output is retrieved.
10- If files have to be copied from the UI to the
execution node, they must be listed in the
InputSandbox attribute - InputSandbox "test.sh", .. ,"fileN"
- The files to be transferred back to the UI after
the job is finished can be specified using the
OutputSandbox attribute - OutputSandbox "std.out","std.err"
11- Wildcards are allowed only in the InputSandbox
attribute. - Absolute paths cannot be specified in the
OutputSandbox attribute. - The InputSandbox cannot contain two files with
the same name, even if they have a different
absolute path, as when transferred they would
overwrite each other. - The shell environment of the job can be modified
using the Environment attribute. - Environment "CMS_PATHHOME/cms",
"CMS_DBCMS_PATH/cmdb"
12- JobType
- Normal (simple, sequential job), Interactive,
MPICH, Checkpointable, Partitionable, Parametric - Or combination of them
- Checkpointable, Interactive
- Checkpointable, MPI
- Interactive MPI not yet permitted
- JobType Interactive
- JobType Interactive,Checkpointable
13- The Requirements attribute can be used to express
constraints on the resources where the job should
run. - Its value is a Boolean expression that must
evaluate to true for a job to run on that
specific CE. - Note Only one Requirements attribute can be
specified (if there are more than one, only the
last one is considered). If several conditions
must be applied to the job, then they all must be
combined in a single Requirements attribute. - For example, let us suppose that the user wants
to run on a CE using PBS as batch system, and
whose WNs have at least two CPUs. He will write
then in the job description file -
- Requirements other.GlueCEInfoLRMSType "PBS"
other.GlueCEInfoTotalCPUs gt 1
14- The WMS can be also asked to send a job to a
particular queue in a CE with the following
expression - Requirements other.GlueCEUniqueID
"lxshare0286.cern.ch2119/jobmanager-pbs-short" - It is also possible to use regular expressions
when expressing a requirement. - Let us suppose for example that the user wants
all his jobs to run on any CE in the domain
cern.ch. This can be achieved putting in the JDL
file the following expression - Requirements RegExp("cern.ch",other.GlueCEU
niqueID) - The opposite can be required by using
- Requirements
- (!RegExp("cern.ch", other.GlueCEUniqueID))
15- If the job must run on a CE where a particular
experiment software is installed and this
information is published by the CE, something
like the following must be written - Requirements Member(BLAST-1.0.3",
- other.GlueHostApplicationSoftwareRunTimeEnvironmen
t)
Note The Member operator is used to test if its
first argument (a scalar value) is a member of
its second argument (a list). In fact, the
GlueHostApplicationSoftwareRunTimeEnvironment
attribute is a list of strings and is used to
publish any VO-specific information relative to
the CE (typically, information on the VO software
available on that CE).
16- It is possible to have the WMS automatically
resubmitting jobs which, for some reason, are
aborted by the Grid. Two kinds of resubmission
are available for the gLite 3 WMS the deep
resubmission and the shallow resubmission (only
the former is available in the LCG-2 WMS). - The resubmission is deep when the job fails after
it has started running on the WN, and shallow
otherwise. - The user can limit the number of times the WMS
should resubmit a job by using the JDL attributes
RetryCount and ShallowRetryCount for the deep and
shallow resubmission respectively. - For example, to disable the deep resubmission and
limit the attempts of shallow resubmission to 3
- RetryCount 0
- ShallowRetryCount 3
17- The proxy renewal feature of the WMS is
automatically enabled, as long as the user has
stored a long term proxy in the default MyProxy
server (usually defined in the MYPROXY SERVER
environment variable. However it is possible to
indicate to the WMS a different MyProxy server in
the JDL file - MyProxyServer myproxy.ct.infn.it"
18- The choice of the CE where to execute the job,
among all the ones satisfying the requirements,
is based on the rank of the CE, a quantity
expressed as a floating-point number. The CE with
the highest rank is the one selected. - By default, the rank is equal to
other.GlueCEStateEstimatedResponseTime, where the
estimated response time is an estimation of the
time interval between the job submission and the
beginning of the job execution. -
- Rank other.GlueCEStateFreeCPUs
- which will rank best the CE with the most free
CPUs.
19 An introduction to the WMS and JDL The
gLite WMS architecture The Command Line
Interface (CLI) Advanced jobs
References Hands-on
20The WMProxy
- The WMProxy is the service responsible to provide
access to the WMS functionality through a Web
Service Interface - The gLite WMProxy Server can be either accessed
directly through the published WSDL, the C
command line interface, or the API - It has been designed to efficiently handle a
large number of requests for job submission and
control to the WMS - it provides additional features such as bulk
submission and the support for shared and
compressed sandboxes for compound jobs. - Its the natural replacement of the NS in the
passage to the SOA approach.
21gLite WMS Architecture
22gLite WMS Architecture
Job management requests (submission,
cancellation) expressed via a Job
Description Language (JDL)
23gLite WMS Architecture
Finds an appropriate CE for each submission
request, taking into account job requests and
preferences, Grid status, utilization policies
on resources
24gLite WMS Architecture
Keeps submission requests Requests are kept
for a while if no resources are immediately
available
25gLite WMS Architecture
Repository of resource information available to
matchmaker Updated via notifications and/or
active polling on resources
26gLite WMS Architecture
Performs the actual job submission and
monitoring
27 An introduction to the WMS and JDL The
gLite WMS architecture The Command Line
Interface (CLI) Advanced jobs
References Hands-on
28The Command Line Interface
- The gLite WMS implements two different services
to manage jobs the Network Server and the
WMProxy. - The recommended method to manage jobs is through
the gLite WMS via WMProxy, because it gives the
best performance and allows to use the most
advanced functionalities
- The WMProxy implements several
- functionalities, among which
- submission of job collections
- faster authentication
- faster match-making
- faster response time for users
- higher job throughput.
29Delegating a proxy to WMProxy
- Each job submitted to WMProxy must be associated
to a proxy credential previously delegated by the
owner of the job to the WMProxy server. - This proxy is then used any time WMProxy needs to
interact with other services for job related
operations (e.g. submission to the CE, a GridFTP
file transfer etc.) - There are two possible mechanisms to ask for a
delegation of the user credentails - asking the automatic delegation of the
credentials during the submission operation - asking for an explicit delegation
30- To explicitly delegate a user proxy to WMProxy,
the command to use is glite-wms-job-delegate-pro
xy -d ltdelegIDgt - where ltdelegIDgt is a string chosen by the user.
-
- For example, to delegate a proxy
- glite-wms-job-delegate-proxy -d mydelegID
- Connecting to the service
- https//rb102.cern.ch7443/glite_wms_wmproxy_serve
r - glite-wms-job-delegate-proxy Success
- Your proxy has been successfully delegated to the
WMProxy - https//rb102.cern.ch7443/glite_wms_wmproxy_serve
r - with the delegation identifier mydelegID
31Submitting a simple job
- Starting from a simple JDL file, we can submit it
via WMProxy by doing - glite-wms-job-submit d mydelegID test.jdl
- Connecting to the service
- https//rb102.cern.ch7443/glite_wms_wmproxy_serve
r - glite-wms-job-submit Success
- The job has been successfully submitted to the
WMProxy - Your job identifier is
- https//rb102.cern.ch9000/vZKKk3gdBla6RySximq_vQ
32Troubleshooting /1
- To submit jobs via WMProxy, it is required to
have a valid VOMS proxy, otherwise the submission
will fail with an error like - Error - Operation failed
- Unable to delegate the credential to the
endpoint - https//rb102.cern.ch7443/glite_wms_wmproxy_serve
r - User not authorized
- unable to check credential permission
(/opt/glite/etc/glite_wms_wmproxy.gacl) - (credential entry not found)
- credential type person
- input dn /CCH/OCERN/OUGRID/CNJohn Doe
33Authorization
- The client must be properly authorized when
interacts with the WMProxy service. - This means that either the FQAN or the DN (in
case of globus-style proxies) of the client must
be properly listed and authorized in the
glite_wms_wmproxy.gacl file on the WMProxy
machine. - cat glite_wms_wmproxy.gacl
- ltgacl version'0.0.1'gt
- ltentry ltvomsgtltfqangtbio/RoleNULLlt/fqan
gtlt/vomsgt - ltallowgtltexec/gtlt/allowgt
- lt/entrygt
- ..
- lt/gaclgt
34Troubleshooting /2
- If the command returns the following error
- Error - WMProxy Server Error
- LCMAPS failed to map user credential
- Method getFreeQuota
- Error code 1208
- it means that there are authentication problems
between the UI and the WMProxy server (you may
not be authorized to use that WMProxy server).
35Listing CE(s) that matching a job
- It is possible to see which CEs are eligible to
run a job described by a given JDL using - glite-wms-job-list-match d mydelegID --rank
test.jdl - Connecting to the service
- https//rb102.cern.ch7443/glite_wms_wmproxy_serve
r
- COMPUTING ELEMENT IDs LIST
- The following CE(s) matching your job
requirements have been found - CEId Rank
- - CE.pakgrid.org.pk2119/jobmanager-lcgpbs-cms 0
- - grid-ce0.desy.de2119/jobmanager-lcgpbs-cms -10
- - gw-2.ccc.ucl.ac.uk2119/jobmanager-sge-default
-56 - - grid-ce2.desy.de2119/jobmanager-lcgpbs-cms
-107
36Retrieving the status of a job
- glite-wms-job-status https//rb102.cern.ch9000/
fNdD4FW_Xxkt2s2aZJeoeg
- BOOKKEEPING INFORMATION
- Status info for the Job https//rb102.cern.ch90
00/fNdD4FW_Xxkt2s2aZJeoeg - Current Status Done (Success)
- Exit code 0
- Status Reason Job terminated successfully
- Destination ce1.inrne.bas.bg2119/jobmanager-lcgp
bs-cms - Submitted Mon Dec 4 150543 2006 CET
- The verbosity level controls the amount of
information provided. The value of the -v option
ranges from 0 to 3. - The commands to get the job status can have
several jobIDs as arguments, i.e.
glite-wms-job-status ltjobID1gt ... or, more
conveniently, the -i ltfile pathgt option can be
used to
37Retrieving the output(s)
- glite-wms-job-output
- https//rb102.cern.ch9000/yabp72aERhofLA6W2-LrJw
- Connecting to the service
- https//128.142.160.937443/glite_wms_wmproxy_serv
er
- JOB GET OUTPUT OUTCOME
- Output sandbox files for the job
- https//rb102.cern.ch9000/yabp72aERhofLA6W2-LrJw
- have been successfully retrieved and stored in
the directory - /tmp/doe_yabp72aERhofLA6W2-LrJw
- The default location for storing the outputs
(normally /tmp) is defined in the UI
configuration, but it is possible to specify in
which directory to save the output using the
--dir ltpath namegt option.
38Cancelling a job
- glite-wms-job-cancel https//rb102.cern.ch9000/
P1c60RFsrIZ9mnBALa7yZA - Are you sure you want to remove specified job(s)
y/ny y - Connecting to the service
- https//128.142.160.937443/glite_wms_wmproxy_serv
er - glite-wms-job-cancel Success
- The cancellation request has been successfully
submitted for the following job(s) - - https//rb102.cern.ch9000/P1c60RFsrIZ9mnBALa7yZ
A
- If the cancellation is successful, the job will
terminate in status CANCELLED
39Real Time Output Retrieval /1
- The user can enable the job perusal by setting
the attribute PerusalFileEnable to true in the
job JDL. - This makes the WN to upload, at regular time
intervals (defined by the PerusalTimeInterval
attribute and expressed in seconds), a copy of
the output files specified using the
glite-wms-job-perusal command to the WMS machine
(by default), or to a GridFTP server specified by
the attribute PerusalFilesDestURI
Executable "job.sh" StdOutput
"stdout.log" StdError "stderr.log" InputSandbo
x "job.sh" OutputSandbox
"stdout.log","stderr.log","testfile.txt" Perusa
lFileEnable true PerusalTimeInterval
30 RetryCount 0
40Real Time Output Retrieval /2
- After the job has been submitted with
glite-wms-job-submit, the user can choose which
output files should be inspected - glite-wms-job-perusal --set -f testfile.txt \
- https//wms104.cern.ch9000/B02xR3EQg9ZHHoRc-1nJkQ
- Connecting to the service https//128.142.160.937
443/glite_wms_wmproxy_server - Connecting to the service
- https//128.142.160.937443/glite_wms_wmproxy_serv
er - glite-wms-job-perusal Success
- Files perusal has been successfully enabled for
the job - https//wms104.cern.ch9000/B02xR3EQg9ZHHoRc-1nJkQ
41Real Time Output Retrieval /3
- .. and, when the job starts, the user can see one
output file - glite-wms-job-perusal --get -f testfile.txt \
- https//wms104.cern.ch9000/B02xR3EQg9ZHHoRc-1nJkQ
- Connecting to the service
- https//137.138.45.797443/glite_wms_wmproxy_serve
r - Connecting to the service
- https//137.138.45.797443/glite_wms_wmproxy_serve
r - glite-wms-job-perusal Success
- The retrieved files have been successfully stored
in - /tmp/doe_OoDVmWCAnhx_HiSPvASGsg
42 An introduction to the WMS and JDL The
gLite WMS architecture The Command Line
Interface (CLI) Advanced jobs References
Hands-on
43DAG job
- DAG is a set of jobs where the input, output, or
execution of one or more jobs depends on one or
more other ones - The jobs are nodes (vertices) in the graph
- the edges (arcs) identify the dependencies
- Their management has been improved with
- Shared sandboxes
- Attributes Inheritance
- Attribute references between nodes
- and with the parent
44 Type "dag" InputSandbox
"/tmp/foo/.exe", "/home/larocca/bar",
"gsiftp//neo.datamat.it5678/tmp/cms_sim.exe ",
"file///tmp/myconf" nodes nodeA
description JobType "Normal"
Executable "a.exe" InputSandbox
"/home/larocca/myfile.txt", root.InputSandbox
nodeF description
JobType "Normal" Executable "b.exe"
Arguments "1 2 3" OutputSandbox
"myoutput.txt", "myerror.txt"
nodeD description JobType
"Checkpointable" Executable "b.exe"
Arguments "1 2 3" InputSandbox
"file///home/larocca/data.txt",
root.nodes.nodeF.description.OutputSandbox0
nodeC file
"/home/larocca/nodec.jdl" nodeB
file "foo.jdl" dependencies
nodeA, nodeB , nodeA, nodeC , nodeA,
nodeF , nodeB, nodeC, nodeF , nodeD
45Job Collection
- Job collection is a set of independent jobs that
user can submit and monitor as it was a single
job - Jobs of a collection are submitted as DAG nodes,
without dependencies - The JDL is a list of ClassAds which describe the
subjobs -
- Type "collection
- nodes
- ltjob descr 1 gt,
- ltjob descr 2 gt,
-
-
- ...
-
46- Type "collection"
- InputSandbox "input_common1.txt","input_com
mon2.txt" -
- nodes
-
- JobType "Normal"
- NodeName "node1"
- Executable "/bin/sh"
- Arguments "script_node1.sh"
- InputSandbox "script_node1.sh",
root.InputSandbox0 - StdOutput "myoutput1"
- StdError "myerror1"
- OutputSandbox "myoutput1","myerror1"
- ShallowRetryCount 1
- ,
- JobType "Normal"
- NodeName "node2"
- Executable "/bin/sh"
1st. sub-job
Collection
2nd. sub-job
3rd. sub-job
47Parametric jobs /1
- A parametric job is a job where one or more of
its attributes are parametric - Value of attributes varies according to parameter
- Job monitoring / managing is always done through
an unique jobID, as if the job was single
JobType "Parametric" Executable
/bin/echo" Arguments _PARAM_ StdOutput
"myoutput_PARAM_.txt" StdError
"myerror_PARAM_.txt" Parameters 3
ParameterStep 1 ParameterStart 1
OutputSandbox myoutput_PARAM_.txt
48Parametric jobs /2
Executable /bin/cat" Arguments
inputMOON.txt InputSandbox
"inputMOON.txt" StdOutput
"myoutputMOON.txt" StdError
"myerrorMOON.txt" OutputSandbox
myoutputMOON.txt
Executable /bin/cat" Arguments
inputMARS.txt InputSandbox
"inputMARS.txt" StdOutput
"myoutputMARS.txt" StdError
"myerrorMARS.txt" OutputSandbox
myoutputMARS.txt
Executable /bin/cat" Arguments
inputEARTH.txt InputSandbox
"inputEARTH.txt" StdOutput
"myoutputEARTH.txt" StdError
"myerrorEARTH.txt" OutputSandbox
myoutputEARTH.txt
49 An introduction to the WMS and JDL The
gLite WMS architecture The Command Line
Interface (CLI) Advanced jobs References
Hands-on
50References
- WMProxy Users guide
- https//edms.cern.ch/file/674643/1/EGEE-JRA1-TEC-
674643-WMPROXY-guide-v0-3.pdf - JDL Attributes Specification
- https//edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-
555796-JDL-Attributes-v0-8.pdf - https//edms.cern.ch/file/590869/1/EGEE-JRA1-TEC-
590869-JDL-Attributes-v0-9.pdf - gLite 3.1 users guide
- https//edms.cern.ch/file/722398/1.2/gLite-3-User
Guide.pdf - Complex jobs
- https//grid.ct.infn.it/twiki/bin/view/GILDA/WmPr
oxyUse - WMProxy API usage
- https//grid.ct.infn.it/twiki/bin/view/GILDA/A
piJavaWMProxy https//grid.ct.infn.it/twiki/bin/v
iew/GILDA/WMProxyCPPAPI
51Hands-on
https//grid.ct.infn.it/twiki/bin/view/GILDA/Authe
nticationAuthorization https//grid.ct.infn.it/tw
iki/bin/view/GILDA/SimpleJobSubmission https//gr
id.ct.infn.it/twiki/bin/view/GILDA/WmProxyUse
Connect to the gLite User Interface
- ssh taipeiXX_at_glite-tutor.ct.infn.it
- OS passwd GridTAIXX
- PassPhrase TAIPEI
- where XX 01,..,60