Title: gLite job submission
1gLite job submission
- Fokke Dijkstra
- Donald Smits Centre for Information Technology,
University of Groningen - Utrecht, Grid Tutorial 2008
2Introduction
?
3Components in the EGEE Grid
4Workload Management System
- Tasks of the WMS
- Find the best resource for your tasks (jobs)
- Submit jobs to compute resources
- Logging and book keeping
- Delegated Grid credential management
5Job preparation
- You need to provide
- A complete (enough) job description
- What program?
- What data?
- Any requirements on OS, installed software, ??
- Possibly a program
- Youre submitting in unknown territory!
- Program portably!
- Dont rely on hard-coded paths or special
locations - The program you send may not even be in HOME!
- Perhaps some input data
- Perhaps instructions on what to do with the
output
6How to Write a Job Description
- Here is a minimal job description (call it
hello.jdl) - We specified
- The program to run and its arguments
- Directed the standard error and output streams to
files - Told it what to do with the output
Executable /bin/echoArguments
GoedemiddagStdError stderr.logStdOutput
stdout.logOutputSandbox stderr.log,
stdout.log
7Job Submission Example
- User issues a voms-proxy-init
- enters his certificates password
- Receives a valid Globus proxy
- User issues a glite-wms-job-submit -a
mytest.jdl - and gets back from the system a unique Job
Identifier (JobId) - User issues a glite-wms-job-status JobId
- to get logging information about the current
status of his Job - When the Done status is reached, the user can
issue a glite-wms-job-output JobId - and the system returns the name of the temporary
directory where the job output can be found on
the UI machine.
8Submitting it
- voms-proxy-init --voms tutor
- Cannot find file or dir /admins/fokke/.glite/voms
es - Enter GRID pass phrase
- Your identity /Odutchgrid/Ousers/Orug/OUrc/CN
Fokke Dijkstra - Creating temporary proxy .........................
.................. Done - Contacting voms.grid.sara.nl30007
/Odutchgrid/Ohosts/OUsara.nl/CNvoms.grid.sara
.nl "tutor" Done - Creating proxy ...................................
............. Done - Your proxy is valid until Wed Nov 5 231127
2008 - glite-wms-job-submit -a hello.jdl
- Connecting to the service https//wms.grid.sara.nl
7443/glite_wms_wmproxy_server - glite-wms-job-submit
Success - The job has been successfully submitted to the
WMProxy - Your job identifier is
JobId
9A Job Submission Example
LCG File Catalog (LFC)
Information System (IS)
Job Status
submitted
User Interface (UI)
Workload Management System (WMS)
Storage Element (SE)
Computing Element (CE)
10Checking the status
- glite-wms-job-status https//wms.grid.sara.nl90
00/V7pw7lTR4MeFMVAz12larQ
- BOOKKEEPING INFORMATION
- Status info for the Job https//wms.grid.sara.nl
9000/V7pw7lTR4MeFMVAz12larQ - Current Status Scheduled
- Status Reason Job successfully submitted to
Globus - Destination ce.grid.rug.nl2119/jobmanager
-pbs-long - Submitted Wed Nov 5 111215 2008 CET
11Check status using browser
12A Job Submission Example
LCG File Catalog (LFC)
Information System (IS)
Job Status
submitted
User Interface (UI)
Workload Management System (WMS)
Storage Element (SE)
Computing Element (CE)
13Getting the Output
- glite-wms-job-output https//wms.grid.sara.nl9
000/V7pw7lTR4MeFMVAz12larQ - Connecting to the service https//wms.grid.sara.nl
7443/glite_wms_wmproxy_server
- JOB GET OUTPUT OUTCOME
- Output sandbox files for the job
- https//wms.grid.sara.nl9000/V7pw7lTR4MeFMVAz12la
rQ - have been successfully retrieved and stored in
the directory - /tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ
- cat /tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ
- Goedemiddag
14A Job Submission Example
LCG File Catalog (LFC)
Information System (IS)
Job Status
submitted
waiting
User Interface (UI)
ready
Workload Management System (WMS)
scheduled
Storage Element (SE)
running
done
Computing Element (CE)
15Job Description Language
- Job Description Language based on Classified
Advertisement language - Lines
- Attribute expression
- Can be multiple lines, semicolon is separator
- for strings
- and // for comments
- No blanks after !!
16Types of Attributes
- The supported attributes are grouped in two
categories - Job
- Define the job itself
- Resources
- Taken into account by the WMS for carrying out
the matchmaking algorithm - Computing Resource (Attributes)
- Used to build expressions of Requirements and/or
Rank attributes by the user - Have to be prefixed with other.
- Data and Storage resources (Attributes)
- Input data to process, SE where to store output
data, protocols spoken by application when
accessing SEs
17Job Definition Attributes
- Executable (mandatory)
- The command name
- Arguments (optional)
- Job command line arguments
- StdInput, StdOutput, StdErr (optional)
- Standard input/output/error of the job
- Environment (optional)
- List of environment settings
- InputSandbox (optional)
- List of files on the UI local disk needed by the
job for running - The listed files are staged from the UI to the
remote CE - Wildcards allowed
- Unique filenames required
- OutputSandbox (optional)
- List of files, generated by the job, which have
to be retrieved
18Resource Attributes
- Requirements
- Job requirements on computing resources
- Specified using attributes of resources published
in the Information System - other.GlueCEStateStatus "Production" always
included (the resource has to be in the
Production grid) - Useful requirements
- Wallclock time and specific sites
- Requirements other.GlueCEPolicyMaxWallClockTime
gt 720 RegExp(nikhef.nl", other.GlueCEUniqueID)
- Specific tag published
- Requirements Member("VO-ncf-gromacs-3.3.2",other
.GlueHostApplicationSoftwareRunTimeEnvironment) - Logical expressions
- and
- or
- ! not
19Data Attributes
- InputData (optional)
- Refers to data used as input by the job these
data are published in the Replica Catalog and
stored in the SEs) - GUIDs and/or LFNs
- Job must be sent to CE that has the data nearby
- DataAccessProtocol (mandatory if InputData
specified) - The protocol or the list of protocols which the
application is able to speak with for accessing
InputData on a given SE
20WMS match making and ranking
- The WMS has to find the best suitable CE where
the job will be executed - It interacts with Data Management service and
Information System - The CE chosen has to match the job requirements
- If 2 or more CEs satisfy all the requirements,
the one with the best Rank is chosen - Specified using attributes of resources published
in the Information Service - If not specified, default value is used
- Rank -other.GlueCEStateEstimatedResponseTime
- quickest response time
21Example JDL File
- Executable gridTest
- StdError stderr.log
- StdOutput stdout.log
- InputSandbox /home/joda/test/gridTest
- OutputSandbox stderr.log, stdout.log
- InputData lfn/grid/tutor/testbed0-00019
- DataAccessProtocol gridftp
- Requirements other.ArchitectureINTEL \
other.OpSysCentOS other.FreeCpus
gt4 - Rank other.GlueHostBenchmarkSF00
22Job Submission
- glite-wms-job-submit -a -d ltdelegationidgt -o
ltoutput filegt ltjob.jdlgt - -o the generated jobId is written in the ltoutput
filegt - Useful for other commands, e.g.glite-wms-job-sta
tus i ltinput filegt (or jobId) - -i the status information about edg_jobId
contained in the ltinput filegt are displayed - -a use automatic delegation
- -d use an existing delegated proxy at the WMS
- e.g. one generated using
- glite-wms-job-delegate-proxy d ltdelegationidgt
-
23Other WMS UI Commands
- glite-wms-job-list-match
- Lists resources matching a job description
- Performs the matchmaking without submitting the
job - glite-wms-job-cancel
- Cancels a given job
- glite-wms-job-status
- Displays the status of the job
- glite-wms-job-output
- Returns the job-output (the OutputSandbox files)
to the user - glite-wms-job-logging-info
- Displays logging information about submitted jobs
(all the events pushed by the various
components of the WMS) - Very useful for debug purposes
24Proxy Renewal
- Why?
- To avoid job failure because it outlived the
validity of the initial proxy - To prevent long term proxies from lying around
- Use a safe system for storing long term proxies
- WMS support automatic proxy renewal mechanism as
long as the user credentials are handled by a
proxy server. - Create a proxy using
- voms-proxy-init --voms ltvonamegt
- Register this proxy with the MyProxy server using
- myproxy-init -s ltservergt -t ltcredgt -c ltproxygt
-d -n - server is the server address (e.g.
px.matrix.sara.nl) - cred is the number of hours the proxy should be
valid on the server - proxy is the number of hours renewed proxies
should be valid - The Proxy is automatic renewed by WMS without
user intervention for all the job life
25Advanced Job types Job Collection
- Set of independent jobs
- Collect jobs in single directory
glite-wms-job-submit -a --collection ltdirectorygt - Advanced collection using global set of
attributes
Type "Collection" InputSandbox
"myjob.exe", "fileA" OutputSandboxBaseDestURI
"gsiftp//lxb0707.cern.ch/data/doe"
DefaultNodeShallowRetryCount 5 Nodes
Executable "myjob.exe"
InputSandbox root.InputSandbox,
"fileB"
OutputSandbox "myoutput1.txt"
Requirements other.GlueCEPolicyMaxWallClockTime
gt 1440 ,
NodeName "mysubjob" Executable
"myjob.exe" OutputSandbox
"myoutput2.txt" ShallowRetryCount
3 , File
"/home/doe/test.jdl"
26Advanced Job types Parametric
- Identical jobs, except the value of a parameter
- Parameters
- List of items
- Number
- ParameterStart and ParameterStep necessary
- _PARAM_ replaced by parameter value
JobType "Parametric" Executable
"myjob.exe" StdInput "input_PARAM_.txt"
StdOutput "output_PARAM_.txt" StdError
"error_PARAM_.txt" Parameters 100
ParameterStart 1 ParameterStep 1
InputSandbox "myjob.exe", "input_PARAM_.txt
OutputSandbox "output_PARAM_.txt",
"error_PARAM_.txt"
27Advanced job types MPI
- For programs using the MPI parallel library
- JobTypeMPICH
- NodeNumber ltngt
- Request n cores on the remote cluster
- Scheduling is determined at remote site
- Submit script that starts up your program using
MPI - You can use mpi-start for this
- Scheduling SMP nodes not yet possible
28Other Advanced Job types
- Direct Acyclic Graph
- Graph shows dependencies between jobs
- Interactive
- Opens graphical windowthat connects to job
29Pilot jobs
- Send agent to a site first
- Will fetch workload from central service
- Both single and multi user frameworks exist
- Advantages
- Hides problematic sites from the user
- Probably less overhead per workload
- Makes central scheduling possible
- Disadvantage
- Less efficient scheduling at site
- Security concerns with multiple users
30How to get your program on the Grid?
- Send it with job
- Binary package
- Compile it on the fly
- Use software manager accounts
- Write permission on special shared directory
- Publish tags in information system
- Use preinstalled packages
- Only possible for special collaborations
- Example VL-e software stack
- Advantages
- You will be able to run anywhere
- Disadvantages
- Portability extremely important, otherwise jobs
will fail - Overhead
- Advantages
- Software can be validated
- Central management takes burden from users
- Disadvantages
- Sites have to support this
- Lot of work for software manager
- Advantages
- Very easy for users
- Disadvantages
- Sites have to support it
- Lot of work for software packager
31Pointers to advanced topics
- Can be found in gLite user guide
http//glite.web.cern.ch/glite/documentation/ - Advanced sandbox play
- Gridftp instead of local files
- No space on WMS needed
- Brokerinfo
- Information about local environment (CE, SEs,
etc.) - Job perusal
- Peek at the output while your job is running
- Automatic retries
- RetryCount
- ShallowRetryCount