Title: The gLite Workload Management System Alessandro Maraschini alessandro'maraschinidatamat'it
1The gLite Workload Management SystemAlessandro
Maraschini alessandro.maraschini_at_datamat.it
OGF06, Manchester, UK 2007, May 9
2Contents
- JRA1 WMS
- JRA1 overview, scope
- WMS, institutes, task, components
- JDL
- Language overview
- JobTypes single, compounds, workflows
- News
- New Functionalities
- Latest Activities
- Tests
- Middleware Testing Activities Results
- Future Plans conclusions
- Workflow Activities
- Ongoing Implementations Activities
- Future Implementations Activities
- WMS and WfMS
3Introduction gLite WMS
- Workload Management System (WMS)
- Part of Joint and Research Activity 1 (JRA1)
- Institutes involved
- INFN
- Datamat
- CESNET
- Provides Distribution and management of tasks
across resources available on a Grid - Accept a request of execution of a Job from a
client - Find appropriate resources to satisfy the Job
- Follow the Job until completion.
- Different aspects of job management are
accomplished by different WMS components - Implemented as different processes
- Communicating via data structures stored on disk
avoid data losses - Job associated to user credentials- operations
done on behalf of the user
4WMS Architecture JRA1 core components
- WMProxy
- Accept Request from User
- Check Authentication/Authorization
- Set up Local File System
- provide access to the WMS
- Forward request to WM
- Workload Manager (WM)
- Accept and satisfy requests for job management
coming from its clients - Forward request to appropriate Computing Element
(CE) for execution - Logging Bookkeeping (LB)
- Tracks jobs in terms of events gathered from
various gLite components - The server processes the incoming events to give
a higher level view on the job states (e.g.
Submitted, Running, Done)
5WMS Architecture overview
6JDL overview
- Job Description Language (JDL)
- gLite approach to Request Description
- classads-based language
- Fully extensible flexible high-level language
- Allow the user to provide job execution needed
information - Characteristics of the application
(Executable,Arguments,Input/Output Sandbox
files,...) - Requirements/preferences about resources (
Computational, storage) - Customized hints for gLite WMS on how to handle
the application (number of retries, proxy
renewal, ...) - Supported Job Types
- Single Jobs
- Compound Jobs
- Workflows (DAGS)
- Collections, Parametrics
7JDL Single Types
- Single Jobs
- Normal single and simple batch job with no
peculiar requirements - MPICH a parallel application to be run on the
nodes of a cluster using the MPICH implementation
of the message passing interface ( new flavour of
MPI modifiche per estendere supportare IN FUTURE
) - Interactive a job whose standard streams are
forwarded to the submitting client, which can
actually interact and steer the job execution by
providing real-time input information - Previously Supported Jobs
- Deprecation due Lack of feedback
- Not anymore supported
- Checkpointable Jobs
- Partitionable Jobs
8JDL Compound Jobs
- Definition
- Aggregate of Normal Jobs
- Benefits
- One Shot submission for (up to thousands of) jobs
- Submission time reduction
- Single call to WMProxy server
- Single AuthN and AuthZ process
- Sharing of files between jobs
- Single Identification to manage all jobs (father
Job)
9JDL Compound Types
- Compound Jobs Workflows
- Implemented as Directed Acyclic Graphs (DAGS)
- Set of jobs where the input, output or execution
of one of more jobs may depend on one or more
other jobs - Dependencies represent time constraints a child
cannot start before all parents have successfully
completed
10JDL Compound Types
- Compound Jobs Parametrics
- Parameterized description of a Job
- automatically converted on WMS side
- generates a (possibly) huge number of (similar)
jobs
11JDL Compound Types
- Compound Jobs Collections
- A set of possibly eterogeneus jobs that can be
specified within a single JDL description - No dependencies among the specified jobs
- Jobs executed independently among the grid
12New Functionalities WMProxy
- WMProxy server
- Replaced the old C based socket connection
sevice - implements an interoperable interface
- Web Service based
- SOA conformance
- WS-I compliance
- provided new operations
- WMProxy client
- Provided C based WMS command-line User
Interface (UI), which executes all the needed
operation automatically - Provided multi language (C, Java and Python)
provided APIs - Fell free to implement your own client with your
desired language
13New Functionalities ICE- CREAM
- WMS Job submission supported resources
- LCG Computing Element
- gLite condor-based Computing Element
- Moreover, the recent introduction of ICE
- intermediate layer service
- allows the WMS to directly send operations to
CREAM - Computing Resource Execution And Management
Service - asynchronously receive notifications about job
status changes
14New Functionalities Sandbox Files
- Sandbox Archiving and Sharing
- Job sandbox files can be automatically compressed
- Different jobs can share the same sandbox,
- dramatically reduced network traffic
- allowed the user to save time and bandwidth
- Sandbox Remote Specification
- User can store files directly on a remote machine
- No intermediate copies workernode will download
directly - Reduced server load
- Supported File Transfer
- Full support (submssion output file retrieval)
for protocols - gridftp
- https
15New Functionalities Bulk-MM
- Bulk-Matchmaking
- Allow single Matchmaking of similar jobs in one
shot - Jobs equivalence based upon submitting
significant attributes - Target Jobs Bunch of Independent Jobs
- Mainly Collections and Parametrics
- Originally managed with DAGMan
- Saved time resources
- Improved System stability and performances
- non-final status jobs decreased from 5 to 0,3
(see next slides)
16Other New Functionalities
- Service Discovery
- provide additional information by performing
queries to external databases of different kinds
(RGMA, BDII) - Client side
- Queries for available WMProxy Endpoints on the
net - Do not need user commands manual reconfiguration
- Server side
- Queries for available LB servers where to Log Job
information - Job Files Perusal
- Perform a monitoring activity on the actual
output files produced by a job during its
lifecycle - Add important pieces of information not available
by simple status monitoring and that were before
available only at job completion
17New Activities
- New platforms widely deployed on the
infrastructure - In particular Scientific Linux 4 and 64-bit
architectures - Migration to ETICS build system
- More flexible, in particular in
- Addresses multiple platform support
- almost impossible using the old gLite build
system - All WMS components build achieved
- Client side manual installation fully working
- Ongoing activity Integration
- Software not yet fully deployed
- Server side installation not yet available (will
be in short term..)
18Test Result
- Intense testing and constant bug fixing
activities have been performed over the last
months - Improved job submission rate
- Improved service stability
- New Functionalities tested and adopted
- Production quality test Results
- 16K jobs/day over one week of submissions
- No manual intervention on server
- Stable memory usage
- 0.3 of jobs in non-final state
- Aborted jobs mostly due to expired user
credentials
19Test Result
20gLite WMS Ongoing Restructuring
- gLite Restructuring
- All activities stopped for 6 months
- improving usability portability
- Multi platform (Structural changes needed)
- Cleaning up sections that cause build and porting
difficulties - Removing/Reducing Dependencies on external
software - Objectives
- Easier Service maintainance and Usage
- Will increase stability and throughput
- Toward a gLighter User Interface
- Identify and remove all unnecessary dependencies
21gLite WMS Future
- Improving Logging and Error Reporting
- Windows working prototype
- gLite porting on MS
- Improving interoperability
- Supercomputing 06 working prototype
- Basic Execution Service (BES)
- Job Submission Description Language (JSDL)
22WfMS and gLite WMS
- Possible integration with external existing
Workflow managers - Triana, GWES, Taverna, etc
- Still to be discussed and planned for EGEE III
- Moreover, Workflow Mangement System (WfMS)
Architecture Proposal for WMS - Running on top of gLite Middleware
- Grid Middleware Undependent
- Abstract and Generic Representation
- Translation mechanisms from different language
front ends - Will be exposed/discussed at next CoreGrid forum