The gLite Workload Management System Alessandro Maraschini alessandro'maraschinidatamat'it - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

The gLite Workload Management System Alessandro Maraschini alessandro'maraschinidatamat'it

Description:

Enabling Grids for E-sciencE. www.eu-egee.org. The gLite ... News. New Functionalities. Latest Activities. Tests. Middleware Testing Activities & Results ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 23
Provided by: FP45
Category:

less

Transcript and Presenter's Notes

Title: The gLite Workload Management System Alessandro Maraschini alessandro'maraschinidatamat'it


1
The gLite Workload Management SystemAlessandro
Maraschini alessandro.maraschini_at_datamat.it
OGF06, Manchester, UK 2007, May 9
2
Contents
  • JRA1 WMS
  • JRA1 overview, scope
  • WMS, institutes, task, components
  • JDL
  • Language overview
  • JobTypes single, compounds, workflows
  • News
  • New Functionalities
  • Latest Activities
  • Tests
  • Middleware Testing Activities Results
  • Future Plans conclusions
  • Workflow Activities
  • Ongoing Implementations Activities
  • Future Implementations Activities
  • WMS and WfMS

3
Introduction gLite WMS
  • Workload Management System (WMS)
  • Part of Joint and Research Activity 1 (JRA1)
  • Institutes involved
  • INFN
  • Datamat
  • CESNET
  • Provides Distribution and management of tasks
    across resources available on a Grid
  • Accept a request of execution of a Job from a
    client
  • Find appropriate resources to satisfy the Job
  • Follow the Job until completion.
  • Different aspects of job management are
    accomplished by different WMS components
  • Implemented as different processes
  • Communicating via data structures stored on disk
    avoid data losses
  • Job associated to user credentials- operations
    done on behalf of the user

4
WMS Architecture JRA1 core components
  • WMProxy
  • Accept Request from User
  • Check Authentication/Authorization
  • Set up Local File System
  • provide access to the WMS
  • Forward request to WM
  • Workload Manager (WM)
  • Accept and satisfy requests for job management
    coming from its clients
  • Forward request to appropriate Computing Element
    (CE) for execution
  • Logging Bookkeeping (LB)
  • Tracks jobs in terms of events gathered from
    various gLite components
  • The server processes the incoming events to give
    a higher level view on the job states (e.g.
    Submitted, Running, Done)

5
WMS Architecture overview
6
JDL overview
  • Job Description Language (JDL)
  • gLite approach to Request Description
  • classads-based language
  • Fully extensible flexible high-level language
  • Allow the user to provide job execution needed
    information
  • Characteristics of the application
    (Executable,Arguments,Input/Output Sandbox
    files,...)
  • Requirements/preferences about resources (
    Computational, storage)
  • Customized hints for gLite WMS on how to handle
    the application (number of retries, proxy
    renewal, ...)
  • Supported Job Types
  • Single Jobs
  • Compound Jobs
  • Workflows (DAGS)
  • Collections, Parametrics

7
JDL Single Types
  • Single Jobs
  • Normal single and simple batch job with no
    peculiar requirements
  • MPICH a parallel application to be run on the
    nodes of a cluster using the MPICH implementation
    of the message passing interface ( new flavour of
    MPI modifiche per estendere supportare IN FUTURE
    )
  • Interactive a job whose standard streams are
    forwarded to the submitting client, which can
    actually interact and steer the job execution by
    providing real-time input information
  • Previously Supported Jobs
  • Deprecation due Lack of feedback
  • Not anymore supported
  • Checkpointable Jobs
  • Partitionable Jobs

8
JDL Compound Jobs
  • Definition
  • Aggregate of Normal Jobs
  • Benefits
  • One Shot submission for (up to thousands of) jobs
  • Submission time reduction
  • Single call to WMProxy server
  • Single AuthN and AuthZ process
  • Sharing of files between jobs
  • Single Identification to manage all jobs (father
    Job)

9
JDL Compound Types
  • Compound Jobs Workflows
  • Implemented as Directed Acyclic Graphs (DAGS)
  • Set of jobs where the input, output or execution
    of one of more jobs may depend on one or more
    other jobs
  • Dependencies represent time constraints a child
    cannot start before all parents have successfully
    completed

10
JDL Compound Types
  • Compound Jobs Parametrics
  • Parameterized description of a Job
  • automatically converted on WMS side
  • generates a (possibly) huge number of (similar)
    jobs

11
JDL Compound Types
  • Compound Jobs Collections
  • A set of possibly eterogeneus jobs that can be
    specified within a single JDL description
  • No dependencies among the specified jobs
  • Jobs executed independently among the grid

12
New Functionalities WMProxy
  • WMProxy server
  • Replaced the old C based socket connection
    sevice
  • implements an interoperable interface
  • Web Service based
  • SOA conformance
  • WS-I compliance
  • provided new operations
  • WMProxy client
  • Provided C based WMS command-line User
    Interface (UI), which executes all the needed
    operation automatically
  • Provided multi language (C, Java and Python)
    provided APIs
  • Fell free to implement your own client with your
    desired language

13
New Functionalities ICE- CREAM
  • WMS Job submission supported resources
  • LCG Computing Element
  • gLite condor-based Computing Element
  • Moreover, the recent introduction of ICE
  • intermediate layer service
  • allows the WMS to directly send operations to
    CREAM
  • Computing Resource Execution And Management
    Service
  • asynchronously receive notifications about job
    status changes

14
New Functionalities Sandbox Files
  • Sandbox Archiving and Sharing
  • Job sandbox files can be automatically compressed
  • Different jobs can share the same sandbox,
  • dramatically reduced network traffic
  • allowed the user to save time and bandwidth
  • Sandbox Remote Specification
  • User can store files directly on a remote machine
  • No intermediate copies workernode will download
    directly
  • Reduced server load
  • Supported File Transfer
  • Full support (submssion output file retrieval)
    for protocols
  • gridftp
  • https

15
New Functionalities Bulk-MM
  • Bulk-Matchmaking
  • Allow single Matchmaking of similar jobs in one
    shot
  • Jobs equivalence based upon submitting
    significant attributes
  • Target Jobs Bunch of Independent Jobs
  • Mainly Collections and Parametrics
  • Originally managed with DAGMan
  • Saved time resources
  • Improved System stability and performances
  • non-final status jobs decreased from 5 to 0,3
    (see next slides)

16
Other New Functionalities
  • Service Discovery
  • provide additional information by performing
    queries to external databases of different kinds
    (RGMA, BDII)
  • Client side
  • Queries for available WMProxy Endpoints on the
    net
  • Do not need user commands manual reconfiguration
  • Server side
  • Queries for available LB servers where to Log Job
    information
  • Job Files Perusal
  • Perform a monitoring activity on the actual
    output files produced by a job during its
    lifecycle
  • Add important pieces of information not available
    by simple status monitoring and that were before
    available only at job completion

17
New Activities
  • New platforms widely deployed on the
    infrastructure
  • In particular Scientific Linux 4 and 64-bit
    architectures
  • Migration to ETICS build system
  • More flexible, in particular in
  • Addresses multiple platform support
  • almost impossible using the old gLite build
    system
  • All WMS components build achieved
  • Client side manual installation fully working
  • Ongoing activity Integration
  • Software not yet fully deployed
  • Server side installation not yet available (will
    be in short term..)

18
Test Result
  • Intense testing and constant bug fixing
    activities have been performed over the last
    months
  • Improved job submission rate
  • Improved service stability
  • New Functionalities tested and adopted
  • Production quality test Results
  • 16K jobs/day over one week of submissions
  • No manual intervention on server
  • Stable memory usage
  • 0.3 of jobs in non-final state
  • Aborted jobs mostly due to expired user
    credentials

19
Test Result
20
gLite WMS Ongoing Restructuring
  • gLite Restructuring
  • All activities stopped for 6 months
  • improving usability portability
  • Multi platform (Structural changes needed)
  • Cleaning up sections that cause build and porting
    difficulties
  • Removing/Reducing Dependencies on external
    software
  • Objectives
  • Easier Service maintainance and Usage
  • Will increase stability and throughput
  • Toward a gLighter User Interface
  • Identify and remove all unnecessary dependencies

21
gLite WMS Future
  • Improving Logging and Error Reporting
  • Windows working prototype
  • gLite porting on MS
  • Improving interoperability
  • Supercomputing 06 working prototype
  • Basic Execution Service (BES)
  • Job Submission Description Language (JSDL)

22
WfMS and gLite WMS
  • Possible integration with external existing
    Workflow managers
  • Triana, GWES, Taverna, etc
  • Still to be discussed and planned for EGEE III
  • Moreover, Workflow Mangement System (WfMS)
    Architecture Proposal for WMS
  • Running on top of gLite Middleware
  • Grid Middleware Undependent
  • Abstract and Generic Representation
  • Translation mechanisms from different language
    front ends
  • Will be exposed/discussed at next CoreGrid forum
Write a Comment
User Comments (0)
About PowerShow.com