Title: WP3 Job management and control
1WP3 Job management and control
- Viet Tran
- Institute of Informatics
- Slovakia
2Work in WP3
- Partners II-SAS, GC RAS, SCAI, CNRS, CGG
- Task 3.1 Detailed analysis of existing tools and
technologies regarding their usability in ES
applications - Task 3.2 Identification of missing technologies
and tools - Task 3.3 Specification of improvements of
existing tools and technologies - Task 3.4 Preparation of test suite
3WP3 Structures
T3.1 Existing technologies
WP1 Application requirements
T3.2 Missing technologies
T3.4 Test suites
T3.3 Improvementsof technologies
4Current situation
T3.1 Existing technologies
WP1 Application requirements
Timeline
T3.2 Missing technologies
T3.4 Test suites
T3.3 Improvementsof technologies
5Task 3.1 Existing technologies
- Overview of existing Grid technologies
- Analyzed middleware
- Globus Toolkit 4 (GC RAS)
- Unicore (SCAI)
- LCG, gLite (II-SAS)
- Analyzed tools
- Workflow management tools (II-SAS)
- Distributed management tools (CNRS)
- Monitoring tools (CNRS)
6Task 3.1 Existing technologies
- Results were reported in D3.1 (PM9)
- Some tools and middleware are still missing in
the report (especially after Workshop in
Bratislava) - ARC, ASKALON, JMS, .
- Need to update in the final Deliverable 3.3
Updated state of the art and gap analysis on
application management and control technologies
(PM 21)
7Task 3.2 Missing technologies
- Ongoing work (PM4-PM18)
- Missing technologies
- Job execution (II-SAS) Near-realtime,
reliability - Workflow (II-SAS) automatic workflow
composition, dynamic workflow - Monitoring (II-SAS) expected job start/end time,
notification, progress - MPI support (SCAI) different versions of MPI
- Licensing management and scheduling (SCAI)
- Co-scheduling of data (SCAI)
- First draft of report is on portal
8Task 3.3 Specification of improvements
- Just starting now
- Based on the work on T3.1 and T3.2
9Task 3.4 Test suites
- Ongoing task
- Two test suites have been proposed
- Seismology CMT (CNRS) simple application family,
test cases are focused on distributed job
management - Flood forecasting FFCS (II-SAS) complex workflow
application, test cases are focused on workflow - Reports are uploaded on portal
10Plan in WP3
- Near future
- Finishing M3.2 Missing technologies, draft is
available, must be done shortly after this
meeting (PM12) - Test suites Draft of two test suites are done,
need to refine and improve (PM15) - M3.3 Specification of improvements to existing
tools and technologies (PM15) - Other work
- Continue on gap analysis and improvement
specification - Continue on test suites
11Missing technologies
12Near realtime job execution
- Required by many ES applications in operation
mode (meteorology NWP, flood FFSC) - Only partially supported on current
infrastructures - EGEE Short deadline jobs
- Int.eu.grid interactive application with
CrossBroker
13Reliability
- Means that all jobs must finished before some
deadlines (QoS, SLA, fault tolerance, scheduling,
load balancing, ) - Required by ES application with risk management
(e.g. CMT, flood) - Distributed job management tools (e.g DIANE) can
partially provide fault tolerance and load
balancing, and improve job start time
14Workflow management
- Most of workflow managers required pre-defined
workflow before execution - ES applications with decision support require
more flexibilities - Automatic workflow composition (e.g. according to
input/output data or semantics) - Ability to change workflow during execution
- Co-scheduling of data
- Partially covered by work done in K-Wf Grid
15MPI support
- Unified and transparent way to submit MPI jobs
- On EGEE infrastructure, it is not sufficient
- Ongoing work in Int.eu.grid
16License management
- Many ES applications have restricted license
- Scheduler need to distribute jobs according to
the availability of license - Ongoing work in BEinGrid
17Monitoring
- Monitoring tools should provide more information,
e.g. - Expected job start/end time
- Notification
- Progress of running jobs
18Test suites of Flood application
19Application Architecture
20Application Operation
- User engages an automated workflow management
system and enters the description of the desired
result - The system consults the existing ontology, and
creates a workflow of service calls, which will
process existing input data into the desired
result - The system shows the workflow to the user, and
asks him/her to enter input parameters for the
service calls engaged in the workflow - The system then starts the workflow of service
calls, which will result into production of the
output files and visualized data, which they can
be displayed to the user
21Testing Scenarios
- Testing of computational resources
- Only basic application execution in the grid
- Testing of grid service container and grid data
services - Testing of the service abstraction layer SOA
interfaces - Testing of a workflow engine
- The services are enacted using a workflow
automation tool - Testing of semantic workflow construction
- The automated workflow is constructed
automatically, using the semantic description of
its (potential) components - Testing of user interface
- Tests of interoperability of the target user
interface
22Step 1 Installation of Binary Modules
- Meteorology
- MM5 Preprocessor, MM5, MM5 Watershed Integration,
MM5 81-Way Watershed Integration, MM5
Visualization - Hydrology
- HSPF, HSPF-Complex (for 81-Way simulation)
- Hydraulics
- DaveF, DaveF 2D Visualization
- Visualization and presentation (user interface)
- MM5 Visualization User Job Packager, Hydrograph
User Job Packager, Waterflow 2D Visualization
User Job Packager
23Step 2 Deployment of Service Interfaces
- Based on Globus Toolkit 4
- Reqired are the WS core container, GridFTP
server, RLS service, WS-MDS service - Available installation packages
- Binary GAR files deployable directly into GT4
- Source files may be built (ant build file),
modified, reviewed - Setup of services necessary
- Configuration parameters, final tests
24CMT Test suite