Title: Scientific workflow management in the VL-e framework
1Scientific workflow management in the VL-e
framework
Sub-program 2.5 Department of Computer
ScienceUniversiteit van Amsterdam
2Outline
- Background
- Scientific experiments, Workflow and e-Science
framework - Workflow management in the VL-e framework
- The first prototype VLAM-G
- Review related work
- Workflow management in the VL-e PoC
- Application use cases and workflow support
- Future work
- New design of the VL-e framework
- Time line
3Scientific experiments e-Science
- Complex experiments
- have complex processes
- require interdisciplinary expertise
- require large scale resources
Grid high level support
Scientific workflows
4Scientific Workflow Management Systems in an
e-Science environment
Domain specific Applications
- Functionalities
- Automating experiment routines
- Rapid prototyping of experimental computing
systems - Hiding integration details between resources
- Managing experiment lifecycle
- Cross different layers of middleware for
managing - Data
- Computing
- Information
- Knowledge.
In the VL-e project the targeted e-science
framework is
Workflow Management system
Knowledge
Information
e-Science framework
Computing tasks
Data management
Generic Grid middleware
Grid infrastructure
5VL-e workflow wish list
- A list of 36 points was established to
characterise the ideal workflow for the VL-e
- The classified in 4 categories
- Functionality and Capability
- User interface characteristics
- Run time capabilities
- Software engineering aspects
- VL-e SIG Workflow meeting Jan 11th, 2005,
10001130, H220 (NIKHEF building) - Present Belleman, Belloum, Bouwhuis, Breanndán,
Kaletas, Konijnenburg, Marshall, Rauwerda, Sterk,
Sluiter, Terpstra, Vasunin, wibisono, Yakali.
6Prioritize the workflow requirements based on
the VL-e Applications
- A list of 12 points was established to
characterise the practical workflow for VL-e
- The classified in 4 categories
- Application domains Model
- Engineering
- Underlying middleware
- Workflow management system
- Composition/ Engine (runtime issues)/User support
- VL-e sub-program 2.5 in collaboration with SP1.X
developers - SP1.X contributors Belleman, Klous,
Konijnenburg, Marshall, Rauwerda, Sluiter,
Terpstra,
7Workflow management in VL-e
- First prototype
- VLAM-G
- Shortcoming (GUI, control flow, monitoring etc.
software engineering) - Approach
- Collect and analyze application use cases
- Review the state of art of workflow systems
- Propose workflow systems for the PoC environment
- Be active in use case projects
- Learn lessons from use cases
- Propose a new design
Based on the list of 36 items was
established to characterise the ideal workflow
for the VL-e, the VLAM-G scored 13 Yes, 5 but
need to be reimplementation, 09 No, 02 Partially
supported, 6 In progress or Planned
8Application use cases and workflow requirements
- Application use cases
- Different rounds a series of meetings
- Distinguish workflow requirement
- Summary
- From the resource perspective
- To support legacy tools
- To support standard middleware, e.g., web/grid
services - To be able to invoke resources from different
systems - Provides a rich library of workflow components
- From the application process perspective
- To efficiently manage parallel processes/tasks in
an experiment (Job farming) - To efficiently explore large parameter space
(Parameter sweep) - To support knowledge based information processing
(semantic level data integration). - From the perspective of using a SWMS
- To provide a friendly user interface (preferably
a GUI) - To support the development of new workflow
components (using java, scripts, C, providing
sufficient documentation and support) - To be able to execute tasks on distributed
resources (clusters or Grid) - To be stable at runtime
- To be able to interoperate with different
workflow management systems.
9Survey of existing workflow systems
http//staff.science.uva.nl/gvlam//doc/P2/Workflo
wSurvey Participants Belloum, De Boer,
Guevara-Masis, Korkhov, Mirzadeh, Terpstra, van
Hooft, Vasunin, wibisono, Yakali, Zhao.
10Survey results
- Based on the survey and the practical tests on
the nine workflow systems, we learn - All of the systems are still in beta-versions
(even in alpha), and have the tendency to crash
when we do relatively complex tests. - None of the systems have support for
collaboration, data sharing, and information
management. - None of the systems enforce best practice or
provide support for knowledge capture. - Most of systems are not geared to use Grid based
systems, they have been built to work on a single
system with some features to submit jobs on a
remote host (user still exposed to some Grid
related issues like writing RSLs). - We have had some problems when testing some
features described in the documentation.
http//staff.science.uva.nl/gvlam//doc/P2/SWMSRec
ommendationReport.pdf Participants Belloum, De
Boer, Korkhov, Terpstra, van Hooft, Vasunin,
wibisono, Zhao.
11Recommendation for PoC R1(Part of the short term
solution)
Version Licence Dependencies Kepler 1.0.alpha7 Open source, Java 1.4.2 PtolemyII 5.X Taverna 1.2 Free for distribution Triana 3.2 Java 1.4.2 Ant 1.6
Highlighted features Inherits a powerful mature framework from Ptolemy. Provides a rich library of actors, nice GUI, provides Nimrod support Web service Based Meta programming environment, has a big bio-informatics users community Interface to invoke WS use Grid resource s (GAP, GAT) deployment of workflow as a web service, rich library of processing modules, nice GUI
Drawbacks Alpha version, not stable, not enough documentation Limited GUI support, not enough documents Instable when of some features, not enough documents
http//staff.science.uva.nl/gvlam//doc/P2/SWMSRec
ommendationReport.pdf Participants Belloum, De
Boer, Korkhov, Terpstra, van Hooft, Vasunin,
wibisono, Zhao.
12Use cases and small project teams
- Use case project teams
- Participants from SPs from P1, P2, P3 and P4.
- Contributions from workflow team distinguish
reusable components and provide integration
solution. - Apart from it, we are also active in project
management, such as decomposing the
implementation into concrete tasks, and track the
progress. - Inside SP2.5, we divide ourselves
- SP1.2 ? Belloum Korkhov
- SP1.3 ? Belloum De Boer
- SP1.4 ? Zhao Vasunin
- SP1.5 ? Zhao Wibisono
- SP1.6 ? Belloum Paul De Boer
13Collaboration with VL-e Applications
- SP1.2 AID-Food informatics-IvI
- WCFS case searching in Research Management
System (Selected by the VLeIT) (ongoing ) - SP1.3 IvI-AMC
- High-volume data management in the PoC SRB
(Selected by the VLeIT) (ongoing ) - SP1.4 - IBED-IvI
- Run KansK toolbox in Workflow environment (Master
thesis project) (ongoing )
14Collaboration with VL-e Applications
- SP1.5 IBU-UvA
- Histone code - semantic data integration
(Selected by VLeIT) (ongoing ) - Running R scripts on multiple nodes using web
service (Finished) - Running R scripts in workflows (ongoing )
- Ridge-O-grammer (ongoing )
- SP1.6 AMOLF-UvA
- SRB Meta data update from file header (Selected
by VLeIT) (ongoing )
15(No Transcript)
16(No Transcript)
17(No Transcript)
18On going development Activities on the rapid
prototyping environment
- Simple file management tools for SRB, and GridFTP
- R scripts in workflow system
- Parameters sharing of workflow components.
- Service discovery using P2P approach
- Parameter Sweep and Job farming
19Future Directions
- By far the most active and rapidly progressing
WMS is Kepler - Beta-version March 2006.
- Kepler/Ptolomy has two ways of extending the
Systems - Actors
- Directors
20References
- People
- Adam Belloum (SP2.5 leader), Zhiming Zhao, Paul
van Hooft (post doc), Andiano Wibisono, Dmitry
Vasyunin , Vladimir Korkhov , Frank Terpstra
(Ph.D students), Piter de Boer (Programmer) - VL-e Reports
- PoC recommendation report
- Publications
- Z. Zhao A. Belloum H. Yakali P.M.A. Sloot and
L.O. Hertzberger Dynamic Workflow in a Grid
Enabled Problem Solving Environment, in
Proceedings of the 5th International Conference
on Computer and Information Technology (CIT2005),
pp. 339-345 . IEEE Computer Society Press,
Shanghai, China, September 2005. - Z. Zhao A. Belloum A. Wibisono F. Terpstra
P.T. de Boer P.M.A. Sloot and L.O. Hertzberger
Scientific workflow management between
generality and applicability, in Proceedings of
the International Workshop on Grid and
Peer-to-Peer based Workflows in conjunction with
the 5th International Conference on Quality
Software, pp. 357-364. IEEE Computer Society
Press, Melbourne, Australia , September 19th-21st
2005. - Z. Zhao A. Belloum P.M.A. Sloot and L.O.
Hertzberger Agent technology and scientific
workflow management in an e-Science environment,
in Proceedings of the 17th IEEE International
conference on Tools with Artificial Intelligence
(ICTAI05), pp. 19-23. IEEE Computer Society
Press, Hongkong, China, November 14th-16th 2005. - Activity
- Intl workshop on Workflow systems in e-Science,
organized by Zhiming Zhao and Adam Belloum, in
the context of ICCS06, Reading University, May
28, 2006. - Workshop on Workflow systems in e-Science, to be
held during the next e-Science conference in
Amsterdam December 2006.
21SP1.2 WCFS case searching in Research
Management System
AID tools
22SP1.3 High-volume data management in the PoC SRB
23SP1.4 Run KansK toolbox in Workflow environment
24SP1.5 Histone code - semantic data integration
25SP1.5 Running R scripts in workflows
26SP1.5 Ridge-O-grammer
27SP1.6 SRB Meta data update from file header