Title: Using Provenance to Improve Workflow Design
1Using Provenance to Improve Workflow Design
- Frederico Tosta
- Leonardo Murta
- Claudia Werner
- Marta Mattoso
- ftoliveira, murta, werner, marta_at_cos.ufrj.br
- COPPE Federal University of Rio de Janeiro -
Brazil
UFRJ
2Summary
- Motivation
- Introduction Background
- Goal
- Approach Implementation
- Conclusion
COPPE/UFRJ
3Motivation
Pieces of workflows that occurred in the past
may occur again in the future.
COPPE/UFRJ
4Motivation
Workflow Services
Workflow Services
Workflow Services
Workflows and WF Services
- The number of services and bioinformatics
operations are growing - Taverna has over 3500 (2007).
- VisTrails has over 1200 Modules (2008).
COPPE/UFRJ
5Motivation
How can we find the pieces or services that are
useful during the design of a new workflow in an
automatic and systematic way?
COPPE/UFRJ
6Software Reuse
- Is the process of creating software systems from
existing software Krueger, 1992.
Reliability
Reduced Cost
Quality
Productivity
Software Reuse
COPPE/UFRJ
7Recommendation Systems
- E-Commerce
- Apply data mining techniques to the problem of
helping user finding the items they would like to
purchase.
E-commerce concepts mapped into scientific
experiment concepts
what is recommended by e-commerce sites
COPPE/UFRJ
8Goal
- Propose a proactive recommendation service that
aims at suggesting frequent combinations of
scientific programs for reuse.
COPPE/UFRJ
9Approach
Workflow specification
Design
Provenance
DB
Design for reuse and recommendation
COPPE/UFRJ
10Approach
Workflow specification
Design
Provenance
Proactive Recommendation
DB
Design with reuse and recommendation
COPPE/UFRJ
11Implementation
- Populating the database
- VisTrails workflows
- Parse provenance xml files to extract the
relations. - MySQL database
- The relations are mapped into a database.
- Each relation contains the modules and how they
are connected.
COPPE/UFRJ
12Implementation
- Recommendation Metric
- From the example, we can infer that port StdOut
of HmmBuild has been connected to port HmmPath of
HmmCalibrate in 40 of previously designed
workflows.
Ports 1 and 2 are the output ports DestinationDir
and StdOut, respectively. Ports 3, 4 and 5 are
the input ports SourceDir, HmmPath and Dir,
respectively
VisTrails workflow design with recommendation
COPPE/UFRJ
13Implementation
VisTrails workflow design with recommendation
COPPE/UFRJ
14Conclusion
- We expect that this approach may help to
propagate the benefits of software reuse to the
context of scientific workflows. - Reduce the time to design workflows.
- Increase the quality of workflows designed.
COPPE/UFRJ
15Conclusion
- Limitations
- The current version of our prototype recommends
only a subsequent component based on previously
used connection. - Future works
- Improve the approach recommending a component
investigating the whole path. - Specify a context to each workflow.
- Apply weight to each relation based on workflow
usage.
COPPE/UFRJ
16Using Provenance to Improve Workflow Design
Frederico Tosta Leonardo Murta Claudia
Werner Marta Mattoso ftoliveira, murta, werner,
marta_at_cos.ufrj.br COPPE Federal University of
Rio de Janeiro - Brazil
UFRJ