Title: Taverna
1Taverna
- Adding science to eScience
- Tom Oinn, tmo_at_ebi.ac.uk
- 6th March 2004
2What is Taverna?
- A collection of Java APIs, XML and RDF Schema,
Languages and Java Applications. - A part of the EPSRC myGrid project.
- Collectively aimed at facilitating standard
scientific procedures in the eScience domain,
especially in workflow systems. - Reproducibility, Data Provenance and Process
Comprehension and Dissemination
3Organisation
- Open source (LGPL) and hosted on sourceforge.net.
- Just over a year old as a distinct project.
- Growing community of both users and developers.
- Coordinated by an ad hoc combination of email,
face to face meetings, access grid and beer.
4Philosophy
- We are trying to build something that works now.
- Incorporate new technologies only where they are
directly useful. - Assume an open world of services, most of which
we do not control directly. - Drive development primarily from user
requirements and requests. - Release often, try to build a community.
5Availability
- Website at http//taverna.sf.net
- Developer access by SSHCVS
- Anonymous CVS
- Regular binary and source releases particularly
for MS Windows allowing a download and run
distribution - Taverna at beta8, Ouzo (of which more later) at
beta1
6Taverna API
- Acts as an intermediate layer between user level
applications and workflow enactors such as
FreeFluo. - Includes object models using a standard MVC
design for both workflow definitions and data
objects within a workflow. - Used by the Taverna Workbench, DataThing viewer,
workflow portal etc...
7XScufl Workflow Language
- SCUFL is the Simple Conceptual Unified Flow
Language - myGrid originally based on WSFL
- but no available editors, editing a simple
workflow by hand was tedious and error prone. - SCUFL provides a much higher level view on
workflows, and therefore simpler to write by hand.
8SCUFL features
- Simple relies upon an inherently connected
environment to reduce the quantity of information
explicitly stated in the workflow definition. - No port definitions in XScufl
- Processor metadata intelligently gathered from
underlying sources i.e. WSDL, Soaplab - Allows optional typing information, can specify
as little or as much as is available
9- Conceptual one Processor in a SCUFL workflow
maps as far as is possible to one conceptual
operation as viewed by a non expert user - Wrap up stateful service interactions into custom
Processor implementations - Lowers the barrier preventing experts in other
domains such as bioinformatics entering or using
eScience
10- Unified Flow Language SCUFL does not dictate
how the workflow is to be enacted, it is
inherently declarative in intent. - Can potentially be translated to other workflow
languages. - Can be arbitrarily abstract, any given workflow
engine may require further definition of the
language before it can be enacted.
11Taverna Workbench
- In the first iteration, a demonstrator and test
bed for the various view components of the
Taverna API. - Now in its eighth release it has become a
powerful and at least partially user friendly
tool for building or editing workflows. - In use in the wild, many known users and probably
more ones who havent told us!
12(No Transcript)
13Graves Disease Workflow
14Taverna Features
- Unsurprisingly, TavernaFreeFluo can enact
workflows. Taverna adds further value to the
enactor over and above this basic functionality. - Implicit iteration support
- Result browsing and data encapsulation
- Provenance recording based on semantic web
technologies and LSID - Fault tolerance features
15Implicit Iteration
- A computer scientist would say that putting a
String into a String doesnt work. She would,
of course, be correct. - Non computer scientists may take a different
view, arguing that it makes sense that if
something can process a String then it should
just run multiple times on a String. - Our users are mostly not computer scientists.
- Taverna tries to behave the way the non CS person
expects, hiding the magic as it does.
16Data Encapsulation
- Workflow engines need a limited understanding of
their data in order to allow features such as
implicit iterators. - They do not, however, require any more than this,
and should be otherwise agnostic to the data
flowing through the workflow. - Taverna includes a DataThing class, which can be
tagged with terms from ontologies, free text
descriptions and MIME types, and which may
contain arbitrary collection structures.
17Data Types, Result Browsing
- Using the metadata hints contained within a
DataThing object we can locate and launch
pluggable view components. - Hybrid typing scheme allows for a best effort
approach to data typing. - Required because life science types are
intractable for reasonable effort or
completeness.
18Example Result Browser
19Provenance, RDF, LSID
- Providing computation access to services creates
new challenges, workflow technology amplifies
them further. - Potentially complex result data in terms of
derivation. - Scientists need to be able to show how a given
result in these data is arrived at. - Metadata about the results is as important as the
result values themselves.
20Overall Metadata Infrastructure
Workflow server
Clients
DataThing viewer
Taverna
LSID Launch pad
Haystack
Web browser
Ouzo API (client)
Ouzo API (server)
LSID Authority
mysql
LSID Authority / Data service
21LSID Launchpad (IBM)
Launchpad is an application that sits inside MS
Windows and allows links to LSIDs to be resolved
as if they were local or normal web page type
addresses. This mechanism could be used to allow
Taverna to email the user once their workflow
completes, the email containing such links which
would then allow the user to browse the data and
associated metadata from their desktop.
22LSID and RDF
- LSID provides a uniform naming scheme.
- This naming system allows us to make unambiguous
statements that may then be reasoned over
programmatically. - RDF allows us to extend base relations i.e. is
derived from, created by with domain specific
ones i.e. is predicted structure of. - These additional metadata are expressed as
templates attached to processors in the workflow,
could come from a variety of sources.
23Fault Tolerance
- In an open service world, we have no control over
the majority of analysis services. - Such services may fail, become inaccessible or
their APIs change with no notice. - Taverna allows configurable failure handling
including dynamically rescheduling processors
with alternate implementations.
24(No Transcript)
25Process Provenance View
26Fault Tolerance Editing
Retry, delay and backoff configuration
Alternate Processor
27Summary Taverna and eScience
- Standard workflow language allows peer review and
publication of eScience methods. - LSID allows universal access to results for
collaboration, as well as for review. - RDFLSID explains the context of these results
and provides guidance for further investigations.