Title: myGrid overview
1- myGrid - overview
- myGrid is an extensible open platform for
e-Science data tools interoperability built
using existing Web services and Grid technologies
that supports - In silico experiments based on process flows
- Data provenance and resource change management
based on notification and process flow evolution - Explicit capture of the e-scientists knowledge
- Personalised views over repositories,
personalised process flows, and personal data
sets - The myGrid toolkit can be configured for specific
applications, building on the experience of the
consortium in user requirements capture and
community-based tools
Exploring Williams-Beuren Syndrome Using
myGrid Robert Stevensa, Hannah J. Tipneyb, Chris
Wroea, Tom Oinnc, Martin Sengerc, Phillip Lorda,
Carole Goblea, Andy Brassa, May Tassabehjib a
Department of Computer Science, University of
Manchester, Oxford Road, Manchester, United
Kingdom, M13 9PL b University of Manchester
Academic Unit of Medical Genetics, St Marys
Hospital, Hathersage Road, United Kingdom M13
0JH c European Bioinformatics Institute, Wellcome
Trust Genome Campus, Hinxton, Cambridge, United
Kingdom CB10 1SD
- The Biological Problem
- Williams-Bueren Syndrome (WBS) is a rare,
sporadically occurring disorder characterised by
a unique set of physical and behavioural
features. WBS is caused by a deletion located in
chromosome band 7q11.23, in a region flanked by
highly repetitive regions containing both genes
and pseudogenes. - Most WBS inidividuals have a deletion of about
1.5Mb, encompassing 24 genes (see right), but a
smaller region containing the genes critical to
the WBS phenotype has been identified. This
smaller reigion is known as the WBS Critial
Region (WBSCR). - The WBSCR has not yet been fully mapped,
primarily because of its complex and repetitive
nature. The gaps in the WBSCR may harbour
important genes and associated regulatory
elements. The purpose of this myGrid application
is to help produce the complete, comprehensive
and robust map of the WBS region that is vital if
we are to fully understand the pathology of WBS.
- The e-Science Process
- In silico experiments necessitate the virtual
organisation of people, data, tools and machines.
The scientific process also necessitates an
awareness of the experience base, both of
personal data as well as the wider context of
work. The management of all these data and the
co-ordination of resources to manage such virtual
organisations and the data surrounding them needs
significant computational infrastructure support.
- myGrid, middleware for the Semantic Grid, enables
biologists to perform and manage in silico
experiments, then explore and exploit the results
of their experiments.
- The Bioinformatics Experiment
- We have developed a workflow language, Scufl, a
workflow development environment, Taverna, and a
workflow enactment engine, Freefluo, that allow
biologists and bioinformaticians to represent an
experiment design explicitly without the
complication of writing a complex bespoke
application . - As well as orchestrating the execution of the
workflows service components, the enactor can
also generate provenance information annotations
in RDF under the control of user-defined
annotation templates. - The diagram (right) shows a schematic
representation of the first workflow created to
explore gap regions within the WSBCR. This
workflow takes the last verified piece of
sequence (lt 3000 bp) in the contig flanking a
gapped region and produces a shortlist of
sequences which may extend the contig into the
gap region.
- Result Co-ordination
- Each run of a series of experiments produces a
large number of data files data are produced for
each service and for each input, multiple outputs
can be generated. - To validate results, a biologist needs to be able
to trace back through data from each part of the
analysis. - In addition, a biologist needs to look back
through a history of experiments on a particular
topic look at experiments on a different topic
look at colleagues experiments and also view
experiment data holdings in a variety of views
suited to the current needs. - in myGrid, this personalisation comes from
co-ordinating these complex, inter-related data
holdings acording to the myGrid information model
through decoration with RDF and LSID references. - The data graph produed by Freefluo (below) is the
counterpart to the workflow graph (left), where
the data are the nodes of the graph and the arcs
the processes that produced those data. - myGrid uses Haystack to enable the biologist to
view this graph of results and follow the RDF
links between results.
- Outcomes
- Performing such results manually through Web
based resources can take at least two days of
tedious, error prone cutting and pasting between
a host of Web pages. - These myGrid workflows take about one hour to run
and produce a collection of co-ordianted results
that facilitate analysis and management of an
experiment's data holdings. - This increase in efficiency in performing an
analysis is coupled with the ability to easily
replicate the expeirmental protocol and gives a
systematic managment of results. - The generic results co-ordination system enables
a biologist to create an experience base of
experimental techniques, data holdings and other
organisational information that facilitates
personalisation of e-Science. - The abstract, declarative nature of the workflows
means that creation of an analysis provides an
alternative to the writing of bespoke software. - Using myGrid in this way has extended the genetic
map into the WBS Critical Region.