A myGrid Project Tutorial (3) - PowerPoint PPT Presentation

About This Presentation
Title:

A myGrid Project Tutorial (3)

Description:

A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and the rest of ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 25
Provided by: ChrisW175
Category:

less

Transcript and Presenter's Notes

Title: A myGrid Project Tutorial (3)


1
A myGrid Project Tutorial (3)
  • Dr Mark Greenwood
  • University of Manchester

With considerable help from Justin Ferris, Peter
Li, Phil Lord, Chris Wroe and the rest of the
myGrid team.
2
Roadmap
services
Registry
1. Describe services
workflows
2. Discover services
workflows
3. Write run workflows
Taverna workbench
LSID authorities
data
4. Provenance datamanagement
3
In a nutshell
Pre-Prototype
Experimental Web-based Requirements gathering
Prototype 1
Demo at ISMB 2003
Architectural workout All services
represented NetBeans workbench API-based
integration Info Repository oriented XML-based
process provenance Workflow enactment engine
Full paper and demo at ISMB 2004 GSK
deployment Real biology
4
Two Paths
  • Innovative work
  • Service and workflow registration
  • Semantic discovery
  • Provenance management
  • Text mining
  • Core functionality
  • Services Soaplab and Gowlab
  • Workflow enactment engine Freefluo
  • Workflow workbench Taverna
  • Data integration OGSADQP
  • Information model management
  • In between
  • Event notification
  • Gateway

5
FreeFluo Features
  • Control flow, iteration and data flow
  • Data sets and nested flows
  • Configurable failure handling
  • Incorporated Life Science Id resolution
  • Provenance and status reporting
  • Type and data management
  • Plug-ins
  • User notification
  • Data entry wizard
  • Libraries of SHIM services
  • Libraries of workflows

6
Fault Tolerance
Retry, delay and backoff configuration
Alternate Processor
7
scheduled and waiting for data
Fault management
data ready
aborted
no
yes
types match
can iterate
no
no
yes
data mismatch
yes
invoking
constructing iterator
creating alternate processor
instantiation error
aborted
done iterating
done
waiting to retry
success
error
timeout
iterations remain
complete
aborted
invoking with implicit iteration
retries left
yes
adding item to result data set
no
waiting to retry
success
error
timeout
alternate available
alternates exist
no
retries left
yes
service failure
no
allow partials
no
yes
8
Domain Services
  • Native WSDL Web services
  • DDBJ, NCBI BLAST, PathPort, BioMOBY, JEMBOSS
  • Wrap legacy services as web services
  • SoapLab Command lines as Web Services
  • GowLab Web pages as Web Services
  • Leveraged the EMBOSS Suite
  • 159 distinct services
  • Lots of redundant services
  • The joys of firewalls and licensing

9
Domain Services
  • Native WSDL Web services
  • DDBJ, NCBI BLAST, PathPort, BioMOBY
  • Wrapped legacy services
  • SoapLab
  • GowLab
  • Web pages as web services
  • One button wrapping
  • Leveraged the EMBOSS Suite
  • 159 services
  • Lots of them and lots of redundant services
  • The joys of firewalls and licensing

For each application CreateJob Run WaitFor GetRes
ults Destroy
EBI Support agreed to support Soaplab services as
core business
http//industry.ebi.ac.uk/soaplab/
10
Workflow environment
  • Freefluo workflow enactment engine
  • http//freefluo.sourceforge.net
  • Taverna development and execution environment
  • http//taverna.sourceforge.net
  • Joint work with HGMP
  • Simple Conceptual Unified Flow Language (Scufl).
  • Rapid development and release cycle on source
    Forge (LGPL)
  • tethered programme own open source development
    community

11
Service and Workflow registration
Workflow registration allows peer review and
publication of e-Science methods.
  • Description scheme
  • RDFS / DAMLOIL / OWL ontologies
  • Based on DAML-S
  • Reasoning over OWL descriptions
  • Querying over RDF
  • Workflow assembly
  • Semantic service typing of inputs and outputs

12
View Service Architecture
Discovery by describing services required
Semantic Find Component
Workflow Registry
Discovery Client
Taverna Workbench
Extract service descriptions to reason over
Service Registry
Personalised View Component
Service Registry
Personalised discovery using UDDI clients and
publishing of personal metadata
Pull service adverts from global registries
13
myGrid Service Stack
Work bench
Taverna
Talisman
Web Portal
Applications
Gateway
Personalisation
Service and Workflow Discovery
Registries
Provenance
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
Views
Core services
myGrid Information Repository
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
14
OGSA-DQP
  • Used in Graves Disease
  • Uses OGSA-DAI data access services to access
    individual data resources.
  • A single query to access and join data from more
    than one OGSA-DAI wrapped data resource.
  • Supports orchestration of computational as well
    as data access services.
  • Interactive interface for integrating resources
    and executing requests.
  • Implicit, pipelined and partitioned parallelism.

http//www.ogsa-dai.org.uk/dqp
15
Event notification
  • Used by commercial company in India.
  • Push and pull
  • Publisher-subscriber
  • Asynchronous
  • Durable topics
  • Dynamic Hierarchical namespace for topics
  • http//cvs.mygrid.org.uk/notification-stable/downl
    oads

16
Text Services Architecture
XScufl workflow definition parameters
User Client
Clustered PubMed Ids titles
Term-annotated Medline abstracts
Medline Server (Sheffield)
Medline Abstracts
PubMed Ids
Medline pre-processed offline to extract
biomedical terms indexed
PubMed Ids
17
Text Services Interface
  • User can
  • Invoke Graves or Williams workflow
  • Issue ad hoc query against Medline
  • Workflow output is a set of Medline abstracts
    listed by title
  • Title expands to full abstract
  • Abstracts clustered by MeSH category
  • User may navigate by MeSH tree (further
    clustering approaches to follow)
  • Can filter abstracts by selected terms

18
Experiment life cycle
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing and monitoring experiments
Managing lifecycle, provenance and results of
experiments
Sharing services experiments
19
Personalisation
  • Dynamic creation of personal data sets.
  • Personal views over repositories.
  • Personalisation of workflows.
  • Personal notification
  • Annotation of datasets and workflows.
  • Personalisation of service descriptions what I
    think the service does.

20
Personalised Discovery
21
Roadmap
services
Registry
1. Describe services
workflows
2. Discover services
workflows
3. Write run workflows
Taverna workbench
LSID authorities
data
4. Provenance datamanagement
22
Project Follow ons
OGSA-DAIT
ISPIDER
SIMDAT
DQP
FreeFluo
e-Fungi
CLEF
Link-Up
Army of PhD students
Provenance
Semantic Discovery
Provenance
PASOA
DynamO
OntoGrid
23
To Dos
  • Improve results management
  • Deployment of mIR
  • Portal for finding workflows, launching
    monitoring workflows, launching taverna, browsing
    results
  • Deploying publicly accessible semantic registry
  • Reinstate service discovery during enactment
  • Large scale data throughput workflow engine
  • Event notification on services
  • Using provenance graphs for impact analysis
  • Hiding LSIDs
  • Lexicons for concept names
  • Hardening semantic discovery
  • Ambient Text
  • Er..Security
  • Etc
  • myGrid in a box

24
Ongoing/Future Activities
  • Networking
  • LinK-up with BIRN/SEEK/GEON (SDSC) SCEC/GriPhyN
    (ISI,USC)
  • Technical follow-ons
  • Best practice (6) and OMII (Freefluo,Taverna,
    Event notification) bids
  • Research follow-ons
  • Semantic Grids, Data Grids, Workflow, Provenance
    services
  • PhD students
  • Science follow-ons
  • Life Sciences ISPIDER, e-Fungi
  • Clinical PsyGrid, CLEF-II
  • PhD students
  • myGrid-in-a-box

25
Wrap Up
  • Managed the transition from generic middleware
    development to practical day to day useful
    services
  • Real users (plural) fundamental to that
  • End to end support for an entire scenario
  • A broad view of the e-Science process
  • Show stoppers for practical adoption are not sexy
    technical showstoppers
  • Can I incorporate my favourite service?
  • Can I manage the results?
  • Tapping into (defacto) standards and communities
    to leverage others results and tools LSID,
    Haystack, Pedro
  • http//www.mygrid.org.uk

26
Acknowledgements
myGrid is an EPSRC funded UK eScience Program
Pilot Project
Particular thanks to the other members of the
Taverna project, http//taverna.sf.net
27
myGrid People
  • Core
  • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
    Cawley, Neil Davis, Alvaro Fernandes, Justin
    Ferris, Robert Gaizaukaus, Kevin Glover, Carole
    Goble, Chris Greenhalgh, Mark Greenwood, Yikun
    Guo, Ananth Krishna, Peter Li, Phillip Lord,
    Darren Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Tom Oinn, Juri Papay, Savas
    Parastatidis, Norman Paton, Terry Payne, Matthew
    Pockock Milena Radenkovic, Stefan
    Rennick-Egglestone, Peter Rice, Martin Senger,
    Nick Sharman, Robert Stevens, Victor Tan, Anil
    Wipat, Paul Watson and Chris Wroe.
  • Users
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle, UK
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Postgraduates
  • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
    Alper, John Dickman, Keith Flanagan, Antoon
    Goderis, Tracy Craddock, Alastair Hampshire
  • Industrial
  • Dennis Quan, Sean Martin, Michael Niemi, Syd
    Chapman (IBM)
  • Robin McEntire (GSK)
  • Collaborators
  • Keith Decker

28
Questions?
http//www.mygrid.org.uk
http//taverna.sf.net
http//freefluo.sf.net/
29
Spares
30
Williams-Beuren Syndrome Microdeletion
C-cen
A-cen
B-cen
C-mid
B-mid
A-mid
B-tel
A-tel
C-tel
WBSCR1/E1f4H
WBSCR5/LAB
GTF2IRD1
WBSCR21
WBSCR18
WBSCR22
WBSCR14
POM121
GTF2IRD2
BCL7B
BAZ1B
NOLR1
GTF2I
FKBP6
CYLN2
CLDN4
CLDN3
STX1A
LIMK1
NCF1
RFC2
TBL2
FZD9
ELN
1.5 Mb
7q11.23
Patient deletions


WBS
SVAS
Chr 7 155 Mb
Write a Comment
User Comments (0)
About PowerShow.com