Title: Part 2: Architecture overview
1Part 2 Architecture overview
- Professor Carole Goble
- University of Manchester
- http//www.mygrid.org.uk
2In a nutshell
- Bioinformatics toolkit
- Open (Web) Services
- myGrid components and external domain services
- Publication, discovery, interoperation,
composition, decommissioning of myGrid services - No control or influence over domain service
providers - Metadata Driven
- LSIDs, Common information model, Ontologies,
Semantic Web technologies - Open extensible architecture
- Assemble your own components
- Designed to work together
- Loosely coupled
Semantic Discovery Feta
Haystack Provenance Browser
Pedro
View UDDI registry
Gateway CHEF Portal
Taverna WfDE
Freefluo WfEE
Event Notification
LSID
Info. Model
mIR
Soaplab Gowlab
3Key Characteristics
- Data Intensive, Up stream analysis
- Pipelines - experiments as workflows (chiefly)
- Adhoc exploratory investigative workflows for
individuals from no particular a priori community - Openness the services are not ours.
- Low activation energy, incremental take-on
- Foundations for sharing knowledge and sharing
experimental objects - Multiple stakeholders
- Collection of components for assembly
4Openness
- Openness
- open source
- open world of services
- open extensible technology
- open to wider eScience context
- open to user feedback
- open to third party metadata
5Platform
- Standards based
- (Web) Service Oriented Architecture
- Publication, discovery, interoperation,
composition, decommissioning of myGrid services - Web services communication fabric
- XML document types
- LSIDs for identifying resources
- Implemented in Java using Axis and Tomcat
- WS-I -gt OGSA / WSRF
- Metadata driven
- RDF-coded metadata
- OWL-coded ontologies
- Common information model
6Stakeholders
- Middleware for
- Tool Developers
- Bioinformaticians
- Service Providers
- Biologists are indirectly supported by the
portals and apps these develop.
myGrid users
IS specialists
biologists
systems administrators
tool builders
infrequent
problem specific
bioinformaticians
service provider
bioinformatics tool builders
annotators
7Collections of Tasks
Building
Domain Tasks
Workflow
Service Providers
Enactment
Bioinformaticians
Storage
Scientists
Description
Service Discovery
Provenance
Data Management
Finding
Querying
Annotation providers
8Experimental entities
9Investigation set of experiments metadata
- Experimental design components
- Experimental instances that are records of
enacted experiments - Experimental glue that groups and links design
and instance components - Life Science IDs, URIs, RDF
10myGrid Service Stack
Taverna Workbench
Haystack
Web Portal
LSID Launch pad
Applications
e-Science Mediator
Provenance Mgt
Event Notification Service
Feta Service WF Discovery
UDDI Registries
Ontology Mgt
Ontologies
Views
Core services
Information Repository
Metadata Store
LSID Authority
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
11Service stack
Taverna workbench
Web Portal
LSID Launch Pad
Haystack
Apps
e-Science process patterns
e-Science Mediator
e-Science event bus
Service workflow discovery
!
Core services
Metadata management
!
Data management
!
Workflow enactment
!
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Websites
Legacy apps
1220,000 feet
Semantic Discovery Registration
Provenance and Data browser Haystack or Portal
Taverna Workbench
View Service
LSID Authority
UDDI
mIR data
Freefluo Workflow Engine
Store Service
mIR metadata
Web services, local tools User interaction etc.
Event Notification Service
13e-Science Mediator
- 1. Application-oriented directly supports the
e-Scientist by - providing pre-configured e-Science processes
templates (i.e. system-level workflows) - helping in capturing and maintaining context
information (via the information model) that is
relevant to the interpretation and sharing of the
results of the e-science experiments. - Facilitating personalisation and collaboration
- 2. Middleware-oriented contributes to the
synergy between myGrid services by - Acting as a sink for e-Science events initiated
by myGrid components - Interpreting the intercepted events and
triggering interactions with other related
components entailed by the semantics of those
events - Compensating for possible impedance mismatches
with other services both in terms of data types
and interaction protocols
14Supporting the e-scientist
Find Workflow Use-case
Find Workflow Process
- Recurring use-cases can be captured
- Then corresponding process templates can be
authored - e-science mediator makes processes available to
the user
Find an interesting workflow for experiment
Create exp. Context for this user
launch semantic Search facility
Examine and modify if necessary
Launch workflow Editor for selected WF
Store to personal repository For later re-use
Enable MIR browser For storage with context
15- E-Science process templates maintained by the
mediator can derive the GUI generation and
interaction with the user
GUI
E-Science Mediator
16Mediating between services
- Example mediation during a workflow execution
2 Establish experiment/user context 4 link
process trace to context 7 get WF results
1 Execution started 3 intermediate process
completed 6 workflow completed
E-Science Mediator
9 notify WF completion to subscribers
5 Store intermediate process trace 8 Store
WF results
Notification Service
MIR
17Simplified Architecture
Client Side
Client-side e-science process logic
E-Science Mediator client-stubs
Context preserved via myGrid Inormation Model
E-Science Mediator Service
Server-side e-science process logic
Service Registry
The Grid
18Event notification Service
- Publish/subscribe model
- Topic based (cf. JMS topics, CORBA channels)
- Hierarchic topics
- Persistent event storage
- Subscription leases
- Federation for scalability reliability
- Event filtering
http//cvs.mygrid.org.uk/notification-stable/downl
oads
19Portal toolkit for bioinformaticians
- Target application
- Williams-Beuren Syndrome
- Fixed set of workflows
- Extra myGrid portlets
- Configurable
- Workflow enactment
- Workflow scheduling
- Completion notification
- Results browsing
- Based on CHEF Jetspeed-1
- Portlets for team collaboration
20Text Services
XScufl workflow definition parameters
User Client
Clustered PubMed Ids titles
Term-annotated Medline abstracts
Medline Server (Sheffield)
Medline Abstracts
PubMed Ids
Medline pre-processed offline to extract
biomedical terms indexed
PubMed Ids
21History
Pre-Prototype
Experimental Web-based Requirements gathering
Prototype 1
Demo at ISMB 2003
Architectural workout All services
represented NetBeans workbench API-based
integration Info Repository oriented XML-based
process provenance Workflow enactment engine
Full paper and demo at ISMB 2004 GSK
deployment Real biology
22Two Paths
- Innovative work
- Service and workflow registration
- Semantic discovery
- Provenance management
- Text mining
- Core functionality
- Services Soaplab and Gowlab
- Workflow enactment engine Freefluo
- Workflow workbench Taverna
- Data integration OGSA-DQP
- Information model management
- Mediator
- In between
- Event notification
23myGrid People
- Core
- Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Ananth Krishna, Peter Li, Phillip Lord,
Darren Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Tom Oinn, Juri Papay, Savas
Parastatidis, Norman Paton, Terry Payne, Matthew
Pokock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Martin Senger,
Nick Sharman, Robert Stevens, Victor Tan, Anil
Wipat, Paul Watson and Chris Wroe. - Users
- Simon Pearce and Claire Jennings, Institute of
Human Genetics School of Clinical Medical
Sciences, University of Newcastle, UK - Hannah Tipney, May Tassabehji, Andy Brass, St
Marys Hospital, Manchester, UK - Steve Kemp, Liverpool, UK
- Postgraduates
- Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
Alper, John Dickman, Keith Flanagan, Antoon
Goderis, Tracy Craddock, Alastair Hampshire - Industrial
- Dennis Quan, Sean Martin, Michael Niemi, Syd
Chapman (IBM) - Robin McEntire (GSK)
- Collaborators
- Keith Decker
24Collaboration
http//www.accessgrid.org
25Publications
- R. Stevens, H.J. Tipney, C. Wroe, T. Oinn, M.
Senger, P. Lord, C.A. Goble, A. Brass and M.
Tassabehji Exploring Williams-Beuren Syndrome
Using myGrid to appear in Proceedings of 12th
International Conference on Intelligent Systems
in Molecular Biology, 31st Jul-4th Aug 2004,
Glasgow, UK. - C.A. Goble, S. Pettifer, R. Stevens and C.
Greenhalgh Knowledge Integration In silico
Experiments in Bioinformatics in The Grid
Blueprint for a New Computing Infrastructure
Second Edition eds. Ian Foster and Carl
Kesselman, 2003, Morgan Kaufman, November
2003.R. Stevens, A. Robinson, and C.A. Goble
myGrid Personalised Bioinformatics on the
Information Grid in proceedings of 11th
International Conference on Intelligent Systems
in Molecular Biology, 29th June3rd July 2003,
Brisbane, Australia, published Bioinformatics
Vol. 19 Suppl. 1 2003, pp302-304.
26http//www.mygrid.org.uk