Title: myGrid
1myGrid
- 3rd Steering Meeting
- October 21st 2002, Manchester
2Meeting Plan.
- Reminder of objectives.
- Project context.
- Review progress over past year.
- Pre-prototype Nov 2001 April 2002.
- Post Pre-prototype April 2002 October 2002.
- Review project plans and structure.
- Workbench Demonstrator.
- Provenance Personalisation.
- Industrial engagement.
- Risk assessment strategy.
3- myGridpersonalised extensible environments
fordata-intensive in silico experiments in
biology - EPSRC eScience pilot project
- official start 01/10/01
- actual 01/01/02
- end 30/03/05
- 16 RAs, 9 studentships (start 09/03)
4Circadian Rhythms
- Has anyone else studied the effect of
neurotransmitters on the circadian rhythms in
Drosophila? - Ive got a cluster of proteins from my
experiment. How do their functions interrelate?
And what are the proteins with a particular
function? - Is a structure known for my protein? What other
proteins have a similar structure? - Can I build a homology 3D model?
- What is known about a homologous protein?
1
2
3
5
4
5E-Science Q A
- Who else has asked this question can I
use/adapt their approach? - Workflow.
- What were the results at each stage?
- Dynamic Data Repositories.
- When was P12345 last updated?
- Which BLAST did I use?
- Provenance.
- Has PDB changed since I last ran this?
- Notification.
1
2
3
5
4
Personalisation.
6myGrid in silico experimentation
- Resource Interoperation.
- Workflow Coordination Database Integration
- Provenance Change Propagation.
- Improving quality of experiments data.
- Personalisation Collaborative working.
- Scientific discovery is personal global.
- Security, ownership -gt valuable assets
- Service based architecture (formally known as
agents) - Publication, discovery, interoperation,
composition, decommissioning of myGrid services - Metadata.
- Describing stuff, using ontologies, Semantic Web.
7myGrid outcomes reminder
- e-Scientists
- Workbench
- Environment built on toolkits for service access,
personalisation community - Application
- Gene function expression analysis using S.
cerevisiae - Annotation workbench for the PRINTS pattern
database - Developers
- myGrid-in-a-Box developers kit
- Re-purposing DAS, AppLab and OpenBSA
- Integrating ISYS GlaxoSmithKline platforms
8myGrid partners
m
Millennium Pharmaceuticals, LION BioSciences,
TurboGenomics Issue Incorporating industrial
partners.
9The myGrid team
- Carole Goble
- Norman Paton
- Brian Warboys
- Stephen Pettifer
- Alvaro Fernandes
- Luc Moreau
- Dave De Roure
- Chris Greenhalgh
- Tom Rodden
- John Brooke
- Paul Watson
- Alan Robinson
- Rob Gaizauskas
- Robert Stevens
- Ian Horrocks
- Neil Wipat
- Matthew Addis
- Nick Sharman
- Rich Cawley
- Simon Harper
- Karon Mee
- Simon Miles
- (Vijay Dailani)
- Xiaojian Liu
- Tom Oinn
- Martin Senger
- Milena Radenkovic
- Kevin Glover
- (Angus Roberts)
- Chris Wroe
- Mark Greenwood
- Phil Lord
- Neil Davis
- Darren Marvin
- Justin Ferris
- Peter Li
- Nedim Alpdemir
- Luca Toldo
- Robin McEntire
- Anne Westcott
- Tony Storey
- Bernard Horan
- Paul Smart
- Robert Haynes
10Global Grid Forum Links
- Open Grid Services Architecture
- http//www.globus.org/ogsa/
- Early Demonstrator (with AstroGrid)
- Database Access and Integration
- GGF OGSA-DAIS and OGSA-DAI project
- Norman Paton, Paul Watson, IBM (WP3)
- OGSI working group
- Tom Oinn (WP1)
- GGF-Semantic Grid Research Group
- Carole Goble, Dave De Roure
- GGF Life Sciences Working Group..
11Links with other Grid projects
- AstroGrid
- Ontologies, Databases
- Geodise
- Ontologies, Databases, Workflow
- Comb-e-Chem
- Workflow, (LabBooks)
- SCEC (USA)
- Ontologies and Service composition
- UTOPIA
- Client application of myGrid middleware.
12Potential Links with Other Projects
- MIMAIS
- Ontologies
- E-Protein
- Potential client and beta tester
- Macromolecular Structures Database
- Potential client and beta tester
- GONG
- Ontologies and ontology infrastructure
- WonderWeb
- Ontology infrastructure
13Links with Other Programmes
- I3C
- BioSciences Service Registry (Carole)
- Life Sciences ID (Martin Senger)
- BioMOBY
- Open Source Activity
- BioMOBY registry and object typing.
14Lots of links take lots of time
The Goals and Status of the e-Science Core
Programme, March 2002
15myGrid Talks
- BiGUM Bioinformatics Grid User Group, NeSC, 2001
- InfoTechPharm2002, London, Feb 2002 (mentioned by
Novartis) - Finland Grid workshop (via AccessGrid)
- NeSC Opening, NeSC April 2002
- Agents in Bioinformatics workshop
- Sun HPC Consortia meeting, Glasgow, July 2002
- I3C meeting, Boston, July 2002
- UK eScience All Hands, Sheffield, Sept 2002
- Genes, Proteins and Computing VII, Southampton,
Sept 2002 - EMBL-EBI, Hinxton 30th September 2002
- Wellcome Trust UK Biological Grids Retreat,
Hinxton. 1-3rd October 2002 - BBSRC Grant Holder workshop, 28-29th October,
2002 - Objects in Bio and Chem Informatics, Washington
Nov 2002 - DTI Outreach in Bioinformatics, London Nov 2002
- Sun BioGrid Symposium, Baltimore, Dec 4-5th 2002
- InforTechPharma Grids Symposium, London Feb 2003
- O'Reilly Bioinformatics Technology Conference,
San Diego, Feb 2003
16Publications
- Comparative and Functional Genomics
- Scientific Computing article
- International Journal of Cooperative Information
Systems - Book chapter on Semantic Grids
- SIGMOD Record paper on Semantic Grids
- Grid Bible 2 chapter on myGrid invited
- Submission to EuroWeb 2002
- Others???
17Current programme
- Use case scenarios.
- Rolling programme of prototyping.
- April myGrid 0.0, October myGrid 0.1
- Identifying the most important services.
- Agreeing consistent interfaces.
- Integrating with other Grid services.
- Implementing core services.
- Describing services.
- Connecting with other efforts.
18Project Management
- Management structure taken a longer time than
expected. - Recruitment completed, and RA churn commenced
(Lost Vijay and Angus) - 9 PhD studentships allocated to start 2003.
Structures taken longer than expected - Weekly management telephone conference now a
monthly access grid meeting - Regular WP meetings and email lists.
- Document repository
- BSCW, WIKI, probably needs bulletin board!
- CVS code repository and software build
environment - Web site
- Software Development environment UML
- But common software platform still unresolved.
- Open Source license LGPL.
- Collaboration agreement
19myGrid phased development
6 months
April 2002
Pre-prototype
12 months
Architecture
Simple services
24 months
Early toolkit trials
33 months
Extended services
- Versions of myGrid
- Varying degrees of functionality
Application trials
Developers toolkit
Release
20Next Phases of development
Kick-off meeting
Nov 01
Pre-Prototype
Consolidation Architecture
Prototype Demonstrator
Pre-Release 1.0
Release 1.0
21Pre-prototype Purpose
- Requirements gathering
- Technology experimentation
- Web services
- Semantic Web
- Grid
- Not to deliver real supported software
22Pre-prototype characteristics
- A number of sequence analysis-based scenarios
- Personal data repository
- Web service-wrapped public data repositories and
tools - Simple Workflow enactment
- Provenance (primitive form)
- Ontology-based service discovery
- Simple Web Portal
- Decoupled text extraction
- No event notification
- No database integration (aka distributed query
processing / instance reconciliation)
23Pre-prototype Process
Technology Personnel Induction
User Group
Architecture Group
Specifying the myGrid versions
Pre-Prototype April 2002
24Client framework
myGrid 0.0
Portal
Repository Client
Ontology Client
Workflow Client
Personal Repository
Workflow Repository
(Metadata) Ontology Server
DAMLOIL Reasoner (FaCT)
(Metadata) Service Type Directory
Workflow enactment
Matcher and Ranker
Service instance directory
Bioinformatics services
25How do the functions of a cluster of proteins
interrelate?
- Some proteins in my personal repository
26 Find services that takes a protein and gives
their functions and pick the best match.
27 Find another that displays the proteins base on
their function. Ontology restricts inputs
outputs
28Build a workflow of composed services linked
together
29 See if a workflow that is appropriate already
exists. It could have been made anyone who will
share with you.
30Pick one and enact it.
31While its running it picks the best service
instance that can run the service at that time.
32While its running it picks the best service
instance that can run the service at that
time. Or you choose.
33The workflow finishes with the final display
service
34Results are put into your personal repository,
with a concept from the ontology to tell you and
myGrid what they mean.
35And full provenance record kept, and linked with
the results. We could redo or reuse the workflow.
36IF-2 (Hinxton, October 2002)
- Consolidated IF-1 software and builds.
- Attempted a new cut at the architecture.
- Event notification service
- Workflow enactment engine
- Personal data repository for XML data
- Ontology server
- Ontology of services
- Gateway API
- Pairwise integration.
37Overview Development
- Consolidate pre-prototype services
- Develop new services for myGrid 0.1
- Start to develop new services for myGrid 0.2
38Development Consolidation
- Bioinformatics services
- SOAPLAB Web Service access to EMBOSS tools
- Medline BQS
- BLAST Services
- Service directory
- Ported from MS Access to MySQL
- Personal repository
- Revised schema
- Specific support for XML data
39Development Consolidation
- Ontology service
- Ported from CGI to Web Service interface
- Workflow enactment engine
- Supports much richer subset of WSFL
- e-Science layer
- Refactored into web Portal and underlying Gateway
(API Web Service) - Text-only client for lightweight use scripting
added
40(No Transcript)
41(No Transcript)
42Development For myGrid 0.1
- Use of myGrid via Talisman
- Click here to run the EMBOSS example workflow
- Notification service
- Based on EJB implementation
- Service describer client
- For introducing new service types
43Development For future
- Container-based framework
- programming model abstracts from transport
infrastructure - Distributed query process support
- as OGSA-DAI service
- Text extraction
- reengineered PASTA
- available via Web Service
44myGrid Framework
Portal
Work Bench
Applications UTOPIA
Bio-Medical Services Library DAS, Talisman,
workflow sets
Upper level knowledge-based Grid Common
Services Semantic integration, knowledge based
querying, workflow composition, visualisation,
provenance mgt, semantic service discovery
Middle level Grid Common Services Database
access, distributed query processing, service
discovery, workflow enactment, event notification
Low level Grid Common Services (OGSI) Co-schedulin
g, data shipping, authentication, job execution,
resource monitoring, replication
45User Agent
Custom Application
Presentation Services
Collaboration Support
Management Tools
Portal
Client Framework
Semantic Data Integration
Semantic Aspect
Information Extraction
Semantic Workflow Design
Provenance Validation Assessment
Semantic Discovery
Ontology Service
Preferences
Metadata Aspect
Availability
Preferences
Versioning
Third-party Metadata
QoS
QoS
Provenance
QoS
Coordination Services
Distributed Query
Workflow Enactment
Syntactic Discovery
Event Notification
Networked Services
White Pages Yellow Pages Discovery
Personal Repository
Database Access
JobExecution
Device Access
Device Access
Security Authentication Authorization
Distributed Resources
Database
resources data and tools
46Review
- Technology focus up until now.
- Tendency to over-develop technology without
application focus. - Lack of user engagement.
- Esp. from yeast and PRINTS annotators.
- Need to reassert
- application perspective
- myGrid distinctiveness provenance and
personalisation. - Architecture group doesnt seem to be working
47Issues (1) Work Packages
- Work package structure does not support the cross
work package issues. - User requirements
- Originally planned under WP6 but isnt how it
turned out. - Provenance
- Personalisation
- Proposed Solution
- New cross WP work packages in these areas.
48Issues (2) Application
- User requirements on workflows rather than how
they are used. - Lack of clarity for
- End-user application.
- End-user demonstrators for the application.
- Without this it is easy for the user scenarios to
be simplistic and technology focused - Forgetting how databases, workflows, services
will be used. - Because we dont have a bio-lead
49Who is myGrid for?
myGrid users
IS specialists
biologists
systems administrators
tool builders
infrequent
problem specific
service provider
bioinformaticians
bioinformatics tool builders
50An e-Science Workbench
- A lab book metaphor
- Strong provenance and personalisation thread.
- An integrating application
- Populated with an bio-examplar
- Andy Brass Cold Carp expression
- Macromolecular Structure Database
51Applications Framework
Sequence annotation
Cold Carp Gene Expression
MSD
App Demonstrator
Workbench Demonstrator
Application UTOPIA
Apps Builder (Talisman)
Workbench
Web Portal
Gateway API
myGrid Middleware Services
52IF-3 Proposal
Sequence annotation
Cold Carp Gene Expression
MSD
App Demonstrator
Workbench Demonstrator
Application UTOPIA
Apps Builder (Talisman)
Workbench
Web Portal
Gateway API
myGrid Middleware Services
53Issues (3) Architecture
- Difficult to get an architecture team going.
- Vested interests.
- Lack of app. focus.
- OGSA confusion.
- Architecture confusion.
- Neglect physical arch.
- Proposal build a demonstrator.
- Adopt the 41 architecture model.
54Challenges Architecture
- Use of service based architecture
- Is this enough?
- Risks stovepipe approach to cross-myGrid issues
- Need more emphasis on data model
- Resources, Services, Provenance,
- Need to address scalability across community,
virtual organizations
55Challenges OGSA
- OGSA Grid meets Web Services
- Being define standardized by GGF
- Significant buy-in across community
- myGrid already uses Web Services
- How do we
- conform to OGSA?
- exploit OGSA?
- add value to OGSA?
56Challenges myGrid 1.0
- The myGrid proposal
- Phase 2 month 18
- First release of simple services interfaced to a
set of biological sources. - The first demonstration of the toolkits and
applications toolkit trials. - Formative assessment of the facilities using
myGrid workshop, user meetings, and the ESNW
Regional Centre to engage the user community. - Third myGrid workshop.
57Challenges myGrid 1.0
- Month 18 October 2003
- myGrid iterations end of..
- January 2003
- May 2003
- September 2003 gtgtgt myGrid 0.1!
58Work Package Reports
59Work packages leaders
- WP1 fabric resources Alan Robinson, EBI
- WP2 architecture Luc Moreau,
Southampton - WP3 databases Paul Watson, Newcastle
(norman paton) - WP4 metadata Carole Goble, Manchester
(robert stevens) - WP5 workflow Brian Warboys, Manchester
(matthew addis) - WP6 toolkits Chris
Greenhalgh, Nottingham - WP7 information extraction Rob Gaizauskas,
Sheffield - WP8 management Nick Sharman, Manchester
(carole goble) - WP9, 10 and 11 proposed.
60WP1 Bio Services
- Leader Alan Robinson (EBI)
- Two RAs Martin Senger and Tom Oinn.
- Linking services with Grid Fabric (although this
is done by other WP too). - Data source preparation.
- Middleware wrapping, security and sources
population. - Globus deployment on hold.
61WP1 Progress Nov01-Oct02
- SOAPLAB to provide Web services for analysis
applications EMBOSS. BLAST. - Web services for archives MEDLINE/BQS. SRS.
GadFly FlyBase. - GO visualisation tool for IF-1.
- Talisman 1.4 for tool builders.
- Use cases for IF-1 IF-2 workflows.
- "Bio services" used by IF-1 IF-2.
- PR to bio community EBI/Hinxton
- Two Web services workshops.
- "What is myGrid Grid?" for non-CS.
- Participation in Users Group.
- Engaging with LION over SRS.
- caBIO of NCI, bioMOBY, I3C LSID. Participating
in bibliographic objects effort.
62WP2 Architecture
- Leader Luc Moreau (Southampton) , Dave De Roure,
Mike Luck - Three Ras Simon Miles, Xiaojian Liu, (Vijay
Dailani) - Sub workpackages
- Service Directory WP4, WP6
- Notification Service WP3, WP5, WP6, WP11
- EJB Component Model all
- Security all
- Fault Tolerance WP11
- Issues? Risks?
63WP2 Service Directory
- A service directory offering personalised and
customisable views of multiple existing service
directories. - myGrid 1.0 functionality Basic implementation of
views (where views content is specified by
queries over service directories or views). - Next 4 Months View design and specification
query language over UDDI-M - Issue lack of Open Source UDDI who will deploy
the demonstrator UDDI? - Issue Link with I3C Registry.
- Link with WP4 Metadata.
64WP2 Notification Service
- A peer to peer adhoc network of OGSA compliant
notification services offering end-to-end quality
of service. - MyGrid 1.0 functionality OGSA compliant
notification service, offering elements of
quality of service (e.g. max notification rate)
and client feedback. Support for personalised
views updates. - Next 4 Months - Implementation of all the
business logic to support OGSA interfaces (this
includes push clients) - Syntactic compatibility
with OGSA interface wherever technically feasible
Framework - Links with WP3 Info. management WP5 Workflow
65WP2 EJB based component model
- An EJB based component model that would allow us
to deploy a service business logic as a myGrid
service, where containers would provide default
security, support for fault tolerance, provenance
context, etc. Services could be exported as OGSA
grid services, Web services or EJBs. A client
side library would allow uniform interactions
with any of these. - myGrid 1.0 functionality Service deployment as
OGSA, WS, EJB. Client API. Security container. - Next 4 Months Client side library (with dynamic
invocation) - Deployment of service (but no
"added value" container provided).
66WP2 Security
- A cross-domain X509-certificate based
authentication mechanism, and access control
based on proxy certificates and/or role
certificates. An API to generate non-repudiable
provenance traces. - myGrid 1.0 functionality Dummy implementation
with placeholders for certificates Dummy
implementation of authorisation based on string
matching - Next 4 Months (starting December) (still needs
to be finalised)
67WP2 Fault tolerance
- A fault manager able to orchestrate, in
collaboration with the enactment engine, the
recovery of a workflow when faults are detected. - myGrid 1.0 functionality to be determined.
- Next 4 Months N/A as progress needs to be made
on the provenance personalisation front.
68WP3 Info Management
- Leader Paul Watson (Newcastle)
- Norman Paton, Alvaro Fernandes (Manchester)
- Three RAs Peter Li, ???, (Newcastle), Nedim
Alpdemir (Manchester) - Effective use of information
- locate, access, process, combine, share, alert
- Activities
- myGrid Information Repository
- Distributed Query Processing
- Views (personalisation)
- Notification
69Information Management Scope
- WP3 enables the scientist to make effective use
of information - locate, access, process, combine, share, alert
- Activities
- MyGrid Information Repository
- Distributed Query Processing
- Views
- Notification
70MyGrid Information Repository
External Bio Repositories
MyGrid Information Repository
Organisational
Analyse Data
Personal
Browse Annotate
Alert
71WP3 Progress Nov01-Oct02
- Initial MyGrid Information Repository deployed
- provenance
- data
- metadata
- workflows
- Supports XML relational
- Notification designed
- Distributed query processing prototype running on
the Grid
72WP3 Plans
- myGrid 1.0 functionality
- Move to Open Grid Services Architecture
- distributed query processing (July)
- Next 4 Months
- requirements analysis (Nov) ? design (Dec) ?
implementation - Err more detail please!
73Distributed Query Processing
- select p.proteinId, Blast(p.sequence)
- from gimsprotein p, goproteinTerm t
- where
- t.termId S92 and
- p.proteinId t.proteinId
- Grid resources are acquired to run the operators
- can exploit parallelism
reduce
op_call (Blast)
exchange
hash_join (proteinId)
exchange
exchange
reduce
reduce
table_scan termIDS92 (goproteinTerm)
table_scan (gimsprotein)
74WP4 Metadata Ontologies
- Leader Carole Goble (Manchester)
- Robert Stevens (Manchester)
- One and a half RAs Phil Lord (since April 02),
- Chris Wroe, (Angus Roberts)
- The metadata requirements, services and content
needed for publication, registration, discovery,
matchmaking, deregistration of services - Activities
- 1. Ontology languages services
- 2. Resource discovery WP2, WP6
- 3. Annotation with metadata WP3, WP6
- RISKS too little resource!
75WP4 Progress Jan02-Oct02
- An ontology of services for myGrid 0.0 and 0.1 in
DAMLOIL (to be OWL). Available on web and
accepted for publication. - A survey of metadata requirements (not yet
consolidated) - Mapping service between Web Services and Ontology
Server - Simple service finding tool, service matcher
- SOAP Ontology server for OWL
- User requirements and scenario building
- Build environment
- DAMLOIL training for other e-Science projects
- I3C and BioMOBY tracking and participation
76WP4 Plans
- myGrid 0.1 functionality
- Integrated service publication (with WP2).
- Service discovery and publication by signature,
types as well as ontology concepts - OGSA compliance.
- Metadata requirements is an ontology really
required for service discovery? - Simple provenance annotation scheme based on
COHSE. - Complete provenance model.
77WP4 Plans
- Next 4 months
- Integrated service registration (with WP2).
- Types and ontologies reconciliation.
- myGrid object model and BioMOBY objects
- Extension of ontology for demonstrator
- User requirements for Cold Carp demonstrator
- Begin a provenance model.
78WP4 Issues
- Focused on describing services using ontologies.
- Information model needs attention.
- Havent looked at metadata other than ontologies.
- Too few people.
79Suite
Specialises. All concepts are subclassed from
those in the more general ontology.
Contributes concepts to form definitions.
Upper level ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Task ontology
Bioinformatics ontology
Web serviceontology
801. User selects values from a drop down list to
create a property based description of their
required service. Values are constrained to
provide only sensible alternatives.
2. Once the user has entered a partial
description they submit it for matching. The
results are displayed below.
3. The user adds the operation to the growing
workflow.
4. The workflow specification is complete and
ready to match against those in the workflow
repository.
81Client framework
myGrid 0.0
Portal
Repository Client
Ontology Client
Workflow Client
Personal Repository
Workflow Repository
(Meta Data) Ontology Server
DAMLOIL Reasoner (FaCT)
(Meta Data) Service Type Directory
Workflow enactment
Matcher and Ranker
Service instance directory
Bioinformatics services
REGISTRY
82Uses of ontology
- Labelling data items in databases.
- Semantic typing for controlling inputs and
outputs of workflows - Use by distributed query processing.
- Workflow, database classification.
- Linking browsing XML-based components
- COHSE
- Soft build of portals.
- Link with the Life Science Identifier (I3C)
- BioMOBY Central service classification
83(some) Registry Issues
- Find services based on name, signature, types, a
word (not just using the ontology). - Registry management weeding, authorisation,
decommissioning. - Publishing of services. Keeping their
descriptions up to date and faithful. - Alternative descriptions of services.
- Staged descriptions.
- Maintenance and evolution of the ontology
- Multiple registries personal, local, enterprise
84WP5 Workflow
- Leader Brian Warboys (Manchester), Matthew Addis
(IT Innovation, Southampton) - TWO RAs Mark Greenwood (Manchester), Darren
Marvin, Justin Ferris (IT Innovation, shared) - Activities
- 1. Workflow design and discovery WP4
- 2. Workflow enactment WP2
- Risk split over 2 sites
85WP5 Progress Apr02-Oct02
- Post pre-prototype documentation, testing and
support for demos - Workflows for 0.1 use cases - both based on
pre-prototype and EMBOSS/SOAPLab services - Metadata for finding workflows and finding
services for workflows - Robust enactment engine using web service
standards. WSDL, UDDI, WSFL - Deployable to standard Tomcat / Axis container
combination - EMBOSS Workflow
- Combined two concurrent application flows
- Executing seven applications
- Forty five web service invocations
- Simple Provenance
- Date and time, Actual services used, Intermediate
data
86WP5 Plans
- myGrid 0.1 functionality
- Sample workflows for publicly available myGrid
- Workflow lifecycle resolution, personalisation,
annotation, composition and development - Workflow requirements in the context of an
application (e.g. relationship to MATLAB,
Talisman, ) - Provenance - are we generating the right workflow
provenance, and how could this be used? - Support for Secure Invocations using HTTPS and
SOAP Digital Signatures - Data staging for performance
- Greater control of provenance
- Personalisation
- Plug-ins to support data processing, User
interaction during workflows, Integration with
other tools
87WP5 Issues
- Contacts
- no response from Frank Leymann of IBM
- after an initial positive meeting
- relationship with technology providing partners?
- When to move from WSFL to BPEL4WS (if at all)
- Technology tracking BioMOBY, OMG LAB, BPEL4WS,
WSCI, LSID, DAML-S, OGSA/OGSI, ... - How to manage our resources effectively - what
alliances should we build (e.g DiscoveryNet)
88WP6 e-Science layer
- Leader Chris Greenhalgh (Nottingham)
- TWO people at Nottingham.
- Milena Radenkovic
- Kevin Glover
- Responsible for portal, workbench, collaboration
environments and application development user
requirements - Propose splitting off User Requirements and
possibly Application building.
89WP6 e-Science layerApr02-Oct02
- Extensive discussions with other WPs, esp.
ontology and metadata - Major re-factoring of pre-prototype web portal to
support multiple types of client - Common Gateway web service/API
- Re-factored web portal as gateway client
- Sample command line clients
- Gateway job abstraction transparent direct
invocation of web services as well as workflows - Common build/deployment environment
90WP6 Plans
- myGrid 1.0 functionality
- Gateway personalisation enhanced user agent,
simple workflow customisation, presentation - Gateway provenance multiple metadata sources,
activity logging, template workflow generation - Direct use of ontology service/facilities
- Enhanced web portal
- Personalisation, arbitrary metadata, helper
applications - Other sample clients command line, GUI
application?. - Next 4 Months More generic gateway web
service/API - Enhanced web portal
- New implementation technology??
- Structured browsing, notification, complex
operations - Resume requirements gathering on collaboration
support
91WP7 Text Services
- Leader Rob Gaizauskas (Sheffield)
- TWO RAs Neil Davis ???
- Aim to provide novel text access capabilities to
biological science researchers - Data will be mined/extracted from text
collections - Entity identification (e.g. proteins, residues,
species, etc.) - Attribute extraction (e.g. residue function)
- Relation extraction (e.g. interactions, pathways)
- Extracted data will be provided
- Via the MyGrid portal and Via web services
92WP7 Starting Point in myGrid
- Prototype PASTA (Protein Active Site Template
Acquisition) System - Terminology recognition/classification (e.g.
proteins, genes, species, ) - Attribute extraction (e.g. residue function)
- Relation extraction (e.g. in_protein(residue,
protein) - Trialed on small Medline corpora (2000 texts)
- PASTAWeb browser-based interface to extraction
results
93WP7 Progress to Date (1)
- Relational database server for PASTA results
- Previously, extracted data held in flat text
files indexed via Perl hashes non-scalable
limited querying - Relational tables for extracted results have been
defined, implemented and tested - Mapping procedures to map PASTA output into RDMS
- Web services (SOAP) interface to PASTA results
RDMS - Revised web interface to PASTA results
- Using new RDMS
- Taking into account feedback from biologists
94WP7 Progress to Date (2)
- Resource acquisition
- UMLS and GO acquired and installed
- Negotiated full copy of MEDLINE access rights
for MyGrid partners via web services to arrive
December 02 - Baseline PASTA system (nearly) integrated in
GATE-II text engineering architecture
95WP7 Activities Underway
- Work begun on significant revision/extension to
terminology acquisition/management/recognition - Redesigning lexical databases to include
synonym/term variant information - Investigating automatic term acquisition
algorithms - The views of biological scientists being sought
- To ensure information being extracted is useful
and relevant - To extend system to new domains
- To refine interface/searching capabilities over
extracted data - To elicit novel text-related requirements
96WP7 Technical Issues
- Text extraction is currently a slow procedure
not feasible in real-time - Re-indexing the data post text extraction
increases it by roughly a factor of 10 - It is proposed to pre-process all (or at least a
sizeable fraction) of MEDLINE which will be
computationally very intensive
97WP7 Text Services and myGrid
- How text services will be integrated in myGrid
still not clear - Text services could be integrated at the simplest
level as another web service - A more ambitious idea is an ambient text system
where potential search terms are gleaned from the
workflow to silently provide a library of useful
texts on the users desktop
98Extra Work Packages
- WP9 User Requirements
- Robert Stevens, Anil Wipat, Peter Li, EBI, Phil
Lord - Scenarios, interviews, web pages
- Issue getting GSK/AZ/Merck requirements
- Issue getting user requirements into the other
work packages - Solution IF-3 demonstrator.
99Extra Work Packages
- WP10 Application
- Part of WP6, but WP6 doesnt have any biologists.
- Suggestion New WP responsible for producing the
application on top of the workbench. - IF-3 demonstrator for Cold Carp.
100Extra Work Packages
- W11 Provenance
- Hidden in WP4 but pervades the whole of myGrid
- Provenance model
- Simple provenance demonstrators using annotation.
101Top 10 thoughts
- Application driven by use cases.
- Open Source.
- Data object types, APIs, protocols, ontologies
have longer life span that s/w. - Components are useful dont have to buy into
the whole shooting match. - Dont reinvent the wheel.
- Get others to build services / applications.
- Lower barriers of entry.
- Keep it simple.
- Its distributed and global.
- One solution wont work.