A myGrid Project Tutorial - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A myGrid Project Tutorial

Description:

AMBIT. Interpro. Emboss Eprimer application. in SoapLab. Selection of ... AMBIT. Determine whether coding SNP. affects the active site of the protein ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 32
Provided by: chris512
Category:

less

Transcript and Presenter's Notes

Title: A myGrid Project Tutorial


1
A myGrid Project Tutorial
  • Dr Mark Greenwood
  • University of Manchester

With considerable help from Justin Ferris, Peter
Li, Phil Lord, Chris Wroe, Carole Goble and the
rest of the myGrid team.
2
  • Open Source Upper Middleware for Bioinformatics
  • (Web) Service-based architecture
  • Targeted at Tool Developers, Bioinformaticians
    and Service Providers

Newcastle
Sheffield
Manchester
Nottingham
Hinxton
Southampton
3
myGrid People
  • Core
  • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich
    Cawley, Neil Davis, Alvaro Fernandes, Justin
    Ferris, Robert Gaizaukaus, Kevin Glover, Carole
    Goble, Chris Greenhalgh, Mark Greenwood, Yikun
    Guo, Ananth Krishna, Peter Li, Phillip Lord,
    Darren Marvin, Simon Miles, Luc Moreau, Arijit
    Mukherjee, Tom Oinn, Juri Papay, Savas
    Parastatidis, Norman Paton, Terry Payne, Matthew
    Pockock Milena Radenkovic, Stefan
    Rennick-Egglestone, Peter Rice, Martin Senger,
    Nick Sharman, Robert Stevens, Victor Tan, Anil
    Wipat, Paul Watson and Chris Wroe.
  • Users
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle, UK
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Postgraduates
  • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar
    Alper, John Dickman, Keith Flanagan, Antoon
    Goderis, Tracy Craddock, Alastair Hampshire
  • Industrial
  • Dennis Quan, Sean Martin, Michael Niemi, Syd
    Chapman (IBM)
  • Robin McEntire (GSK)
  • Collaborators
  • Keith Decker

4
Roadmap - start
services
data
5
Philosophy
  • Openness
  • open source
  • open world of services
  • open to wider eScience context
  • open to user feedback
  • open to third party metadata
  • Collection of components for assembly
  • Pick and mix

6
Tenet I
  • High level Middleware services for data intensive
    resource interoperation for Bioinformatics
  • Information Grid not computational Grid
  • Exploratory, ad hoc
  • For individuals
  • In silico experiment as workflow
  • Distributed query processing
  • Information Management

7
Tenet II
  • High level services for e-Science experimental
    management
  • Provenance
  • Event notification
  • Personalisation
  • Sharing knowledge and sharing components
  • Scientific discovery is personal global.
  • Federated third party registries for workflows
    and services
  • Workflow and service discovery for reuse and
    repurposing

Find
Registry
Annotate
Register
8
Tenet III
  • Open Source and Open Services
  • No control or influence over service providers
  • Open to third party metadata and services
  • Open extensible architecture
  • Assemble your own components
  • Designed to work together
  • Toolkit

9
Tenet IV
  • (Web) Service architecture
  • Publication, discovery, interoperation,
    composition, decommissioning of myGrid services
  • WS-I -gt OGSA / WSRF
  • Metadata driven
  • Ontologies
  • Common information model
  • Semantic Web technologies
  • RDF, OWL

10
Tenet V
  • Middleware for
  • Tool Developers
  • Bioinformaticians
  • Service Providers
  • Biologists are indirectly supported by the
    portals and apps these develop.

11
Roadmap
services
workflows
discover services
workflows
run workflows
data
data management
12
Data-intensive bioinformatics
ID MURA_BACSU STANDARD PRT 429
AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE
1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7)
(ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMI
NE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA
OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA
FIRMICUTES BACILLUS/CLOSTRIDIUM GROUP
BACILLACEAE OC BACILLUS. KW PEPTIDOGLYCAN
SYNTHESIS CELL WALL TRANSFERASE. FT ACT_SITE
116 116 BINDS PEP (BY SIMILARITY). FT
CONFLICT 374 374 S -gt A (IN REF.
3). SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32 MEKLNIAGGD SLNGTVHISG AKNSAVALIP
ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE
MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI
GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER
LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE
IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP
DRIEAGTFMI
13
Use Scenarios
  • Graves Disease
  • Autoimmune disease of the thyroid
  • Simon Pearce and Claire Jennings, Institute of
    Human Genetics School of Clinical Medical
    Sciences, University of Newcastle
  • Discover all you can about a gene
  • Annotation pipelines and Gene expression analysis
  • Services from Japan, Hong Kong, various sites in
    UK
  • Williams-Beuren Syndrome
  • Microdeletion of 155 Mbases on Chromosome 7
  • Hannah Tipney, May Tassabehji, Andy Brass, St
    Marys Hospital, Manchester, UK
  • Characterise an unknown gene
  • Annotation pipelines and Gene expression analysis
    Services from USA, Japan, various sites in UK

14
Manually filling a genomic gap
  • Two major steps
  • Extend into the gap Similarity searches
    RepeatMasker, BLAST
  • Characterise the new sequence NIX, Interpro,
    etc
  • Numerous web-based services (i.e. BLAST,
    RepeatMasker)
  • Cutting and pasting between screens
  • Large number of steps
  • Frequently repeated info now rapidly added to
    public databases
  • Dont always get results
  • Time consuming
  • Huge amount of interrelated data is produced
    handled in lab book and files saved to local hard
    drive
  • Mundane
  • Much knowledge remains undocumented
  • Bioinformatician does the analysis

15
Query nucleotide sequence
ncbiBlastWrapper
RepeatMasker
Pink Outputs/inputs of a service Purple
Taylor-made services Green Emboss soaplab
services Yellow Manchester soaplab services
Grey Unknowns
WBS Workflows
GenBank Accession No
URL inc GB identifier
Translation/sequence file. Good for records and
publications
prettyseq
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
6 ORFs
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
sixpack
ORFs
transeq
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
ncbiBlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
Predicts cellular location
CpG Island locations and
cpgreport
InterPro PFAM Prosite Smart
Identifies functional and structural
domains/motifs
RepeatMasker
Repetative elements
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
16
Graves Disease Bioinformatics
Peter Li1, Claire Jennings2, Simon Pearce2 and
Anil Wipat1, (2003) 1School of Computing Science
and 2Institute of Human Genetics, University of
Newcastle-upon-Tyne.
Candidate gene pool
Annotation Pipeline
Genotype Assay Design System
3D Protein Structure
What is known about my candidate gene?
What is the structure of the protein product
encoded by my candidate gene?
Is this SNP present in my samples?
Medline
Gene ID
Primer Design
GO
EMBL
Emboss Eprimer application in SoapLab
Use primers designed by myGrid to amplify region
flanking SNP on the gene
SNP
Query
Restriction Fragment Length Polymorphism
experiment
OMIM
BLAST
Selection of restriction enzyme
Talisman
Emboss Restrict in SoapLab
DQP
SN
SNP
P
SN
P
17
Experiment life cycle
Forming experiments
Personalisation
Discovering and reusing experiments and resources
Executing and monitoring experiments
Managing lifecycle, provenance and results of
experiments
Sharing services experiments
18
(e-)Scientists
  • Experiment
  • Can workflow be used as an experimental method?
  • How many times has this experiment been run?
  • Analyze
  • How do we manage the results to draw conclusions
    from them?
  • How reliable are these results?
  • Collaborate
  • Can we share workflows, results, metadata etc?
  • Publish
  • Can we link to these workflows and results from
    our papers?
  • Review
  • Can I find, comprehend and review your work?
  • How was that result derived?

19
Collections of Tasks
Building
Domain Tasks
Workflow
Service Providers
Enactment
Bioinformaticians
Storage
Scientists
Description
Service Discovery
Provenance
Data Management
Finding
Querying
Annotation providers
20
Registry
Bioinformaticians
Taverna WF Builder
Querying/sharing/ federating/registering
Query Retrieve
Workflow Execution
Discovery View
Annotation/description
FreeFluo Enactor
invoking
Annotation providers
Interface Description
Store data/ knowledge
Pedro Annotation tool
mIR
Others
Service Providers
WSDL
Soap- lab
Vocabulary
Haystack Provenance Browser
Ontology Store
Data descriptions
Scientists
21
myGrid Service Stack
Work bench
Taverna
Talisman
Web Portal
Applications
Gateway
Personalisation
Service and Workflow Discovery
Registries
Provenance
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
Views
Core services
myGrid Information Repository
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
22
Two Paths
  • Innovative work
  • Service and workflow registration
  • Semantic discovery
  • Provenance management
  • Text mining
  • Core functionality
  • Services Soaplab and Gowlab
  • Workflow enactment engine Freefluo
  • Workflow workbench Taverna
  • Data integration OGSADQP
  • Information model management
  • In between
  • Event notification
  • Gateway

23
myGrid Service Stack
Work bench
Taverna
Talisman
Web Portal
Applications
Gateway
Personalisation
Service and Workflow Discovery
Registries
Provenance
Event Notification
Ontology Mgt
Ontologies
Metadata Mgt
Views
Core services
myGrid Information Repository
FreeFluo Workflow Enactment Engine
OGSA-DQP Distributed Query Processor
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Legacy apps
Legacy apps
24
(No Transcript)
25
Run the Workflow
Viewing intermediate results
26
Run the Workflow
27
Drilling Down myGrid and Semantics
  • Workflow and service discovery
  • Prior to and during enactment
  • Semantic registration
  • Workflow assembly
  • Semantic service typing of inputs and outputs
  • Provenance of workflows and other entities
  • Experimental metadata glue
  • Use of RDF, RDFS, DAMLOIL/OWL
  • Instance store, ontology server, reasoner
  • Materialised vs at point of delivery reasoning.
  • myGrid Information Model

28
Semantic Discovery
  • Pedro data capture tool

Drag a workflow entry into the explorer pane and
the workflow loads. Drag a service/ workflow to
the scavenger window for inclusion into the
workflow
  • View annotations on workflow

29
Tutorial focus
  • Innovative work
  • Service and workflow registration
  • Semantic discovery
  • Provenance management
  • Text mining
  • Core functionality
  • Services Soaplab and Gowlab
  • Workflow enactment engine Freefluo
  • Workflow workbench Taverna
  • Data integration OGSADQP
  • Information model management
  • In between
  • Event notification
  • Gateway

30
Roadmap
services
Registry
1. Describe services
workflows
2. Discover services
workflows
3. Write run workflows
Taverna workbench
LSID authorities
data
4. Provenance datamanagement
31
Sessions on Details
  • Workflows - hands on with Taverna
  • Semantics
  • Timetable split sessions
  • Session 1
  • Group 1 hands on (Swanson)
  • Group 2 semantics (Newhaven)
  • Teabreak (short)
  • Session 2
  • Group 1 semantics (Newhaven)
  • Group 2 hands on (Swanson)
  • Discussions and Conclusions

32
Questions?
http//www.mygrid.org.uk
http//taverna.sf.net
http//freefluo.sf.net/
Write a Comment
User Comments (0)
About PowerShow.com