Title: ISPIDER
1ISPIDER A Pilot Grid for Integrative Proteomics
- BEP-II grantholders meeting,
- Edinburgh 24th Nov 2004
2Diversity of proteome data
gels
sequences
gtA01562 MAPKATYLIGAADKFHW gtA01567 MAQQPKEMLNILADKF
HWFLYC
Other data Species, PTMS, pathways, functional
annotation, transcriptome data
Structures/folds
mass spec
3Integration problems
- Lack of specific middleware
- Existing resources not wrapped
- Lack of data standards
- Standards for proteomics, incl. MS and protein
identification are emerging - Data not modelled
- New challenges from proteomics
- Data not captured/modelled
- Data not captured
- No mature repositories/databases for some
proteome data - But there is lots of data
4Aims
- To develop an integrated platform of proteomic
data resources enabled as Grid/Web services - Integrate existing proteome resources, enabling
them as Grid/Web services. - To develop novel, proteome-specific databases as
part of ISPIDER delivered as Grid/Web and
browser-based services - A repository for experimental proteome data
- A proteome protein identification server and
database - A phosphoproteome specific database
- To develop middleware support for distributed
querying, workflows and other integrated data
analysis tasks - Demonstrate effectiveness of the resulting
infrastructure studies in proteomics, including - Visualisation clients for proteomic data e.g. LRF
data - Analyses for fungal species of industrial
interest - Protein structural/functional trends in
experimental proteomics e.g. linking domain
structural patterns
5Integrated Proteomics Informatics Platform -
Architecture
ISPIDER Proteomics Clients
Vanilla Query Client
PPI Validation Analysis Client
Protein ID Client
WP3
WP4
WP6
WP1
WP5
WP2
Web services
ISPIDER Proteomics Grid Infrastructure
Existing E-Science Infrastructure
WP1
Public Proteomic Resources
WP6
WP3
Existing Resources
ISPIDER Resources
KEY WS Web services, GS Genome sequence, TR
transcriptomic data, PS protein structure, PF
protein family, FA functional annotation, PPI
protein-protein interaction data, WP Work
Package
6Work packages
- WP1 A Skeleton Integrated Proteomics Grid
- WP2 - Integration of gel-based data with
structural and functional annotation - WP3 - Data mining tools for the phosphoproteome
- WP4 - Structural and functional proteomics for
the Aspergilli - WP5 - Integration of proteinprotein interaction
data with structural functional annotations - WP6 - A protein identification server and
database
7Personnel
WP1
WP2
WP4
WP6
RA1
Manchester Khalid Belhajjame
WP6
WP4
WP3
RA2
Manchester Jennifer Siepen
WP2
RA3
WP1
WP3
UCL TBA
WP1
WP2
RA4
WP5
Birkbeck Lucas Zamboulis / Hao Fan
RA5
WP1
WP2
WP3
WP4
WP5
WP6
EBI Nishia Vinod
RA6
WP1
WP2
WP3
WP4
WP5
WP6
EBI TBA
8Deliverables
Primary RA Also involved
RA6
RA2
RA5
- PRIDE db
- Protein ID server
- Phosphoproteome db
- Extended isoform model
- Integrated generic workflows/DQP/etc
- 2D-DAS clients
- Grid wrapped BIOMAP
- Integrated Protein-protein workflows
RA2
RA6
RA2
RA5
RA6
RA6
RA3
RA1
RA4
RA3
RA1
RA4
RA4
RA3
RA1
RA6
9Existing infrastructure and skills
- myGRID
- OGSA-DQP
- AutoMed
- PSI/Pedro infrastructure/standards
- Protein id tools at Manchester
- 3 primary data integration strategies
- Workflows
- DQP using OGSA-DAI
- Heterogenous schema integration technologies
10Workflow Components
Freefluo
Freefluo Workflow engine to run workflows
Scufl Simple Conceptual Unified Flow
Language Taverna Writing, running workflows
examining results SOAPLAB Makes applications
available
11OGSA-DQP
- Used in Graves Disease
- Uses OGSA-DAI data access services to access
individual data resources. - A single query to access and join data from more
than one OGSA-DAI wrapped data resource. - Supports orchestration of computational as well
as data access services. - Interactive interface for integrating resources
and executing requests. - Implicit, pipelined and partitioned parallelism
and optimisation
http//www.ogsa-dai.org.uk/dqp
12AutoMed infrastructure
- Bidirectional mappings between schemas
- Available in global and local views
- Transformations between schemas
13Potential clients and outputs
- Markup with
- Identified peptides
- Across different tissues
- Different species
- PTMs
- etc
142D gel visualisation client
Potential annotations Comparative proteomics Real
vs virtual Add/subtract PTMs Display
pathways Functional annotation PPIs Folds
15Summary
- in silico Proteome Integrated Data Resource
Environment
- Alex Poulovassilis
- Nigel Martin
- Lucas Zamboulis
- Hao Fan
- Simon Hubbard
- Suzanne Embury
- Steve Oliver
- Norman Paton
- Carole Goble
- Robert Stevens
- Jennifer Siepen
- Khalid Bellhajjame
- Rolf Apweiler
- Weimin Zhu
- Henning Hermjakob
- Chris Taylor
- Nishia Vinod
- TBA
- David Jones
- Christine Orengo
- TBA