Title: The Cancer Biomedical Informatics Grid
1The Cancer Biomedical Informatics Grid (caBIG)
Overview of the Integrative Cancer Research
Workspace
- Baris Ethem Suzek
- Georgetown University - Lombardi Cancer Center
PIR - bes23_at_georgetown.edu
2Agenda
- Mission and goals
- Overview of Year 1 Year 2 Projects
- Data Resources
- Analytical tools
- ICR Project Development
- Example usage scenarios
- Year 3
- Questions
3Domain and Cross Cutting Workspaces
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
Addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
Provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
Provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
Provides for the sharing and analysis of in vivo
imaging data.
Responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
Developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
4Mission of ICR WS
- Facilitate translational research by integrating
clinical and basic research data - Produce informatics systems and tools that are
- interoperable
- modular
- well-engineered, well-documented
- validated
5ICR WS at a Glance
Special Interest Group Lead
Pathways Tools Shannon McWeeney, Ohio State
Genome Annotation Craig Street, U Penn
Data Analysis and Statistical Tools Ted Liefeld, MIT/Broad
Proteomics Tom Moloshok, Fox Chase
Translational Tools Terry Braun, U Iowa
caArray Users' Group Mervi Heiskanen, NCICB
6Participating Cancer Centers
- Burnham Institute
- Cold Spring Harbor
- Columbia UniversityHerbert Irving
- DartmouthNorris Cotton
- Duke University
- Fox Chase
- Fred Hutchinson Cancer Research Center
- Georgetown UniversityLombardi
- Massachusetts Institute of Technology
- Memorial Sloan Kettering
- Meyer L. Prentis-Karmanos
- New York University
Northwestern UniversityRobert H. Lurie Oregon
Health and Science University Thomas Jefferson
UniversityKimmel University of California San
Francisco University of Chicago University of
IowaHolden University of Michigan University of
North CarolinaLineberger University of
PennsylvaniaAbramson University of South
FloridaH. Lee Moffitt Vanderbilt
UniversityIngram Washington UniversitySiteman Wi
star
7Overview of Year 1 Year 2 Projects
- 27 funded Cancer Centers
- 21 development projects in the workspace,
representing 44 total developer/adopter SOWs
8Pathways Projects
- Reactome Data
- Developer Cold Spring Harbor Laboratory
- Adopter Memorial Sloan-Kettering Cancer Center
- Pathways Tools (cPath, Cytoscape, BioPAX)
- Developer Memorial Sloan-Kettering Cancer Center
- Adopter Oregon Health and Science University
- QPACA
- Developer University of California, San
Francisco - Adopter Oregon Health and Science University
9Microarray Repositories
- caArray
- Developer NCI Center for Bioinformatics
- Adopters Georgetown University, New York
University, Wistar, Thomas Jefferson University - NCI-60 data
- Developer NCI Center for Cancer Research
- Adopter Memorial Sloan-Kettering Cancer Center
10Proteomics Tools
- RProteomics
- Developer Duke University
- Adopters University of Pennsylvania, Oregon
Health and Science University - Proteomics LIMS
- Developer Fox Chase
- Adopter University of South Florida
- Q5
- Developer Dartmouth University
- Adopter Oregon Health and Science University
11Genome Annotation
- FunctionExpress
- Developer Wash U
- Adopter Wistar
- Cancer Molecular Pages
- Developer Burnham
- Adopter Moffitt
- Seed
- Developer U Chicago
- Adopter Georgetown
- PIR
- Developer Georgetown
- Adopter Penn
- GOMiner
- Developer CCR
- Adopter Wistar
- TrAPSS
- Developer U Iowa
- Adopter Wistar
- HapMap Data
- Developer CHSL
- Adopter Wistar
- Vertebrate Promoter Data
- Developer CSHL
- Adopter MSKCC
12Typical Project Tasks
- Use case document (developer, with adopter
approval) - Software requirements specification (developer)
- Data model (developer, with VCDE WS approval),
data elements registered in caDSR - Code, compatible with caBIG guidelines
(developer, with Architecture WS approval) - Test Procedures (adopter)
- Installation Guide (developer)
- Training Plan (adopter)
- User Guide (adopter)
13A Data Resource gridPIR
- One of three reference implementations from ICR
for caGrid - Developer PIR of Georgetown University
- Adopter U.Penn, BMIF
- Provides comprehensive and fully annotated
protein related information for genomic and
proteomic cancer research - Currently 48 objects related to Protein, Gene,
Taxonomy and Protein Features are exposed to
caGrid - Developed using Model Driven Approach
14A Data Resource gridPIR
Use cases, SRS
Data Model Creation
Semantic Annotation
caDSR Registration
Code Generation
Object/Relational Mapping
Grid Deployment
UniProt Knowledgebase iProclass
Picture is from caCORE SDK Programmers Guide
15Example Scenarios
16Annotate List of Genes and Proteins
- Example Get physical and functional properties
and homologies for 1500 proteins detected in
serum sample - Using caBIG standard APIs, query
- Cancer Molecular Pages Burnham
- PIR Georgetown On the caGrid
- SEED U. Chicago
- Retrieve data - protein features, molecular
weight, functional domains, modified residues,
homologies and more
17Display Expression Data on Pathways
- Goal highlight functional roles of genes
overexpressed in glioblastoma multiforme samples
(compared with normal) - Query caArray repositories for availability of
samples retrieve data in MAGE-ML format. - Query cPath and Reactome for network data in
BioPAX format - cPath protein/protein interaction data MSKCC
- Reactome curated pathways CSHL
- Using Cytoscape, superimpose expression data on a
network with gene expression values displayed
along a color gradient - Cytoscape plugins for cPath, BioPAX, MAGE-ML
MSKCC - Use QPACA UCSF to assess match between
expression data and pathway membership