Title: caBIG: the cancer Biomedical Informatics Grid
1caBIG the cancer Biomedical Informatics Grid
- Ken Buetow
- NCICB/NCI/NIH/DHHS
2NCI biomedical informatics
- Goal A virtual web of interconnected data,
individuals, and organizations redefines how
research is conducted, care is provided, and
patients/participants interact with the
biomedical research enterprise
3etiology,treatment,prevention
4building common architecture, common tools, and
common standards
accessportals
participatinggroup nodes
ClinicalTrials
MolecularPathology
caCORE
CancerGenomics
MouseModels
5Interoperability
Courtesy Charlie Mead
- interoperability
- ability of a system...to use the parts or
equipment of another systemSource
Merriam-Webster web site - interoperability
- ability of two or more systems or components to
exchange information and to use the information
that has been exchanged.Source IEEE Standard
Computer Dictionary A Compilation of IEEE
Standard Computer Glossaries, IEEE, 1990
Semanticinteroperability
Syntacticinteroperability
6Enterprise Vocabulary
- NCI Meta-Thesaurus (Cross-map standard
vocabularies/ontologies, e.g. SNOMED, MEDRA,
ICD) - Semantic integration, inter-vocabulary mapping
- UMLS Metathesaurus extended with cancer-oriented
vocabularies - 800,000 Concepts, 2,000,000 terms and phrases
- Mappings among over 50 vocabularies
- NCI Thesaurus
- Description logic-based
- 18,000 Concepts
- Concept is the semantic unit
- One or more terms describe a Concept synonymy
- Semantic relationships between Concepts
biomedical objects
common data elements
controlled vocabulary
7Common Data Elements
- Structured data reporting elements
- Precisely defining the questions and answers
- What question are you asking, exactly?
- What are the possible answers, and what do they
mean?
biomedical objects
common data elements
controlled vocabulary
8Biomedical Information Objects
- Data service infrastructure developed using OMGs
Model Driven Architecture approach - Object models expressed in UML represent actual
biomedical research entities such as genes,
sequences, chromosomes, sequences, cellular
pathways, ontologies, clinical protocols, etc. - The object models form the basis for uniform APIs
(Java, SOAP, HTTP-XML, Perl) that provide an
abstraction layer and interfaces for developers
to access information without worrying about the
back-end data stores
biomedical objects
common data elements
controlled vocabulary
9Standards supporting infrastructure
- Enterprise Vocabulary Services (EVS)
- Browsers
- APIs
- cancer Bioinformatics Infrastructure Objects
(caBIO) - Applications
- APIs
- cancer Data Standards Repository (caDSR)
- CDEs
- Case Report Forms
- Object models
- ISO 11179 model
10Integrating Architecture
Data
Object
Presentation
Client
Domain Objects
HTML (Browsers)
Web Server
Tomcat Servlets JSPs SOAP XML XSL/XSLT
HTML/XML Clients
RMI
Object Managers
SOAP Clients
Meta-Data
Data Access Objects
PERLClients
Java Applications
11Semantic Integration Modeling Time
Class
Attributes
Mapping to EVS Concepts Done at Modeling Time
12Semantic IntegrationMetadata Registration Time
ISO11179 mapping
caDSR loading
UML model, including EVS Concept mappings
Curation Data standards registration for
instance data
13Semantic Integration Runtime
Presentation
Client
Data
Object
HTML/XML Clients (Browsers)
Web Server
Domain Objects Gene, Disease, Concept, DataElemen
t
Research DBs
Tomcat Servlets ( XML XSL/XSLT ) JSPs SOAP
SOAP Clients
Research DBs
RMI
Object Managers
Perl Clients
Data Access Objects (OJB)
Java Applications
14caGRID caCORE architecture extension
caGRID Extension (Integration of Discovery and
Query Services)
OGSA-DAI Globus
caGRID extension (Concept Discovery)
caGRID extension (Federated Query)
Client
OGSA-DAI
caGRID extension (metadata)
caGRID extension (query)
Grid
Globus
caGRID extension (caBIO adapter)
caBIO client
Data Source
caBIO server
15NCICB applications
- clincial trials support - C3DS
- molecular pathology - caArray
- cancer images - caImage
- pre-clinical models - caModelsDb
- laboratory support - caLIMS
16- Standards-based Data System for the conduct of
clinical trials - C3D (Cancer Central Clinical Database)
- WWW-based eCRF-based primary data capture by
protocol - C3PR (Cancer Central Clinical Participant
Registry) - WWW-based Central registration of participants
across protocols - C3PA (Cancer Central Clinical Protocol
Administration) - Scientific management system for clinical
protocols - C3TR (Cancer Central Clinical Tissue Repository)
- Tissue repository
- C3DW (Cancer Central Clinical Data Warehouse)
- De-identified patient information accessed via
caBIO
17(No Transcript)
18(No Transcript)
19Image Portal
- The NCICB has developed an image portal to allow
researchers to search for mouse and human images
and annotations - Human and mouse images and annotations were
provided by the MMHCC
20Pathway Database
- Enhance value of imperfect, but available,
pathway knowledge - Make biological assumptions explicit
- Combine sources of data (e.g. KEGG, BioCarta,
...) - Merge data from separate pathways
- Build a causal framework to support (future)
quantitative simulation/analysis
21Cancer Biomedical Informatics Grid (caBIG)
- Common, widely distributed infrastructure permits
cancer research community to focus on innovation - Shared vocabulary, data elements, data models
facilitate information exchange - Collection of interoperable applications
developed to common standard - Raw published cancer research data is available
for mining and integration
22caBIG will facilitate sharing of infrastructure,
applications, and data
23caBIG action plan
- Establish pilot network of Cancer Centers
- Groups agreeing to caBIG principles
- Mixture of capabilities
- Mixture of contributions
- Expanding collection of participants
- Establish consortium development process
- Collecting and sharing expertise
- Identifying and prioritizing community needs
- Expanding development efforts
- Moving at the speed of the internet
24Three Domain Workspaces and two Cross Cutting
Workspaces have been launched during the Pilot
phase
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
provides for the integration, development, and
implementation of tissue and pathology tools.
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 2 Architecture
developing architectural standards and
architecture necessary for other workspaces.
25Key deliverables of caBIG pilot
- Componentized, standards-based Clinical Trials
Management System - e-IND filing/regulatory reporting with FDA
- Electronic management of trials
- Integration of diverse trials
- Tissue Management System
- Systematic description and characterization of
tissue resources - Ability to link tissue resources to clinical and
molecular correlative descriptions - Plug and Play analytic tool set
- microarray
- proteomics
- pathways
- data analysis and statistical methods
- gene annotation
- Diverse library of raw, structured data
26Cancer Molecular Analysis Project (CMAP)- a
prototypic biomedical data integration effort
Profiles, Targets, Agents, Clinical Trials
NCBI
CGAP
CTEP clinical trials
UCSC (via DAS)
NCI drug screening
CGAP gene expression
KEGG
GeneOntologies
BioCarta
NCI drug screening
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39caBIG community contributions
- Infrastructure
- Ontologies
- Databases
- Applications
- Clinical trials support
- Analytic tools
- Data mining
- Data
- Trials
- Experimental outcomes
- Genomic
- Microarray
- Proteomic
40(No Transcript)
41acknowledgements
- NCICB
- Peter Covitz
- Sue Dubman
- Mary Jo Deering
- Leslie Derr
- Carl Schaefer
- Christos Andonyadis
- Mervi Heiskanen
- Denise Hise
- Kotien Wu
- Fei Xu
- Frank Hartel
- LPG/CCR
- Michael Edmundson
- Bob Clifford
- Cu Nguyen
http//ncicb.nci.nih.gov http//cmap.nci.nih.gov h
ttp//caBIG.nci.nih.gov