Title: Good Cells, Bad Cells, and the Grid
1Good Cells, Bad Cells,and the Grid
- J. Robert Beck, MD
- Fox Chase Cancer Center, Philadelphia PA USA
- April 2007
2Organization of the Talk
- Cell Biology Research on Grids
- There is not very much most is molecular
- Most projects are using good tools and
establishing proof of concept - Cancer Research and caBIG
- Three year pilot phase completed
- New directions uncertain
3The Year 2013 (Silva and Ball, IJMI 02)
- Grid computing
- Intimate computing
- Microelectromechanical computing
- All these technologies combine to redefine
clinical care - Thesis of caBIG project
- Thesis of Medigrid and myGrid projects
4Good Cells Cell Biology Research
5How a DataGrid Can Help Cellular Research
Creating a community of scholars Providing
necessary middleware solutions Cellular
imaging Cellular Signaling Pathway Visualization
6Integrative Biology Projects on the Grid
7myGrid
- Project based in UK
- Type Service Grid
- Open Grid Service Architecture
- Globus toolkit
- Most proposed applications in molecular biology
- Whole cell studies possible
Stevens et al, Bioinformatics 03
8Integrative Biology Project
- Project based in UK e-Science programme
- Type Computational and Data Grids, some service
components - Open Middleware
- Globus toolkit
- Applications in disease modeling
- Focus on simulation, should support whole cell
studies
Gavaghan et al, Phil Trans R Soc 05
9Visible Cell Project (Queensland)
- Develop and underlying 3D spatial matrix from
cellular tomographic images - Integrate macromolecular data into the matrix
- Establish matrix as a dynamic modeling
environment - Position system as basis for exploratory research
During the next 5 years, the Visible Cell will
emerge as a sophisticated data environment where
proteomic, genomic, molecular, cell and
developmental biology data from multiple, diverse
sources will be integrated. Through the Visible
Cell interface, researchers will be able to
utilise advanced computational simulations to
model and predict changes in cellular behaviour
at molecular and cellular levels.
Tool Open Microscopy Environment
Data and potential grid resource for light
microscopy
Burrage et al, Brief Bioinf 06
10Grid Response to SARS Outbreak
- Shows Value in Developing Community Service Grids
11SARS The Disease
- The period of incubation for the coronavirus of
SARS is 3-7 days. 10 days quarantine is the key
to contain the virus. - Transmission through physical contact or
breathing in aerosolized particles of the saliva
or phlegm of a sick person. - Once infected there are symptoms of fever for a
few days, followed by an acute pneumonia.
12Taiwans Knowledge Innovation National Grid
- KING Steering and
- Deployment committees
- Supercomputing Center
- Grid Operation Center
- Network Operation Center (NOC)
- Grid applications
- ECO Grid
- Asthma Grid
- SARS Grid
- Access Grid
- Biology Grid
- Hazard Mitigation Emergency Response Grid
- e-learning Grid
International Innovation and Collaboration
13Establishment of a SARS Grid
- Find an immediate solution from the current Grid
works Asthma Grid, Access Grid and Ecogrid. - Specific medical information network
- Construct AG nodes for hospitals.
- Integrate GIS system.
- Monitor the quarantined rooms.
- 3 Hospitals and CDC had AG nodes setup and in
operation. Dedicated network deployed. - SARS Grid portal www.sarshope.org developed
- SARS spread GIS portal developed
- International Supports http//www.npaci.edu/Press
/03/052903_SARS.html - http//ncmir.ucsd.edu/news/
14Bad Cells
- The Cancer Biomedical Informatics Grid (caBIG)
Project
15Cancer as a Complex Adaptive System
base state
selection
selection
selection
malignantstate
mutation
mutation
mutation
16Measure states indirectly
base state(s)
malignantstate(s)
Mutationstatus
Alleleloss
Constitutionalvariation
RNAexpression
Epigeneticvariation
17(No Transcript)
18The bench-to-bedside-to-bench cycle
- Promises a future of personalized medicine
- But
19Biomedical information tsunami
- overwhelming volume of data
- multitude of sources
20Informatics tower of Babel
- Each cancer research community speaks its own
scientific dialect - Integration critical to achieve promise
21Interdisciplinary Intersection
Computer Science
Grid Radiology, Pathology
Security
Adverse Event Rules Engine
Tumor Microenvironment
Biomedical Informatics
Clinical Research
Basic Biomedical Science
Automated Pathology
Ontologies and Neuroscience
22Biomedical Informatics and Middleware
Translates and Integrates Information Natural
Language Processing Ontologies
Disseminates Information Grid Information
Integration
Brings in Information Grid Information Integration
23caBIG promotes the Vision
NCI 2015 challenge goal eliminate suffering and
death due to cancer
Nearly every facet of NCIs strategic plan to
eliminate suffering and death due to cancer is
predicated on the revolutionizing potential of
caBIG.Cancer Bulletin, 2005
A.C. von Eschenbach, M.D.Former Director,
National Cancer Institute Director, Food Drug
Administration
24Scenario, 2009
Cell Type
Small Molecules
- A researcher involved in a phase II clinical
trial of a new molecularly targeted therapeutic
for brain tumors observes that cancers derived
from one specific tissue progenitor appear to be
strongly affected. The trial has been generating
proteomic and microarray data. The researcher
would like to identify potential biochemical and
signaling pathways that might be different
between this cell type and other potential
progenitors in cancer, deduce whether anything
similar has been observed in other clinical
trials involving agents known to affect these
specific pathways, and identify any studies in
model organisms involving tissues with similar
pathway activity.
Pathways
Clinical Trials
Animal Models
Therapeutics
Homologous Proteins
Michael Ochs, 2005
25How is such research conducted?
- Today a lot of manual work finding sources,
other groups working on problems, getting data
from other sites, re-analyzing, etc. - With caBIG, much of the work is automated across
a data grid, caGrid - Security model authenticates and authorizes the
investigator - Data is made available for translational use
- Standard tools and architectures exist for
analytical flow
26Today vs 2009?
- In 2006 such a study would involve immense
manual work getting information locally and from
other sites, precluding the possibility of
identifying the required data and thus being
unable to deduce the likely significance of the
trial observation. - However, with caBIG compliant components now
under development, the researcher would be able
to perform the analysis routinely, with data
flowing through systems and analysis being
automatic. - This analysis will yield biomarkers and
potential drug targets gathered from multiple
workspaces and make it possible to develop
treatment modalities faster, less expensively,
and more effective for patients.
27Promoter DB
Discovery utilizing caBIG Integrated Cancer
Research Tools
Identify recurring promoter elements
Gene Pattern
FunctionExpress
Clinical Trials Database
Gene annotation
Analysis
Identify up-regulated genes in specific pathways
Gene expression profiling
Clinical Trials Tumor Samples
400 brain tumor tissue samples acquired
caArray
Pathways Tool
Pathology reports
Mutation identification
caTIES
Potential Drug Targets and Biomarkers
caTissue
Discrete and manual annotation on tissues
TrAPSS
Clinical Annotation Modules
Analysis
Annotation
Proteomics LIMS
Q5
PIR
28Thinking about a Solution
- A virtual web of interconnected data,
individuals, and organizations redefines how - research is conducted
- care is provided
- patients/participants interact with the
biomedical research enterprise
29Goals of the caBIG pilot
- Illustrate that a spectrum of Cancer Centers with
varying needs and capabilities can be joined in a
common grid of communications, shared data,
applications, and technologies - Demonstrate that Cancer Centers, in collaboration
with NCI, will develop new enabling tools and
systems that could support multiple Cancer
Centers - Create an extensible infrastructure that will
continue to be expanded and extended to members
of the cancer research community - Demonstrate that Cancer Centers will actively use
the grid and realize greater value in their
cancer research endeavors by using the grid
30From the caBIG Strategic Plan, 2005
Vision caBIG will become a self-sustaining
network, which will foster improvements in
collaborative projects and increase the speed and
efficacy of treatment to benefit patients.
Mission caBIG participants will develop
readily disseminated standards tools and
information systems for the management of
clinical and research activities in oncology.
These will include systems for the management of
cancer clinical trials, standards for integrative
research systems, a coherent approach to
biospecimen informatics management, and the
underlying architectures, vocabularies and data
elements that will facilitate sharing and access
to these systems.
31caBIG Pilot action plan
- Establish pilot network of NCI Cancer Centers
- Groups agreeing to caBIG principles
- Mixture of capabilities
- Mixture of contributions
- Expanding collection of participants
- Establish consortium development process
- Collecting and sharing expertise
- Identifying and prioritizing community needs
- Expanding development efforts
- Moving at the speed of the internet
32caBIG principles
- Open source
- Open access
- Open development
- Federated
33Common needs helped shape priority areas for the
caBIG pilot activities
Database Datasets
Imaging Tools Databases
Integration
High Performance Computing
Clinical Trial Management Systems
Pathways
Licensing Issues
LIMS
Meeting
Microarray Gene Expression Tools
Tissue Banks Pathology
Proteomics
Remote/Bandwidth
Visualization Front-End Tools
Statistical Data Analysis Tools
Vocabulary Ontology Tools Databases
Integrative Cancer Research
Meta-Project
Common Data Elements Architecture
Center Integration Management
Tissue Pathology Tools
Access to Data
Translational Research Tools
Distributed Data Sharing/Analysis Tools
Staff Resources
Clinical Data Management Tools
0
5
10
15
20
25
30
35
Number of Needs Reported
34and we quickly learned.
35This isnt Rocket Science
- A lot of caBIG isnt even computer science
- Most industries did much of this years ago
- Really this is an engineering project
- But it is hard to achieve it takes time
- caBIGs goal (oversimplified) facilitate the
exchange of data useful for cancer research and
care - Between research domains, systems, investigators,
and organizations - For instance, the caBIG compatibility of a
system is determined by how easily the system can
exchange data (i.e., interoperability)
36Four Domain Workspaces and two Cross Cutting
Workspaces were launched
DOMAIN WORKSPACE 1 Clinical Trial Management
Systems
addresses the need for consistent, open and
comprehensive tools for clinical trials
management.
DOMAIN WORKSPACE 2 Integrative Cancer Research
provides tools and systems to enable integration
and sharing of information.
DOMAIN WORKSPACE 3 Tissue Banks Pathology Tools
provides for the integration, development, and
implementation of tissue and pathology tools.
DOMAIN WORKSPACE 4 Imaging
provides for the sharing and analysis of in vivo
imaging data.
responsible for evaluating, developing, and
integrating systems for vocabulary and ontology
content, standards, and software systems for
content delivery
CROSS CUTTING WORKSPACE 1 Vocabularies Common
Data Elements
developing architectural standards and
architecture necessary for other workspaces.
CROSS CUTTING WORKSPACE 2 Architecture
37Strategic Level Workspaces
Data Sharing and Intellectual Capital
Addresses issues related to the sharing of data,
applications and infrastructure both within the
consortium and in the larger cancer research
community.
Training
Developing strategies for providing training in
the use of the caBIG developed resources
including on-line turtorials, workshops, training
programs.
caBIG Strategic Planning
Assists in identifying strategic priorities for
the development and evolution of the caBIG effort.
38(No Transcript)
39Overall Goals for caBIGThree-year (mid-2007)
- Develop sufficient research tools and standards
to have a positive impact on the cancer research
community, as measured by adoption of relevant
caBIG principles in project proposals. - Ensure widespread adoption of developer standards
so that funded developer projects are operating
under the Gold standard of compatibility. - Adopt and use caBIG interoperable tools and data
sets within the caBIG community. - Develop mechanisms for engaging and promoting
caBIG compliant technologies and established
datasets within the oncology research community.
40Overall Goals for caBIGFive-year (2010)
- Ensure widespread adoption, dissemination, and
use of caBIG interoperable tools, standards, and
data sets within the larger cancer community, to
include the biopharmaceutical industry, non-NCI
cancer centers, and the national cancer research
enterprise. - Begin to see results of caBIG-compliant
interdisciplinary and inter-institutional
research affecting clinical oncology care.
41Architecture
- Conceptually, caBIG has adopted two primary
guiding principles - To bring systems on-line quickly, caBIG is
committed to a bias for action. This implies a
commitment to making decisions and moving
forward, even if perfection cannot be achieved. - To allow long-term evolution and improvement of
architectural design, caBIG is committed to
designing for change. - To turn these thoughts into action, caBIG has
also adopted a two-pronged practical approach - If requirements are well-understood and good
solutions are available, caBIG initiates
developmental activities within the architectural
workspace. - If requirements are less clear or if solutions
are not yet available, caBIG commissions analysis
and assessment activities.
42caBIGTM Compatibility Guidelines
- The caBIGTM compatibility guidelines are designed
to insure that systems designed in a Federated
environment are still interoperable on the
caBIGTM Grid, both syntactically and
semantically - Since achieving interoperability is a process,
caBIGTM recognizes four levels of compatibility,
starting from Legacy (not interoperable) through
Bronze, Silver and Gold (fully interoperable) - caBIGTM compatibility is all about interfaces
rather than the scientific content of the system
43SYNTACTIC
caBIG Compatibility Guidelines
44A Lot of Stuff Has Emerged
- And much of it is based on standard tools and
architectures
45caBIG Deliverables Clinical Trials Management
Systems
- Biomedical Research Integrated Domain Group Model
(BRIDG) - Adverse Events Reporting Tool
- Cancer Clinical Comprehensive Dictionary (C3D)
- Cancer Community Clinical Patient Registry (C3PR)
- Clinical Research Information Exchange (CRIX)
- caBIG Compatibility evaluation for existing
commercial tools - Harmonization of UML Representations
- Ontological Representations and Data Elements for
Clinical Trials - Metadata Harmonization
- Componentized, interoperable and standards-based
Clinical Trials Management Systems, both
purpose-built and commercial off-the-shelf to
handle, in an automated fashion, many aspects of
developing, managing, conducting, and reporting
Clinical Trials
46Clinical Research IT Infrastructure
External Reporting
Clinical Systems
Clinical Trials
TranslationService
etc.
HL7-v3, Janus
HL7-v3, Janus
HL7- v2.x,other
Labs, EMR, Tissue, etc.
HL7- v3
Lifecycle Management
ClinicalResearchInformation Exchange
HL7 trans-actionaldatabase
HL7/CAM SDK
Adverse Events
FDA
Participant Registry
SPONSOR
EDC
NCI
Clinical Data Mgmt
other
PatientHealthRecord
ResearchDataWarehouse
De-identification Services
47caBIG Deliverables Tissue Banks and Pathology
Tools
- caTISSUE Core
- caTIES
- caTISSUE Clinical Annotation Engine
- caTISSUE Experimental Annotation Engine
- Requirements Specifications Survey and Results
- Federated Tissue Data Set White Paper
- Cancer Translational Informatics Platform
(caTRIP)
- Systematic description and characterization of
tissue resources tools to inventory, track,
mine, and visualize tissue samples from
geographically dispersed repositories, with an
ability to link tissue resources to clinical and
molecular correlative descriptions
48caBIG Deliverables Integrative Cancer Research
- caArray
- geWorkbench 2.0
- GenePattern
- Gene Ontology Miner (GOMiner)
- Protein Information Resource (PIR)
- RProteomics
- Pathways Tool Development
- Tools Distance-Weighted Discrimination
- Magellan
- Visual and Statistical Data Analyzer (VISDA)
- Cancer Molecular Pages
- The ICR Workspace seeks to provide for the
development of a Plug and Play analytic tool
set, suitable for a variety of experiemental
methodologies, including microarrays, proteomics,
biological pathways, data analysis and
statistical methods, gene annotation, et al. It
will also develop a diverse library of raw,
structured data and facilitate the integration of
different types of data. All of these tools
would help in integration of clinical and basic
research
49caBIG Deliverables Integrative Cancer Research
(contd)
- Proteomics Laboratory Information Management
System (LIMS) Prototype - Q5
- TrAPSS
- Gene Connect
- Integrating Bioconductor and R into caBIG
- Reverse Phase Protein Lysate Array based data for
caArray - Cancer Translational Informatics Platform (caTRIP)
- FunctionExpress
- HapMap, PromoterDB
- SEED
- NCI-60 Data Sharing
- Quantitative Pathway Analysis in Cancer (QPACA)
- Reactome (GKB) Data
50(No Transcript)
51caBIG Deliverables Architecture
- The Architecture Cross-Cutting Workspace provides
for the development of the underlying standards
used by the program, and ensures that common
mechanisms are used throughout the caBIG
community via mentoring, white papers and a
structured review process.
- caBIG Compatibility Guidelines
- caGrid 0.5 Security White Paper
- caGrid Software Version 0.5
- caGrid 1.0
- Technology Evaluation White Paper
- caBIG - The Security White Paper (Technology
Evaluation) - Workflow Language Recommendations White Paper
- ID Management White Paper
- Common Query Language White Paper
52caBIG Deliverables Vocabularies and Common Data
Elements
- The Vocabularies and Common Data Elements
Cross-Cutting Workspace provides for the
development of the underlying data elements and
vocabularies used by the program, and ensures
that common mechanisms are used throughout the
caBIG community via mentoring, white papers and
a structured review process.
- LexGrid
- CDE Governance Model
- VCDE Guidance Mentoring Teams
- Vocabularies Deployment Document
- Data Standards Approval Guidelines
- Procedures for the Review and Approval of New
VCDE Content - Mouse/Human Anatomy Ontology Mapping
- Nutrition Ontology
53Standards-based interoperability the cancer
common object resource environment (caCORE)
biomedical objects
- Community driven
- Dynamic implementation
- Built to be upgraded as standards harden, and
domains expand
common data elements
controlled vocabulary
54Standards infrastructure and services
- Enterprise Vocabulary Services (EVS)
- Browsers
- APIs
- cancer Bioinformatics Infrastructure Objects
(caBIO) - Applications
- APIs
- cancer Data Standards Repository (caDSR)
- CDEs
- Case Report Forms
- Object models
- ISO 11179 model
- Developer Toolkits
- caCORE SDK
- caAdapter
55caGrid
- Grid Infrastructure for caBIG
- caGrid Components
- Language (metadata, ontologies)
- Security
- Advertisement and Discovery
- Workflow
- Grid Service Graphical Development Toolkit
56repositories
Gene Expression Data
Tissue Bank
Research Center
caCORE - caBIO - caDSR - EVS
NCICB
Data Mart
Clinical Data
Gene Expression Data
- Data Services
- Analytical Services
- Annotation Services
- Service Advertisement
- Service Discovery
- Service Query
- Semantic mapping
- Security Services
Clinical Data
Proteomics Data
Analysis Tools
Research Center
Genomics Data
57caGrid 1.0 Security Needs
- Authentication
- Process of determining whether someone or
something is, in fact, who or what it is declared
to be. - Authorization
- Process of determining if an authenticated user
may do something on a given resource. - Can User X perform Operation Y on Resource Z?
- Trust Management
- Supports applications and services in deciding
whether or not signers of digital
credentials/user attributes can be trusted. - Secure Communication
- The ability to guarantee the integrity and/or
privacy of messages between two parties
58Authorization Notional Architecture
Courtesy of Kenneth Lin, BAH
59caGrid Trust Management
60A caGrid Illustration Virtual PACS
- Present a PACS interface to analytical and data
sources on the grid. - Use your own DICOM Workstation
- Virtual PACS federates services on the Grid using
caGrid
61Infrastructure Contemporary
- A compatibility evaluation process for caBIG
program projects and a certification process for
externally developed tools are established
- A rich set of harmonized standards and
vocabularies continues to grow in size
- Tooling available to provide site-specific
vocabularies and ontology management and support
- Mentors actively working in caBIG Community and
beyond to ensure consistency across key projects
and adherence to caBIG goals
- APIs with common interfaces facilitate
scientific workflows
Many applications Grid enabled (e.g., gene
pattern, reactome)
Instantiated formal process for evaluation and
harmonization
End user portal available, security
infrastructure)
NCICB housed infrastructure for CDEs, and
vocabularies
62Infrastructure The Future
- A rich set of community developed harmonized
standards and vocabularies continues to grow
- Tooling available to provide site specific
vocabularies and ontology management and support
- Certification process for externally developed
tools
- Mentors actively working in caBIG Community and
beyond to ensure consistency across key
projects/adherence to caBIG goals
- Developed standards increase in number
mechanisms exists for community to develop and
harmonize standards and compatibility guidelines
Functional applications part of standard
practice/fully deployed on GRID
Multiple sites host portions of the
federated, scaleable, standards-based
infrastructure
NCICB housed infrastructure for CDEs and
vocabularies
Vocabulary services are federated
63caBIG Tools Today
TBPT
CTMS
ICR
64caBIG Tools Tomorrow
ICR
65caBIG Tools The Future
ICR
66caBIG - Interaction Mechanisms
- For all participants
- Annual meeting
- Online Town Hall quarterly
- Addresses solicited questions
- Monthly program update newsletter (big picture)
- Whats big this week weekly newsletter (e.g.
workspace meeting schedule)
- For Cancer Center Directors
- Directors newsletter
- For Workspaces participants
- Monthly teleconferences (more frequently as
needed) - Quarterly meeting (face to face)
- For all participants and the general public
- caBIG website
67caBIG Involves a Large Community with a Wide
Range of Interests
Ohio State University-Arthur G. James/Richard
Solove Oregon Health and Science
University Roswell Park Cancer Institute St Jude
Children's Research Hospital Thomas Jefferson
University-Kimmel Translational Genomics Research
Institute Tulane University School of
Medicine University of Alabama at
Birmingham University of Arizona University of
California Irvine-Chao Family University of
California, San Francisco University of
California-Davis University of Chicago University
of Colorado University of Hawaii University of
Iowa-Holden University of Michigan University of
Minnesota University of Nebraska University of
North Carolina-Lineberger University of
Pennsylvania-Abramson University of
Pittsburgh University of South Florida-H. Lee
Moffitt University of Southern
California-Norris University of
Vermont University of Wisconsin Vanderbilt
University-Ingram Velos Virginia Commonwealth
University-Massey Virginia Tech Wake Forest
University Washington University-Siteman Wistar Ya
le UniversityNorthwestern University-Robert H.
Lurie
9Star Research Albert Einstein Ardais Argonne
National Laboratory Burnham Institute California
Institute of Technology-JPL City of Hope
Clinical Trial Information Service (CTIS) Cold
Spring Harbor Columbia University-Herbert
Irving Consumer Advocates in Research and
Related Activities (CARRA) Dartmouth-Norris
Cotton Data Works Development Department of
Veterans Affairs Drexel University Duke
University EMMES Corporation First Genetic
Trust Food and Drug Administration Fox Chase
Fred Hutchinson GE Global Research
Center Georgetown University-Lombardi IBM Indiana
University Internet 2 Jackson Laboratory Johns
Hopkins-Sidney Kimmel Lawrence Berkeley
National Laboratory Massachusetts Institute of
Technology Mayo Clinic Memorial Sloan
Kettering Meyer L. Prentis-Karmanos New York
University
68If caBIG accomplishes its mission and creates a
robust grid for translational and clinical
research, within the cancer community, it will be
deemed a failure.
- Bob Robbins (Fred Hutchinson Cancer Research
Center), at the initial Strategic Planning
Workspace meeting
69Prevention is Better than Cure
- --Desiderius Erasmus (1466-1536)
Embedding caBIG in the larger biomedical
research community
70The Future
- A worldwide biomedical grid community
- Bringing translational and clinical research to
personalized medicine