Title: Serviceenabling Biomedical Research Enterprise
1Service-enabling Biomedical Research Enterprise
2Introduction
- Life sciences have witnessed a flurry of
innovations triggered by sequencing of human
genome as well as genomes of other genomes. - Area of transformational medicine aims to improve
communication between basic and clinical science
to allow more therapeutic and diagnostic insights.
3Translational medicine
- From bench to bedside
- Exchange ideas, information and knowledge across
organizational, governance, socio-cultural,
political and national boundaries. - Currently mediated by the internet and
exponentially-increasing resources - Digital resources scientific literature,
experimental data, curated annotation (metadata)
human and machine generated. Ex Blast Searches
NCBI taxonomy
4Driving principles
- Key requirements large volume of data to be
managed. How? - Transform to
- Digital
- Machine readable
- Capable of being filtered
- Aggregated
- Transformed automatically
- Context information use and meaning along with
content - Knowledge integration combines data from
research in mouse genetics, cell bilogy, animal
neuropsychology, protein biology, neuropathology,
and other areas. - Attention to drug discovery, systems bilogy and
personalized medicine that rely heavily on
integrating and interpreting data produced by
experiments. - Heterogenious data
5BioSem Enterprise Architecture
search
Transform results Ex integrate, generate
metadata
Dissemination Of results
Clinical experiments Ex drug discovery
Diagnostic tools
Clinical data Ex JNI
Research Knowledge Ex Blast
ontology
Academic Knowledge Ex cell, psychology molecular
Treatment methods
6Use case
- Parkinsons disease (PD)
- System physiology perspective
- Cellular and molecular biology perspective
- Pharmacology relating to chemical compounds that
bind to receptors - Example query show me the neuronal components
that bind to a ligand which is a therapeutic
agent in Parkinsons disease in reach of the
dopaminergic neurons in the substania nigra. - Domain specific shared semantics and
classifications - Ontologies can help map among the domains and
support seamless integration and interoperation.
7Development of Ontologies
- Manual interaction between ontologists in experts
- Textual descriptions are used for adding to this
base - Link pre-existing ontologies for extensive
coverage
8Ontology design and creation Approach (fig. 5.1)
Subject matter Knowledge (Text)
Identify core terms And phrases
Map phrases to Relationship between classes
Model terms using ontological Constructs
classes, properties
Arrange classes and relationships in subsumption
hierarchies
Information queries
Identify new classes and relationships
Refine subsumption hierarchies
Pre-existing classifications And ontologies
Re-use classes and relationships
Extenf subsumption hierarchies
9Identifying concepts and hierarchies
- Text describing PD in p.105
- Study the analysis
- Based on the analysis identify important
ontological concepts relevant to PD - Genes
- Proteins
- Genetic mutations
- Diseases
- See fig. 5.2
- Next step is to identify relationship among
concepts
10Identifying and extracting relationships
11Extending the ontology based on information
queries
- Consider various queries and identify concepts
and relationships needed to be part of PD
ontology. - These concepts are needed to retrieve information
and knowledge from the system. - This lead to additional new concepts. See fig.5.4
12PD adding concepts to support information queries
13Ontology Re-use
- It is desirable to re-use the ontology and
vocabulary developed in the healthcare and
life-sciences fields. - Diseases PD information can be used in
Huntingtons and Alzeimers. PD can reuse
information from International classification of
diseases ICD and its subset SNOMED. - Genes more genes and genomic concepts such as
proteins, pathways are added to ontologies.
Consider connecting to Gene Ontology. - Neurological concepts Consider using Neuro names
2007. - Enzymes concepts related to enzymes and other
chemicals may be required you may use Enzyme
Nomenclature 2007 - Be aware of inconsistencies and circularities.
- Multiple models may emerge choice should be
based on use cases and functional requirements.
14Data sources
- Now answering the question that we posted in
slide6, three data sources need to be
integrated - Neuron database, PDSP KI database, PubChem
15Data Integration
- A centralized approach where data available
through web based interfaces is converted into
RDF and stored in a centralized repository - A federated approach where data continues to
reside in the existing repositories. RDF mediator
converts underlying data into RDF format. - RDF allows for focus on logical structures of
information in contrast to only representational
format (XML) or storage format (relational).
16Mapping ontological concepts to RDF graphs
- Sample query discussed earlier results in these
concepts - Compartment located_on Neuron
- Receptor located_in Compartment
- Ligand binds_to Receptor
- Ligand associated_with Disease
- Next task to map these into RDF maps in the
underlying data sources. - Using ontological definitions, data sources,
SPARQL queries, and name space, RDF graphs are
extracted.
17Generation and merging of RDF graphs
D1 UR14
Parkinsons disease UR16
D_Neuron UR12
Neuron Database
type
associated_with
binds_to
Neuron UR12
D1 UR14
5-H Tryptamine UR15
5-H Tryptamine UR15
Located_in
D_Dendrite UR12
Located_in
PDSPKI Database
PubChem database
18Integrated RDF graph
Parkinsons disease UR16
D_Neuron UR12
type
associated_with
Neuron UR12
5-H Tryptamine UR15
Located_in
binds_to
D1 UR14
D_Dendrite UR12
Located_in
19Assignment 2
- Consider the PD case study that used ontological
approach to querying distributed databases. - Discuss 10 reasons of using this approach as
opposed to common SQL query and relational
database approach. - Why is Google, Yahoo or MSN search not good
enough for searching biological database? - Discuss centralized and federated approach to
data integration in the context of this case
study. - Submit a softcopy of the document in the digital
drop box. - How to do this? Read Chapter 5, read it again.
The answers can be formed from the information
provided there and from your experience with
relational database systems.
20Summary
- Semantic web technologies provide an attractive
technological informatics foundation for enabling
the Bench to Bedside Vision. - Many areas of biomedical research including drug
discovery, systems biology, personalized medicine
rely heavily on integrating and interpreting
heterogeneous data set. - This is part of ongoing work in the framework of
the work being performed in the Healthcare and
Life Sciences Interest Group of W3C.