Title: The Future of Biomedical Informatics
1The Future of Biomedical Informatics
- Barry Smith
- University at Buffalo
- http//ontology.buffalo.edu/smith
2- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
3- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
4Biomedical Informatics Needs Data
- Four sides of the equation of translational
medicine - Biological data clinical data
- Access usability
5Problems of gaining access to clinical data
- privacy, security, liability
- incentives (value of data ...)
- costs (training ...)
6Making data (re-)usable through standards
- Standards provide
- common structure and terminology
- single data source for review (less redundant
data) - Standards allow
- use of common tools and techniques
- common training
- single validation of data
7Problems with standards
- Not all standards are of equal quality
- Once a bad standard is set in stone you are
creating problems for your children and for your
childrens children - Standards, especially bad standards, have costs
8- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
9Multiple kinds of data in multiple kinds of silos
- Lab / pathology data
- Clinical trial data, including regulatory data
- Electronic Health Record data
- Patient histories (free text)
- Medical imaging
- Microarray data
- Protein chip data
- Flow cytometry
- Mass spectrometry data
- Genotype / SNP data
- Mouse data, fly data, chicken data ...
10How to find your data?
- How to find other peoples data?
- How to reason with data when you find it?
- How to work out what data you do not have?
- How to understand the significance of your own
data from 3 years ago?
11- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
12 Sharing Research Data Investigators submitting
an NIH application seeking 500,000 or more in
direct costs in any single year are expected to
include a plan for data sharing or state why this
is not possible (http//grants.nih.gov/grants/poli
cy/data_sharing).
13Program Announcement (PA) Number
PAR-07-425 Title Data Ontologies for
Biomedical Research (R01) NIH Blueprint for
Neuroscience Research, (http//neurosciencebluepri
nt.nih.gov/)National Cancer Institute (NCI),
(http//www.cancer.gov)National Center for
Research Resources (NCRR), (http//www.ncrr.nih.go
v/)National Eye Institute (NEI),
(http//www.nei.nih.gov/)National Heart Lung and
Blood Institute (NHLBI), (http//http.nhlbi.nih.go
v )National Human Genome Research Institute
(NHGRI), (http//www.genome.gov)National
Institute on Alcohol Abuse and Alcoholism
(NIAAA), (http//www.niaaa.nih.gov/)National
Institute of Biomedical Imaging and
Bioengineering (NIBIB), (http//www.nibib.nih.gov/
)National Institute of Child Health and Human
Development (NICHD), (http//www.nichd.nih.gov/)N
ational Institute on Drug Abuse (NIDA),
(http//www.nida.nih.gov/)National Institute of
Environmental Health Sciences (NIEHS),
(http//www.niehs.nih.gov/)National Institute of
General Medical Sciences (NIGMS),
(http//www.nigms.nih.gov/)National Institute of
Mental Health (NIMH), (http//www.nimh.nih.gov/)N
ational Institute of Neurological Disorders and
Stroke (NINDS), (http//www.ninds.nih.gov/)Nation
al Institute of Nursing Research (NINR),
(http//www.ninr.nih.gov) Release/Posted Date
August 3, 2007 Letters of Intent Receipt
Date(s) December 18, 2007, August 18, 2008,
December 22, 2009, and August 21, 2009 for the
four separate receipt dates.
14Purpose. Optimal use of informatics tools and
resources data sets depends upon explicit
understandings of concepts related to the data
upon which they compute. This is typically
accomplished by a tool or resource adopting a
formal controlled vocabulary and ontology ...
that describes objects and the relationships
between those objects in a formal way. ... this
FOA solicits Research Project Grant (R01)
applications from institutions/ organizations
that propose to develop an ontology that will
make it possible for software to understand how
two or more existing data sets relate to each
other. Â
15- Currently, there is no convenient way to map the
knowledge that is contained in one data set to
that in another data set, primarily because of
differences in language and structure. - ... in some areas there are emerging standards.Â
Examples include - the Unified Medical Language System (UMLS),
- the Gene Ontology, http//www.geneontology.org/,
- the work supported by the caBIG project
(https//cabig.nci.nih.gov/workspaces/VCDE/), - ontologies listed at the Open Biomedical Ontology
web site (http//obo.sourceforge.net/).Â
16This FOA will support limited awards, each of
which focuses on integrating information between
two (or a few very closely related) data sets in
a single subject domain. The hope is that the
developed vocabularies and ontologies will serve
as nucleation points for other researchers in the
area to build upon by adopting and extending the
vocabularies and ontologies developed under this
FOA. Applicants are expected to identify and
adopt emerging standards (such as those listed
above) whenever possible. Applicants are also
strongly encouraged to federate their data under
appropriate infrastructures when possible. One
potential infrastructure is provided by the
Biomedical Informatics Research Network
(http//www.nbirn.net ). The caBIG
infrastructure (http//cabig.cancer.gov ) is
another well established infrastructure that
researchers should consider.
17NIH anticipates that once important data sets in
a topical area have been unified that others in
that area will adopt the emerging standard. The
nucleation points should be able to interact with
each other, e.g. through the use of tools that
are made freely available to the research
community, such as those created by the National
Center for Biomedical Ontology (NCBO)
(http//bioontology.org/) or by caBIG
18Another determinate of ontology acceptance is the
degree to which the ontology conforms to best
practices governing ontology design and
construction. Criteria have been developed,
and are undergoing empirical validation, by the
Vocabulary and Common Data Element Work Group of
caBIG. Other criteria have been specified by the
OBO Foundry (http//obofoundry.org/ ). In this
FOA, the applicant should specify the criteria
with which the ontology will conform and the
reasons that those criteria are relevant to the
data sets being integrated by the proposed
ontology.      Â
19Growth of Clinical and Translational Research
Consortia
- Examples
- PharmGKB
- caBIG
- BIRN Biomedical Informatics Research Network
- BIRN Ontology Task Force
20- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
21medical records
SNOMED codes
22- The Systematized Nomenclature of Medicine
- built by College of American Pathologists
- now maintained by International Health
Terminology Standards Development Organisation - access via Virginia Tech SNOMED CT Browser
http//snomed.vetmed.vt.edu/ - (semi-) Open Source
23SNOMED often includes non-perspicuous terms
- FullySpecifiedName
- Coordination observable (observable entity)
- Â Â
- FullySpecifiedName
- Coordination (observable entity)
- Â
24and more
- Self-control behavior aggression (observable
entity) - Physical activity target light exercise (finding)
- is a type of physical activity finding (finding)
25odd bunchings
- European is a ethnic group
- 6
- Other European in New Zealand (ethnic group) is a
ethnic group - Mixed ethnic census group is a ethnic group
- Flathead is a ethnic group
26Poor modular development
- No clear strategy for improvement
- Difficult to use for coding
- A tax on world health information technology?
27SNOMED embraces only some of the multiple kinds
of siloed data
- Lab / pathology data
- Electronic Health Record data
- Patient histories
- Clinical trial data, including regulatory data
- Medical imaging
- Microarray data
- Protein chip data
- Flow cytometry
- Mass spectrometry data
- Genotype / SNP data
- Mouse data, fly data, chicken data ...
28- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
29(No Transcript)
30The Gene Ontology
- Open Source
- Cross-Species
- Impressive annotation resource
- Impressive policies for maintenance
31How to do Biology across the Genome?
- MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYED
EKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQG
GPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSF
RVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAP
YMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQI
CNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIP
SVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNN
GVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYV
DDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALG
NSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYA
TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSEL
MANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHC
SFSSTRNAEDV
sequence of X chromosome in bakers yeast
32- MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYED
EKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQG
GPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELS
FRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLA
PYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQ
ICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPII
PSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDN
NGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVY
VDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICAL
GNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY
ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSE
LMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDY
HCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQK
LFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALA
SKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTD
LYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETE
VYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSM
DINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEE
ALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAA
EKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQ
GVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKK
GKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQE
SNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTN
ASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTAS
INVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASIN
VRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSN
TNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTS
ATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERK
KLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLC
KGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPY
HGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQG
SQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDD
TVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLG
MHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVW
LLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCR
DSSREVGE
33what cellular component?
what molecular function?
what biological process?
34A strategy for translational medicine
- Sjöblöm T, et al. analyzed 13,023 genes in 11
breast and 11 colorectal cancers - using functional information captured by GO for
given gene product types identified 189 as being
mutated at significant frequency and thus as
providing targets for diagnostic and therapeutic
intervention. - Science. 2006 Oct 13314(5797)268-74.
35GO widely used
- Sjöblöm T, et al. analyzed 13,023 genes in 11
breast and 11 colorectal cancers - using functional information captured by GO for
given gene product types identified189 as being
mutated at significant frequencies and thus as
providing targets for diagnostic and therapeutic
intervention. - Science. 2006 Oct 13314(5797)268-74.
36Benefits of GO
- links people to data
- links data together
- across species (human, mouse, yeast, fly ...)
- across granularities (molecule, cell, organ,
organism, population) - links medicine to biological science
37- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
382003
- a shared portal for (so far) 58 ontologies
- (low regimentation)
- http//obo.sourceforge.net ? NCBO BioPortal
39(No Transcript)
40(No Transcript)
41Building out from the original GO
42OBO Foundry Coordinators
Ashburner Cambridge
Smith Buffalo
Mungall Berkeley
43The goal
- all biological (biomedical) research data should
cumulate to form a single, algorithmically
processible, whole - http//obofoundry.org
44CRITERIA
- The ontology is open and available to be used by
all. - The ontology is in, or can be instantiated in, a
common formal language. - The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap.
FOUNDRY CRITERIA
45- UPDATE The developers of each ontology commit to
its maintenance in light of scientific advance,
and to soliciting community feedback for its
improvement. - ORTHOGONALITY They commit to working with other
Foundry members to ensure that, for any
particular domain, there is community convergence
on a single controlled vocabulary.
CRITERIA
46- OBO Foundry is serving as a benchmark for
improvements in discipline-focused terminology
resources - yielding callibration of existing terminologies
and data resources and alignment of different
views
Consequences
47Mature OBO Foundry ontologies (now undergoing
reform)
- Cell Ontology (CL)
- Chemical Entities of Biological Interest (ChEBI)
- Foundational Model of Anatomy (FMA)
- Gene Ontology (GO)
- Phenotypic Quality Ontology (PaTO)
- Relation Ontology (RO)
- Sequence Ontology (SO)
48Ontologies being built to satisfy Foundry
principles ab initio
- Common Anatomy Reference Ontology (CARO)
- Ontology for Biomedical Investigations (OBI)
- Protein Ontology (PRO)
- RNA Ontology (RnaO)
- Subcellular Anatomy Ontology (SAO)
49Ontologies in planning phase
- Biobank/Biorepository Ontology (BrO, part of OBI)
- Environment Ontology (EnvO)
- Immunology Ontology (ImmunO)
- Infectious Disease Ontology (IDO)
50- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
51NCBO
- National Center for Biomedical Ontology (NIH
Roadmap Center)
- Stanford Medical Informatics
- University of San Francisco Medical Center
- Berkeley Drosophila Genome Project
- Cambridge University Department of Genetics
- The Mayo Clinic
- University at Buffalo Department of Philosophy
52- Biomedical Informatics Needs Data
- The Problem of Local Coding Schemes
- NIH Policies for Data Reusability and the Growth
of Clinical Research Consortia - Is SNOMED the Solution?
- The Gene Ontology
- The OBO Foundry
- The National Center for Biomedical Ontology
- Ontology in Buffalo
53Ontology Research Group in CoE
- Werner Ceusters
- Louis Goldberg
- Barry Smith
- Robert Arp
- Thomas Bittner
- Maureen Donnelly
- David Koepsell
- Ron Rudnicki
- Shahid Manzoor
54Ontologies in Buffalo
- Common Anatomy Reference Ontology (CARO)
- Environment Ontology (EnvO)
- Foundational Model of Anatomy (FMA)
- Infectious Disease Ontology (IDO)
- MS Ontology
- Protein Ontology (PRO)
- Relation Ontology (RO)
55Ontologies planned
- ICF Ontology
- Food Ontology
- Allergy Ontology
- Vaccine Ontology
- Ontology for Community-Based Medicine
- Psychiatry Ontology