The Future of Biomedical Informatics - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

The Future of Biomedical Informatics

Description:

The Future of Biomedical Informatics Barry Smith University at Buffalo http://ontology.buffalo.edu/smith * * * * * * * A strategy for translational medicine Sj bl m ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 56
Provided by: bioethics
Category:

less

Transcript and Presenter's Notes

Title: The Future of Biomedical Informatics


1
The Future of Biomedical Informatics
  • Barry Smith
  • University at Buffalo
  • http//ontology.buffalo.edu/smith

2
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

3
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

4
Biomedical Informatics Needs Data
  • Four sides of the equation of translational
    medicine
  • Biological data clinical data
  • Access usability

5
Problems of gaining access to clinical data
  • privacy, security, liability
  • incentives (value of data ...)
  • costs (training ...)

6
Making data (re-)usable through standards
  • Standards provide
  • common structure and terminology
  • single data source for review (less redundant
    data)
  • Standards allow
  • use of common tools and techniques
  • common training
  • single validation of data

7
Problems with standards
  • Not all standards are of equal quality
  • Once a bad standard is set in stone you are
    creating problems for your children and for your
    childrens children
  • Standards, especially bad standards, have costs

8
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

9
Multiple kinds of data in multiple kinds of silos
  • Lab / pathology data
  • Clinical trial data, including regulatory data
  • Electronic Health Record data
  • Patient histories (free text)
  • Medical imaging
  • Microarray data
  • Protein chip data
  • Flow cytometry
  • Mass spectrometry data
  • Genotype / SNP data
  • Mouse data, fly data, chicken data ...

10
How to find your data?
  • How to find other peoples data?
  • How to reason with data when you find it?
  • How to work out what data you do not have?
  • How to understand the significance of your own
    data from 3 years ago?

11
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

12
Sharing Research Data Investigators submitting
an NIH application seeking 500,000 or more in
direct costs in any single year are expected to
include a plan for data sharing or state why this
is not possible (http//grants.nih.gov/grants/poli
cy/data_sharing).
13
Program Announcement (PA) Number
PAR-07-425 Title  Data Ontologies for
Biomedical Research (R01) NIH Blueprint for
Neuroscience Research, (http//neurosciencebluepri
nt.nih.gov/)National Cancer Institute (NCI),
(http//www.cancer.gov)National Center for
Research Resources (NCRR), (http//www.ncrr.nih.go
v/)National Eye Institute (NEI),
(http//www.nei.nih.gov/)National Heart Lung and
Blood Institute (NHLBI), (http//http.nhlbi.nih.go
v )National Human Genome Research Institute
(NHGRI), (http//www.genome.gov)National
Institute on Alcohol Abuse and Alcoholism
(NIAAA), (http//www.niaaa.nih.gov/)National
Institute of Biomedical Imaging and
Bioengineering (NIBIB), (http//www.nibib.nih.gov/
)National Institute of Child Health and Human
Development (NICHD), (http//www.nichd.nih.gov/)N
ational Institute on Drug Abuse (NIDA),
(http//www.nida.nih.gov/)National Institute of
Environmental Health Sciences (NIEHS),
(http//www.niehs.nih.gov/)National Institute of
General Medical Sciences (NIGMS),
(http//www.nigms.nih.gov/)National Institute of
Mental Health (NIMH), (http//www.nimh.nih.gov/)N
ational Institute of Neurological Disorders and
Stroke (NINDS), (http//www.ninds.nih.gov/)Nation
al Institute of Nursing Research (NINR),
(http//www.ninr.nih.gov) Release/Posted Date
August 3, 2007 Letters of Intent Receipt
Date(s) December 18, 2007, August 18, 2008,
December 22, 2009, and August 21, 2009 for the
four separate receipt dates.
14
Purpose. Optimal use of informatics tools and
resources data sets depends upon explicit
understandings of concepts related to the data
upon which they compute. This is typically
accomplished by a tool or resource adopting a
formal controlled vocabulary and ontology ...
that describes objects and the relationships
between those objects in a formal way.  ... this
FOA solicits Research Project Grant (R01)
applications from institutions/ organizations
that propose to develop an ontology that will
make it possible for software to understand how
two or more existing data sets relate to each
other.  
15
  • Currently, there is no convenient way to map the
    knowledge that is contained in one data set to
    that in another data set, primarily because of
    differences in language and structure. 
  • ... in some areas there are emerging standards. 
    Examples include
  • the Unified Medical Language System (UMLS),
  • the Gene Ontology, http//www.geneontology.org/,
  • the work supported by the caBIG project
    (https//cabig.nci.nih.gov/workspaces/VCDE/),
  • ontologies listed at the Open Biomedical Ontology
    web site (http//obo.sourceforge.net/). 

16
This FOA will support limited awards, each of
which focuses on integrating information between
two (or a few very closely related) data sets in
a single subject domain. The hope is that the
developed vocabularies and ontologies will serve
as nucleation points for other researchers in the
area to build upon by adopting and extending the
vocabularies and ontologies developed under this
FOA.  Applicants are expected to identify and
adopt emerging standards (such as those listed
above) whenever possible.  Applicants are also
strongly encouraged to federate their data under
appropriate infrastructures when possible.  One
potential infrastructure is provided by the
Biomedical Informatics Research Network
(http//www.nbirn.net ).  The caBIG
infrastructure (http//cabig.cancer.gov ) is
another well established infrastructure that
researchers should consider.
17
NIH anticipates that once important data sets in
a topical area have been unified that others in
that area will adopt the emerging standard.  The
nucleation points should be able to interact with
each other, e.g. through the use of tools that
are made freely available to the research
community, such as those created by the National
Center for Biomedical Ontology (NCBO)
(http//bioontology.org/) or by caBIG
18
Another determinate of ontology acceptance is the
degree to which the ontology conforms to best
practices governing ontology design and
construction.  Criteria have been developed,
and are undergoing empirical validation, by the
Vocabulary and Common Data Element Work Group of
caBIG. Other criteria have been specified by the
OBO Foundry (http//obofoundry.org/ ).  In this
FOA, the applicant should specify the criteria
with which the ontology will conform and the
reasons that those criteria are relevant to the
data sets being integrated by the proposed
ontology.        
19
Growth of Clinical and Translational Research
Consortia
  • Examples
  • PharmGKB
  • caBIG
  • BIRN Biomedical Informatics Research Network
  • BIRN Ontology Task Force

20
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

21
medical records
SNOMED codes
22
  • The Systematized Nomenclature of Medicine
  • built by College of American Pathologists
  • now maintained by International Health
    Terminology Standards Development Organisation
  • access via Virginia Tech SNOMED CT Browser
    http//snomed.vetmed.vt.edu/
  • (semi-) Open Source

23
SNOMED often includes non-perspicuous terms
  • FullySpecifiedName
  • Coordination observable (observable entity)
  •   
  • FullySpecifiedName
  • Coordination (observable entity)
  •  

24
and more
  • Self-control behavior aggression (observable
    entity)
  • Physical activity target light exercise (finding)
  • is a type of physical activity finding (finding)

25
odd bunchings
  • European is a ethnic group
  • 6
  • Other European in New Zealand (ethnic group) is a
    ethnic group
  • Mixed ethnic census group is a ethnic group
  • Flathead is a ethnic group

26
Poor modular development
  • No clear strategy for improvement
  • Difficult to use for coding
  • A tax on world health information technology?

27
SNOMED embraces only some of the multiple kinds
of siloed data
  • Lab / pathology data
  • Electronic Health Record data
  • Patient histories
  • Clinical trial data, including regulatory data
  • Medical imaging
  • Microarray data
  • Protein chip data
  • Flow cytometry
  • Mass spectrometry data
  • Genotype / SNP data
  • Mouse data, fly data, chicken data ...

28
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

29
(No Transcript)
30
The Gene Ontology
  • Open Source
  • Cross-Species
  • Impressive annotation resource
  • Impressive policies for maintenance

31
How to do Biology across the Genome?
  • MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYED
    EKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQG
    GPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSF
    RVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAP
    YMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQI
    CNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIP
    SVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNN
    GVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYV
    DDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALG
    NSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYA
    TFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSEL
    MANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHC
    SFSSTRNAEDV

sequence of X chromosome in bakers yeast
32
  • MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYED
    EKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQG
    GPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELS
    FRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLA
    PYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQ
    ICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPII
    PSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDN
    NGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVY
    VDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICAL
    GNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY
    ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSE
    LMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDY
    HCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQK
    LFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALA
    SKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTD
    LYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETE
    VYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSM
    DINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEE
    ALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAA
    EKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQ
    GVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKK
    GKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQE
    SNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTN
    ASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTAS
    INVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASIN
    VRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSN
    TNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTS
    ATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERK
    KLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLC
    KGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPY
    HGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQG
    SQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDD
    TVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLG
    MHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVW
    LLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCR
    DSSREVGE

33
what cellular component?
what molecular function?
what biological process?
34
A strategy for translational medicine
  • Sjöblöm T, et al. analyzed 13,023 genes in 11
    breast and 11 colorectal cancers
  • using functional information captured by GO for
    given gene product types identified 189 as being
    mutated at significant frequency and thus as
    providing targets for diagnostic and therapeutic
    intervention.
  • Science. 2006 Oct 13314(5797)268-74.

35
GO widely used
  • Sjöblöm T, et al. analyzed 13,023 genes in 11
    breast and 11 colorectal cancers
  • using functional information captured by GO for
    given gene product types identified189 as being
    mutated at significant frequencies and thus as
    providing targets for diagnostic and therapeutic
    intervention.
  • Science. 2006 Oct 13314(5797)268-74.

36
Benefits of GO
  • links people to data
  • links data together
  • across species (human, mouse, yeast, fly ...)
  • across granularities (molecule, cell, organ,
    organism, population)
  • links medicine to biological science

37
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

38
2003
  • a shared portal for (so far) 58 ontologies
  • (low regimentation)
  • http//obo.sourceforge.net ? NCBO BioPortal

39
(No Transcript)
40
(No Transcript)
41
Building out from the original GO
42
OBO Foundry Coordinators
  • Lewis
  • Berkeley

Ashburner Cambridge
Smith Buffalo
Mungall Berkeley
43
The goal
  • all biological (biomedical) research data should
    cumulate to form a single, algorithmically
    processible, whole
  • http//obofoundry.org

44
CRITERIA
  • The ontology is open and available to be used by
    all.
  • The ontology is in, or can be instantiated in, a
    common formal language.
  • The developers of the ontology agree in advance
    to collaborate with developers of other OBO
    Foundry ontology where domains overlap.

FOUNDRY CRITERIA
45
  • UPDATE The developers of each ontology commit to
    its maintenance in light of scientific advance,
    and to soliciting community feedback for its
    improvement.
  • ORTHOGONALITY They commit to working with other
    Foundry members to ensure that, for any
    particular domain, there is community convergence
    on a single controlled vocabulary.

CRITERIA
46
  • OBO Foundry is serving as a benchmark for
    improvements in discipline-focused terminology
    resources
  • yielding callibration of existing terminologies
    and data resources and alignment of different
    views

Consequences
47
Mature OBO Foundry ontologies (now undergoing
reform)
  • Cell Ontology (CL)
  • Chemical Entities of Biological Interest (ChEBI)
  • Foundational Model of Anatomy (FMA)
  • Gene Ontology (GO)
  • Phenotypic Quality Ontology (PaTO)
  • Relation Ontology (RO)
  • Sequence Ontology (SO)

48
Ontologies being built to satisfy Foundry
principles ab initio
  • Common Anatomy Reference Ontology (CARO)
  • Ontology for Biomedical Investigations (OBI)
  • Protein Ontology (PRO)
  • RNA Ontology (RnaO)
  • Subcellular Anatomy Ontology (SAO)

49
Ontologies in planning phase
  • Biobank/Biorepository Ontology (BrO, part of OBI)
  • Environment Ontology (EnvO)
  • Immunology Ontology (ImmunO)
  • Infectious Disease Ontology (IDO)

50
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

51
NCBO
  • National Center for Biomedical Ontology (NIH
    Roadmap Center)
  • Stanford Medical Informatics
  • University of San Francisco Medical Center
  • Berkeley Drosophila Genome Project
  • Cambridge University Department of Genetics
  • The Mayo Clinic
  • University at Buffalo Department of Philosophy

52
  • Biomedical Informatics Needs Data
  • The Problem of Local Coding Schemes
  • NIH Policies for Data Reusability and the Growth
    of Clinical Research Consortia
  • Is SNOMED the Solution?
  • The Gene Ontology
  • The OBO Foundry
  • The National Center for Biomedical Ontology
  • Ontology in Buffalo

53
Ontology Research Group in CoE
  • Werner Ceusters
  • Louis Goldberg
  • Barry Smith
  • Robert Arp
  • Thomas Bittner
  • Maureen Donnelly
  • David Koepsell
  • Ron Rudnicki
  • Shahid Manzoor

54
Ontologies in Buffalo
  • Common Anatomy Reference Ontology (CARO)
  • Environment Ontology (EnvO)
  • Foundational Model of Anatomy (FMA)
  • Infectious Disease Ontology (IDO)
  • MS Ontology
  • Protein Ontology (PRO)
  • Relation Ontology (RO)

55
Ontologies planned
  • ICF Ontology
  • Food Ontology
  • Allergy Ontology
  • Vaccine Ontology
  • Ontology for Community-Based Medicine
  • Psychiatry Ontology
Write a Comment
User Comments (0)
About PowerShow.com