Title: Overcoming the Scarcity of Big Numbers
1Overcoming the Scarcity of Big Numbers
- Isaac Zak Kohane, MD, PhD
- HMS Center for Biomedical Informatics
2Overview
- Dire consequences of insufficiently powered
studies - The opportunity to leverage existing information
sources to achieve the requisite sample sizes. - The annotation challenge that remains.
3One Very Old Biomarker
- 1410 original articles on PSA screening for
cancer - 179 review articles
- With 1000s of gene products, how do we
systematically address this problem?
4The paper that Launched at 100,000 chips
Alizadeh 2000
Alizadeh et al.
5Example Shipp 2002
6Example Rosenwald
7Example PPARg Pro12Ala and diabetes
Oh et al.
Deeb et al.
Mancini et al.
Clement et al.
Hegele et al.
Hasstedt et al.
Lei et al.
Ringel et al.
Sample size
Hara et al.
Overall P value 2 x 10-7 Odds ratio 0.79
(0.72-0.86)
Meirhaeghe et al.
Douglas et al.
Altshuler et al.
Mori et al.
All studies
Estimated risk (Ala allele)
0.2
0.4
0.6
0.8
1
1.2
2.0
1.4
1.6
1.8
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
Ala is protective
Courtesy J. Hirschhorn
8(No Transcript)
9Peer SPIN Node
User
Childrens Hospital
Peer SPIN Node
Indiana
Peer SPIN Node
Peer SPIN Node
HTTP
Peer SPIN Node
Query Composer
Peer SPIN Node
MGH
Peer SPIN Node
Pittsburgh
Peer SPIN Node
UCLA(1)
Peer SPIN Node
HCNR
Beth Israel
UCLA(2)
Peer SPIN Node
Pathology
Peer SPIN Node
Node Tools
UPDATE
Clinical
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Aggregate-able and delegated approval
Researcher enters through web browser
HMS eCommons credentialing of researcher
Compose Web Query
HMS IRB approval for entire network
HMS Admin Supernode
Institutional Firewall
BWH MGH
CHMC
BIDMC
Institutional IRB approval to become a node
Different HIPAA-covered entities
Partners IRB Approval
Caregroup IRB Approval
CHMC IRB Approval
14True names the goal of biomarkers
15Triangulating Human Disease
Genetics
Physiology
Environment
Behavioral Social Milieu
16Substrate Gene Expression Omnibus
- Currently has 72,566 samples contained in 975
series measured using 3136 platforms - Gaining 2,500 samples per month
- 1200 curated data sets have been established
- Within each GDS, samples are directly comparable
- Most can be downloaded
17Substrate The Unified Medical Language System
- Label concepts using the Unified Medical Language
System - Largest unification of over 130 biomedical
vocabularies - Contains over one million inter-related concepts
- 20 million links between concepts
- Covers human and model organisms
- Scales from molecular, to physiological, to
pathological
18Courtesy Atul Butte
Courtesy Atul Butte
Concepts CUI Synonyms and abbreviations Source
vocabulary and term Defined relations not
all Statistical relations not all, 4K for
diabetes mellitus
19(No Transcript)
20Gene-Driven Nosology
21Biomarkers as Continuum
22Instance of multiplicity of biomarkers
- 11 genes related to aging concept
- G6PDH (with p lt 10-6)
- Individuals with G6PDH deficiency have been noted
to have reduced mortality from cardiovascular
disease and have increased longevity - BDNF
- has been previously shown to have a significant
drop in expression in human skin fibroblasts with
aging. - TNNT1, SYNJ1, TADA2L, SLC7A2, MGAT2 and KCTD2
also associated with aging.
23Summary
- Adequate numbers of patient samples are ever more
essential in the genomic era (big N). - Distributed solutions for data sharing are
workable today. - Integrating the genome and phenome will require
improved clinical annotation as well as big N
24Bioinformatics and Integrative Genomics (BIG)