Title: Pathology data sharing
1 Pathology data sharing United States Military
Cancer Institute Walter Reed Army Medical
Center November 16, 2004 Jules J. Berman,
Ph.D., M.D. Program Director, Pathology
Informatics Cancer Diagnosis Program, NCI,
NIH email bermanj_at_mail.nih.gov
2UFO Abductees Lots of them They often say about
the same thing (independent confirmations) All
walks of life Minority are a little crazy Mostly
honest and rational people One problem no
evidence
3Researchers who dont publish their primary
data Lots of them They often say about the same
thing (independent confirmations) All walks of
life Minority are a little crazy Mostly honest
and rational people One problem no evidence
4After your research data reaches a certain size,
the data becomes the publication, and the journal
articles become tiny editorials that describe or
interpret the data
5In a data-intensive world, the data is the center
of the universe. Manuscripts are satellites
revolving around a central large BLOB of data.
6 Sticks and carrots NIH Statement on Data
Sharing http//grants.nih.gov/grants/guide/notice-
files/NOT-OD-03-032.html National Research
Council UPSIDE Universal Principle of Sharing
Integral Data Expeditiously http//books.nap.edu/b
ooks/0309088593/html/R1.html
7NIH Funding for data sharing Shared Pathology
Informatics Network http//grants.nih.gov/grants/g
uide/rfa-files/RFA-CA-01-006.html Tools for
collaborations that involve data
sharing http//grants1.nih.gov/grants/guide/pa-fil
es/PAR-03-134.html Infrastructure for data
sharing and archiving http//grants.nih.gov/grants
/guide/rfa-files/RFA-HD-03-032.html caBIG http//
cabig.nci.nih.gov/
8 Confidentiality methods
9Two U.S. regulations that tell us how we can use
medical records in research Common Rule HIPAA
Privacy Rule In pathology informatics, the most
ambitious research typically involves hundreds of
thousands or millions of patient records.
Getting informed consent for these studies is not
feasible. HIPAA and Common Rule both work under
the principle that medical research is good, and
it can be conducted without getting patient
consent if you can come up with a way to avoid
harming patients (no harm, no consent for harm).
10hipaa
11IRB
12Corporate Lawyer
13Irate Human Subject
14Principle Investigator
15Articles Berman JJ. Confidentiality for Medical
Data Miners. Artificial Intelligence in Medicine.
26(1-2)25-36, 2002. Berman JJ. Concept-Match
Medical Data Scrubbing How pathology datasets
can be used in research. Arch Pathol Lab Med.
2003 Jun127(6)680-6. Berman JJ. Threshold
protocol for the exchange of confidential medical
data. BMC Medical Research Methodology, 2002,
212.
16More Berman JJ. A tool for sharing annotated
research data the "Category 0" UMLS (Unified
Medical Language System) vocabularies. BMC Med
Inform Decis Mak. 2003 Jun 163(1)6. Berman
JJ. Zero-check a zero-knowledge protocol for
reconciling patient identities across
institutions.Arch Pathol Lab Med. 2004
Mar128(3)344-6. Berman JJ. Racing to share
pathology data. Am J Clin Pathol. 2004
Feb121(2)169-71 (editorial).
17 Standard ways of organizing data
(nomenclatures, taxonomies, classifications, data
structures)
18 Directors Challenge Toward a molecular
classification of tumors In January 1999, the
U.S. National Cancer Institute (NCI) issued a
challenge to the scientific community "to
harness the power of comprehensive molecular
analysis technologies to make the classification
of tumors vastly more informative. This
challenge is intended to lay the groundwork for
changing the basis of tumor classification from
morphological to molecular characteristics."
19 Impediment Misunderstanding about the
definition of classification
20 Classifications are not Identification
systems Taxonomies and nomenclatures Ontologies Ac
hieved by analyzing gene expression array data
21What is a tumor classification? A grouped
taxonomy listing of all tumors with the
following properties Inheritance Hierarchical
structure, with each class of tumors inheriting
properties of its ancestors Uniqueness Each
tumor occurs in only one place in the
classification Comprehensive All tumors are
included Intransitive A tumor from one class
does not change into a tumor from another class
(e.g. an adenocarcinoma does not become a
lymphoma)
22Problems with current tumor classifications 1.
Created piecemeal 2. Often based on medical
disciplines 3. A given tumor can appear
redundantly when subclassifications are merged
4. No tumor classification has been prepared in
a standard format designed to exchange, merge or
analyze heterogeneous biological data
23New Tumor Classification Comprehensive 122,000
terms (9 Megabytes) Based on developmental and
molecular biologic features of tumors Heritable
class structure with a unique class location for
each tumor XML document that can be
cross-annotated with molecular biology
databases Preserves current tumor names, while
abandoning purely morphologic categories (e.g.
epithelial/stromal)
24Latest version of the nomenclature http//www.pat
hologyinformatics.org/informatics_r.htm 122,000
terms Copy of paper Berman JJ. Tumor
classification molecular analysis meets
Aristotle. BMC Cancer 2004 410, 17 March
2004 http//www.biomedcentral.com/1471-2407/4/10
25 Standard ways of exchanging data
26 XML is the greatest information organizing tool
since the invention of the book. Much more
important than HTML Takes advantage
of Metadata Namespaces Internet External links
27 Example Tissue Microarray Data Exchange
Specification The TMA Specification is an open
access document that can be used without any
restriction. Its development was sponsored by the
NCI and by the Association for Pathology
Informatics
28(No Transcript)
29Basics of the specification Jules J Berman, Mary
Edgerton and Bruce Friedman. The tissue
microarray data exchange specification a
community-based, open source tool for sharing
tissue microarray data. BMC Med Inform Decis
Mak. 2003 May 2335 Real-world implementation
example Jules J Berman, Milton Datta, Andre
Kajdacsy-Balla, Jonathan Melamed, Jan Orenstein,
Kevin Dobbin, Ashok Patel, Rajiv Dhir, Michael J
Becich. The tissue microarray data exchange
specification implementation by the Cooperative
Prostate Cancer Tissue Resource. BMC
Bioinformatics 2004 Feb 27, 519
30LDIP (Laboratory Digital Imaging
Project) Association for Pathology
Informatics Pathology Image Data Exchange
Specification Information available
at http//www.pathologyinformatics.org/ldip.htm
31 end