Health Sciences Informatics ResearchinProgress Seminar' - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Health Sciences Informatics ResearchinProgress Seminar'

Description:

15 years as Chief of Surgical Pathology and Cytology at the Baltimore VA ... 5. Dr. truelove's diagnosis is both incorrect and incompetent. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 29
Provided by: ISCS7
Category:

less

Transcript and Presenter's Notes

Title: Health Sciences Informatics ResearchinProgress Seminar'


1
Health Sciences Informatics Research-in-Progress
Seminar. The Johns Hopkins Medical
School Meyer Building B-105 1100 A.M. March 7,
2003 Jules J. Berman, Ph.D., M.D. Program
Director, Pathology Informatics Cancer Diagnosis
Program, DCTD, NCI, NIH EPN - Room 6028 6130
Executive Blvd. Rockville, MD 20892 email
bermanj_at_mail.nih.gov voice 301-496-7147
2
My Background 15 years as Chief of Surgical
Pathology and Cytology at the Baltimore VA
Hospital (occasional adjunct appointments at U of
MD and Hopkins (for Johns Hopkins Autopsy
Resource project) Last 4 years as Program
Director for Pathology Informatics in the Cancer
Diagnosis Program at NCI NCI Coordinator for the
Shared Pathology Informatics Network Somewhere in
there, learned Perl
3
NIH role Help move the field of Pathology
Informatics forward by developing NCI funded
initiatives and by identifying and nurturing
vital activities in the area.
4
1. Acquisition of Data - 49 of my time 2.
Organization of Data - 49 of my time 3.
Analysis of Data - 2 of time
5
1. Acquisition of Data - Getting people to
share, and working within HIPAA and Common Rule
Guidelines, meetings 2. Organization of Data -
Standards, XML, meta-data, self-describing
architectures, more meetings, technical standards
committee of API, Tissue Microarray Data Exchange
Standard 3. Analysis of Data - ??? - almost
irrelevant at this time. People think its ok to
publish without supporting data.
6
Data Sharing NIH Statement on Data
Sharing http//grants.nih.gov/grants/guide/notice-
files/NOT-OD-03-032.html National Research
Council UPSIDE Universal Principle of Sharing
Integral Data Expeditious http//books.nap.edu/boo
ks/0309088593/html/R1.html Comment Letter on NIH
Data Sharing Proposal http//www.aamc.org/advocacy
/library/research/corres/2002/051102.htm
7
UFO Abductees Lots of them They often say about
the same thing (independent confirmations) All
walks of life Generally honest Minority are a
little crazy One problem no evidence
8
Researchers who dont publish their primary
data Lots of them They often say about the same
thing (independent confirmations) All walks of
life Generally honest Minority are a little
crazy One problem no evidence
9
After your research data reaches a certain size,
the data becomes the publication, and the journal
articles become tiny editorials that describe or
interpret the data Think of the relationship
between the earth and the sun. Terra-centrics
did not want to think that their planet was not
the center of the universe. But actually, earth
is a tiny fraction of the size of the sun, and
people eventually switched to a heliocentric
vision of reality. Research papers are mere
editorials that revolve around a central large
BLOB of data. The database is the publication.
Everything else is peripheral.
10
After your research data reaches a certain size,
the data becomes the publication, and the journal
articles become little editorials interpreting
the data Examples Human Genome Project (3
billion bases) Gene Expression Arrays Tissue
Microarrays (a thousand cores of
tissue) Proteomics
11
All those numbers get multiplied when you start
thinking about Accruing data (annotating
databases with the results of experiments) Merging
and linking data Executing distributed queries
over multiple databases
12
So whats stopping us from making incredibly
large medical databases? Human Subject
Protection issues Usually means
confidentiality/privacy issues under context of
research using medical records Human
nature Researcher insecurities Non-existence of
organized data
13
Two Regulations that tell us how we can use
medical records in research Common Rule HIPAA
Privacy Both work on the principle that medical
research is good, and it can be conducted without
getting patient consent if you can come up with a
way to avoid harming patients (no harm, no
consent for harm). Typically, this is done by
de-identifying records
14
How do you de-identify records? 1. Remove
identifiers 2. Ensure non-uniqueness of
records 3. Scrub text
15
Legal Importance of de-identification research 1.
Scientific field created in HIPAA HIPAA asks the
community to come up with de-identification
standards 2. Civil Rights Office will not be
looking for misinterpretation. Will probably
only respond to complaints. No pre-screening of
methodology by Civil Rights Office. 2. Published
Research Methodology sure to weigh-in if lawsuit
every occur To a certain extent, whats
de-identified is what scientists promote and
accept in published articles (Daubert v Dow
(1993) interpretation of admissible expert
opinion)
16
One-way hash method described (currently
deprecated under HIPAA) Techniques Ive been
publishing Concept-Match Medical Data Scrubbing
(In press, Archives of Pathology) Threshold
Method (published, BMC Methods) Zero-Check, A
Zero-Knowledge HIPAA-compliant Protocol for
Reconciling Patient Identities Across
Institutions (answer to HIPAA attack on one-way
hash methodology)
17
One-Way Hash Method for de-identifying Allows
you to get follow-up data on de-identified
patients A one-way hash algorithm computes a
fixed length string from a character string. It
is impossible to determine the original character
string by looking at the hash value. The
algorithm always gives the same hash value for
any given string. Therefore it is typically use
as an authenticator for secret messages. Joe
Smith replaced by one-way hash ekso583a2ldg
18
One-Way Hash Method for de-identifying Allows
you to get follow-up data on de-identified
patients Joe Smith replaced by one-way hash
ekso583a2ldg Joe Smith comes back a year later
and his new record is de-identified with one-way
has string ekso583a2ldg The two de-identified
records are merged under the common one-way hash
string, ekso583a2ldg
19
Concept-Match algorithm for scrubbing text 1.
Parse all input into sentences. 2. Parse each
sentence, into words. 3. Each "stop word" (high
frequency word) is preserved. 4. Intervening
words and phrases are mapped to a standard
nomenclature. 5. Each coded term is replaced by
an alternate term that maps to the same code. 6.
All other words are replaced by blocking symbol
(consisting of three asterisks).
20
Examples from Hopkins Pathology Phrase
list Diagnosis of severe dysplasia gt
(DiagnosiC0348026) of (severe dysplasiaC0334048)
Diagnosis of sickle gt (DiagnosiC0348026) of
Diagnosis of sickle cell anemia gt
(DiagnosiC0348026) of (herrick anemiaC0002895)
Diagnosis of simple hyperplasia gt
(DiagnosiC0348026) of (simpleC0205352)
(hypercellularityC0020507) Diagnosis of
sjogren gt (DiagnosiC0348026) of (sjogren
diseaseC0037230)
21
1. Dr. Atkinson killed his patient today. gt
(patientC0030705) (todayC0750526)
2. Is this malpractice? gt Is this 3.
Senator garfield was admitted today into the
psychiatric unit. gt was
(todayC0750526) into the (psychiatric
behavioralC0205487) (unitC0439148). 4. Snetor
garfield was admitted today into the psyciatric
unit. gt was (todayC0750526) into
the (unitC0439148) 5. Dr. truelove's
diagnosis is both incorrect and incompetent. gt
(diagnosiC0348026) is both and
6. The patient's social security number is
523845 gt The is
22
Threshold algorithm A familiar plot device.
23
they suggested that the manifestations were as
severe in the mother as in the sons and that this
suggested autosomal dominant inheritance. Bobs
Piece 1. 684327ec3b2f020aa3099edb177d3794 gt
suggested autosomal dominant inheritance 3c188dace
2e7977fd6333e4d8010e181 gt mother 8c81b4aaf9c20096
66d532da3b19d5f8 gt manifestations db277da2e82a4cb
7e9b37c8b0c7f66f0 gt suggested e183376eb9cc9a30195
2c05b5e4e84e3 gt sons 22cf107be97ab08b33a62db68b4a
390d gt severe Bobs Piece 2. they
db277da2e82a4cb7e9b37c8b0c7f66f0 that
the 8c81b4aaf9c2009666d532da3b19d5f8 were
as 22cf107be97ab08b33a62db68b4a390d in
the 3c188dace2e7977fd6333e4d8010e181 as in
the e183376eb9cc9a301952c05b5e4e84e3 and that
this 684327ec3b2f020aa3099edb177d3794.
24
Piece 1 (the listing of phrases and their
one-hashes) 1..Contains no information on the
frequency of occurrence of the phrases found in
the original text (because recurring phrases map
to the same hash code and appear as a single
entry in Piece 1). 2..Contains no information
that Alice can use to connect any patient to any
particular patient record. Records do not exist
as entities in Piece 1. 3..Contains no
information on the order or locations of the
phrases found in the original text. 4..Contains
all the concepts found in the original text.
Stop words are a popular method of parsing text
into concepts 4,5. 5..Bob can destroy Piece 1
and re-create it later from the original file,
using the same threshold algorithm. 6..Alice can
use the phrases in Piece 1 to transform, annotate
or search the concepts found in the original
file. 7..Alice can transfer Piece 1 to a third
party without violating HIPAA privacy rules or
Common Rule human subject regulations (in the
U.S.). For that matter, Alice can keep Piece 1
and add it to her database of Piece 1 files
collected from all of her clients. 8..Piece 1 is
not necessarily unique. Different original files
may yield the same Piece 1 (if theyre composed
of the same phrases). Therefore Piece 1 cannot
be used to authenticate the original file used to
produce Piece 1.
25
Properties of Piece 2 1..Contains no
information that can be used to connect any
patient to any particular patient
record. 2..Contains nothing but hash values of
phrases and stop words, in their correct order of
occurrence in the original text. 3..Anyone
obtaining Piece 1 and Piece 2 can reconstruct the
original text. 4.The original text can be
reconstructed from Piece 2, and any file into
which Piece 1 has been merged. There is no
necessity to preserve Piece 1 in its original
form. 5..Bob can lose or destroy Piece 2, and
re-create it later from the original file, using
the same threshold algorithm.
26
Bob prepares threshold Pieces 1 and 2 and sends
Piece 1 to Alice. Alice may require Bob to prove
the authenticity of Piece 1, but Bob has no
reason to care if Piece 1 is intercepted by an
unauthorized party. Alice uses her software
(which may be secret, or it may require
computational facilities that Bob doesn't have,
or it may require large databases that Bob
doesn't have), to transform or annotate each
phrase from Piece 1. The transformation product
for each phrase can be almost anything that Bob
considers valuable (e.g., a UMLS code, a genome
database link, an image file URL, or a tissue
sample location). Alice substitutes the
transformed text (or simply appends the
transformed text) for each phrase back into Piece
1, co-locating it with the original one-way hash
number associated with the phrase.
27
The original text has been converted into two
pieces, neither of which contain any identifying
information. There is sufficient information in
Piece 1 for Alice to annotate the text and return
it to Bob (annotated Piece 1). Bob can
reconstruct his original text, including Alices
annotations, thus adding value to his original
data, without breaching patient confidentiality.
Bob can pay Alice for her services. Alice can
keep Piece 1 and use it for her own purposes.
Alice can make a large database consisting of all
the Piece 1 files she receives from all of her
customers. Alices aggregated Piece 1 database
can be used by owners of Piece 2 files to
reconstruct their original files (along with
Alices value-added annotations). Alice can sell
Piece 1 to a third party, if she wishes. Alice
can continually update or otherwise enhance her
annotations on Piece 1 and sell the updated
versions to Bob and others.
28
end
Write a Comment
User Comments (0)
About PowerShow.com