Title: DeIdentification Methodologies and DeIDTM Software
1De-Identification Methodologies and De-IDTM
Software
- De-ID Data Corp, LLC
- Dan Wasserstrom
- President
- Steven Merahn, MD
- Chief Medical Officer
2Overview
- Why de-identify?
- What is de-identification?
- Considerations in choosing a de-identification
methodology - Automated vs manual methods
- DE-IDTM software overview
3Why De-Identifiy?
- De-identification is required in order to protect
patient privacy and advance the progress of
clinical, quality and population research - A reliable and valid de-identification
methodology can increase opportunities expand
data access and to leverage and manage data
assets - De-identification is required by the Privacy Act
of HIPAA
4What is De-identification?
- A well-defined, but limited, step in a broader
research workflow or protocol - Patient privacy must be a consideration for the
complete workflow - The defined nature of the step includes the
obfuscation of individually identifiable
information in records and reports - Obfuscation schema includes redaction,
elimination, categorical replacement (e.g.,
place), and replacement with proxies (Dr X), and
offsets (day 1)
5Considerations
- When choosing a de-identification methodology,
four things need consideration - What is the reliability and validity of the
methodology? - Can the method maintain its specificity and
sensitivity in local use? - What are the limitations of the methodology?
- Can files be re-identified?
6Reliability and Validity
- Despite some claims, manual de-identification is
not the gold standard - Problems with inter-rater reliability, manpower
resource and time constraints, over-marking and
under-marking - Automated de-identification eliminates the issues
of time, costs and manpower and offers
consistency of quality - The issue then becomes the quality of the quality
-- over-marketing (specificity) and
under-marketing (sensitivity) - What are acceptable levels of sensitivity and
specificity? - 100 for sensitivity for names
- What is the benchmark?
- What is the value of consistency?
7Local Use
- While some methods may have good numbers, will
they hold up in local use? - Every community has its own acronyms, place names
and other local vocabulary - Can your methodology be customized to meet local
needs? - What is the protocol to manage local quality?
- Regular checks against manual review
- Formal evaluation research
8Limitations
- Based on the use case, different methods may have
limitations - Are files in PDF format?
- Are photographs involved?
- What about other images, captions?
- Know the limits of your methodology
- Manual de-identification personnel may need to be
trained in the handling of radiology images - Automated tools may only de-identify text reports
- Match the features of your method against your
use case
9Re-identification
- There are two types of re-identification
deliberate and inadvertent - Deliberate re-identification involves the use of
trusted third parties to protect
re-identification codes - Review of de-identified files may reveal
eligibility criteria for clinical trial - Inadvertent re-identification may occur with
under-marking, or when special case information
outside of safe harbor guidelines inadvertently
allow patient ID - Patient with Wilsons Disease
- Patient privacy issues must be a consideration
throughout the overall research workflow
10Automated De-Identification
- An application that provides a level of
consistency for quality, saves time and lowers
costs - A gatekeeping function in a broader research
infrastructure - Increase opportunities to leverage the value of
data assets
11Features of De-IDTM Software
- Accurate, reliable automated de-identification
- Leaves original record intact
- Very simple to install and use
- Network-level operability batch processing
- Works with any text from reports or databases
- Meets all HIPAA safe harbor guidelines option
of limited data sets and custom fields - Continuous quality improvement program
- New version from published paper
- Sensitivity for names 100 overall sensitivity
99.5 specificity 89
12DE-ID Mechanics
- Heuristics and rule sets identify the presence of
any of the HIPAA 18 identifiers within text. - Supplemental dictionaries ensure of locations,
names can be idenfied - UMLS utilized to ensure that words or phrases
that may be medical terms with proper names are
preserved. - De-ID replaces the identifiable text with
specific tags in the form of offsets and proxies.
- De-ID can be used as a standalone application a
Java interface can automatically call other
applications for automated batch processing
13OUTPUT OPTIONS
Word processing
Transcribed records or reports
Manual Feed
XML
DE-IDTM
HP, imaging or lab reports
Tab. Comma, delimited
Java Interface
Database storage
HIPAA Boundary
Database outputs, EMR storage
Outputs de-identified records and reports as
individual files or batch reports
Are automatically de-identified based on HIPAA
safe harbor guidelines or limited data sets
Record and reports containing protected health
information
14Summary
- De-identification is an critical success factor
to the success of clinical and population
research and expanded data access - In choosing a de-identification methodology
issues to consider include manpower requirements,
consistency, quality, time and costs - Automated de-identification methodologies reduce
issues of manpower, time and cost - Quality and consistency benchmarks can be met by
software