Title: Open Source Solutions for Tissue Banking Informatics
1Open Source Solutions forTissue Banking
Informatics
- Jules J. Berman, Ph.D., M.D.
- INFORMATICS FOR REPOSITORIES
- Wednesday, May 21, 2008
- 330 pm 405 pm
2Approaches to finding open source solutions
- 1. Generalize (don't specialize). Wherever
possible, don't think of your tissue repository
problems as being unique. Try to think of your
problems as instances of very general informatics
problems. - In most cases, the same open source solutions
that work for bioinformaticians, astronomers, and
factory inventories will likely work for you
3Approaches to finding open source solutions
- 2. Learn a popular open source programming
language that is easy to learn and that is
supplemented by an enthusiastic biomedical
community - Perl
- Python
- Ruby
4Approaches to finding open source solutions
- 3. Use open source, unencumbered nomenclatures,
codes, syntactic formats. Otherwise, can't share
or post data through web - MESH (standard, open source, free)
- UMLS (standard, encumbered)
- SNOMED (standard, encumbered)
- Neoplasm Classification (non-standard, open
source, free, standard syntax XML, RDF) - http//www.julesberman.info/
5Approaches to finding open source solutions
- 4. Use an open source and general data syntax
- HTML (formatting and linking)
- XML (describing data)
- RDF (getting meaning from described data)
6(No Transcript)
7All data can be specified using RDF, developed by
the W3C. RDF files are collections of
statements expressed as data triples ltidentified
subjectgtltmetadatagtltdatagt Jules Berman blood
glucose level 85 Mary Smith eye color
brown Samuel Rice eye color blue Jules
Berman eye color brown When you bind a
key/value pair to a specified object, you're
moving from the realm of data structure (i.e.,
XML) into the realm of data meaning.
8RDF permits data to be merged between different
files
Medical file Jules Berman blood glucose
level 85 Mary Smith eye color
brown Samuel Rice eye color blue Jules
Berman eye color brown
Merged Jules Berman database Jules Berman
blood glucose level 85 Jules Berman eye
color brown Jules Berman hat size 9
Hat file Sally Frann hat size 8 Jules
Berman hat size 9 Fred Garfield hat size
9 Fred Garfield hat_type bowler
9(No Transcript)
10Approaches...
- 5. Use open source utilities not software
applications (open source or otherwise)l - Utilities are simple programs that do one type of
job, very well. Often work from command-line
(i.e., no GUI) - Once you've mastered a dozen or so utilities, you
- can handle most informatics task that you'll come
across. - Applications are often complex and seldom provide
the functionality you need (now or future).
11Approaches ...
- 6. Learn the algorithms for your discipline.
- Algorithms are process descriptions that work
every time. - Most informatics algorithms can be implemented in
under ten lines of software code - You can think of software applications as many
algorithms working under a GUI - If you really understand algorithms, you can make
important contributions to your field.
12Approaches...
- 7. De-emphasize standards.
- Most standards are difficult to understand, and
there are many of them, often covering obscure
domains. Many standards are just bad. - Data kept in a standard today may be non-standard
legacy data tomorrow. - Unlike physical standards, standards are
transformable (so why fuss over any one
standard?). - Standards can be encumbered
13(No Transcript)
14- Specifications often a better solution than
Standards - Specifications are just descriptions of your
data. - A specification requires a common language for
describing data (so that you and your computer
can understand what it's trying to convey). - Specifications give you enormous freedom to
create and describe new and unconventional data
objects. - Usually done in RDF
- If you've specified your data well, you can port
between standards when you need to. -
15- Example Pathology image annotation
-
16- Important descriptors of an image might include
- File information
- Image capture information
- Image format information
- Specimen information
- Patient information
- Pathology information
- Region of interest information
17- JPEG is an image format that is used by millions
of people in all types of professions, including
the medical profession - JPEG can now be used without worrying about IP
issues - You can put any information you want into the
header of a JPEG image (including an RDF
document) so that specified clinical/pathological
information can be conveyed with the image - Because images non-physical, it is usually easy
to interconvert image formats
18By annotating our images, we can ensure that the
image conveys meaning and value By using RDF, we
can ensure that the individual triples can be
integrated with heterogeneous data sources beyond
those of images. By using pre-existing
international general standards for describing
any kind of data, we attain interoperability and
avoid the confusion and complexity that occurs
whenever a new standard is created. See
http//www.julesberman.info/spec2img.htm
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23- Would you like to write a
- Tissue Respository/Tissue Informatics
- book?
- jjberman_at_alum.mit.edu