Title: Terminologies, Classifications and Groupings
1Terminologies, Classifications and Groupings
- Dr M N Kamel BoulosXaBpR (aka Dermatologist)E-ma
il dk708_at_city.ac.uk - 2001
MIM Centre,City University,London
2Why Use a Clinical Terminology? Free-Text Search
- Brute force free-text search techniques cannot
locate relevant knowledge efficiently for three
reasons - The sought page might be using a different term
(or synonym) that points to the same concept.
Myocardial infarction and coronary thrombosis
cannot be matched, although they are the same. - Spelling mistakes and variants are considered as
different terms in a computer environment. For
example, psoriasis (correct spelling) and
psoriaisis (typographical error) cannot be
matched. Similarly, anaemia (correct UK spelling)
and anemia (correct US spelling) cannot be
3Why Use a Clinical Terminology? Free-Text Search
Weaknesses (Contd)
- Brute force free-text search techniques cannot
locate relevant knowledge efficiently for three
reasons (Contd) - In the bibliographic world, e.g., MEDLINE, search
engines cannot process HTML intelligently. For
example, searching for resources on psoriasis
will retrieve all the documents containing this
word, but many of these resources might not be
relevant, i.e., psoriasis was just mentioned by
the way in these documents and is not their
actual topic. For example, some documents might
be mentioning psoriasis within a See also
psoriasis sentence, e.g., at the bottom of a
page covering another papulous squamous disease,
or under the differential diagnosis section of a
page covering another disease, e.g., Reiters
disease or seborrhoeic dermatitis.
4You might be doing a search on the term stroke
(cerebral infarction) and end up with documents
that teach you about the workings of the
two-stroke motorcycle engine. The
non-discriminatory free-text method of document
retrieval inevitably produces a number of
irrelevant leads or noise (Kiley, 1999).
- Structured (headings), coded data entry.
- It is essential that healthcare professionals
agree on the nature and content of the component
data sets of the different EPRs (e.g., record
structures and headings related to the different
medical, surgical and nursing specialities), so
that consistent basic models of these records can
be constructed and shared in a reliable way. (The
framework of headings Web site http//www.nhsia.nh
s.uk/headings/ is a good example of
well-organised standardisation efforts.)
Headings are very important. For example, the
same term schizophrenia assumes different
meanings/implications depending on whether it
appears under diagnosis or present/past history
or family history headings in an electronic
patient record.
6Introduction to Clinical Coding
- Clinical coding lies at the heart of successful
implementation of the EPR and integrated decision
support modules. - EPR coding is done using a fine granularity
terminology (or controlled healthcare vocabulary)
like Read CTV3 (Read Clinical Terms Version 3). - A controlled healthcare vocabulary is a system of
concepts to populate electronic healthcare
applications. Controlled healthcare vocabularies
are products of the electronic era, designed to
support computer-based functionality. - Read CTV3 allows not only the coding of diagnoses
and drugs (treatment), but also the coding of
symptoms and signs, and of different tests and
investigations. Moreover, Read CTV3 is a
compositional terminology, which means that
concepts can be constructed from primitive
building blocks with rules controlling different
7Introduction to Clinical Coding
- On the other hand, a clinical classification
allows categorisation of clinical data according
to intrinsic rules. Formal clinical
classifications have existed for over 100 years,
initially for mortality, but more recently for
morbidity and interventions. Classifications like
the WHO's ICD-10 (The International Statistical
Classification of Diseases and Related Health
Problems, tenth revision) offer a coarser
granularity (1000s of entries vs. 100,000s of
entries in clinical terminologies) and only
single parentage (so that an item may not be
counted twice under different headings), and are
therefore more suitable for statistical reporting
(national statistics and international
comparisons) using aggregated data.
8Introduction to Clinical Coding
- Groupings like HRGs (Health Resource Groups) have
an even much coarser granularity, lumping
together tens of different conditions in single
groups according to their resource consumption
(100s of groups). Grouping information from
aggregated EPRs helps in resource management,
planning and budget negotiations.
10How Do Concept Codes Help in Decision Support and
All the concepts represented by the terms in Read
CTV3 are arranged in a hierarchy (multiple
parentage allowed), i.e., they are semantically
defined within the clinical vocabulary. The
hierarchy describes which concepts are types of
something else. Consider this example Myocardial
infarction (and its synonyms) is a type
ofIschaemic heart disease (and its
synonyms) which is a type ofDisorder of heart
(and its synonyms) which is a type
ofCardiovascular disorder (and its
synonyms) which is a type ofDisorders (or
diseases) which is a type ofClinical findings
11How Do Concept Codes Help in Decision Support and
Now suppose that a doctor wishes to prescribe a
drug that must not be used by anyone having a
heart disorder. Because the clinical terminology
knows every condition that is a type of heart
disorder, it can automatically check the
patients record to see whether the patient has
any of these conditions. This could not have been
achieved with a free text patient record. For
instance, Angina is a type of heart disorder, but
this could not have been detected in a text-based
patient record, where the best that can be
achieved is to search for the word "heart" in the
text. Nor would it have been possible to search
for words "heart OR angina OR coronary OR
myocardial OR ... etc.", as there are over 1000
types of heart disorder listed in the Read Codes
for example. Read Codes hierarchy of concepts
allows all sorts of research questions such as
"list all my patients who have eczema, and list
these according to the type of eczema that they
have." The possibilities are endless...
12Controlled Clinical Terminology Desirable
Features (Cimino)
- Concept based
- Completeness (the compositional feature of a
terminology ensures completeness) - Synonymy (in this way the terminology is less
restrictive and richer all synonyms of a concept
point to it and are semantically associated with
it) - Hierarchical
- Multiple classification and multiple parentage
- Compositional
- Semantic definition of concepts
- Mapped to classifications (e.g., Read -gt ICD10
can be one-to-one or one-to-many maps usually
some detail is lost as classifications have
coarser granularity compared to terminologies) - Language-independent model
13No Ambiguity or Redundancy
- No duplicate concepts are allowed, i.e., cannot
allow two different ways of coding the same thing
or concept, e.g., "Heart attack" and "Myocardial
infarction" cannot be considered two different
concepts and given two concept codes they are
just synonyms. - "Paget disease" cannot be a concepts preferred
term or label, because it is ambiguous it can
point to "Paget disease of the breast" as well as
"Paget disease of bone". - Each concept has one unambiguous preferred term
and any number of synonyms. Synonyms may be
shared with other concepts, e.g., "Ventricle" is
a synonym (but cannot be the preferred term) of
both "Cardiac ventricle" and "Brain ventricle".
14Concept and Term Codes
15Concept and Term Codes
- "Plaque psoriasis" (concept label) has a Read
concept code M1614 while the codes (TermIds) of
the preferred term Plaque psoriasis (again) is
Y50HZ and synonymous terms "Discoid psoriasis"
Y50Ha and "Nummular psoriasis" Y50Hc,
i.e., four codes for this example one concept
code and three TermIds.
16Arranging Concepts
- Concepts can be arranged orthographically (by
spelling, i.e., A to Z), like a dictionary (e.g.,
Apple, Dog, Orange, Zebra). However, arranging
concepts semantically (by meaning) like a
thesaurus is much more useful (e.g., Fruits
Apple, Orange, Animal Dog, Zebra).
17Directed Acyclic Graph (DAG)
- DAG allows multiple parentage and allows concepts
to be moved and reclassified as medical knowledge
changes (cf. rigid code-dependent hierarchy of
ICD). With DAG, unlimited hierarchy depths can be
reached (cf. Only four levels in ICD), but all
these features of DAG come on the expense of
increased complexity for implementers.
18Directed Acyclic Graph (DAG)
19Enumerative Vs Compositional Terminologies
- Enumerative (pre-coordinated) terminologies,
where every possible concept is listed
explicitly, result in compositional explosion and
you can never be sure that you have listed all
the possibilities. - A compositional (post-coordinated) terminology on
the other hand, like Read CTV3, seeks to
construct concepts from primitive building
blocks, governed by validation rules. - OAV (Object-Attribute-Value) triples constitute
the description logic scheme used in Read CTV3,
and help achieving semantic definition of
concepts. - In SNOMED (Systematised Nomenclature of Human and
Veterinary Medicine - College of American
Pathologists), the description logic is KRSS
(Knowledge Representation System Specification)
while in GALEN it is GRAIL. - N.B. Read Codes are due to be merged with SNOMED
by 2002, to create a new worldwide standard
clinical coding scheme. This will be called
SNOMED Clinical Terms (SNOMED-CT).
20Enumerative Vs Compositional Terminologies
21Description Logics
- Description logics (DLs) lie at the heart of any
clinical terminology. DLs are languages that
allow reasoning about information, in particular
supporting the classification of descriptions by
working out how concepts and their instances
relate to one another based on their roles. They
can thus infer knowledge implied by an ontology.
22Terminology Servers
- Medical terminologies are foundational ontologies
used by many applications, and hence they should
not be embedded in client applications, but
should be shared and reused as distributed
resources by implementing them as services
through terminology servers. - A terminology server is a special type of
ontology servers that allows retrieval of related
concepts (parent, child, sibling, cousin and
uncle concepts) and synonyms, and querying and
cross-mapping multiple terminologies/
classifications at the same time. Ideally, it
should also support concept mapping, which
involves processing free text queries to identify
corresponding terms from a controlled vocabulary
this relieves users from any restrictions while
ensuring accurate results (contextual relevancy)
and can also support multiple languages.
23Terminology Servers (Contd)
- Chute et al mention the following desiderata for
a clinical terminology server word
normalisation, word completion, target
terminology specification, spelling correction,
lexical matching, term completion, semantic
locality, term composition and decomposition. - Examples of terminology servers include Saphire
International (http//www.ohsu.edu/cliniweb/saphin
t/) and jTerm (http//www.jterm.org), a
Java-based open source terminology server.
- A classification is a system of categories to
which entities are assigned according to some
established criteria, e.g., anatomy, disease
process or pathology, aetiology, clinical (like
obstetrics), or a combination of these.
Categories are limited in number, all
encompassing and stable over time. Common and
important entities are assigned to specific
categories, while uncommon and less significant
entities are included within other categories.
26Entities rules of engagement in ICD
- Index, e.g., 443.1 Buergers disease (ICD9)
- Inclusions (when an entity is less significant),
e.g., 443.6 Other (incl. Acrocyanosis, Diabetic
peripheral angiopathy) excl. Chilblains,
Frostbite, Immersion foot (ICD9) - Exclusions (see above example tells you not to
count an entity under this code, as it is listed
elsewhere within the classification with another
code) - Otherwise specified categories (OS) include
other specific but less significant entities - Unspecified (NOS Not Otherwise Specified),
e.g., 443.9 Unspecified (incl. Intermittent
claudication, Spasm of artery) excl. Spasm of
cerebral artery (435) (ICD9) - Extensions (5th digit), e.g., to differentiate
between closed and open fracture neck of femur as
an open fracture is much more liable to infection
and complications. - Dagger asterisk, e.g., Cause 265.0 Beriberi
Effect 425.7 Nutritional cardiomyopathies
27Characteristics of a Classification
- All concepts can find a single place in a
suitable classification (i.e., all-inclusive,
mutually exclusive). A single concept cannot be
classified under two different headings (i.e., no
multiple parentage) this prevents double
counting of a condition, which is essential for
reliable statistics and central returns
(remember statistics are the main raison dêtre
of classifications). - In classifications, you loose detail (related
concepts are aggregated and counted together no
distinction between them is made on the code and
statistics levels).
28Characteristics of a Classification
- Classifications become less accurate with time,
and will eventually need revision at some stage,
e.g., when new diseases are discovered (where to
put these diseases, and if we put them under
existing categories the meaning of these
categories will drift with time making
comparisons with previous years statistics done
using the same classification less accurate and
reliable). Updating a classification also implies
preparing an equivalence-mapping table (to
compare statistics done using different versions
of the classification). - In addition to ICD (for primary diagnosis),
OPCS-4 (a surgical operations and procedure
classification) is used in the UK.
29The Pyramid
30Mapping and Grouping Tools
CamsCoder is a good example of a mapping tool
that translates Read CTV3 terms into ICD-10 and
OPCS-4 terms and codes.
CamsCoder presents us with a review screen
showing the diagnostic and operative procedures
CamsCoder allows the entered statements (which
may contain multiple terms and codes) to be
re-ordered/deleted using these buttons.
The coder can set the episode coding to be
complete when they are happy with it.
The episodes HRG is displayed here.
CamsCoder automatically validates the information
presented. If there are any problems, a message
is displayed here explaining what the coder needs
to do.
If the coding was invalid clicking on the
Action button would take the coder through the
process of making the episode valid.
The coder now clicks on OK to finish this coded
31The LOINC Codes
- The LOINC database provides a set of universal
names and ID codes for identifying laboratory and
clinical observations. The goal of LOINC is to
facilitate the exchange and pooling of clinical
laboratory results, such as blood haemoglobin or
serum potassium, for clinical care, outcomes
management, and research. - Currently, many laboratories are using HL7 or
similar standards, to send laboratory results
electronically from producer laboratories to
clinical care systems in hospitals. Most
laboratories identify tests in these messages by
means of their internal (and idiosyncratic) code
values, so the receiving systems cannot fully
"understand" the results they receive unless they
either adopt the producer's laboratory codes
(which is impossible if they receive results from
multiple source laboratories), or invest in work
to map each laboratory's code system to the
receiver's internal code system.
32The LOINC Codes
You may download and use to LOINC database
browser free of charge from http//www.regenstrief
33The LOINC Codes
- If laboratories all used the LOINC codes to
identify their results in data transmissions,
this problem would disappear. The receiving
system with LOINC codes in its master vocabulary
file would be able to understand and properly
file HL7 results messages that also use the LOINC
code. Similarly, government agencies would be
able to pool and analyse results for tests from
many sites if they were reported electronically
using the LOINC codes.
342001 MeSH (Medical Subject Headings)
- MeSH was originally developed by United States
National Library of Medicine (NLM) to index the
world medical literature in MEDLINE (MeSH
provides bibliographic headings for indexing)
the latest MeSH version is 2001 MeSH. MeSH also
forms an essential part of the NLMs Unified
Medical Language System (UMLS). - MeSH qualifiers or subheadings are used to better
define a topic, narrow retrieval, or express a
certain aspect of a main heading. - It should be noted that MeSH is not an efficient
indexing language for tasks such as classifying
episodes of patient care. The more efficient
clinical coding systems (e.g., Read
Codes/Clinical Terms Version 3) are more suited
to coding the Electronic Patient Record.
352001 MeSH (Medical Subject Headings)
MeSH Descriptor Data for Psoriasis, a skin
362001 MeSH Tree Structures
- MeSH hierarchy allows broader (parents or
ancestors and siblings) and narrower (children or
successors) concept relationships. Moreover,
within this hierarchy, a single concept may
appear as narrower concepts of more than one
broader concept, e.g., "Psoriatic Arthritis"
appears under both "Joint Diseases" and "Skin
cf. ICD remember each coding language or scheme
is most suited to particular purpose(s).
372001 MeSH Tree Structures
38UMLS (Unified Medical Language System)
- The UMLS project (http//umls.nlm.nih.gov/) is a
long-term research and development project at the
United States' National Library of Medicine (NLM)
whose goal is to help health professionals and
researchers to intelligently retrieve and
integrate information from a wide range of
disparate electronic biomedical information
sources. It can be used to overcome variations in
the way similar concepts are expressed in
different sources. This makes it easier for users
to link information from patient record systems,
bibliographic databases, factual databases,
expert systems, etc. The UMLS Knowledge Services
can also assist in data creation and indexing
39UMLS (Unified Medical Language System)
- The UMLS includes machine-readable "Knowledge
Sources" that can be used by a wide variety of
applications programs to compensate for
differences in the way concepts are expressed in
different machine-readable sources and by
different users, to identify the information
sources most relevant to a user inquiry. - The Metathesaurus contains mappings to MeSH,
ICD-9-CM, SNOMED, CPT, and a number of other
coding systems.
40UMLS (Unified Medical Language System)
- The UMLS is not itself a standard it is a
cross-referenced collection of standards and
other data and knowledge sources. It is a very
valuable resource for solving the most difficult
problem in exchanging healthcare information the
multiplicity of coding systems in use today. - One on-line use of the UMLS is the Medical World
Search site (http//www.mwsearch.com/). When a
user searches the Web for a medical concept,
Medical World Search uses UMLS to include
synonyms in the query.
41The UMLS Project and its Components
- The project is directed by a multidisciplinary
team, including clinicians, computer and
information scientists, and linguists, and
involves collaboration with many medical
informatics research groups. The project work has
resulted in a set of knowledge sources and
accompanying programs that are updated and
distributed regularly on CD-ROM. Online access to
the UMLS knowledge sources is provided through
the Internet-based UMLS Knowledge Source Server,
which includes an application programming
interface (API) and a World Wide Web interface.
The Web site requires registration
42UMLS Metathesaurus
- The Metathesaurus contains information about
biomedical concepts and terms from a large number
of controlled terminologies and thesauri. The
Metathesaurus preserves the information encoded
in the source vocabularies, such as the
hierarchical contexts of the terms, their
meanings and other attributes. The Metathesaurus
is organised by concepts, which means that
alternate names (synonyms, lexical variants, and
translations) for the same meaning are all linked
together as one concept. The Metathesaurus adds
information to the concepts, including semantic
types, definitions, and inter-concept
43UMLS Metathesaurus (Contd)
- The Metathesaurus contains hundreds of thousands
of concepts from a broad range of vocabularies.
These include, for example, all or portions of
the following terminologies - the Systematised Nomenclature of Medicine (SNOMED
International), - the Read Thesaurus,
- the International Classification of Diseases -
Clinical Modification (ICD9-CM), - the Universal Medical Device Nomenclature System,
- the WHO Adverse Drug Reaction Terminology,
- the Classification of Nursing Diagnoses (NANDA),
- the Home Health Care Classification of Nursing
Diagnoses and Interventions, - the Physicians' Current Procedural Terminology
(CPT), - the Medical Subject Headings (MeSH),
- the Diagnostic and Statistical Manual of Mental
Disorders (DSMIV), and - the Thesaurus of Psychological Index Terms.
- In addition, translations of some of the
terminologies into languages other than English
are included.
44UMLS Semantic Network
- The Semantic Network, through its high-level
semantic types, or categories, provides a
consistent categorisation of all concepts
represented in the Metathesaurus. The links
between the semantic types provide the structure
for the Network and represent important
relationships in the biomedical domain. There are
semantic types for organisms, anatomical
structures, biologic function, chemicals, events,
physical objects, and concepts or ideas. The
primary relationship is the "is_a" link, and
there are five major categories of additional
relationships physical, spatial, temporal,
functional, and conceptual relationships.
45UMLS (Unified Medical Language System)
46UMLS (Unified Medical Language System)
47UMLS (Unified Medical Language System)
48(No Transcript)
49(No Transcript)
50Software Implementations You Can Experiment With
- Agoras Web-based READ 3.1 browser that you can
play with http//www.agora.co.uk1080/read/gp.htm
Lightweight browser and search engine for the
Read Codes clinical thesaurus by Agora.
51Software Implementations You Can Experiment With
- CLUE (CIC Look Up Engine from The Clinical
Information Consultancy, UK) is a freeware
clinical coding solution that helps you add NHS
Clinical Terms Version 3 capabilities to clinical
applications in hours rather than months (you may
use Visual Basic for example to access CLUE's
API). CLUE also offers a ready-to-use Read Codes
browser. You may download the full CLUE package
free of charge, but beware that a terminology
like Read CTV3 with more than 200,000 concepts,
nearly 300,000 terms and over a million access
keys is not a small download, so be prepared for
this one (over 20MB).http//www.clinical-info.co.
52Software Implementations You Can Experiment With
- CLUE (CIC Read CTV3 Look Up Engine from The
Clinical Information Consultancy, UK)
53Software Implementations You Can Experiment With
ICD9 CodeFinder lets you search and browse ICD9
categories and codes. You may download CodeFinder
free of charge (298 KB).http//www.winsite.com/in
54Software Implementations You Can Experiment With
- e-MDs Online ICD-9 Search http//www.e-mds.com/ic
55Recommended Web Links and Papers
- Bechhofer SK, Goble CA, Rector AL, Solomon WD,
and Nowlan WA. Terminologies and Terminology
Servers for Information Environments. In
Proceedings of STEP '97 Software Technology and
Engineering Practice, 1997. URI
http//citeseer.nj.nec.com/354766.html - Chute CG, Elkin PL, Sheretz DD and Tuttle MS.
Desiderata for a Clinical Terminology Server. In
Proceedings of AMIA'99 Annual Symposium, 1999.
URI http//www.amia.org/pubs/symposia/D005782.PDF
- Rector AL. Clinical Terminology Why Is it so
Hard? Methods Inf Med. 199938(4-5)239-52 - The British Association of Clinical Terminology
Specialists http//www.bacts.org.uk/ - OpenGALEN http//www.opengalen.org/
- Read Codes Engines http//www.cams.co.uk/ and
http//www.visualread.com - See also Related Web Links section