Title: New Logo Here
1 T25 Lexical Tools for UMLS
Developers 11/10/2002 800 AM
New Logo Here!
2Lexical Tools for UMLS Developers
November 10, 2002 Allen C. Browne, Guy Divita,
Chris Lu Lister Hill National Center for
Biomedical Communications National Library of
3(No Transcript)
4Lexical Tools for UMLS Developers
- The SPECIALIST lexicon Browne
- The lexical tools Divita/Lu
- Coffee Break 1000 - 1030
- Lexical tools cont. Divita/Lu
5Text processing
Lexical tools
- A syntactic lexicon
- Biomedical and general English
- Over 180,000 records
- General English
- 10,000 most frequent words from the American
Heritage word frequency list - 2,000 words used by Longmans Dictionary of
Contemporary English - Verbs and adjectives identified by heuristics
8Lexicon Growth
9George A. Miller The Science of Words 1991
10The SPECIALIST Lexicon
- Morphology
- Inflection
- Derivation
- Orthography
- Spelling variants
- Syntax
- Complementation for verbs, nouns, and adjectives
- Inflectional
- nucleus -- nuclei
- cauterize, cauterizes, cauterized, cauterizing
- red, redder reddest
- Derivational
- laryngeal -- larynx
- transport -- transportation
Spelling Variation
- align -- aline
- Graves disease -- Gravess disease -- Graves
disease - anesthetize -- anaesthetise
- esophagus -- oesophagus
14British and American Spelling
- Criticise -- criticize
- naturalise --naturalize
- centre -- center
- foetus -- fetus
15Syntax -- Verb Complements
- Intran
- Ill treat.
- trannp
- He treated the patient.
- ditrannp,pphr(with,np)
- She treated the patient with the drug.
16Syntax -- Verb Complements
basetreat entryE0061964 catverb variantsreg
intran trannp tranpphr(with,np) tranpphr(o
f,np) ditrannp,pphr(to,np) ditrannp,pphr(with,
np) ditrannp,pphr(for,np) cplxtrannp,advbl no
17The 2003 SPECIALIST Lexicon
18(No Transcript)
19Lexicon Unit Records
basechronic entryE0016869 catadj variantsin
v positionattrib(1) positionpred stative
baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
baseaspirate entryE0010803 catverb
variantsreg trannp
basein entryE0033870 catprep
20Noun Variants
- Kaposis sarcoma
- Kaposis sarcomas
- Kaposis sarcomata
- Kaposi sarcoma
- Kaposi sarcomas
- Kaposi sarcomata
baseKaposi's sarcoma spelling_variantKaposi
sarcoma entryE0003576 catnoun variantsuncount
variantsreg variantsglreg
21Regular Nouns
The plural suffix is s. y becomes ie following a
consonant before s. e is inserted before s if the
base ends in s, z, x, ch, or s
22Regular Nouns
23Greco-latin Regular nouns
24Uncount Nouns(abstract or mass)
basesmallpox entryE0056359 catnoun variants
uncount basepotassium entryE0049387 catnoun
- a smallpox
- two smallpoxes
- much smallpox
- a potassium
- two potassiums
- much potassium
25Fixed Plural Nouns
basescissors entryE0054633 catnoun
basepolice entryE0048616 catnoun variantspl
26Irregular Nouns
baselarynx entryE0036919 catnoun
variantsirreglarynges variantsreg
basecorpus entryE0019113 catnoun
variantsirregcorpora variantsreg
27Regular Verbs
- The third person present tense suffix is s.
- y becomes ie following a consonant before s.
- e is inserted between z, x, ch, or sh and s.
- The past tense suffix is ed.
- y becomes ie following a consonant before ed.
- Final e is deleted before ed.
28Regular Verbs
- dismissdismisses, dismissed, dismissing
- agree agrees agreed agreeing
- dry dries, dried, drying
29Regular Doubling Verbs
- End in a CVC pattern
- Double the final consonant before ed and ing.
- Are otherwise regular
- variantsregd
- e.g. control controls, controlled, controlling
30Irregular Verbs
basedive catverb variantsreg
intran intranpart(in) ...
31Dive vs. Dove
32Regular Adjectives and Adverbs
- The comparative suffix is er.
- The superlative suffix is est.
- y become ie after a consonant before er or est.
- Final e is deleted before er or est.
- e.g. green greener, greenest
33Regular Doubling Adjectives and Adverbs
- CVC final pattern
- Final consonant is doubled before ed or est.
- Otherwise regular
- e.g. red redder, reddest
34Ancillary Data Bases
- Synonymy
- sm.db
- Derivation
- dm.db, dm.rules
- Inflection
- im.rules
- Neoclassical compounds
- nc.db
35Derivational Facts and Rules
dm.facts treatmentnountreatverb prohibitionno
unprohibitiveadj cell lineagenouncell
linenoun photochemotherapeuticadjphotochemother
apynoun pharmacotherapeuticadjpharmacotherapyn
36Derivational Facts and Rules
dm.rules e.g. alienationalienate ationnouna
teverb rationrate stationstate
37Inflectional Facts and Rules
im.rules Noun rules (glreg) usnounsingular
inounplural antusanti manounsingularm
atanounplural anounsingularaenounplural
umnounsingularanounplural onnounsingular
anounplural sisnounsingularsesnounplura
l isnounsingularidesnounplural mennounsi
ngularminanounplural exnounsingularicesn
ounplural xnounsingularcesnounplural
38Neoclassical compounds
nc.db abdomin(o)abdomenroot abaway
fromprefix acanth(o)prickleroot acar(o)mitero
ot acetabul(o)acetabulumroot adtowardsprefix a
gogueinducingterminal albumin(o)albuminroot si
sconditionterminal stomysurgical
sm.db alaradjwingnoun amygdalineadjtonsilno
un articularadjjointnoun bulbaradjmedulla
oblongatanoun fununcularadjboilnoun genicular
adjkneenoun hepatocellularadjliver
cellsnoun lazaradjleprosynoun lenticularadjc
rystalline lensnoun ypsiliformadjupsiloidadj w
olframnountungstennoun double
40Relational Tables
- One line records
- Pipe separated Fields --
- Keyed to EUI
- LRAGR matches forms to EUIs
- Word index LRWD
41Relational Tables
- LRAGR - Agreement
- LRCMP - Complements
- LRFIL - Files
- LRFLD - Fields
- LRMOD - Modification
- LRNOM - Nominalization
- LRPRN - Pronouns
- LRPRP - Properties
- LRSPL - Spelling
- LRTRM - Trademarks
- LRWD - Word index
Agreement and Inflection
- EUI - Entry ID
- STR - Inflected form
- SCA - Syntactic category
- AGR - agreement information
- BAS - Base form (morphological)
- CIT - Citation form (base)
E0003576Kaposi sarcomasnouncount(thr_plur)Kapo
si sarcomaKaposi's sarcoma E0003576Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomaKaposi's sarcoma E0003576Kaposi's
sarcomaKaposi's sarcoma
44Number Words
- one, thirteen fifty, thousand, million
- Not in the lexicon.
- No part of speech
- Used to construct number expressions
- Three thousand eight hundred and five
- To be released in the 2003 lexicon.
- Accompanying number tools.
45basetwo catnumber_word entryN0000003
variantsecondordinal variantseconddenom
nator varianthalvesdenominator,pluralfull_d
enominator number_typeunit value2
46basetwenty catnumber_word
entryN0000021 variantsreg
number_typedecade value20 digit2
basetwelve catnumber_word
entryN0000013 variantsreg
number_typeteen value12
basebillion catnumber_word
entryN0000032 variantsreg
number_typemagnitude power3
basesexdecillion catnumber_word
entryN0000046 variantsreg
number_typemagnitude power17
47Text processing
Lexical tools
48Lexical Tools
- Wordind -- breaks strings into words
- Produces the Metathesaurus word indexes (MRXW)
- LVG -- performs various lexical transformations
- NORM -- a selection of LVG transformations,
- Used for Metathesaurus indexing
- Produces the Metathesaurus Normalized word and
string indexes (MRXNW MRXNS) - Used to access those indexes
- Hodgkin Disease
- Hodgkin's Disease
- Disease, Hodgkin's
- Hodgkin's disease
- Hodgkins Disease
- Hodgkin's disease NOS
- Hodgkin's disease, NOS
- Disease, Hodgkins
- Diseases, Hodgkins
- Hodgkins Diseases
- Hodgkins disease
- hodgkin's disease
- DiseaseHodgkins
- Disease, Hodgkin