Additional NLS Tools - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Additional NLS Tools

Description:

Create a TokenizeAPI object == TokenizeAPI tokenizer = new TokenizeAPI( argv ) ... object == LexicalLookupAPI look = new LexicalLookupAPI(argv) ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 37

Provided by: Div62

Category:

more less

Transcript and Presenter's Notes

Title: Additional NLS Tools

1
Additional NLS Tools

Knowledge Source Server Java Client API
NLSs Java NLP tools
MMTx
GSpell

2
Knowledge Source Server Java Client API

XML over RMI
Java UMLS Object Model

Chapter 5. Building UMLSKS Software
Applications Chapter Contents 5.1 Building and
Running Your Program5.2 API Package
Structure5.3 Program Initialization5.4 UMLSKS
API Functions 5.5 Using the UMLSKS Object Model
4
Knowledge Source Server Java Client API

5
Knowledge Source Server Java Client API
// Initialize the client
KSSRetrieverV2_1 retriever
(KSSRetrieverV2_1) Naming.lookup("//umlsks.nlm.
nih.gov/KSSRetriever") //
Send a request to client char result
retriever.findBasicConcept(ksYear,
termName, sabs,
language, KSSRetriever.NormalizeStr
ing, false ) //
Convert the XML into ...
ConceptVector concepts ConceptVector.getInst
ance(
String.valueOf(result))

6
Knowledge Source Server Java Client API

ltconceptgt
ltcuigtC0032615lt/cuigt
ltcngtFatty Acids, Polyunsaturatedlt/cngt
lttermgt
ltluigtL0032615lt/luigt
lttngtFatty Acids, Polyunsaturatedlt/tngt
lttsgtPlt/tsgt
ltlatgtENGlt/latgt
lttermVariantgt
ltsuigtS0010240lt/suigt
ltsttgtVWlt/sttgt
ltstrgtAcids, Polyunsaturated
Fattylt/strgt
ltstrSourcegt
ltsabgtMSH2002lt/sabgtltttygtPMlt/ttygtltscd
gtD005231lt/scdgtltsrlgt0lt/srlgt

7
NLS Java NLP Tools

Tokenizer
Lexical Lookup
NP Parser
Document Centric
Java Programs
and APIs

8
Java NLP Tools Tokenizer
Document

Tokenizes text into
Sections (paragraphs)
Sentences
Tokens
Can handle
FreeText
HTML
MedLINE Abstracts

Sections
Section 1
Sentences
Sentence 1
Tokens
Token 1
9
Java NLP Tools Tokenizer

Usage
tokenize.batsh Options
--fileNamefileName
--outputFileNamefileName
--inputTypefreeTextHTMLmedlineCitations
--sections
--sentences
--tokens
--pipedOutput
--indicate_citation_end

10
Java NLP Tools Tokenizer
tokenize.bat --inputFile5.txt --inputTypefreeTex
t --sentences --tokens
--pipedOutput

Sentence197182But those follow-up tests have
been inconclusive, state and federal officials
said.
Token16979900But
Token1710110510those
Token1810811320follow
Token1911411420-
Token2011511630up
Token2111812240tests
Token2212412750have
Token2312913260been
Token2413414570inconclusive

11
Java NLP Tools Tokenizer

// Create a TokenizeAPI object
TokenizeAPI tokenizer new TokenizeAPI( argv )
// Tokenize the file
Document aDocument
tokenizer.processDocument( aFile)
Vector tokens aDocument.getTokens()
int numberOfTokens tokens.size()
Token aToken null
// Print the tokens out
for ( int i 0 i lt numberOfTokens i )
aToken (Token) tokens.get(i)
System.out.println( aToken.toPipedString() )

12
NLP Tools Lexical Lookup
Document

Chunks tokens into
terms
From SPECIALIST
Lexicon
From regular
expressions

Sections
Section 1
Sentences
Sentence 1
LexicalElements
Lexical Element 1
Tokens
13
Java NLP Tools Lexical Lookup

Usage
LexicalLookup.batsh Options
--fileNamefileName
--outputFileNamefileName
--inputTypefreeTextHTMLmedlineCitations
--sections
--sentences
--lexicalElements
--lexicalEntries
--tokens
--pipedOutput

14
Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput

Lexical Element17LEXICONprepBut9799
LexicalEntrybutconjbaseE0014465
LexicalEntrybutprepbaseE0014464
Lexical Element18LEXICONdetthose101105
LexicalEntrythosedetpluralE0060728
LexicalEntrythosepronbaseE0060729
Lexical Element20LEXICONadjfollow-up108116
LexicalEntryfollow-upadjbaseE0028422
Lexical Element23LEXICONnountests118122
LexicalEntrytestsverbpres3sE0060349
LexicalEntrytestsnounpluralE0060348

15
Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput

Lexical Element12SHAPEUnlabeledunknownRichmon
d6774
Lexical Element13LEXICONprepfor7678
Lexical Element14LEXICONadjfurther8086
Lexical Element15LEXICONverbtesting8894
Lexical Element 16PUNCTUATIONpunctuation.959
5
Lexical Element 17LEXICONprepBut9799
Lexical Element 18LEXICONdetthose101105
Lexical Element 20LEXICONadjfollow-up108116
Lexical Element 23LEXICONnountests118122
Lexical Element 24LEXICONauxhave124127

16
Java NLP Tools Lexical Lookup

// Create a LexicalLookupAPI object
LexicalLookupAPI look new LexicalLookupAPI(argv)
// Chunk the file
Document aDocument look.processDocument( aFile
)
Vector les aDocument.getLexicalElements()
int numberOfLexElements les.size()
LexicalElement aLexElement null
// Print the LexicalElements out
for (int i 0 ilt numberOfLexElements i )
aLexElement (LexicalElement) les.get(i)
System.out.println(aLexElement.toPipedString())

17
NLP Tools NpParser

Chunks sentences
into simple phrases

18
Java NLP Tools NpParser

Usage
npParser.batsh Options
--fileNamefileName
--outputFileNamefileName
--inputTypefreeTextHTMLmedlineCitations
--sections
--sentences
--phrases--nps--mincoMan
--lexicalElements
--lexicalEntries
--tokens
--pipedOutput

19
Java NLP Tools NpParser
npParser.bat --inputFile5.txt --inputTypefreeTex
t --phrases --pipedOutput

Phrase0010The companycompany
Phrase11214has
Phrase21624forwarded
Phrase32639some materialsmaterials
Phrase44162to a state laboratorystate
laboratory
Phrase56474in RichmondRichmond
Phrase67686for furtherfurther
Phrase78894testing

20
Java NLP Tools NpParser

// Create a Parser object
Parser parser new Parser( argv )
// Parse the file
Document aDocument parser.processDocument(aFile)
Vector phrases aDocument.getPhrase()
Int numberOfPhrases phrases.size()
Phrase aPhrase null
// Print the Phrases out
for ( int i 0 i lt numberOfPhrases i )
aPhrase (Phrase) phrases.get(i)
System.out.println( aPhrase.toPipedString() )

21
MMTxMetaMapTechnology Transfer

Maps text phrases to Metathesaurus
concepts
Java
Implementation
of MetaMap

22
MMTx
Document
Tokenization
POS Tagger Client
Lexical Lookup
Parser
Variant Generation
Candidate Retrieval
Evaluation
Final Mapping
Post-processing Presentation
23
MMTx

Usage
MMTx ltoptionsgt --fileNameinfile
outputFileNameoutfile
--strict_model--moderate_model--relaxed_model
--KSYearyear--mm_data_versioncustomName
--thresholdlowestScore
--truncate_candidates_mappings
--term_processing--allow_overmatches--allow_co
ncept_gaps
--composite_phrases
--prefer_multiple_concepts
--fielded_output

24
MMTx
MMTx --inputFile5.txt --inputTypefreeText

Processing 00000000.tx.3 One problem is caused
by the VecTest itself,
which uses a dipstick to measure the presence of
a protein
associated with the parasite that causes malaria.
Phrase "One problem"
Meta Candidates (2)
861 Problem, NOS Finding,Pathologic Function
694 One Quantitative Concept
Meta Mapping (888)
694 One Quantitative Concept
861 Problem, NOS Finding,Pathologic Function

25
MMTx

// Create a MMTxAPI object
MMTxAPI mmtx new MMTxAPI( argv )
// Analyze the file
Document aDocument mmtx.processDocument(aFile)
Vector phrases aDocument.getPhrases()
int numberOfPhrases phrases.size()
Token aPhrase null
// Print the Phrases out
for ( int i 0 i lt numberOfPhrases i )
aPhrase (Phrase) phrases.get(i)
finalConcepts aPhrase.getFinalMappings()

26
Useful Text Feature Classes
Many-to-one Relationlship
27
GSpell
28
GSpell

Spelling suggestion tool
Pure Java application with Java APIs
Support for multi word dictionary entries

29
GSpell Usage

Usage
GSpellFind.shbat
--dictionaryNameOfDictionary
--inputFileSource --outputFiletarget
--truncateN --considerNCandidatesN
--maxEditDistanceN
--fieldedText --termFieldX
--correctFieldY
--reportTime --version--help

30
GSpell Example

anonomousanonymous1.00.8734230160180236NGrams
anonomousallonomous2.00.5819672267388108NGram
s
anonomousautonomous2.00.5819672267388108NGram
s
anonomousanadromous3.00.2958160192082048NGram
s
anonomousanalogous3.00.2958160192082048NGrams
anonomousanomalous3.00.2958160192082048NGrams
anonomousanonymously3.00.295816019208248NGram
s
anonomousanonymes3.00.2958160192082048Metapho
ne
anonomousanonyms3.00.2958160192082048Metaphon
e
anonomousacoprous4.00.11470810702102521NGrams

31
GSpell Indexing

Usage
GSpellIndex.shbat
--dictionaryNameOfDictionary
--inputFileSourceFile
--reportTime --version--help
Format for the input file
One word per line

32
GSpell Developers Guide

import gov.nih.nlm.nls.gspell.GSpell //
lt-------These come from the gspell.jar
import gov.nih.nlm.nls.gspell.Candidate
GSpell gspell new GSpell( _dictionaryName,
GSpell.READ_ONLY )
candidates gspell.find( aTerm )
if ( candidates ! null )
for ( int i 0 i lt candidates.length i )
System.out.println(candidatesi.toString())
else
System.out.println("No Suggestions")
gspell.cleanup()

33
Downloadable Resources

umlsks.nlm.nih.gov
umlsLex.nlm.nih.gov
Lvg
Java NLP Tools
GSpell
mmtx.nlm.nih.gov
Requires a UMLS Licience Aggreement

34
Lexical Tools for UMLS Developers
November 10, 2002 Allen C. Browne, Guy Divita,
Chris Lu Lister Hill National Center for
Biomedical Communications National Library of
Medicine
Lexical Systems
umlsLex.nlm.nih.gov Email
umlslex_at_nlm.nih.gov Knowledge Source
Server http//umlsks.nlm.nih.gov UMLS
Information http//umlsInfo.nlm.nih.go
v
35
Appendix