Title: Additional NLS Tools
1Additional NLS Tools
- Knowledge Source Server Java Client API
- NLSs Java NLP tools
- MMTx
- GSpell
2Knowledge Source Server Java Client API
- XML over RMI
- Java UMLS Object Model
3Chapter 5. Building UMLSKS Software
Applications Chapter Contents 5.1 Building and
Running Your Program5.2 API Package
Structure5.3 Program Initialization5.4 UMLSKS
API Functions 5.5 Using the UMLSKS Object Model
4Knowledge Source Server Java Client API
5Knowledge Source Server Java Client API
// Initialize the client
KSSRetrieverV2_1 retriever
(KSSRetrieverV2_1) Naming.lookup("//umlsks.nlm.
nih.gov/KSSRetriever") //
Send a request to client char result
retriever.findBasicConcept(ksYear,
termName, sabs,
language, KSSRetriever.NormalizeStr
ing, false ) //
Convert the XML into ...
ConceptVector concepts ConceptVector.getInst
ance(
String.valueOf(result))
6Knowledge Source Server Java Client API
- ltconceptgt
- ltcuigtC0032615lt/cuigt
- ltcngtFatty Acids, Polyunsaturatedlt/cngt
- lttermgt
- ltluigtL0032615lt/luigt
- lttngtFatty Acids, Polyunsaturatedlt/tngt
- lttsgtPlt/tsgt
- ltlatgtENGlt/latgt
- lttermVariantgt
- ltsuigtS0010240lt/suigt
- ltsttgtVWlt/sttgt
- ltstrgtAcids, Polyunsaturated
Fattylt/strgt - ltstrSourcegt
- ltsabgtMSH2002lt/sabgtltttygtPMlt/ttygtltscd
gtD005231lt/scdgtltsrlgt0lt/srlgt -
7NLS Java NLP Tools
- Tokenizer
- Lexical Lookup
- NP Parser
- Document Centric
- Java Programs
- and APIs
8Java NLP Tools Tokenizer
Document
- Tokenizes text into
- Sections (paragraphs)
- Sentences
- Tokens
- Can handle
- FreeText
- HTML
- MedLINE Abstracts
Sections
Section 1
Sentences
Sentence 1
Tokens
Token 1
9Java NLP Tools Tokenizer
- Usage
- tokenize.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --tokens
- --pipedOutput
- --indicate_citation_end
10Java NLP Tools Tokenizer
tokenize.bat --inputFile5.txt --inputTypefreeTex
t --sentences --tokens
--pipedOutput
- Sentence197182But those follow-up tests have
been inconclusive, state and federal officials
said. - Token16979900But
- Token1710110510those
- Token1810811320follow
- Token1911411420-
- Token2011511630up
- Token2111812240tests
- Token2212412750have
- Token2312913260been
- Token2413414570inconclusive
11Java NLP Tools Tokenizer
- // Create a TokenizeAPI object
- TokenizeAPI tokenizer new TokenizeAPI( argv )
- // Tokenize the file
- Document aDocument
- tokenizer.processDocument( aFile)
- Vector tokens aDocument.getTokens()
- int numberOfTokens tokens.size()
- Token aToken null
- // Print the tokens out
- for ( int i 0 i lt numberOfTokens i )
- aToken (Token) tokens.get(i)
- System.out.println( aToken.toPipedString() )
12NLP Tools Lexical Lookup
Document
- Chunks tokens into
- terms
- From SPECIALIST
- Lexicon
- From regular
- expressions
Sections
Section 1
Sentences
Sentence 1
LexicalElements
Lexical Element 1
Tokens
13Java NLP Tools Lexical Lookup
- Usage
- LexicalLookup.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --lexicalElements
- --lexicalEntries
- --tokens
- --pipedOutput
-
14Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
- Lexical Element17LEXICONprepBut9799
- LexicalEntrybutconjbaseE0014465
- LexicalEntrybutprepbaseE0014464
- Lexical Element18LEXICONdetthose101105
- LexicalEntrythosedetpluralE0060728
- LexicalEntrythosepronbaseE0060729
- Lexical Element20LEXICONadjfollow-up108116
- LexicalEntryfollow-upadjbaseE0028422
- Lexical Element23LEXICONnountests118122
- LexicalEntrytestsverbpres3sE0060349
- LexicalEntrytestsnounpluralE0060348
15Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
- Lexical Element12SHAPEUnlabeledunknownRichmon
d6774 - Lexical Element13LEXICONprepfor7678
- Lexical Element14LEXICONadjfurther8086
- Lexical Element15LEXICONverbtesting8894
- Lexical Element 16PUNCTUATIONpunctuation.959
5 - Lexical Element 17LEXICONprepBut9799
- Lexical Element 18LEXICONdetthose101105
- Lexical Element 20LEXICONadjfollow-up108116
- Lexical Element 23LEXICONnountests118122
- Lexical Element 24LEXICONauxhave124127
16Java NLP Tools Lexical Lookup
- // Create a LexicalLookupAPI object
- LexicalLookupAPI look new LexicalLookupAPI(argv)
- // Chunk the file
- Document aDocument look.processDocument( aFile
) - Vector les aDocument.getLexicalElements()
- int numberOfLexElements les.size()
- LexicalElement aLexElement null
- // Print the LexicalElements out
- for (int i 0 ilt numberOfLexElements i )
- aLexElement (LexicalElement) les.get(i)
- System.out.println(aLexElement.toPipedString())
17NLP Tools NpParser
- Chunks sentences
- into simple phrases
18Java NLP Tools NpParser
- Usage
- npParser.batsh Options
- --fileNamefileName
- --outputFileNamefileName
- --inputTypefreeTextHTMLmedlineCitations
- --sections
- --sentences
- --phrases--nps--mincoMan
- --lexicalElements
- --lexicalEntries
- --tokens
- --pipedOutput
-
19Java NLP Tools NpParser
npParser.bat --inputFile5.txt --inputTypefreeTex
t --phrases --pipedOutput
- Phrase0010The companycompany
- Phrase11214has
- Phrase21624forwarded
- Phrase32639some materialsmaterials
- Phrase44162to a state laboratorystate
laboratory - Phrase56474in RichmondRichmond
- Phrase67686for furtherfurther
- Phrase78894testing
20Java NLP Tools NpParser
- // Create a Parser object
- Parser parser new Parser( argv )
- // Parse the file
- Document aDocument parser.processDocument(aFile)
- Vector phrases aDocument.getPhrase()
- Int numberOfPhrases phrases.size()
- Phrase aPhrase null
- // Print the Phrases out
- for ( int i 0 i lt numberOfPhrases i )
- aPhrase (Phrase) phrases.get(i)
- System.out.println( aPhrase.toPipedString() )
21MMTxMetaMapTechnology Transfer
- Maps text phrases to Metathesaurus
- concepts
- Java
- Implementation
- of MetaMap
22MMTx
Document
Tokenization
POS Tagger Client
Lexical Lookup
Parser
Variant Generation
Candidate Retrieval
Evaluation
Final Mapping
Post-processing Presentation
23MMTx
- Usage
- MMTx ltoptionsgt --fileNameinfile
outputFileNameoutfile - --strict_model--moderate_model--relaxed_model
- --KSYearyear--mm_data_versioncustomName
- --thresholdlowestScore
- --truncate_candidates_mappings
- --term_processing--allow_overmatches--allow_co
ncept_gaps - --composite_phrases
- --prefer_multiple_concepts
- --fielded_output
24MMTx
MMTx --inputFile5.txt --inputTypefreeText
- Processing 00000000.tx.3 One problem is caused
by the VecTest itself, - which uses a dipstick to measure the presence of
a protein - associated with the parasite that causes malaria.
- Phrase "One problem"
- Meta Candidates (2)
- 861 Problem, NOS Finding,Pathologic Function
- 694 One Quantitative Concept
- Meta Mapping (888)
- 694 One Quantitative Concept
- 861 Problem, NOS Finding,Pathologic Function
25MMTx
- // Create a MMTxAPI object
- MMTxAPI mmtx new MMTxAPI( argv )
- // Analyze the file
- Document aDocument mmtx.processDocument(aFile)
- Vector phrases aDocument.getPhrases()
- int numberOfPhrases phrases.size()
- Token aPhrase null
- // Print the Phrases out
- for ( int i 0 i lt numberOfPhrases i )
- aPhrase (Phrase) phrases.get(i)
- finalConcepts aPhrase.getFinalMappings()
26Useful Text Feature Classes
Many-to-one Relationlship
27GSpell
28GSpell
- Spelling suggestion tool
- Pure Java application with Java APIs
- Support for multi word dictionary entries
29GSpell Usage
- Usage
- GSpellFind.shbat
- --dictionaryNameOfDictionary
- --inputFileSource --outputFiletarget
- --truncateN --considerNCandidatesN
- --maxEditDistanceN
- --fieldedText --termFieldX
--correctFieldY - --reportTime --version--help
30GSpell Example
- anonomousanonymous1.00.8734230160180236NGrams
- anonomousallonomous2.00.5819672267388108NGram
s - anonomousautonomous2.00.5819672267388108NGram
s - anonomousanadromous3.00.2958160192082048NGram
s - anonomousanalogous3.00.2958160192082048NGrams
- anonomousanomalous3.00.2958160192082048NGrams
- anonomousanonymously3.00.295816019208248NGram
s - anonomousanonymes3.00.2958160192082048Metapho
ne - anonomousanonyms3.00.2958160192082048Metaphon
e - anonomousacoprous4.00.11470810702102521NGrams
31GSpell Indexing
- Usage
- GSpellIndex.shbat
- --dictionaryNameOfDictionary
- --inputFileSourceFile
- --reportTime --version--help
- Format for the input file
- One word per line
32GSpell Developers Guide
- import gov.nih.nlm.nls.gspell.GSpell //
lt-------These come from the gspell.jar - import gov.nih.nlm.nls.gspell.Candidate
- GSpell gspell new GSpell( _dictionaryName,
GSpell.READ_ONLY ) - candidates gspell.find( aTerm )
- if ( candidates ! null )
- for ( int i 0 i lt candidates.length i )
- System.out.println(candidatesi.toString())
- else
- System.out.println("No Suggestions")
- gspell.cleanup()
33Downloadable Resources
- umlsks.nlm.nih.gov
- umlsLex.nlm.nih.gov
- Lvg
- Java NLP Tools
- GSpell
- mmtx.nlm.nih.gov
- Requires a UMLS Licience Aggreement
34Lexical Tools for UMLS Developers
November 10, 2002 Allen C. Browne, Guy Divita,
Chris Lu Lister Hill National Center for
Biomedical Communications National Library of
Medicine
Lexical Systems
umlsLex.nlm.nih.gov Email
umlslex_at_nlm.nih.gov Knowledge Source
Server http//umlsks.nlm.nih.gov UMLS
Information http//umlsInfo.nlm.nih.go
v
35Appendix
- NormExample.java
- LvgExampleEasy.java
- LvgExampleHarder.java
- LvgExampleEvenHarder.java
- TokenizeExample.java
- LexicalLookupExample.java
- NpParserExample.java
- MMTxExample.java
- GSpellExample.java
- 5.txt
- 5.tokenized
- 5.lexicalLookuped
- 5.parsed
- 5.mmtxed
36(No Transcript)