Additional NLS Tools - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Additional NLS Tools

Description:

Create a TokenizeAPI object == TokenizeAPI tokenizer = new TokenizeAPI( argv ) ... object == LexicalLookupAPI look = new LexicalLookupAPI(argv) ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 37
Provided by: Div62
Category:
Tags: nls | additional | argv | tools

less

Transcript and Presenter's Notes

Title: Additional NLS Tools


1
Additional NLS Tools
  • Knowledge Source Server Java Client API
  • NLSs Java NLP tools
  • MMTx
  • GSpell

2
Knowledge Source Server Java Client API
  • XML over RMI
  • Java UMLS Object Model

3

Chapter 5. Building UMLSKS Software
Applications Chapter Contents 5.1 Building and
Running Your Program5.2 API Package
Structure5.3 Program Initialization5.4 UMLSKS
API Functions 5.5 Using the UMLSKS Object Model
4
Knowledge Source Server Java Client API

5
Knowledge Source Server Java Client API
// Initialize the client
KSSRetrieverV2_1 retriever
(KSSRetrieverV2_1) Naming.lookup("//umlsks.nlm.
nih.gov/KSSRetriever") //
Send a request to client char result
retriever.findBasicConcept(ksYear,
termName, sabs,
language, KSSRetriever.NormalizeStr
ing, false ) //
Convert the XML into ...
ConceptVector concepts ConceptVector.getInst
ance(
String.valueOf(result))

6
Knowledge Source Server Java Client API
  • ltconceptgt
  • ltcuigtC0032615lt/cuigt
  • ltcngtFatty Acids, Polyunsaturatedlt/cngt
  • lttermgt
  • ltluigtL0032615lt/luigt
  • lttngtFatty Acids, Polyunsaturatedlt/tngt
  • lttsgtPlt/tsgt
  • ltlatgtENGlt/latgt
  • lttermVariantgt
  • ltsuigtS0010240lt/suigt
  • ltsttgtVWlt/sttgt
  • ltstrgtAcids, Polyunsaturated
    Fattylt/strgt
  • ltstrSourcegt
  • ltsabgtMSH2002lt/sabgtltttygtPMlt/ttygtltscd
    gtD005231lt/scdgtltsrlgt0lt/srlgt

7
NLS Java NLP Tools
  • Tokenizer
  • Lexical Lookup
  • NP Parser
  • Document Centric
  • Java Programs
  • and APIs

8
Java NLP Tools Tokenizer
Document
  • Tokenizes text into
  • Sections (paragraphs)
  • Sentences
  • Tokens
  • Can handle
  • FreeText
  • HTML
  • MedLINE Abstracts

Sections
Section 1
Sentences
Sentence 1
Tokens
Token 1
9
Java NLP Tools Tokenizer
  • Usage
  • tokenize.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --tokens
  • --pipedOutput
  • --indicate_citation_end

10
Java NLP Tools Tokenizer
tokenize.bat --inputFile5.txt --inputTypefreeTex
t --sentences --tokens
--pipedOutput
  • Sentence197182But those follow-up tests have
    been inconclusive, state and federal officials
    said.
  • Token16979900But
  • Token1710110510those
  • Token1810811320follow
  • Token1911411420-
  • Token2011511630up
  • Token2111812240tests
  • Token2212412750have
  • Token2312913260been
  • Token2413414570inconclusive

11
Java NLP Tools Tokenizer
  • // Create a TokenizeAPI object
  • TokenizeAPI tokenizer new TokenizeAPI( argv )
  • // Tokenize the file
  • Document aDocument
  • tokenizer.processDocument( aFile)
  • Vector tokens aDocument.getTokens()
  • int numberOfTokens tokens.size()
  • Token aToken null
  • // Print the tokens out
  • for ( int i 0 i lt numberOfTokens i )
  • aToken (Token) tokens.get(i)
  • System.out.println( aToken.toPipedString() )

12
NLP Tools Lexical Lookup
Document
  • Chunks tokens into
  • terms
  • From SPECIALIST
  • Lexicon
  • From regular
  • expressions

Sections
Section 1
Sentences
Sentence 1
LexicalElements
Lexical Element 1
Tokens
13
Java NLP Tools Lexical Lookup
  • Usage
  • LexicalLookup.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --lexicalElements
  • --lexicalEntries
  • --tokens
  • --pipedOutput

14
Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
  • Lexical Element17LEXICONprepBut9799
  • LexicalEntrybutconjbaseE0014465
  • LexicalEntrybutprepbaseE0014464
  • Lexical Element18LEXICONdetthose101105
  • LexicalEntrythosedetpluralE0060728
  • LexicalEntrythosepronbaseE0060729
  • Lexical Element20LEXICONadjfollow-up108116
  • LexicalEntryfollow-upadjbaseE0028422
  • Lexical Element23LEXICONnountests118122
  • LexicalEntrytestsverbpres3sE0060349
  • LexicalEntrytestsnounpluralE0060348

15
Java NLP Tools Lexical Lookup
LexicalLookup.bat --inputFile5.txt
--inputTypefreeText
--lexicalElements --lexicalEntries --pipedOutput
  • Lexical Element12SHAPEUnlabeledunknownRichmon
    d6774
  • Lexical Element13LEXICONprepfor7678
  • Lexical Element14LEXICONadjfurther8086
  • Lexical Element15LEXICONverbtesting8894
  • Lexical Element 16PUNCTUATIONpunctuation.959
    5
  • Lexical Element 17LEXICONprepBut9799
  • Lexical Element 18LEXICONdetthose101105
  • Lexical Element 20LEXICONadjfollow-up108116
  • Lexical Element 23LEXICONnountests118122
  • Lexical Element 24LEXICONauxhave124127

16
Java NLP Tools Lexical Lookup
  • // Create a LexicalLookupAPI object
  • LexicalLookupAPI look new LexicalLookupAPI(argv)
  • // Chunk the file
  • Document aDocument look.processDocument( aFile
    )
  • Vector les aDocument.getLexicalElements()
  • int numberOfLexElements les.size()
  • LexicalElement aLexElement null
  • // Print the LexicalElements out
  • for (int i 0 ilt numberOfLexElements i )
  • aLexElement (LexicalElement) les.get(i)
  • System.out.println(aLexElement.toPipedString())

17
NLP Tools NpParser
  • Chunks sentences
  • into simple phrases

18
Java NLP Tools NpParser
  • Usage
  • npParser.batsh Options
  • --fileNamefileName
  • --outputFileNamefileName
  • --inputTypefreeTextHTMLmedlineCitations
  • --sections
  • --sentences
  • --phrases--nps--mincoMan
  • --lexicalElements
  • --lexicalEntries
  • --tokens
  • --pipedOutput

19
Java NLP Tools NpParser
npParser.bat --inputFile5.txt --inputTypefreeTex
t --phrases --pipedOutput
  • Phrase0010The companycompany
  • Phrase11214has
  • Phrase21624forwarded
  • Phrase32639some materialsmaterials
  • Phrase44162to a state laboratorystate
    laboratory
  • Phrase56474in RichmondRichmond
  • Phrase67686for furtherfurther
  • Phrase78894testing

20
Java NLP Tools NpParser
  • // Create a Parser object
  • Parser parser new Parser( argv )
  • // Parse the file
  • Document aDocument parser.processDocument(aFile)
  • Vector phrases aDocument.getPhrase()
  • Int numberOfPhrases phrases.size()
  • Phrase aPhrase null
  • // Print the Phrases out
  • for ( int i 0 i lt numberOfPhrases i )
  • aPhrase (Phrase) phrases.get(i)
  • System.out.println( aPhrase.toPipedString() )

21
MMTxMetaMapTechnology Transfer
  • Maps text phrases to Metathesaurus
  • concepts
  • Java
  • Implementation
  • of MetaMap

22
MMTx
Document
Tokenization
POS Tagger Client
Lexical Lookup
Parser
Variant Generation
Candidate Retrieval
Evaluation
Final Mapping
Post-processing Presentation
23
MMTx
  • Usage
  • MMTx ltoptionsgt --fileNameinfile
    outputFileNameoutfile
  • --strict_model--moderate_model--relaxed_model
  • --KSYearyear--mm_data_versioncustomName
  • --thresholdlowestScore
  • --truncate_candidates_mappings
  • --term_processing--allow_overmatches--allow_co
    ncept_gaps
  • --composite_phrases
  • --prefer_multiple_concepts
  • --fielded_output

24
MMTx
MMTx --inputFile5.txt --inputTypefreeText
  • Processing 00000000.tx.3 One problem is caused
    by the VecTest itself,
  • which uses a dipstick to measure the presence of
    a protein
  • associated with the parasite that causes malaria.
  • Phrase "One problem"
  • Meta Candidates (2)
  • 861 Problem, NOS Finding,Pathologic Function
  • 694 One Quantitative Concept
  • Meta Mapping (888)
  • 694 One Quantitative Concept
  • 861 Problem, NOS Finding,Pathologic Function

25
MMTx
  • // Create a MMTxAPI object
  • MMTxAPI mmtx new MMTxAPI( argv )
  • // Analyze the file
  • Document aDocument mmtx.processDocument(aFile)
  • Vector phrases aDocument.getPhrases()
  • int numberOfPhrases phrases.size()
  • Token aPhrase null
  • // Print the Phrases out
  • for ( int i 0 i lt numberOfPhrases i )
  • aPhrase (Phrase) phrases.get(i)
  • finalConcepts aPhrase.getFinalMappings()

26
Useful Text Feature Classes
Many-to-one Relationlship
27
GSpell
28
GSpell
  • Spelling suggestion tool
  • Pure Java application with Java APIs
  • Support for multi word dictionary entries

29
GSpell Usage
  • Usage
  • GSpellFind.shbat
  • --dictionaryNameOfDictionary
  • --inputFileSource --outputFiletarget
  • --truncateN --considerNCandidatesN
  • --maxEditDistanceN
  • --fieldedText --termFieldX
    --correctFieldY
  • --reportTime --version--help

30
GSpell Example
  • anonomousanonymous1.00.8734230160180236NGrams
  • anonomousallonomous2.00.5819672267388108NGram
    s
  • anonomousautonomous2.00.5819672267388108NGram
    s
  • anonomousanadromous3.00.2958160192082048NGram
    s
  • anonomousanalogous3.00.2958160192082048NGrams
  • anonomousanomalous3.00.2958160192082048NGrams
  • anonomousanonymously3.00.295816019208248NGram
    s
  • anonomousanonymes3.00.2958160192082048Metapho
    ne
  • anonomousanonyms3.00.2958160192082048Metaphon
    e
  • anonomousacoprous4.00.11470810702102521NGrams

31
GSpell Indexing
  • Usage
  • GSpellIndex.shbat
  • --dictionaryNameOfDictionary
  • --inputFileSourceFile
  • --reportTime --version--help
  • Format for the input file
  • One word per line

32
GSpell Developers Guide
  • import gov.nih.nlm.nls.gspell.GSpell //
    lt-------These come from the gspell.jar
  • import gov.nih.nlm.nls.gspell.Candidate
  • GSpell gspell new GSpell( _dictionaryName,
    GSpell.READ_ONLY )
  • candidates gspell.find( aTerm )
  • if ( candidates ! null )
  • for ( int i 0 i lt candidates.length i )
  • System.out.println(candidatesi.toString())
  • else
  • System.out.println("No Suggestions")
  • gspell.cleanup()

33
Downloadable Resources
  • umlsks.nlm.nih.gov
  • umlsLex.nlm.nih.gov
  • Lvg
  • Java NLP Tools
  • GSpell
  • mmtx.nlm.nih.gov
  • Requires a UMLS Licience Aggreement

34
Lexical Tools for UMLS Developers
November 10, 2002 Allen C. Browne, Guy Divita,
Chris Lu Lister Hill National Center for
Biomedical Communications National Library of
Medicine
Lexical Systems
umlsLex.nlm.nih.gov Email
umlslex_at_nlm.nih.gov Knowledge Source
Server http//umlsks.nlm.nih.gov UMLS
Information http//umlsInfo.nlm.nih.go
v
35
Appendix
  • NormExample.java
  • LvgExampleEasy.java
  • LvgExampleHarder.java
  • LvgExampleEvenHarder.java
  • TokenizeExample.java
  • LexicalLookupExample.java
  • NpParserExample.java
  • MMTxExample.java
  • GSpellExample.java
  • 5.txt
  • 5.tokenized
  • 5.lexicalLookuped
  • 5.parsed
  • 5.mmtxed

36
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com