Morphological%20Processing%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Morphological%20Processing%20

Description:

Some NLP apps do this (e.g., AZ Noun Phraser (Tolle 2001)) FSAs and NLP ... Metrics widely adopted in Stat NLP. Precision and Recall. Take a given stemming task ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 15
Provided by: facultyWa4
Category:

less

Transcript and Presenter's Notes

Title: Morphological%20Processing%20


1
Morphological Processing Stemming
  • Using FSAs/FSTs

2
FSAs and Morphology
  • Can be used to validate/recognize input string
  • For example, consider the Spanish conjugation for
    amar in JM p. 64
  • What would a FSA look like the would recognize
    the input?

a
3
5
s
am
e
1
2

m
4
6

3
FSTs and Morphology
  • An FST could output information about the input,
    such as a translation or grammatical info

a
e
ae
3
eimpf
amlove
oe
1
2

7
4
FSAs and NLP
  • Why even use FSAs in NLP?
  • Memory and storage are cheap
  • Build one large lexicon
  • List all entries and reqd output
  • amo amas ames
  • love love love
  • pres ind pres impf pres subj
  • Some NLP apps do this (e.g., AZ Noun Phraser
    (Tolle 2001))







5
FSAs and NLP
  • For more morphologically complex languages, one
    big lexicon not feasible
  • Consider Hungarian and Finnish
  • One verbal form
  • Hundreds of possible inflections
  • Millions of resulting forms
  • A complete word lexicon not feasible
  • Morphological processing essential

6
Hungarian
  • Consider one concept/word in Hungarian
  • haz house
  • hazat house (object)
  • haznak of the house
  • hazzal with the house
  • hazza into a house
  • hazba into the house
  • hazra to the house

7
Hungarian
  • Now consider plural inflections
  • hazak houses
  • hazakat houses (object)
  • hazaknak of the houses
  • hazakzal with the houses
  • hazakza into a houses
  • hazakba into the houses
  • hazakra to the houses

8
Hungarian
  • And possessives
  • hazaim my houses
  • hazaimat my houses (object)
  • hazaimnak of the houses
  • hazaimzal with the houses
  • hazaimza into a houses
  • hazaimba into the houses
  • hazaimra to the houses

9
Stop
10
Stemming
  • Used in many IR applications
  • For building equivalence classes
  • Connect
  • Connected
  • Connecting
  • Connection
  • Connections
  • Porter Stemmer, simple and efficient
  • Website http//www.tartarus.org/martin/PorterS
    temmer

Same class suffixes irrelevant
11
Stop
12
Stemming and Performance
  • Does stemming help IR performance?
  • Harman 91 indicated that it hurt as much as it
    helped
  • Krovetz 93 shows that stemming does help
  • Porter-like algorithms work well with smaller
    documents
  • Krovetz proposes that stemming loses information
  • Derivational morphemes tell us something that
    helps identify word senses (and helps in IR)
  • Stemming them information loss

13
Evaluating Performance
  • Measures of Stemming Performance rely on similar
    metrics used in IR
  • Precision measure of the proportion of selected
    items the system got right
  • precision tp / (tp fp)
  • Recall measure of the proportion of the target
    items the system selected
  • recall tp / (tp fn)
  • Rule of thumb as precision increases, recall
    drops, and vice versa
  • Metrics widely adopted in Stat NLP

14
Precision and Recall
  • Take a given stemming task
  • Suppose there are 100 words that could be stemmed
  • A stemmer gets 52 of these right (tp)
  • But it inadvertently stems 10 others (fp)
  • Precision 52 / (52 10) .84
  • Recall 52 / (52 48) .52
Write a Comment
User Comments (0)
About PowerShow.com