Title: ParaMor
1ParaMor
Across Morphology
Christian Monson
2Turkish Morphology Beads on a String
present progressive
2nd person singular
take
passive
negative
You are not being taken
3Turkish Morphology Beads on a String
götür
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
You are not being taken
4Applications of Computational Morphology
- Machine Translation
- Turkish-English (Oflazer, 2007)
- Czech-English (Goldwater and McClsky, 2005)
- Speech Recognition
- Finnish (Creutz, 2006)
- Information Retrieval
5Challenges of Computational Morphology
- Time Consuming
- Kemal Oflazer estimates
- 3-4 months to build basic Turkish analyzer
- Plus lexicon development and maintenance
- Expertise Needed
- Greenlandic
- Official language of Greenland
- Agglutinative Inuit language
- 50,000 speakers
- Per Langaard
6The Solution
Raw Text
Unsupervised Morphology Induction
7Paradigms The Structure of Morphology
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
8Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
sun
üyor
present progressive
2nd person singular
take
passive
negative
9Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
present progressive
take
passive
negative
1st person singular
10Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
present progressive
take
passive
negative
3rd person singular
11Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
uz
present progressive
take
passive
negative
1st person plural
12Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
um
Ø
uz
present progressive
take
passive
negative
13Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
take
passive
negative
future
14Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
take
passive
negative
15Paradigms The Structure of Morphology
Tense Mood
Person Number
Stem
Voice
Polarity
ül
m
um
üyor
yecek
um
Ø
uz
16Paradigms The Structure of Morphology
ül
m
um
üyor
yecek
um
Ø
uz
- Paradigm
- Set of mutually replaceable strings
17Paradigms The Structure of Morphology
Paradigms
ül
m
um
üyor
yecek
um
Ø
uz
- Paradigm
- Set of mutually replaceable strings
18Paradigms The Structure of Morphology
Paradigm
ül
m
um
üyor
yecek
um
Ø
uz
- Paradigm
- Set of mutually replaceable strings
19Paradigms The Structure of Morphology
Paradigm
ül
m
um
üyor
yecek
um
Ø
uz
- Paradigm
- Set of mutually replaceable strings
20Overview
- ParaMor
- Unsupervised morphology induction system
21Overview
- ParaMor
- Unsupervised morphology induction system
- Evaluation Methodology
22Overview
- ParaMor
- Unsupervised morphology induction system
- Evaluation Methodology
- Results
23The ParaMor Algorithm
24The ParaMor Algorithm
- Identify paradigms in 3 steps
25The ParaMor Algorithm
- Identify paradigms in 3 steps
- Search for candidate paradigms
26The ParaMor Algorithm
- Identify paradigms in 3 steps
- Search for candidate paradigms
- Cluster candidates modeling the same paradigm
27The ParaMor Algorithm
- Identify paradigms in 3 steps
- Search for candidate paradigms
- Cluster candidates modeling the same paradigm
- Filter
28The ParaMor Algorithm
- Paradigm discovery in 3 steps
- Search for candidate paradigms
- Cluster candidates modeling the same paradigm
- Filter
- Segment words
- using the discovered paradigms
29Search for Candidate Paradigms
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 23
Ø da das do dos n ndo r ron 118
strada stradas strado strar stró 7
a an ar ó 353
rada radas rado rados 53
Ø do n r 354
a as o os 892
strada strado strar stró 8
a an ar 413
rada rado rados 67
Ø n r 509
a o os 1410
strada strado stró 9
Ø r s 287
a an 1049
rada rado 89
Ø n 1874
a o 2304
Ø s 5501
strada strado 12
Ø es 874
strado 15
rado 167
an 1786
n 6051
a 8981
s 10662
es 2751
...
...
30Search for Candidate Paradigms
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 167
Ø da das do dos n ndo r ron 6051
rada radas rado rados 167
strada stradas strado strar stró 7
a an ar ó 1786
Ø do n r 6051
a as o os 8981
strada strado strar stró 8
rada rado rados 167
a an ar 1786
Ø n r 6051
a o os 8981
Ø r s 287
strada strado stró 9
a an 1786
rada rado 167
Ø n 6051
a o 8981
Ø s 5501
Ø es 10662
strada strado 12
strado 15
rado 167
an 1786
n 6051
a 8981
s 10662
es 10662
...
31a
a ada adas ado ados an ar aron ó 1786
ra rada radas rado rados ran rar raron ró 167
Ø da das do dos n ndo r ron 6051
a ada ado ados an ar aron ó 1786
ra rada radas rado rados rar raron ró 167
Ø da das do dos n r ron 6051
rada radas rado rados rar raron ró 167
trada tradas trado trados trar traron tró 167
a ada ado an ar aron ó 1786
Ø da do dos n r ron 6051
a ado an ar aron ó 1786
Ø da do n r ron 6051
trada tradas trado trados trar tró 167
rada radas rado rados rar ró 167
trada tradas trado trar tró 30
strada stradas strado strar stró 7
a ado an ar ó 1786
rada radas rado rados rar 167
Ø do n r ron 6051
a an ar ó 1786
rada radas rado rados 167
Ø do n r 6051
a as o os 8981
trada trado trar tró 30
strada strado strar stró 8
Ø r s 287
a an ar 1786
rada rado rados 167
Ø n r 6051
a o os 8981
trada trado tró 30
strada strado stró 9
a an 1786
rada rado 167
Ø n 6051
a o 8981
Ø s 5501
trada trado 30
strada strado 12
Ø es 10662
strado 15
trado 30
rado 167
an 1786
n 6051
a 8981
s 10662
es 10662
...
...
...
32a
17 a aba aban ada adas ado ados an ando ar ara
aron arse ará arán aría ó Cosine Similarity
0.715 532 Covered Types
15 Suffixes a aba aban ada adas ado ados an ando
ar aron arse ará arán ó 25 Stems anunci, aplic,
apoy, celebr, consider, 375 Covered Types
16 a aba ada adas ado ados an ando ar ara aron
arse ará arán aría ó Cosine Similarity 0.664 451
Covered Types
15 Suffixes a aba ada adas ado ados an ando ar
aron arse ará arán aría ó 22 Stems anunci, aplic,
apoy, celebr, concentr, 330 Covered Types
15 Suffixes a aba ada adas ado ados an ando ar
ara aron arse ará arán ó 23 Stems anunci, apoy,
confirm, consider, declar, 345 Covered Types
33a
F1
ParaMor Morfessor
Bernhard 2
Morfessor
ParaMor
34a
ParaMor
Identify
Search Cluster Filter
Segment
35a
ParaMor
Identify
Search Cluster Filter
Segment
36Morphology in NLP
sun
götür
ül
m
um
üyor
present progressive
2nd person singular
take
passive
negative
You are not being taken