Title: Minimally Supervised Morphological Analysis by Multimodal Alignment
1Minimally Supervised Morphological Analysis by
Multimodal Alignment
- David Yarowsky
- and
- Richard Wicentowski
2Introduction
- The Algorithm capable of inducing inflectional
morphological analyses of regular and highly
irregular forms. - The Algorithm combines four original alignment
models based on - Relative corpus frequency.
- Contextual Similarity.
- Weighted string similarity.
- Incrementally retrained inflectional transduction
probabilities.
3Lectures Subjects
- Task definition.
- Required and Optional resources.
- The Algorithm.
- Empirical Evaluation.
4Task Definition
- Consider this task as three steps
- Estimate a probabilistic alignment between
inflected forms and root forms. - Train a supervised morphological analysis learner
on a weighted subset of these aligned pairs. - Use the result from step 2 to iteratively refine
the alignment in step 1.
5Example (POS)
6Task Definition cont.
- The target output of step 1
7Required and Optional resources
- For the given language we need
- A table of the inflectional Part of Speech (POS).
- A list of the canonical suffixes.
- A large text corpus.
8Required and Optional resources cont.
- A list of the candidate noun, verb and adjective
roots (from dictionary), and any rough mechanism
for identifying the candidates POS of the
remaining vocabulary. (not based on morphological
analysis). - A list of the consonants and vowels.
9Required and Optional resources cont.
- A list of common function words.
- A distance/similarity tables generated on
previously studied languages.
Not essential
If available
10The Algorithm
- Combines four original alignment models
- Alignment by Frequency Similarity.
- Alignment by Context Similarity.
- Alignment by Weighted Levenshtein Distance.
- Alignment by Morphological Transformation
Probabilities.
11Lemma Alignment by Frequency Similarity
12Lemma Alignment by Frequency Similarity cont.
- This Table is based on relative corpus frequency
13Lemma Alignment by Frequency Similarity cont.
14Lemma Alignment by Frequency Similarity cont.
- A problem the true alignments between
inflections are unknown in advance. - A simplifying assumption the frequency ratios
between inflections and roots is not
significantly different between regular and
irregular morphological processes.
15Lemma Alignment by Frequency Similarity cont.
- Similarity between regular and irregular forms
16Lemma Alignment by Frequency Similarity cont.
- The expected frequency should also be estimable
from the frequency of any of the other
inflectional variants. - VBD/VBG and VBD/VBZ could also be used as
estimators.
17Lemma Alignment by Frequency Similarity cont.
18Lemma Alignment by Context Similarity
- Based on contextual similarity of the candidate
form. - Computing similarity between vectors of weighted
and filtered context features. - Clustering inflectional variants of verbs (e.g.
sipped, sipping, and sip).
19Lemma Alignment by Context Similarity cont.
20Lemma Alignment by Weighted Levenshtein Distance
- Consider overall stem edit distance.
- A cost matrix with initial distance costs
- initially set to (0.5,0.6,1.0,0.98)
21Lemma Alignment by Morphological Transformation
Probabilities
- The goal is to generalize a mapping function via
a generative probabilistic model.
22Lemma Alignment by Morphological Transformation
Probabilities
23Lemma Alignment by Morphological Transformation
Probabilities cont.
unique
24Lemma Alignment by Morphological Transformation
Probabilities cont.
Example
25Lemma Alignment by Morphological Transformation
Probabilities cont.
- Example
- P(solidified solidify, ed, VBD)
- P(y?i solidify, ed, VBD)
- ?1P(y?i ify, ed)
- (1-?1)( ?2P(y?i fy, ed)
- (1-?2)( ?3P(y?i y, ed)
- (1-?3)( ?4P(y?i ed)
- (1-?4) P(y?i)
POS can be deleted
26Lemma Alignment by Model Combination and the
Pigeonhole Principle
- No single model is sufficiently effective on its
own. - The Frequency, Levenshtein and Context Similarity
models retain equal relative weight. - The Morphological Transformation Similarity model
increases in relative weight.
27Lemma Alignment by Model Combination and the
Pigeonhole Principle
28Lemma Alignment by Model Combination and the
Pigeonhole Principle cont.
- The final alignment is based on the pigeonhole
principle. - For a given POS a root shouldn't have more than
one inflection nor should multiple inflections in
the same POS share the same root.
29Empirical Evaluation