Title: Finite State Morphological Parsing
1Finite State Morphological Parsing
2Concepts
- Input Output
- Generating cat cats
- Recognizing cats yes
- Parsing cats N pl
3Definitions
- Cascading running rules in series with the
output of one feeding the input of the second.
hope -gt hope zpl - hope zpl -gt hope s/-vc__
- Composing converting the cascade into a single
two level transducer. - Stemming removing affixes
4English Inflectional Morphology
- Adjective comparative (-er)
- superlative (-est)
- Noun - plural (-s)
- possessive (s, s)
- Verb - 3rd pers sg (shows)
- pres. part (showing)
- past tense (showed)
- past part (shown)
5Rules for English Inflection
- Mostly local rules involving spelling
- fox, foxes
- beg, begging
- busy, busier
- Some long distance rules
- beautiful, beautifuler
6English Derivational Morphology
- Semantic constraints on affixes
- un happy unhappy
- un big unbig
- child ish childish
- spoon ish spoonish
- spoon ful spoonful
- child ful childful
7Morphological Complexity
- English is a morphologically finite language
- All inflectional forms and derivational forms can
be enumerated in a lexicon without much cost - This is not so for agglutinative languages like
Turkish, Hungarian, and Finnish
8Turkish example
- uygarlastiramadiklarimizdanmissinizcasina
- uygarlastiramadiklarimizdanmissinizcasin
a - Behaving as if you are among those whom we could
not cause to become civilized - And this does not include the derivational
affixes! - Consider that the affixes for behave, cause,
become could all recur in this example, and you
can see the possibility of an infinite number of
words an uncomputable problem.
9Template Morphology - Arabic
10Templatic Morphology
- Far more complex than English
- But, unlike agglutination, it does not have the
potential for an infinite number of words
11Representing spoonish and
childful wksht 2
12An FSA for English Nouns with inflection
13An FSA for English Nouns with inflection
- In the previous slide, the FSA gives all the
letters the same status. - In particular, the s is not identified as a
separate morpheme. - This FSA is only working on one level the
surface level.
14Two FSAs working in parallel a Finite State
Transducer
Generation goes from Lexical to
Surface Recognition and Parsing goes from
Surface to Lexical
15An FST for English Nouns with inflection
16FST Conventions
- Symbols
- (morpheme boundary)
- (word boundary)
- e (empty element null)
- _at_ (any element)
- Input output pairs are represented as
- Inputoutput, e.g., PLs
17An Intermediate Level
- The FST on the previous slide contains both
morphological (s) and part-of-speech (N)
information. This necessitates two levels below
the surface.
marks morpheme boundary marks word boundary
18Implementing an FST Rule
19e insertion rule for kisses, foxes,
klutzes wksht 3
20A State Transition Table Example
- State Transition Table  Â
- Input State 1 0
- S1 S1 S2
- S2 S2 S1
-
- All the possible inputs to the FSA are enumerated
across the columns of the table. - All the possible states are enumerated across the
rows. - From the state transition table, it is easy to
see that if the FSA is in S1 (the first row), and
the next input is character 1, the FSA will stay
in S1. If a character 0 arrives, the machine will
transition to S2 as can be seen from the second
column. In the diagram this is shown by the arrow
from S1 to S2 labeled with a 0.
1
1
0
0
21The State Transition Table for the -e insertion
rule(path for kisses starting at the second s)
22PC-KIMMO Sample Rule
- RULE "Voicing sz ltgt V___V" 4 4
- V s V _at_ lexical level of states
- V z _at_ _at_ surface level of columns
- 1 2 0 1 1
- 2 2 4 3 1
- 3 0 0 1 1
- 4. 2 0 0 0
- State transitions defined for each state
states
23Problems with FST Analysis
- Local ambiguity asses(s)
- Requires a solution to handle non-determinism
(back-up, look-ahead, parallelism) - Long distance dependencies beautifuler
- The rule that determines when to add er depends
on syllable count, which is outside the scope of
Finite State machines