Title: Creating Morphological Data: From Markup to Generalizations
1Creating Morphological Data From Markup to
Generalizations
- Mike MaxwellSIL International Web-Based
Language Documentation and Description12-15
December 2000, Philadelphia, USA.
2Tools for lesser known languages
- Languages for which no adequate computer
processing is being developed, risk gradually
losing their place in the global Information
Society, or even disappearing, together with the
cultures they embody, to the detriment of one of
humanitys great assets its cultural diversity. - Zampolli and Varile, Forward to the Survey of
the state of the art in human language
technology. (1997 xvi)
3Methods for creating computational grammars
- Linguistic Expert approach
- Machine Learning approach
- Middle road?
- Non-expert linguists
- On-line tools and help
- Machine testing of hypotheses
4What would ideal morphology tools do?
- Handle complex morphologies, including
- Morphological processes (reduplication, infixes,
ablaut, suprafixes) - Phonological and morphosyntactic features
- Multidimensional paradigms
- Complex affix ordering
- Automatically turn raw text data into a
morphological grammar and lexicon
5Next best tools would
- Handle complex morphologies
- Let the user concentrate on the task at hand,
e.g. morpheme-level markup - Learn from the user
- Help diagnose incorrect parses or missing parses
- Display the current state of the grammar
- Provide easy migration path between stages of
grammatical analysis - with non-expert users
6Morphology tools should...
- Handle morphological processes
- Reduplication
- Infixes
- Ablaut
- Suprafixes
- Issue for parser/ generator and data model
7Morphology tools should...
- Handle Morphosyntactic features
- Feature systems should be transparent to
non-expert user - Solution Grammatical Gloss List incorporating
morphosyntactic features
8Grammatical Gloss List
9Grammatical Gloss List
10Grammatical Gloss List
11Grammatical Gloss List
12Grammatical Gloss List
13Grammatical Gloss List
14Grammatical Gloss List
15Grammatical Gloss List
16Grammatical Gloss List
17Grammatical Gloss List
18Grammatical Gloss List
19Grammatical Gloss List
20Grammatical Gloss List
21Grammatical Gloss List
22Grammatical Gloss List
23Grammatical Gloss List
24Grammatical Gloss List
25Grammatical Gloss List Summary
- Use Grammatical Gloss List to hide
morphosyntactic feature system - Feature system remains accessible to advanced
users - Interlinear glossing builds language-specific
feature system and feeds lexicon - Possibly similar solution for phonological
features
26Morphology tools should...
- Learn from the user
- Lexical information
- Grammatical information
- Example interlinear text glossing
27Interlinear Text
28Interlinear Text
29Interlinear Text
30Interlinear Text
31Interlinear Text
32Interlinear Text
33Interlinear Text
34Interlinear Text
35Interlinear Text
36Interlinear Text
37Interlinear Text Summary
- Parser automatically picks up some information
- Lexical entries for new morphemes (form,
prefix/suffix/stem status, category of stems,
gloss/ morphosyntactic features) - Preferred parse of wordforms
- User can designate incorrect parses, and diagnose
them
38Morphology tools should...
- Help diagnose incorrect parses or missing parses
39Morphology tools should...
40Morphology tools should...
- Handle complex multidimensional paradigms
- Paradigms useful for finding
- missing inflectional affixes
- co-occurrence restrictions among inflectional
affixes - syncretism
- allomorph constraints
- Solution Paradigm Charting tool
41Paradigm Charting Tool
42Paradigm Charting Tool
43Paradigm Charting Tool
44Paradigm Charting Tool
45Paradigm Charting Tool
46Paradigm Charting Tool
47Paradigm Charting Tool
48Paradigm Charting Tool
49Paradigm Charting Tool
50Paradigm Charting Tool
51Paradigm Charting Tool
52Paradigm Charting Tool
53Generated paradigm chart
54Paradigm Charting Tool Summary
- Dimensions selected from language-specific
feature gloss list/ morphosyntactic feature
system - Cell fillers from attested wordforms and/or
generator
55Morphology tools should...
- Help determine affix ordersExample inflectional
templates
56Create an Inflectional Template
57Create an Inflectional Template
58Morphology tools should...
- Allow the user to debug grammar
- Turn rules off and on
- Change rule order
- Trace the parse
- Compare traces after changing the grammar
59Tracing and Debugging
60Tracing and Debugging
61Tracing and Debugging
62Tracing and Debugging
63Tracing and Debugging Summary
- Easier to understand generation traces
(derivations) than parse traces - Diffing derivations still messy
64Morphology tools should...
- Help the user see the current state of the
grammar - Dump the grammar out as XML and apply a
customizable style sheet.
65Grammar Write-up View
66Grammar Write-up View
67Grammar Write-up View
68Grammar Write-up View
69Grammar Write-up View
70Grammar Write-up View Summary
- Grammar information can be dumped out in text/
chart form - Suitable for web
- Could benefit from NL generation techniques
71Morphology tools should...
- Provide easy migration path between stages of
analysis - Allow the user to capture adhoc restrictions--but
let the user come back to those restrictions to
look for generalizations
72Migration Path
73Migration Path
74Migration Path
75Migration Path
- Going from observations to generalizations, e.g.
- Ad hoc co-occurrence restrictions among
allomorphs - Lexically listed allomorphs with phonological
restrictions - Underlying forms with phonological rules to
derive allomorphs - Parsing requires observational adequacy, not
necessarily descriptive adequacy - Database support for finding less-than-adequate
areas of analysis
76Symbiosis
- The information processing equipment, for its
part, will convert hypotheses into testable
models and then test the models against data
(which the human operator may designate roughly
and identify as relevant when the computer
presents them for his approval). The equipment
will answer questions. It will simulate the
mechanisms and models, carry out procedures, and
display the results to the operator... In
general, it will carry out the routinizable,
clerical operations that fill the intervals
between decisions Finally, it will do as much
diagnosis, pattern matching, and relevance
recognizing as it profitably can, but it will
accept a clearly secondary status in those areas. - J.C.R. Licklider, Man-Computer Symbiosis.
(1960)
77Current Status
- Morphology modeling nearly complete
- Testing model with real data
- Evaluating parsers/ generators
- Developing tools to diagnose missing/ incorrect
parses - Designing migration path tools