Title: Linguistics 187: Grammar Engineering
1Linguistics 187 Grammar Engineering
- Ron Kaplan and Tracy King
2Administrivia
- Schedule Office hours
- Requirements
- Overview
3Applications of Language Engineering
Shallow
Synthesis
Broad
Domain Coverage
Narrow
Deep
Low
High
Functionality
4Grammar engineering for deep processing
- Draws on theoretical linguistics, software
engineering - Theoretical linguistics gt papers
- Generalizations, universality, idealization
(competence) - Software engineering gt programs
- Coverage, interface, QA, maintainability,
efficiency, practicality - Grammar engineering
- GrammarTheory ProgramProgramming language
- Reflect linguistic generalizations
- Respect special cases of ordinary language
- Deal with large-scale interactions
- Theory/practice trade-offs
5What is a shallow grammar
- often trained automatically from marked up
corpora - part of speech tagging
- chunking
- trees
6POS tagging and Chunking
- Part of speech tagging
- I/PRP saw/VBD her/PRP duck/VB ./PUNCT
- I/PRP saw/VBD her/PRP duck/NN ./PUNCT
- Chunking
- general chunking
- I begin with an intuition when I read a
sentence, I read it a chunk at a time.
(Abney) - NP chunking
- NP President Clinton visited NP the Hermitage
in NP Leningrad
7Treebank grammars
- Phrase structure tree (c-structure)
- Annotations for heads, grammatical functions
Collins parser output
8Deep grammars
- Provide detailed syntactic/semantic analyses
- LFG (ParGram), HPSG (LinGO, Matrix)
- Grammatical functions, tense, number, etc.
- Mary wants to leave.
- subj(want1,Mary3)
- comp(want1,leave2)
- subj(leave2,Mary3)
- tense(want1,present)
- Usually manually constructed
- linguistically motivated rules
9Why would you want one
- Meaning sensitive applications
- overkill for many NLP applications
- Applications which use shallow methods for
English may not be able to for "free" word order
languages - can read many functions off of trees in English
- SUBJ NP sister to VP S NP Mary VP left
- OBJ first NP sister to V S NP Mary VP saw
NP John - need other information in German, Japanese, etc.
10Deep analysis matters if you care about
the answer
- Example
- A delegation led by Vice President Philips, head
of the chemical division, flew to Chicago a
week after the incident. - Question Who flew to Chicago?
- Candidate answers
- division closest noun
- head next closest
- V.P. Philips next
11Traditional Problems
- Time consuming and expensive to write
- Robustness
- want output for any input real-world
applications - Ambiguity
- Efficiency
- Interfaces to other application components
12Why deep analysis is difficult
- Languages are hard to describe
- Meaning depends on complex properties of words
and sequences - Different languages rely on different properties
- Errors and disfluencies
- Languages are hard to compute
- Expensive to recognize complex patterns
- Sentences are ambiguous
- Ambiguities multiply explosion in time and
space
13How to overcome this
- Engineer the deep grammars
- theoretical vs. practical
- what is good enough
- Integrate shallow techniques into deep grammars
- Experience based on broad-coverage LFG grammars
(ParGram project)
14Robustness Sources of Brittleness
- missing vocabulary
- you can't list all the proper names in the world
- missing constructions
- there are many constructions theoretical
linguistics rarely considers (e.g. dates, company
names) - easy to miss even core constructions
- ungrammatical input
- real world text is not always perfect
- sometimes it is really horrendous
15Real world Input
- Other weak blue-chip issues included Chevron,
which went down 2 to 64 7/8 in Big Board
composite trading of 1.3 million shares Goodyear
Tire Rubber, off 1 1/2 to 46 3/4, and American
Express, down 3/4 to 37 1/4. (WSJ, section 13) - The croaker's done gone from the hook
(WSJ, section 13) - (SOLUTION 27000 20) Without tag P-248 the W7F3
fuse is located in the rear of the machine by the
charge power supply (PL3 C14 item 15.
(Eureka copier repair tip)
16Missing vocabulary
- Build vocabulary based on the input of shallow
methods - fast
- extensive
- accurate
- Finite-state morphologies
- Part of Speech Taggers
17LFG and XLE This course
- LFG a theory of grammar
- XLE a parsing/generation engine for LFG grammars
18Different patterns code same meaning
The small children are chasing the dog.
English Group, order
Japanese Group, mark
19Different patterns code same meaning
The small children are chasing the dog.
LFG theory minor adjustments on universal theme
English Group, order
Japanese Group, mark
Chase(small(children), dog)
20LFG architecture
Modularity
- C-structures and f-structures in piecewise
correspondence.
S
f
NP
VP
John
V
NP
likes
Mary
Formal encoding of grammatical relations
Formal encoding of order and grouping
21LFG grammar
Rules
Lexical entries
S ? NP VP (? SUBJ)? ??
John NP (? PRED)John (?
NUM)SG likes V (? PRED)likeltSUBJ, OBJgt
(? SUBJ NUM)SG (? SUBJ PERS)3
VP ? V NP ? ? (? OBJ)?
- Context-free rules define valid c-structures
(trees). - Annotations are instantiated at tree nodes to
give equational constraints that corresponding
f-structures must satisfy. - Satisfiability of constraints determines
grammaticality. - F-structure is solution for equations (if
satisfied).
22Rules as well-formedness conditions
S
SUBJ
NP
VP
A tree containing S over NP - VP is OK if
F-unit corresponding to NP node is SUBJ of f-unit
corresponding to S node The same f-unit
corresponds to both S and VP nodes.
23Inconsistent equations Ungrammatical
- Whats wrong with They walks ?
f v and (v SUBJ NUM)SG gt (f SUBJ
NUM)SG
If a valid inference chain yields FALSE, the
premises are unsatisfiable.
24Pargram project
- Large-scale LFG grammars for several languages
- English, German, Japanese (Korean), French,
Norwegian, Chinese, Turkish, Arabic, Hungarian - Cover real uses of language--newspapers,
documents, etc. - Parallelism test LFG universality claims
- Common c- to f-structure mapping conventions
- (unless typologically motivated variation)
- Invariant underlying f-structures
- Permits shared disambiguation properties, Glue
interpretation premises - All grammars run on PARC software (XLE)
- International consortium of linguists
- PARC, Stuttgart, Fuji Xerox, Konstanz, Bergen,
Sabanci, Oxford, Oman - Sustained effort--full-week meetings twice a
year10 years! - Contributions to linguistics and computational
linguistics books and papers - Each group is self-funded, self-managed
25History
- Started in 1994
- English (PARC)
- French (XRCE, now PARC)
- German (IMS-Stuttgart)
- Biannual meetings
- Alternating between Palo Alto and Europe/Japan
- 1998 Japanese started (Fuji Xerox)
- 1999 Norwegian started (University of Bergen)
- 2000 Urdu (Konstanz)
- 2002 Danish started (Copenhagen)
- 2003 Korean (PARC) porting experiment
- 2004 Welsh, Malagasy (Essex, Oxford) Chinese
(PARC) - 2005 Arabic (Oman), Turkish (Sabanci), Hungarian
26Goals
- Practical
- Create a capability/platform for NL applications
- translation, information retrieval, ...
- Develop discipline of grammar engineering
- what tools, techniques, conventions make it easy
to develop and maintain broad-coverage grammars? - how long does it take?
- how much does it cost?
- Theoretical
- Refine and guide LFG theory through broad
coverage of multiple languages - Refine and guide the algorithms and
implementation (XLE)
27Parallel f-structures (where possible)
28but different c-structures
29Pargram grammars
Rules 251 388 180 56
States 3,239 13,655 1,747 368
Disjuncts 13,294 55,725 12,188 2,012
- German
- English
- French
- Japanese (Korean)
English allows for shallow markup labeled
bracketing, named-entities
30Engineering results
- Grammars and Lexicons
- Parallel f-structures across languages
- Grammar writers cookbook
- New practical formal devices
- Complex categories for efficiency NPnom
vs. NP (? CASE) NOM - Optimality marks for robustness
- enlarge grammar without being overrun by
peculiar analyses - Lexical priority merging different lexicons
31Theoretical results
- New theory of agreement features
- Separate representation of morphosyntactic
features - Phonology-syntax interface
- New analysis of nonconstituent coordination
- Distribution instead of generalization over sets
32XLE Demo
33(No Transcript)