Title: Modelling Natural Language with Finite Automata
1Formal Languages FSAs in NLP Hinrich Schütze IMS,
Uni Stuttgart, WS 2006/07 Most slides borrowed
from K. Haenelt and E. Gurari
2Formal Language Theory
- Two different goals in computational linguistics
- Theoretical interest
- What is the correct formalization of natural
language? - What does this formalization tell us about the
properties of natural language? - What are the limits of NLP algorithms in
principle? - E.g., natural language is context free would
imply syntactic analysis of natural language is
cubic. - Practical interest
- Well-understood mathematical and computational
framework for solving NLP problems - ... even if formalization is not cognitively
sound - Today finite state for practical applications
3Language-Technology Based on Finite-State-Devices
4Advantages ofFinite-State Devices
- efficiency
- time
- very fast
- if deterministic or low-degree non-determinism
- space
- compressed representations of data
- search structure ( hash function)
- system development and maintenance
- modular design andautomatic compilation of
system components - high level specifications
- language modelling
- uniform framework for modelling dictionaries and
rules
5Modelling Goal
- specification of a formal language that
corresponds as closely as possible to a natural
language - ideally the formal system should
- never undergenerate (i.e. accept or generate all
the strings that characterise a natural language) - never overgenerate(i.e.not accept or generate
any string which is not acceptable in a real
language) - realistically
- natural languages are moving targets
(productivity, variations) - approximations are achievable
- Finite-state is a crude, but useful approximation
(Beesley/Karttunen, 2003)
6Layers of Linguistic Modelling
Sem
promote
on
in
Commission
information
product
third country
Syn
S
PP
NP
VP
NP
PP
Lex
noun
dete
verb
noun
prpo
noun
prpo
noun
The
Commission
promote
information
on
product
in
third country
Text
The Commission promotes information on
products in third countries
7Main Types of Transducers for Natural Language
Processing
general case - non-deterministic transducers
with e-transitions optimisation -
determinisation and minimisation
s
a
w
aw
s
a
w
s
e
e
s
a
w
e
e
s
ee
a
w
s
8Lexical Analysis of Natural Language Texts
Lexical Analysis
Recogn. Input L
Mapping to Output Language
Tokenisation
Morphological Analysis
9Tokenization
- Just use white space?
- Tokenization rules?
10Properties ofNatural Language Words
- very large set (how many?)
- word formation concatenation with constraints
- simple words word dorw
- compound word
- productive compounding and derivations Drosselkl
appenpotentiometer organise ? organisation
? organisational, re-organise ?
re-organisation ? re-organisational be-wald-en
be-feld-en - contiguous dependencies go-es walk-es
un-expect-ed-ly un-elephant-ed-ly - discontiguous (long distance) dependencies expec
t-s un-expect-s mach-st ge-mach-st
11(No Transcript)
12Modelling ofNatural Language Words
- Concatenation rules expressed in terms of
- meaningful word components (including simple
words) (morphs) - and their concatenation (morphotactics)
- Modelling approach
- lexicalisation of sets of meaningful components
(morph classes) - representation of these dictionaries with
finite-state transducers - specification of concatenation
d
e
b
o
o
k
i
n
g
b
o
o
k
s
w
o
r
k
w
o
r
k
13Modelling of Natural Language Words System
Overview
affix (enoun sg) ltgt (snoun pl) ltgt
morpheme classes
stem (bookbook) ltaffixgt (boxbox) ltaffixgt
morphotactics
concatenation of morpheme classes
lexicon compiler
phon./orthograph. alternation rules
special expressions
lexical transducer
(0-92.)2190-92 ...
14Non-Regular Phenomena
15Modelling of WordsStandard Case
noun-stem book ltnoun-suffixgt work ltnoun-suffixgt
NSg
NPl
b
o
o
k
e
noun-suffix (eN Sg) (sN Pl)
s
b
o
o
k
w
o
r
r
w
o
- Lexical Transducer
- deterministic
- minimal
- additional output at final states
16Modelling of Words Mapping Ambiguities between
io-Language
l
e
a
v
e
e
leave
l
e
a
v
s
e
leave
leaves
ave
e
e
left
left
f
t
ft
- Lexical Transducer
- deterministic
- delayed emission at final state
- minimal
17Modelling of WordsOverlapping of Matching
Patterns
e-
w
a
c
h
s
Wach-stube
Wachs-tube
u
b
e
s
t
t
- Lexical Transducer
- non-deterministic with e-transitions
- determinisation not possible
- infinite delay of output due to cycle
- Ambiguities
- cannot be resolved at lexical level
- must be preserved for later analysis steps
- require non-deterministic traversal
18Modelling of WordsNon-Disjoint Morpheme Classes
- problem
- multiplication of start sections
- high degree of non-determinism
- solutions
- pure finite-state devices
- transducer with high degree of non-determinism
- heavy backtracking / parallel search
- slow processing
- determinisation
- explosion of network in size
- extendend device feature propogation along paths
- merging of morpheme classes
- reduction of degree of non-determinsms
- minimisation
- additional checking of bit-vectors
19Modelling of WordsNon-Disjoint Morpheme
Classes high degree of non-determinism
noun-stem book ltnoun-stemgt ltnoun-suffixgt work ltno
un-stemgt ltnoun-suffixgt
e
book
e
e
N
work
e
book
eV
e
verb-stem book ltverb-suffixgt work ltverb-suffixgt
V
work
noun-suffix (eN Sg) (sN Pl)
- problem
- each of the subdivisions must be searched
separateley - (determinization not feasible
- leads to explosion of network or
- not possible)
verb-suffix (eV) (edV past) (sV
3rd)
20Modelling of WordsNon-Disjoint Morpheme
Classes feature propagation
noun-stem book ltnoun-stemgt ltnoun-suffixgt work ltno
un-stemgt ltnoun-suffixgt
e N
book N,V
e N,V
work N,V
ed V
s N,V
verb-stem book ltverb-suffixgt work ltverb-suffixgt
noun-suffix (eN Sg) (sN Pl)
- solution
- interpreting continuation class as bit-feature
- added to the arcs
- checked during traversal (feature intersection)
- merging the dictionaries
- searching only one dictionary (degree of
non-determinism reduced considerably)
verb-suffix (eV) (edV past) (sV
3rd)
21Modelling of WordsLong-Distance Dependencies
Constraints on the co-occurrence of morphs within
words
Contiguous Constraints
Discontiguous Constraints
invalid sequences
ge-mach- ge-mach-
e st
mach- mach- mach- mach- mach- mach- ...
e st t en t en
ge-mach-t
ge-wachs-t
wachs-e
ge-wach-st
wach-e
22Modelling of WordsLong-Distance Dependencies
- modelling alternatives
- pure finite-state transducer
- copies of the network
- can cause explosion in the size of the resulting
transducer - extended finite-state transducer
- context-free extension simple memory flags
- special treatment by analysis and generation
routine - keeps transducers small
23Modelling of WordsLong-Distance Dependencies
network copies
inflection suffixes
verb-stems
e
e
verb-stems
ge
gt 15.000 entries
problem can cause explosion in the size of the
resulting transducer
24Modelling of WordsLong-Distance Dependencies
simple memory flags
inflection suffixes
_at_require(-ge)
e
verb-stems
ge
_at_set(ge)
_at_require(ge)
solution -state flags - procedural
interpretation (bit vector operations)
Beesley/Karttunen, 2003
25Modelling of WordsPhon./Orth. Alternation Rules
- phenomena
- pity ? pitiless
- fly ? flies
- swim ? swimming
- delete ? deleting
- fox ? foxes
- dictionary
- rules high level specification of regular
expressions - .
noun-stem dog ltnoun-suffixgt fox ltnoun-suffixgt
noun-suffix (eN Sg) (sN Pl)
Beesley/Karttunen, 2003 61
a?b_c
? b a ? ? a c ?
26Modelling of WordsPhon./Orth. Alternation Rules
- compilation of lexical transducer
- construction of dictionary transducer
- construction of rule transducer
- composition of lexical transducer and rule
transducer
27Modelling of WordsPhon./Orth. Alternation Rules
e-insertion rule for English plural nouns ending
with x,s,z (foxes)
,other
r5
other
z,s,x
s
z,s,x
z,s,x
s
r0
r1
r2
r3
r4
z,x
,other
,other
Jurafsky/Martin, 2000, S. 78
28(No Transcript)
29(No Transcript)
30Diagram?
31(No Transcript)
32 33(No Transcript)
34Tisch/Tische Rat/Räte
NSg
NPl
b
o
o
k
e
s
b
o
o
k
w
o
r
r
w
o
35- Numbers to Numerals for English (1-99)
36 37Modelling of WordsPhon./Orth. Alternation Rules
dictionary
oo
xx
ff
1
2
Ne
Pl
es
e
0
5
6
7
8
9
dd
oo
gg
3
4
rule
,other
5
e
z,s,x
other
,
e
z,s,x
s
D
W
ee
e
s
z,s,x
1
2
3
4
0
,other
z,x
,other
dictionary ? rule
xx
e
ee
ss
61
51
20
10
72
73
84
f
f
oo
xx
Ne
Pl
es
e
00
90
e
ss
30
81
40
50
60
70
dd
Pl
oo
gg
Ne
es
e
38Time complexity of transducer?
39Layers of Linguistic Modelling
Sem
promote
on
in
Commission
information
product
third country
Syn
S
PP
NP
VP
NP
PP
Lex
noun
dete
verb
noun
prpo
noun
prpo
noun
The
Commission
promote
information
on
product
in
third country
Text
The Commission promotes information on
products in third countries
40Syntactic Analysis of Natural Language Texts
Syntactic Analysis
Input Language
Output Language
Assignment of Syntactic Categories
Assigment of Syntactic Structure
NP
NP
NP
NP
the
good
example
the
good
example
41Sequences of Words Properties and
Well-Formedness Conditionssyntactic conditions
type-3
- regular language concatenations
- local word ordering principles
- the good example example good the
- could have been done been could done have
- global word ordering principles
- (we) (gave) (him) (the book) (gave) (him) (the
book) (we)
42Sequences of Words Properties and
Well-Formedness Conditionssyntactic conditions
beyond type-3
- concatenations beyond regular languages
- centre embedding (S ? a S b)
- obligatorily paired correspondences
- either ... or, if ... then
- can be nested inside each other
43Sequences of Words Properties and
Well-Formedness Conditionssyntactic conditions
beyond type-2
- concatenations beyond context-free languages
- cross-serial dependencies
Jan säit das mer dchind em Hans es huus lönd
hälfe aastriiche
y1
y2
y3
x1
x2
x3
John said that we the children-acc
let
Hans-dat help
the
house paint
44Syntactic Grammars
- complete parsing
- goal recover complete, exact parses of sentences
- closed-world assumption
- lexicon and grammar are complete
- place all types of conditions into one grammar
- seeking the globally best parse of the entire
search space - problems
- not robust
- too slow for mass data processing
- partial parsing
- goal recover syntactic informationefficiently
and reliably from unrestricted text - sacrificing completeness and depth of analysis
- open-world assumption
- lexicon and grammar are incomplete
- local decisions
Abney, 1996
45Syntactic GrammarsComplete Sentence Structure ?
syntactic link
NP
semantic link
AP
text structure
PP
NP
AP
PP
NP-c
with
stopper
fastening
cork
material
or
made of
with
bottle
closed
46Syntactic GrammarsComplete Sentence
Parsingcomputational problem
- combinatorial explosion of readings
Bod, 1998 2
47All Grammars Leak
Edward Sapir, 1921
- Not possible to provide an exact and complete
characterization - of all well-formed utterances
- that cleanly divides them from all other
sequences of words which are regarded as
ill-formed utterances - Rules are not completely ill-founded
- Somehow we need to make things looser to account
for the creativity of language use
48All Grammars Leak
- Example for leaking rule?
49All Grammars Leak
- Agreement in English
- Why do some teachers, parents and religious
leaders feel that celebrating their religious
observances in home and church are inadequate and
deem it necessary to bring those practices into
the public schools?
50Syntactic Structure Partial Parsing Approaches
- finite-state approximation of sentence structures
(Abney 1995) - finite-state cascades sequences of levels of
regular expressions - recognition approximation tail-recursion
replaced by iteration - interpretation approximation embedding replaced
by fixed levels
51Syntactic StructureFinite State Cascades
- functionally equivalent to composition of
transducers, - but without intermediate structure output
- the individual transducers are considerably
smaller than a composed transducer
52Syntactic StructureFinite-State Cascades (Abney)
Finite-State Cascade
S
S
L3 ----
T3
NP
PP
VP
NP
VP
L2 ----
T2
NP
P
NP
VP
NP
VP
L1 ----
T1
D
N
P
D
N
N
V-tns
Pron
Aux
V-ing
L0 ----
the
woman
in
the
lab
coat
thought
you
were
sleeping
Regular-Expression Grammar
53Syntactic StructureFinite-State Cascades (Abney)
- cascade consists of a sequence of levels
- phrases at one level are built on phrases at the
previous level - no recursion phrases never contain same level or
higher level phrases - two levels of special importance
- chunks non-recursive cores (NX, VX) of major
phrases (NP, VP) - simplex clauses embedded clauses as siblings
- patterns reliable indicators of gist of
syntactic structure
54Syntactic StructureFinite-State Cascades (Abney)
- each transduction is defined by a set of patterns
- category
- regular expression
- regular expression is translated into a
finite-state automaton - level transducer
- union of pattern automata
- deterministic recognizer
- each final state is associated with a unique
pattern - heuristics
- longest match (resolution of ambiguities)
- external control process
- if the recognizer blocks without reaching a final
state,a single input element is punted to the
output andrecognition resumes at the following
word
55Syntactic StructureFinite-State Cascades (Abney)
- patterns reliable indicators of bits of
syntactic structure - parsing
- easy-first parsing (easy calls first)
- proceeds by growing islands of certainty into
larger and larger phrases - no systematic parse tree from bottom to top
- recognition of recognizable structures
- containment of ambiguity
- prepositional phrases and the like are left
unattached - noun-noun modifications not resolved
56Syntactic StructureBounding of Centre Embedding
- Sproat, 2002
- observation unbounded centre embedding
- does not occur in language use
- seems to be too complex for human mental
capacities - finite state modelling of bounded centre embedding
S ? the (mandog) S1 (biteswalks) S1 ? the
(mandog) S2 (biteswalks) S2 ? the (mandog)
(biteswalks) S1 ? e S2 ? e
57Modelling of Natural Language Word Sequences
Approaches
58Modelling of Natural Language Word Sequences
Cases (1)
59Modelling of Natural Language Word Sequences
Cases (2)
60Semantic AnalysisAn example
- message understanding
- filling in relational database templates from
newswire texts - approach of FASTUS 1) cascade of five
transducers - recognition of names,
- fixed form expressions,
- basic noun andverb groups
- patterns of events
- ltcompanygt ltformgtltjoint venturegt with ltcompanygt
- "Bridgestone Sports Co. said Friday it has set up
a joint venture in Taiwan with a local concern
and a Japanese trading house to produce golf
clubs to be shipped to Japan. - identification of event structures that describe
the same event
1) Hobbs/Appelt/Bear/Israel/Kehler/Martin/Meyers/K
ameyama/Stickel/Tyson (1997)
61Summary Linguistic Adequacy
- word formation
- essentially regular language
- sentence formation
- reduced recognition capacity (approximations)
- corresponds to language userather than natural
language system - flat interpretation structures
- clearly separate syntactic constraints from other
(semantic, textual) constraints - partial interpretation structures
- clearly identify the contribution of syntactic
structure in the interplay with other structuring
principles - content
- suitable for restricted fact extraction
- deep text understanding generally still poorly
understood
62Summary Practical Usefulness
- not all natural language phenomena can be
described with finite-state devices - many actually occurring phenomena can be
described with regular devices - not all practical applications require a complete
and deep processing of natural language - partial solutions allow for the development of
many useful applications
63Summary Complexity of Finite-State Transducers
for NLP
- theoretically computationally intractable
(Barton/Berwick/Ristad,1987) - SAT-problem is unnatural
- natural language problems are bounded in size
- input and output alphabets,
- word length of linguistic words,
- partiality of functions and relations
- combinatorial possibilities are locally
restricted. - practically, natural language finite-state
systems - do not involve complex search
- are remarkably fast
- can in many relevant cases be determinised and
minimised
64Summary Large Scale Processing
- Context-free devices
- run-time complexity G n3 , G gtgt n3
- too slow for mass data processing
- Finite-state devices
- run-time complexity best case linear(with low
degree of non-determinism) - best suited for mass data processing
65References
- Abney, Steven (1996). Tagging and Partial
Parsing. In Ken Church, Steve Young, and Gerrit
Bloothooft (eds.), Corpus-Based Methods in
Language and Speech. Kluwer Academic Publishers,
Dordrecht. http//www.vinartus.net/spa/95a.pdf - Abney, Steven (1996a) Cascaded Finite-State
Parsing. Viewgraphs for a talk given at Xerox
Research Centre, Grenoble, France.
http//www.vinartus.net/spa/96a.pdf - Abney, Steven (1995). Partial Parsing via
Finite-State Cascades. In Journal of Natural
Language Engineering, 2(4) 337-344.
http//www.vinartus.net/spa/97a.pdf - Barton Jr., G. Edward Berwick, Robert, C. und
Eric Sven Ristad (1987). Computational Complexity
and Natural Language. MIT Press. - Beesley Kenneth R. und Lauri Karttunen (2003).
Finite-State Morphology. Distributed for the
Center for the Study of Language and Information.
(CSLI- Studies in Computational Linguistics) - Bod, Rens (1998). Beyond Grammar. An
Experienced-Based Theory of Language. CSLI
Lecture Notes, 88, Standford, California Center
for the Study of Information and Language - Grefenstette, Gregory (1999). Light Parsing as
Finite State Filtering. In Kornai 1999, S.
86-94. earlier version in Workshop on Extended
finite state models of language, Budapest,
Hungary, Aug 11--12, 1996. ECAI'96.
http//citeseer.nj.nec.com/grefenstette96light.htm
l - Hobbs, Jerry Doug Appelt, John Bear, David
Israel, Andy Kehler, David Martin, Karen Meyers,
Megumi Kameyama, Mark Stickel, Mabry Tyson
(1997). Breaking the Text Barrier. FASTUS
Presentation slides. SRI International.
http//www.ai.sri.com/israel/Generic-FASTUS-talk.
pdf - Jurafsky, Daniel und James H. Martin (2000)
Speech and Language Processing. An Introduction
to Natural Language Processing, Computational
Linguistics and Speech Recognition. New Jersey
Prentice Hall. - Kornai, András (ed.) (1999). Extended Finite
State Models of Language. (Studies in Natural
Language Processing). Cambridge Cambridge
University Press. - Koskenniemi, Kimmo (1983). Two-level morphology
a general computational model for word-form
recognition and production. Publication 11,
University of Helsinki. Helsinki Department of
Genral Linguistics
66References
- Kunze, Jürgen (2001). Computerlinguistik.
Voraussetzungen, Grundlagen, Werkzeuge.
Vorlesungsskript. Humboldt Universität zu Berlin.
http//www2.rz.hu-berlin.de/compling/Lehrstuhl/Skr
ipte/Computerlinguistik_1/index.html - Manning, Christopher D. Schütze, Hinrich (1999).
Foundations of Statistical Natural Language
Processing. Cambridge, Mass., London The MIT
Press. http//www.sultry.arts.usyd.edu.au/fsnlp - Mohri, Mehryar (1997). Finite State Transducers
in Language and Speech Processing. In
Computational Linguistics, 23, 2, 1997, S.
269-311. http//citeseer.nj.nec.com/mohri97finites
tate.html - Mohri, Mehryar (1996). On some Applications of
finite-state automata theory to natural language
processing. In Journal of Natural Language
Egineering, 2, S. 1-20. - Mohri, Mehryar und Michael Riley (2002). Weighted
Finite-State Transducers in Speech Recognition
(Tutorial). Teil 1 http//www.research.att.com/m
ohri/postscript/icslp.ps, Teil 2
http//www.research.att.com/mohri/postscript/icsl
p-tut2.ps - Partee, Barbara ter Meulen, Alice and Robert E.
Wall (1993). Mathematical Methods in Linguistics.
Dordrecht Kluwer Academic Publishers. - Pereira, Fernando C. N. and Rebecca N. Wright
(1997). Finite-State Approximation of
Phrase-Structure Grammars. In Roche/Schabes
1997. - Roche, Emmanuel und Yves Schabes (Eds.) (1997).
Finite-State Language Processing. Cambridge
(Mass.) und London MIT Press. - Sproat, Richard (2002). The Linguistic
Significance of Finite-State Techniques. February
18, 2002. http//www.research.att.com/rws - Strzalkowski, Tomek Lin, Fang Ge, Jin Wang
Perez-Carballo, Jose (1999). Evaluating Natural
Language Processing Techniques in Information
Retrieval. In Strzalkowski, Tomek (Ed.) Natural
Language Information Retrieval, Kluwer Academic
Publishers, Holland 113-145 - Woods, W.A. (1970). Transition Network Grammar
for Natural Language Analysis. In Communications
of the ACM 13 591-602.