Title: 7' Parsing in functional unification grammar
17. Parsing in functional unification grammar
2Contents
- 7.1 Functional unification grammar
- 7.1.1 Compilation
- 7.1.2 Attributes and values
- 7.1.3 Unification
- 7.1.4 Patterns and constituent sets
- 7.1.5 Grammar
- 7.2 The parser
- 7.2.1 The General Syntactic Processor
- 7.2.2 The parsing grammar
- 7.3 The compiler
- 7.4 Conclusion
37.1 functional unification grammar
- The claim that this theory makes on the word
functional in its title is therefore supported
in three ways. - 1. It gives primary status to those aspects of
language that have often been called functional
logical aspects are not privileged - 2. It describes linguistic structures in terms of
the function that a part fills in a whole, rather
than in terms of parts of speech and ordering
relations - 3. Most important for this paper, it requires its
grammars to function that is, they must support
the practical enterprises of language generation
and analysis.
47.1.1 Compilation
- This paper will concentrate on how this
translation is actually carried out it will, in
short, be about machine translation between
grammatical formalisms. - This kind of translation to be explored here is
known in computer science as compilation, and the
computer program that does it is called a
compiler. - The term compilation almost always refers to a
process that translates a text produced by a
human into a text that is functionally
equivalent, but not intended for human
consumption.
57.1.2 Attributes and values
- Functional unification grammar knows things by
their functional descriptions, (FDs). A simple FD
is a set of descriptors and a descriptor is a
constituent set, a pattern, or an attribute with
an associated value. - The list of descriptors that make up an FD is
written in square brackets, no significance
attaching to the order. The attributes in an FD
must be distinct from one another so that if an
FD F contains the attribute a, it is always
possible to use the phrase the a of F to refer
unambiguously to a value. - An attribute is a symbol, that is, a string of
letters. A value is either a symbol or another FD
67.1.2 Attributes and values
77.1.3 Unification
- A string of atoms enclosed in angle brackets
constitutes a path and there is at least one that
identifies every value in an FD. - The path lta1 a2 akgt identifies the value of
the attribute ak in the FD that is the value of
lta1 a2 ak-1gt. It can be read as The ak of the
ak-1 of the a1. - Paths are always interpreted as beginning in the
largest FD that encloses them. - A pair consisting of a path in an FD and the
value that the path leads to is a feature of the
object described. - If the value is a symbol, the pair is a basic
feature of the FD
87.1.3 Unification
- The sentence He likes writing books
- Example of Path
97.1.3 Unification
- The union of a pair of FDs in not, in general, a
well-formed FD. - The reason is this The requirement that a given
attribute appear only once in an FD implies a
similar constraint on the set of features
corresponding to an FD. - A path must uniquely identify a value.
107.1.3 Unification
- When two or more simple FDs are compatible, they
can be combined into one simple FD describing
those things that they both describe, by the
process of unification. - Unification is the same as set union except that
it yields the null set when applied to
incompatible arguments. - The sign is used for unification, so that a
ß denotes the result of unifying a and ß. - Unification is the fundamental operation
underlying the analysis and synthesis of
sentences using functional unification grammar.
117.1.3 Unification
127.1.4 Patterns and constituent sets
- The value of SUBJ is the FD of a constituent of
the sentence, whereas the value of ASPECT is not - The purpose of constituent sets and patterns is
to identify constituents and to state constraints
on the order of their occurrence - The value of the C-set attribute covers all
constituents.
137.1.4 Patterns and constituent sets
- Each pattern is a list whose members can be
- 1. A path. The path may have as its valuea. An
FD. As in the case of the constituent set, the FD
describes a constituentb. A pattern. The pattern
is inserted into the current one at this point - 2. A string of dots. This matches any number of
constituents - 3. The symbol . This matches any one constituent
- 4. An FD. This will match any constituent whose
description is unifiable with it. The unification
is made with a copy of the FD in the pattern,
rather than with the FD itself, because the
intention is to impute its properties to the
constituent, but not to unify all the
constituents that match this part of the pattern - 5. An expression of the form ( fd), where fd is
an FD. This matches zero or more constituents,
provided they can all be unified with a copy of
fd.
147.1.4 Patterns and constituent sets
Expressions of pattern
The pattern (16) requires exactly one constituent
to have the property TRACENP all others must
have the property TRACENONE
157.1.5 Grammar
- A functional unification grammar is a single FD
- Example (19) shows a simple grammar,
corresponding to a context-free grammar
containing the single rule (20)
167.2 The Parser
- 7.2.1 The General Syntactic Processor
- The input is an FD that constitutes the
specification of a sentence to be uttered - There are two principal data structures, the
chart and the agenda - The chart is a directed graph each of whose edges
maps onto a substring of the sentence being
analyzed
177.2.1 The General Syntactic Processor
- Chart
- K1 vertices for a sentence of k words
- Each word in the sentence to be parsed is
represented by an edge labeled with an FD
obtained by looking that word up in the lexicon - If the word is ambiguous, that is, if it has more
than one FD, it is represented by more than one
edge. - All the edges for the i-th word clearly go from
vertex i 1 to vertex i - The label on an active edge has two parts, an FD
describing what is known about the putative
phrase, and a procedure that will carry the
recognition of the phrase one step further forward
187.2.1 The General Syntactic Processor
- Parsing proceeds in a series of steps in each of
which the procedure on an active edge is applied
to a pair of FDs, one coming from that same
active edge, and the other from an inactive edge
that leaves the vertex where the active edge
ends. - If a and i are an active and an inactive edge
respectively, a being incident to the vertex that
i is incident from, the step consists in
evaluating Pa(fa,fi) , where fa and fi are the
FDs on a and i, and Pa is the procedure
197.2.1 The General Syntactic Processor
- This process carried out for every pair
consisting of an active followed by an inactive
edge that comes to be part of the chart. Each
successful step leads to the introduction of one
new edge, but this edge may result in several new
pairs. - Each new pair produced therefore becomes a new
item on the agenda which serves as a queue of
pairs waiting to be processed
207.2.2 The parsing grammar
- The parsing grammar, as we have seen, takes the
form of a set of procedures, each of which
operates on a pair of FDs - One of these FDs, the matrix FD, is a partial
description of a phrase, and the other, the
constituent FD, is as complete a description as
the parser will ever have of a candidate for
inclusion as constituent of that phrase
217.2.2 The parsing grammar
227.3 The compiler
- The compiler has two major sections. The first
part is a straightforward application of the
generation program to put the grammar,
effectively, into disjunctive normal form. The
second is concerned with actually building the
procedures - If F is grammar, or indeed any complex FD, it is
always possible to recast it in the form F1 ? F2
Fn, where the Fi (1 i n) each contain no
alternations
237.3 The compiler
- The process of generation from a particular FD,
, effectively selects those members of F1 Fn
that can be unified with , and then repeats this
procedure recursively for each constituent. F is,
in general, a conjunct containing some atomic
terms and some alternations.
247.3 The compiler
- Ignoring patterns for the moment, the procedure
is as follows - 1. Unify the atomic terms of F with . If this
fails, the procedure as a whole fails. Some
number of alternations now remain to be
considered. In other words, that part of F that
remains to be unified with is an expression F'
of the form (a1.1 ? a1.2 a1.k1) (a2.1 ? a2.2
a2.k2) (an.1 ? an.2 an.kn) - 2. Rewrites as an alternation by multiplying out
the terms of an arbitrary alternation in F', say
the first one. This give an expression F" of the
form (a1.1 (a2.1 ? a2.2 a2.k2) (an.1 ? an.2
an.kn)) ? (a1.2 (a2.1 ? a2.2 a2.k2) (an.1
? an.2 an.kn)) ? (a1.k1 (a2.1 ? a2.2
a2.k2) (an.1 ? an.2 an.kn)) - 3. Apply the whole procedure (steps 1-3)
separately to each conjunct in F"
257.3 The compiler
- It remains to spell out the alternatives that are
implicit in the patterns - The basic idea is to generate all permutations of
the constituent set of the FD and to eliminate
those that do not match all the patterns - The result of this phase of the compilation is a
list of simple FDs, containing no alternations,
and having either no pattern, or a single
pattern that specifies the order of constituents
uniquely - Those that have no pattern become lexical entries
and they are of no further interest to the
compiler
267.3 The compiler
- The second phase of the compiler centers around a
procedure which, given a list of simple FDs, and
an integer n, attempts to find an attribute, or
path, on the basic of which the nth constituent
of those FDs can be distinguished - The result of this process is (1) a path A, (2) a
set of values for A, each associated with the
subset of the list of FDs whose nth constituent
has that value of A, and (3) a residual subset of
the list consisting of FDs whose nth constituent
has no value of the attribute A
277.3 The compiler
287.4 Conclusion
- Two things can be said to mitigate this to some
extent. First, the parsing and generation
grammars do indeed describe exactly the same
languages, so that much of the work involved in
testing prototype grammars can be done with a
generator that works directly and efficiently off
the competence grammar. The second point is this
the compiler behaves as though any
attribute-value pair in the grammar that did not
mention CAT was not there at all. - The resulting set of parsing procedures clearly
recognizes at least all the sentences of the
language intended, though possibly others in
addition.