Title: Tree Adjoining Grammars
1Tree Adjoining Grammars
2Context Free Grammars
Context Free Grammars Derivations
Who does Bill think Harry likes?
S
S
NP
V
S
who
does
VP
NP
V
S
Bill
think
NP
Harry
3Context Free Grammars
Context Free Grammars Semantics
Who does Bill think Harry likes?
S
S
NP
V
S
who
does
VP
NP
V
S
- Meaning relations of the predicate/argument
structures is lost in the tree - likes (Harry, who)
Bill
think
NP
Harry
4Context Free Grammars
Context Free Grammars Complexity
- CFGs can be parsed in time proportional to n3,
where n is the length of the input in words by
algorithms like CKY.
5Transformational Grammars
Who does Bill think Harry likes?
S
Context Free Deep Structure plus Movement
Transformations
S
NP
V
S
does
VP
NP
V
S
Bill
think
VP
NP
NP
V
Harry
likes
6Transformational Grammars Complexity
- TGs can be parsed in exponential time
- 2n, where n is the length of the input in words
- Exponential time is intractable, because
exponentials grow so quickly
7Lexicalized LTAG
- Finite set of elementary trees anchored on
lexical items -- encapsulates syntactic and
semantic dependencies - Elementary trees Initial and Auxiliary
8LTAG A set of Elementary Trees
9LTAG Examples
S
S
a1
a2
VP
S
NP
NP
V
NP
VP
NP
likes
V
NP
likes
transitive
e
object extraction
some other trees for likes subject extraction,
topicalization, subject relative, object
relative, passive, etc.
10Lexicalized LTAG
- Finite set of elementary trees anchored on
lexical items -- encapsulates syntactic and
semantic dependencies - Elementary trees Initial and Auxiliary
- Operations Substitution and Adjoining
11Substitution
X
a
b
X
g
X
b
12Adjoining
X
b
a
X
X
X
g
b
X
Tree b adjoined to tree a at the node labeled X
in the tree a
13LTAG A derivation
14LTAG A derivation
15LTAG A derivation
16LTAG A derivation
S
a2
NP
17LTAG A derivation
S
a2
NP
18LTAG A derivation
S
S
a2
a2
NP
NP
S
19LTAG A derivation
S
a2
b1
NP
S
S
VP
NP
V
NP
likes
e
20LTAG A derivation
S
a2
NP
21LTAG Semantics
who does Bill think Harry likes
S
S
NP
V
S
who
- Meaning relations of the predicate/argument
structures are clear in the original base trees!
does
VP
NP
V
S
Bill
think
VP
NP
NP
V
Harry
likes
e
22LTAG A Derivation
who does Bill think Harry likes
S
S
b2
b1
a2
S
V
S
S
NP
VP
NP
does
VP
V
S
NP
V
NP
think
likes
substitution
e
a5
a3
NP
a4
NP
NP
adjoining
Bill
who
Harry
23LTAG Derivation Tree
substitution
who does Bill think Harry likes
adjoining
likes
a2
who
a3
a4
Harry
b1
think
a5
Bill
does
b2
Compositional semantics on this derivation
structure Related to dependency diagrams
24TAGS Complexity
- TAGs, like CFGS, can be parsed in polynomial
time! - Here, n5 rather than n3 for CFGs
- The additional complexity allows TAGS to capture,
for example, the complexities of Swiss German and
other non-Context Free Languages - TAGS are a prime example of mildly context
sensitive grammars (MCSGs), a class invented by
Joshi and his students to describe this class - It is plausible that the MCSGs are sufficient to
capture the grammar of all languages
25Adequacy vs. Complexity
- Context Free Grammars
- Structure doesnt well represent domains of
locality reflecting meaning - Parsed in polynomial time n3 (n is the length
of the input) - Transformational Grammars
- Captures domains of locality, accounting for
surface word order by movement - Parsing is intractable, requring 2n time
- Tree Adjoining Grammars
- Captures domains of locality, with surface
discontiguities the result of adjunction - Parsed in polynomial time n5 (rather than n3 for
CFGs)
26TAGS Complexity
- The additional complexity allows TAGS to
capture, for example, the complexities of Swiss
German and other non-Context Free Languages - It is plausible that the TAGs and related
formalisms are sufficient to capture the grammar
of all languages
27English relative clauses are nested
- NP1 The mouse VP1 ate the cheese
- Form NP1 VP1
- NP1 The mouse NP2 the cat VP2 chased VP1
ate the cheese - Form NP1 NP2 VP2 VP1
- Theorem Languages of form wwr are context free
28CFG trees naturally nest structure
S
VP1
NP
NP
V
S
ate
VP2
the cheese
29Swiss German sentences are harder.
- In English
- NP1 Claudia VP1 watched NP2 Eva vp2 make
NP3 Ulrich VP2 work - Form NP1 VP1 NP2 VP2 NP3 VP3
- Not hard
- In Swiss German
- NP1 Claudia NP2 Eva NP3 Ulrich VP1
watched vp2 make VP3 work - Form NP1 NP2 NP3 VP1 VP2 VP3
- Theorem Languages of form ww cannot be generated
by Context Free Grammars
30Scrambling N1 N2 N3 V1 V2 V3
VP
VP
VP
N1 VP
N2 VP
N3 VP
VP
VP
VP
VP
N2
VP
VP
N3
VP
VP
N1
e
e
V2
V3
e
V1
31Scrambling N1 N2 N3 V1 V2 V3
VP
N1
VP
VP
N2 VP
N3 VP
VP
VP
VP
VP
N3
VP
e
V3
32Scrambling N1 N2 N3 V1 V2 V3
VP
N3
VP
e
V3
VP
33A Simple Synchronous TAG translator
34Substituting in John and Mary
35Substituting Apparently
36Parsing TAGs by Supertagging Reducing parsing
to POS tagging e
37Supertag disambiguation - supertagging
- Given a corpus parsed by an LTAG grammar
- We have statistics of supertags -- unigram,
bigram, trigram, etc. - These statistics combine the lexical statistics
as well as the statistics of the constructions in
which the lexical items appear
38Supertagging
a5
a2
a1
a3
a4
a8
a6
a7
b2
b4
a13
a9
a10
b3
a12
a11
b1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the purchase price includes two
ancillary companies
On the average a lexical item has about 8 to 10
supertags
39Supertagging
a5
a2
a1
a3
a4
a8
a6
a7
b2
b4
a13
a9
a10
b3
a12
a11
b1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the purchase price includes two
ancillary companies
- Select the correct supertag for each word --
shown in blue - Correct supertag for a word means
the supertag that corresponds to that word in
the correct parse of the sentence
40Supertagging -- performance
- Performance of a trigram supertagger
- Performance on the WSJ corpus,
Srinivas (1997)
correct
of words correctly supertagged
Size of the training corpus
Size of the test corpus
75.3
35,391
Baseline
47,000
92.2
43,334
47,000
1 million
41Abstract character of supertagging
- Complex (richer) descriptions of primitives
- Contrary to the standard mathematical convention
- Descriptions of primitives are simple
- Complex descriptions are made from simple
descriptions - Associate with each primitive all information
associated with it
42Complex descriptions of primitives
- Making descriptions of primitives more complex
- Increases the local ambiguity, i.e., there are
more descriptions for each primitive - However, these richer descriptions of primitives
locally constrain each other - Analogy to a jigsaw puzzle -- the richer the
description of each primitive the better
43Complex descriptions of primitives
- Making the descriptions of primitives more
complex - Allows statistics to be computed over these
complex descriptions - These statistics are more meaningful
- Local statistical computations over these complex
descriptions lead to robust and efficient
processing
44A different perspective on LTAG
- Treat the elementary trees associated with a
lexical item as if they are super part of speech
(super-POS or supertags) - Local statistical techniques have been remarkably
successful in disambiguating standard POS - Apply these techniques for disambiguating
supertags -- almost parsing