Parsing with Soft and Hard Constraints on Dependency Length - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Parsing with Soft and Hard Constraints on Dependency Length

Description:

Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 45

Provided by: NoahA2

Category:

more less

Transcript and Presenter's Notes

Title: Parsing with Soft and Hard Constraints on Dependency Length

1
Parsing with Soft and Hard Constraints on
Dependency Length

Jason Eisner and Noah A. Smith
Department of Computer Science /
Center for Language and Speech Processing
Johns Hopkins University
jason,nasmith_at_cs.jhu.edu

2
Premise
here at IWPT 2005 Burstein Sagae
Lavie Tsuruoka Tsujii Dzikovska and Rosé ...

Many parsing consumers
(IE, ASR, MT)
will benefit more from
fast, precise partial parsing
than from full, deep parses that are slow to
build.

3
Outline of the Talk

The Short Dependency Preference
Review of split bilexical grammars (SBGs)
O(n3) algorithm
Modeling dependency length
Experiments
Constraining dependency length in a parser
O(n) algorithm, same grammar constant as SBG
Experiments

Soft constraints
Hard constraints
4
Short-Dependency Preference

A words dependents (adjuncts, arguments)
tend to fall near it
in the string.

5
length of a dependency surface distance
3
1
1
1
6
50 of English dependencies have length 1,
another 20 have length 2, 10 have length 3 ...
fraction of all dependencies
length
7
Related Ideas

Score parses based on whats between a head and
child
(Collins, 1997 Zeman, 2004 McDonald et al.,
2005)
Assume short ? faster human processing
(Church, 1980 Gibson, 1998)
Attach low heuristic for PPs (English)
(Frazier, 1979 Hobbs and Bear, 1990)
Obligatory and optional re-orderings (English)
(see paper)

8
Split Bilexical Grammars (Eisner, 1996 2000)

Bilexical capture relationships between two
words using rules of the form
Xp ? Yp Zc
Xp ? Zc Yp
Xw ? w
grammar size N3S2
Split left children conditionally independent of
right children, given parent
(equivalent to split HAGs Eisner and Satta, 1999)

head
9
Generating with SBGs

?w0
?w0

Start with left wall
Generate root w0
Generate left children w-1, w-2, ..., w-l from
the FSA ?w0
Generate right children w1, w2, ..., wr from the
FSA ?w0
Recurse on each wi for i in -l, ..., -1, 1,
..., r, sampling ai (steps 2-4)
Return al...a-1w0a1...ar

w0
w-1
w1
w-2
w2
...
...
?w-l
w-l
wr
w-l.-1
10
Naïve Recognition/Parsing
p
goal
O(n5) combinations
O(n5N3) if N nonterminals
r
p
c
i
j
0
k
n
goal
takes
takes
It
to
takes
tango
It
takes
two
to
It
takes
two
to
tango
11
Cubic Recognition/Parsing (Eisner Satta, 1999)
A triangle is a head with some left (or right)
subtrees.
goal
One trapezoid per dependency.
It
takes
two
to
tango
12
Cubic Recognition/Parsing (Eisner Satta, 1999)
goal
O(n) combinations
0
i
n
O(n3) combinations
i
j
i
j
k
k
O(n3) combinations
i
j
i
j
k
k
O(n3g2N) if N nonterminals, polysemy g
13
Implementation

Augment items with (Viterbi) weights order by
weight.
Agenda-based, best-first algorithm.
We use Dyna see the HLT-EMNLP paper to
implement all parsers here.
Count the number of items built ? a measure of
runtime.

14
Very Simple Model for ?w and ?w
We parse POS tag sequences, not words.
p(child first, parent, direction) p(stop
first, parent, direction) p(child not first,
parent, direction) p(stop not first, parent,
direction)
?takes
?takes
It
takes
two
to
15
Baseline
test set recall () test set recall () test set recall () test set runtime (items/word) test set runtime (items/word) test set runtime (items/word)

73 61 77 90 149 49
16
Improvements
smoothing/max ent
parse words, not tags
bigger FSAs/ more nonterminals
73
LTAG, CCG, etc.
model dependency length?
special NP-treatment, punctuation
train discriminatively
17
Modeling Dependency Length
When running parsing algorithm, just multiply in
these probabilities at the appropriate time.
p
DEFICIENT
p(3 r, a, L)
p(2 r, b, L)
p(1 b, c, R)
p
p(1 r, d, R)
p(1 d, e, R)
p(1 e, f, R)
18
Modeling Dependency Length
test set recall () test set recall () test set recall () test set runtime (items/word) test set runtime (items/word) test set runtime (items/word)

73 61 77 90 149 49
76 62 75 67 103 31
4.1 1.6 -2.6 -26 -31 -37
length
19
Conclusion (I)

Modeling dependency length can
cut runtime of simple models by 26-37
with effects ranging from
-3 to 4 on recall.
(Loss on recall perhaps due to deficient/MLE
estimation.)

20
Going to Extremes
Longer dependencies are less likely.
What if we eliminate them completely?
21
Hard Constraints

Disallow dependencies between words of distance gt
b ...
Risk best parse contrived, or no parse at all!
Solution allow fragments (partial parsing
Hindle, 1990 inter alia).
Why not model the sequence of fragments?

22
From SBG to Vine SBG
L(?) ? S
L(?) e
An SBG wall () has one child.

L(?) ? S
A vine SBG wall has a sequence of children.
L(?) e

23
Building a Vine SBG Parser

Grammar generates sequence of trees from
Parser recognizes sequences of trees without
long dependencies
Need to modify training data
so the model is consistent
with the parser.

24

8
would
9
4
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
some
2
third
(from the Penn Treebank)
1
a
25

would
4
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 4
some
2
third
(from the Penn Treebank)
1
a
26

would
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 3
some
2
third
(from the Penn Treebank)
1
a
27

would
1
1
.
,
According
changes
cut
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 2
some
2
third
(from the Penn Treebank)
1
a
28

would
1
1
.
,
According
changes
cut
1
to
by
1
filings
the
rule
1
1
estimates
more
insider
1
1
than
b 1
some
third
(from the Penn Treebank)
1
a
29

would
.
,
According
cut
changes
to
by
filings
the
rule
estimates
more
insider
than
b 0
some
third
(from the Penn Treebank)
a
30
Observation

Even for small b, bunches can grow to arbitrary
size
But arbitrary center embedding is out

31
Vine SBG is Finite-State

Could compile into an FSA and get O(n) parsing!
Problem whats the grammar constant?

EXPONENTIAL

insider has no parent
cut and would can have more children
can have more children

FSA
According to some estimates , the rule changes
would cut insider ...
32
Alternative

Instead, we adapt
an SBG chart parser
which implicitly shares fragments of stack state
to the vine case,
eliminating unnecessary work.

33
Quadratic Recognition/Parsing
goal
...
O(n2b)

...
O(n2b)
O(n3) combinations
only construct trapezoids such that k i b
i
j
i
j
k
k
O(nb2)
O(n3) combinations
i
j
i
j
k
k
34

would
.
,
According
changes
cut
O(nb) vine construction
b 4

According to some , the new changes would cut
insider filings by more than a third .

all width 4
35
Parsing Algorithm

Same grammar constant as Eisner and Satta (1999)
O(n3) ? O(nb2) runtime
Includes some overhead (low-order term) for
constructing the vine
Reality check ... is it worth it?

36
Results Penn Treebank
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
37
Results Chinese Treebank
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
38
Results TIGER Corpus
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
39
Type-Specific Bounds

b can be specific to dependency type
e.g., b(V-O) can be longer than b(S-V)
b specific to parent, child, direction
gradually tighten based on training data

English 50 runtime, no loss
Chinese 55 runtime, no loss
German 44 runtime, 2 loss

41
Related Work

Nederhof (2000) surveys finite-state
approximation of context-free languages.
CFG ? FSA
We limit all dependency lengths (not just
center-embedding), and derive weights from the
Treebank (not by approximation).
Chart parser ? reasonable grammar constant.

42
Future Work
apply to state-of-the-art parsing models

better parameter estimation
applications MT, IE, grammar induction
43
Conclusion (II)

Dependency length can be a helpful feature in
improving the
speed and accuracy
(or trading off between them)
of simple parsing models that
consider dependencies.

44
This Talk in a Nutshell
3
length of a dependency surface distance
1
1
1

Empirical results (English, Chinese, German)
Hard constraints cut runtime in half or more
with no accuracy loss (English, Chinese) or by
44 with -2.2 accuracy (German).
Soft constraints affect accuracy of simple
models by -3 to 24 and cut runtime by 25 to
40.