Parsing with Soft and Hard Constraints on Dependency Length - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Parsing with Soft and Hard Constraints on Dependency Length

Description:

Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 45
Provided by: NoahA2
Category:

less

Transcript and Presenter's Notes

Title: Parsing with Soft and Hard Constraints on Dependency Length


1
Parsing with Soft and Hard Constraints on
Dependency Length
  • Jason Eisner and Noah A. Smith
  • Department of Computer Science /
  • Center for Language and Speech Processing
  • Johns Hopkins University
  • jason,nasmith_at_cs.jhu.edu

2
Premise
here at IWPT 2005 Burstein Sagae
Lavie Tsuruoka Tsujii Dzikovska and Rosé ...
  • Many parsing consumers
  • (IE, ASR, MT)
  • will benefit more from
  • fast, precise partial parsing
  • than from full, deep parses that are slow to
    build.

3
Outline of the Talk
  • The Short Dependency Preference
  • Review of split bilexical grammars (SBGs)
  • O(n3) algorithm
  • Modeling dependency length
  • Experiments
  • Constraining dependency length in a parser
  • O(n) algorithm, same grammar constant as SBG
  • Experiments

Soft constraints
Hard constraints
4
Short-Dependency Preference
  • A words dependents (adjuncts, arguments)
  • tend to fall near it
  • in the string.

5
length of a dependency surface distance
3
1
1
1
6
50 of English dependencies have length 1,
another 20 have length 2, 10 have length 3 ...
fraction of all dependencies
length
7
Related Ideas
  • Score parses based on whats between a head and
    child
  • (Collins, 1997 Zeman, 2004 McDonald et al.,
    2005)
  • Assume short ? faster human processing
  • (Church, 1980 Gibson, 1998)
  • Attach low heuristic for PPs (English)
  • (Frazier, 1979 Hobbs and Bear, 1990)
  • Obligatory and optional re-orderings (English)
  • (see paper)

8
Split Bilexical Grammars (Eisner, 1996 2000)
  • Bilexical capture relationships between two
    words using rules of the form
  • Xp ? Yp Zc
  • Xp ? Zc Yp
  • Xw ? w
  • grammar size N3S2
  • Split left children conditionally independent of
    right children, given parent
  • (equivalent to split HAGs Eisner and Satta, 1999)

head
9
Generating with SBGs

?w0
?w0
  1. Start with left wall
  2. Generate root w0
  3. Generate left children w-1, w-2, ..., w-l from
    the FSA ?w0
  4. Generate right children w1, w2, ..., wr from the
    FSA ?w0
  5. Recurse on each wi for i in -l, ..., -1, 1,
    ..., r, sampling ai (steps 2-4)
  6. Return al...a-1w0a1...ar

w0
w-1
w1
w-2
w2
...
...
?w-l
w-l
wr
w-l.-1
10
Naïve Recognition/Parsing
p
goal
O(n5) combinations
O(n5N3) if N nonterminals
r
p
c
i
j
0
k
n
goal
takes
takes
It
to
takes
tango
It
takes
two
to
It
takes
two
to
tango
11
Cubic Recognition/Parsing (Eisner Satta, 1999)
A triangle is a head with some left (or right)
subtrees.
goal
One trapezoid per dependency.
It
takes
two
to
tango
12
Cubic Recognition/Parsing (Eisner Satta, 1999)
goal
O(n) combinations
0
i
n
O(n3) combinations
i
j
i
j
k
k
O(n3) combinations
i
j
i
j
k
k
O(n3g2N) if N nonterminals, polysemy g
13
Implementation
  • Augment items with (Viterbi) weights order by
    weight.
  • Agenda-based, best-first algorithm.
  • We use Dyna see the HLT-EMNLP paper to
    implement all parsers here.
  • Count the number of items built ? a measure of
    runtime.

14
Very Simple Model for ?w and ?w
We parse POS tag sequences, not words.
p(child first, parent, direction) p(stop
first, parent, direction) p(child not first,
parent, direction) p(stop not first, parent,
direction)
?takes
?takes
It
takes
two
to
15
Baseline
test set recall () test set recall () test set recall () test set runtime (items/word) test set runtime (items/word) test set runtime (items/word)

73 61 77 90 149 49
16
Improvements
smoothing/max ent
parse words, not tags
bigger FSAs/ more nonterminals
73
LTAG, CCG, etc.
model dependency length?
special NP-treatment, punctuation
train discriminatively
17
Modeling Dependency Length
When running parsing algorithm, just multiply in
these probabilities at the appropriate time.
p
DEFICIENT
p(3 r, a, L)
p(2 r, b, L)
p(1 b, c, R)
p
p(1 r, d, R)
p(1 d, e, R)
p(1 e, f, R)
18
Modeling Dependency Length
test set recall () test set recall () test set recall () test set runtime (items/word) test set runtime (items/word) test set runtime (items/word)

73 61 77 90 149 49
76 62 75 67 103 31
4.1 1.6 -2.6 -26 -31 -37
length
19
Conclusion (I)
  • Modeling dependency length can
  • cut runtime of simple models by 26-37
  • with effects ranging from
  • -3 to 4 on recall.
  • (Loss on recall perhaps due to deficient/MLE
    estimation.)

20
Going to Extremes
Longer dependencies are less likely.
What if we eliminate them completely?
21
Hard Constraints
  • Disallow dependencies between words of distance gt
    b ...
  • Risk best parse contrived, or no parse at all!
  • Solution allow fragments (partial parsing
    Hindle, 1990 inter alia).
  • Why not model the sequence of fragments?

22
From SBG to Vine SBG
L(?) ? S
L(?) e
An SBG wall () has one child.

L(?) ? S
A vine SBG wall has a sequence of children.
L(?) e

23
Building a Vine SBG Parser
  • Grammar generates sequence of trees from
  • Parser recognizes sequences of trees without
    long dependencies
  • Need to modify training data
  • so the model is consistent
  • with the parser.

24

8
would
9
4
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
some
2
third
(from the Penn Treebank)
1
a
25

would
4
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 4
some
2
third
(from the Penn Treebank)
1
a
26

would
1
1
.
,
According
changes
cut
3
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 3
some
2
third
(from the Penn Treebank)
1
a
27

would
1
1
.
,
According
changes
cut
1
to
2
2
by
1
filings
2
the
rule
1
1
estimates
more
insider
1
1
than
b 2
some
2
third
(from the Penn Treebank)
1
a
28

would
1
1
.
,
According
changes
cut
1
to
by
1
filings
the
rule
1
1
estimates
more
insider
1
1
than
b 1
some
third
(from the Penn Treebank)
1
a
29

would
.
,
According
cut
changes
to
by
filings
the
rule
estimates
more
insider
than
b 0
some
third
(from the Penn Treebank)
a
30
Observation
  • Even for small b, bunches can grow to arbitrary
    size
  • But arbitrary center embedding is out

31
Vine SBG is Finite-State
  • Could compile into an FSA and get O(n) parsing!
  • Problem whats the grammar constant?

EXPONENTIAL
  • insider has no parent
  • cut and would can have more children
  • can have more children

FSA
According to some estimates , the rule changes
would cut insider ...
32
Alternative
  • Instead, we adapt
  • an SBG chart parser
  • which implicitly shares fragments of stack state
  • to the vine case,
  • eliminating unnecessary work.

33
Quadratic Recognition/Parsing
goal
...
O(n2b)






...
O(n2b)
O(n3) combinations
only construct trapezoids such that k i b
i
j
i
j
k
k
O(nb2)
O(n3) combinations
i
j
i
j
k
k
34

would
.
,
According
changes
cut
O(nb) vine construction
b 4
  • According to some , the new changes would cut
    insider filings by more than a third .

all width 4
35
Parsing Algorithm
  • Same grammar constant as Eisner and Satta (1999)
  • O(n3) ? O(nb2) runtime
  • Includes some overhead (low-order term) for
    constructing the vine
  • Reality check ... is it worth it?

36
Results Penn Treebank
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
37
Results Chinese Treebank
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
38
Results TIGER Corpus
evaluation against original ungrafted Treebank
non-punctuation only
b 20
b 1
39
Type-Specific Bounds
  • b can be specific to dependency type
  • e.g., b(V-O) can be longer than b(S-V)
  • b specific to parent, child, direction
  • gradually tighten based on training data

40
  • English 50 runtime, no loss
  • Chinese 55 runtime, no loss
  • German 44 runtime, 2 loss

41
Related Work
  • Nederhof (2000) surveys finite-state
    approximation of context-free languages.
  • CFG ? FSA
  • We limit all dependency lengths (not just
    center-embedding), and derive weights from the
    Treebank (not by approximation).
  • Chart parser ? reasonable grammar constant.

42
Future Work
apply to state-of-the-art parsing models

better parameter estimation
applications MT, IE, grammar induction
43
Conclusion (II)
  • Dependency length can be a helpful feature in
    improving the
  • speed and accuracy
  • (or trading off between them)
  • of simple parsing models that
  • consider dependencies.

44
This Talk in a Nutshell
3
length of a dependency surface distance
1
1
1
  • Empirical results (English, Chinese, German)
  • Hard constraints cut runtime in half or more
    with no accuracy loss (English, Chinese) or by
    44 with -2.2 accuracy (German).
  • Soft constraints affect accuracy of simple
    models by -3 to 24 and cut runtime by 25 to
    40.
  • Formal results
  • A hard bound b on dependency length
  • results in a regular language.
  • allows O(nb2) parsing.
Write a Comment
User Comments (0)
About PowerShow.com