Syntax

About This Presentation

Title:

Syntax

Description:

Over the last 12 years statistical parsing has succeeded wonderfully! ... Going into it, building a treebank seems a lot slower and less useful than building a grammar ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 77

Provided by: DanJur1

Category:

more less

Transcript and Presenter's Notes

Title: Syntax

1
Syntax

Sudeshna Sarkar
25 Aug 2008

2
Top-Down and Bottom-Up

Top-down
Only searches for trees that can be answers (i.e.
Ss)
But also suggests trees that are not consistent
with any of the words
Bottom-up
Only forms trees consistent with the words
But suggest trees that make no sense globally

3
Problems

Even with the best filtering, backtracking
methods are doomed if they dont address certain
problems
Ambiguity
Shared subproblems

4
Ambiguity
5
Shared Sub-Problems

No matter what kind of search (top-down or
bottom-up or mixed) that we choose.
We dont want to unnecessarily redo work weve
already done.

6
Shared Sub-Problems

Consider
A flight from Indianapolis to Houston on TWA

7
Shared Sub-Problems

Assume a top-down parse making bad initial
choices on the Nominal rule.
In particular
Nominal -gt Nominal Noun
Nominal -gt Nominal PP

8
Shared Sub-Problems
9
Shared Sub-Problems
10
Shared Sub-Problems
11
Shared Sub-Problems
12
Parsing

CKY
Earley
Both are dynamic programming solutions that run
in O(n3) time.
CKY is bottom-up
Earley is top-down

13
Sample Grammar
14
Dynamic Programming

DP methods fill tables with partial results and
Do not do too much avoidable repeated work
Solve exponential problems in polynomial time
(sort of)
Efficiently store ambiguous structures with
shared sub-parts.

15
CKY Parsing

First well limit our grammar to epsilon-free,
binary rules (more later)
Consider the rule A -gt BC
If there is an A in the input then there must be
a B followed by a C in the input.
If the A spans from i to j in the input then
there must be some k st. iltkltj
Ie. The B splits from the C someplace.

16
CKY

So lets build a table so that an A spanning from
i to j in the input is placed in cell i,j in
the table.
So a non-terminal spanning an entire string will
sit in cell 0, n
If we build the table bottom up well know that
the parts of the A must go from i to k and from k
to j

17
CKY

Meaning that for a rule like A -gt B C we should
look for a B in i,k and a C in k,j.
In other words, if we think there might be an A
spanning i,j in the input AND
A -gt B C is a rule in the grammar THEN
There must be a B in i,k and a C in k,j for
some iltkltj

18
CKY

So to fill the table loop over the celli,j
values in some systematic way
What constraint should we put on that?
For each cell loop over the appropriate k values
to search for things to add.

19
CKY Table
20
CKY Algorithm
21
CKY Parsing

Is that really a parser?

22
Note

We arranged the loops to fill the table a column
at a time, from left to right, bottom to top.
This assures us that whenever were filling a
cell, the parts needed to fill it are already in
the table (to the left and below)

23
Example
24
Other Ways to Do It?

Are there any other sensible ways to fill the
table that still guarantee that the cells we need
are already filled?

25
Other Ways to Do It?
26
Sample Grammar
27
Problem

What if your grammar isnt binary?
As in the case of the TreeBank grammar?
Convert it to binary any arbitrary CFG can be
rewritten into Chomsky-Normal Form automatically.
What does this mean?
The resulting grammar accepts (and rejects) the
same set of strings as the original grammar.
But the resulting derivations (trees) are
different.

28
Problem

More specifically, rules have to be of the form
A -gt B C
Or
A -gt w
That is rules can expand to either 2
non-terminals or to a single terminal.

29
Binarization Intuition

Eliminate chains of unit productions.
Introduce new intermediate non-terminals into the
grammar that distribute rules with length gt 2
over several rules. So
S -gt A B C
Turns into
S -gt X C
X - A B
Where X is a symbol that doesnt occur anywhere
else in the the grammar.

30
CNF Conversion
31
CKY Algorithm
32
Example
Filling column 5
33
Example
34
Example
35
Example
36
Example
37
END
38
Statistical parsing

Over the last 12 years statistical parsing has
succeeded wonderfully!
NLP researchers have produced a range of (often
free, open source) statistical parsers, which can
parse any sentence and often get most of it
correct
These parsers are now a commodity component
The parsers are still improving year-on-year.

39
Classical NLP Parsing

Wrote symbolic grammar and lexicon
S ? NP VP NN ? interest
NP ? (DT) NN NNS ? rates
NP ? NN NNS NNS ? raises
NP ? NNP VBP ? interest
VP ? V NP VBZ ? rates
Used proof systems to prove parses from words
This scaled very badly and didnt give coverage
Minimal grammar on Fed raises sentence 36
parses
Simple 10 rule grammar 592 parses
Real-size broad-coverage grammar millions of
parses

40
Classical NLP ParsingThe problem and its
solution

Very constrained grammars attempt to limit
unlikely/weird parses for sentences
But the attempt make the grammars not robust
many sentences have no parse
A less constrained grammar can parse more
sentences
But simple sentences end up with ever more parses
Solution We need mechanisms that allow us to
find the most likely parse(s)
Statistical parsing lets us work with very loose
grammars that admit millions of parses for
sentences but to still quickly find the best
parse(s)

41
The rise of annotated dataThe Penn Treebank

( (S
(NP-SBJ (DT The) (NN move))
(VP (VBD followed)
(NP
(NP (DT a) (NN round))
(PP (IN of)
(NP
(NP (JJ similar) (NNS increases))
(PP (IN by)
(NP (JJ other) (NNS lenders)))
(PP (IN against)
(NP (NNP Arizona) (JJ real) (NN
estate) (NNS loans))))))
(, ,)
(S-ADV
(NP-SBJ (-NONE- ))
(VP (VBG reflecting)
(NP
(NP (DT a) (VBG continuing) (NN
decline))
(PP-LOC (IN in)

42
The rise of annotated data

Going into it, building a treebank seems a lot
slower and less useful than building a grammar
But a treebank gives us many things
Reusability of the labor
Broad coverage
Frequencies and distributional information
A way to evaluate systems

43
Human parsing

Humans often do ambiguity maintenance
Have the police eaten their supper?
come in and look
around.
taken out and shot.
But humans also commit early and are garden
pathed
The man who hunts ducks out on weekends.
The cotton shirts are made from grows in
Mississippi.
The horse raced past the barn fell.

44
Phrase structure grammars context-free grammars

G (T, N, S, R)
T is set of terminals
N is set of nonterminals
For NLP, we usually distinguish out a set P ? N
of preterminals, which always rewrite as
terminals
S is the start symbol (one of the nonterminals)
R is rules/productions of the form X ? ?, where X
is a nonterminal and ? is a sequence of terminals
and nonterminals (possibly an empty sequence)
A grammar G generates a language L.

45
Probabilistic or stochastic context-free grammars
(PCFGs)

G (T, N, S, R, P)
T is set of terminals
N is set of nonterminals
For NLP, we usually distinguish out a set P ? N
of preterminals, which always rewrite as
terminals
S is the start symbol (one of the nonterminals)
R is rules/productions of the form X ? ?, where X
is a nonterminal and ? is a sequence of terminals
and nonterminals (possibly an empty sequence)
P(R) gives the probability of each rule.
A grammar G generates a language model L.

46
Soundness and completeness

A parser is sound if every parse it returns is
valid/correct
A parser terminates if it is guaranteed to not go
off into an infinite loop
A parser is complete if for any given grammar and
sentence, it is sound, produces every valid parse
for that sentence, and terminates
(For many purposes, we settle for sound but
incomplete parsers e.g., probabilistic parsers
that return a k-best list.)

47
Top-down parsing

Top-down parsing is goal directed
A top-down parser starts with a list of
constituents to be built. The top-down parser
rewrites the goals in the goal list by matching
one against the LHS of the grammar rules, and
expanding it with the RHS, attempting to match
the sentence to be derived.
If a goal can be rewritten in several ways, then
there is a choice of which rule to apply (search
problem)
Can use depth-first or breadth-first search, and
goal ordering.

48
Top-down parsing
49
Bottom-up parsing

Bottom-up parsing is data directed
The initial goal list of a bottom-up parser is
the string to be parsed. If a sequence in the
goal list matches the RHS of a rule, then this
sequence may be replaced by the LHS of the rule.
Parsing is finished when the goal list contains
just the start category.
If the RHS of several rules match the goal list,
then there is a choice of which rule to apply
(search problem)
Can use depth-first or breadth-first search, and
goal ordering.
The standard presentation is as shift-reduce
parsing.

50
Shift-reduce parsing one path

cats scratch people with claws
cats scratch people with claws SHIFT
N scratch people with claws REDUCE
NP scratch people with claws REDUCE
NP scratch people with claws SHIFT
NP V people with claws REDUCE
NP V people with claws SHIFT
NP V N with claws REDUCE
NP V NP with claws REDUCE
NP V NP with claws SHIFT
NP V NP P claws REDUCE
NP V NP P claws SHIFT
NP V NP P N REDUCE
NP V NP P NP REDUCE
NP V NP PP REDUCE
NP VP REDUCE
S REDUCE
What other search paths are there for parsing
this sentence?

51
Problems with top-down parsing

Left recursive rules
A top-down parser will do badly if there are many
different rules for the same LHS. Consider if
there are 600 rules for S, 599 of which start
with NP, but one of which starts with V, and the
sentence starts with V.
Useless work expands things that are possible
top-down but not there
Top-down parsers do well if there is useful
grammar-driven control search is directed by the
grammar
Top-down is hopeless for rewriting parts of
speech (preterminals) with words (terminals). In
practice that is always done bottom-up as lexical
lookup.
Repeated work anywhere there is common
substructure

52
Problems with bottom-up parsing

Unable to deal with empty categories termination
problem, unless rewriting empties as constituents
is somehow restricted (but then it's generally
incomplete)
Useless work locally possible, but globally
impossible.
Inefficient when there is great lexical ambiguity
(grammar-driven control might help here)
Conversely, it is data-directed it attempts to
parse the words that are there.
Repeated work anywhere there is common
substructure

53
Repeated work
54
Principles for success take 1

If you are going to do parsing-as-search with a
grammar as is
Left recursive structures must be found, not
predicted
Empty categories must be predicted, not found
Doing these things doesn't fix the repeated work
problem
Both TD (LL) and BU (LR) parsers can (and
frequently do) do work exponential in the
sentence length on NLP problems.

55
Principles for success take 2

Grammar transformations can fix both
left-recursion and epsilon productions
Then you parse the same language but with
different trees
Linguists tend to hate you
But this is a misconception they shouldn't
You can fix the trees post hoc
The transform-parse-detransform paradigm

56
Principles for success take 3

Rather than doing parsing-as-search, we do
parsing as dynamic programming
This is the most standard way to do things
Q.v. CKY parsing, next time
It solves the problem of doing repeated work
But there are also other ways of solving the
problem of doing repeated work
Memoization (remembering solved subproblems)
Also, next time
Doing graph-search rather than tree-search.

57
Probabilistic or stochastic context-free grammars
(PCFGs)

G (T, N, S, R, P)
T is set of terminals
N is set of nonterminals
For NLP, we usually distinguish out a set P ? N
of preterminals, which always rewrite as
terminals
S is the start symbol (one of the nonterminals)
R is rules/productions of the form X ? ?, where X
is a nonterminal and ? is a sequence of terminals
and nonterminals (possibly an empty sequence)
P(R) gives the probability of each rule.
A grammar G generates a language model L.

58
PCFGs Notation

w1n w1 wn the word sequence from 1 to n
(sentence of length n)
wab the subsequence wa wb
Njab the nonterminal Nj dominating wa wb
Nj
wa wb
Well write P(Ni ? ?j) to mean P(Ni ? ?j Ni
)
Well want to calculate maxt P(t ? wab)

59
The probability of trees and strings

P(t) -- The probability of tree is the product of
the probabilities of the rules used to generate
it.
P(w1n) -- The probability of the string is the
sum of the probabilities of the trees which have
that string as their yield
P(w1n) Sj P(w1n, t) where t is a parse of
w1n
Sj P(t)

60
A Simple PCFG (in CNF)
S ? NP VP 1.0 VP ? V NP 0.7 VP ? VP PP 0.3 PP ? P NP 1.0 P ? with 1.0 V ? saw 1.0 NP ? NP PP 0.4 NP ? astronomers 0.1 NP ? ears 0.18 NP ? saw 0.04 NP ? stars 0.18 NP ? telescope 0.1
61
(No Transcript)
62
(No Transcript)
63
Tree and String Probabilities

w15 astronomers saw stars with ears
P(t1) 1.0 0.1 0.7 1.0 0.4 0.18
1.0 1.0 0.18
0.0009072
P(t2) 1.0 0.1 0.3 0.7 1.0 0.18
1.0 1.0 0.18
0.0006804
P(w15) P(t1) P(t2)
0.0009072 0.0006804
0.0015876

64
Chomsky Normal Form

All rules are of the form X ? Y Z or X ? w.
This makes parsing easier/more efficient

65
Treebank binarization
N-ary Trees in Treebank
TreeAnnotations.annotateTree
Binary Trees
Lexicon and Grammar
TODO CKY parsing
Parsing
66
An example before binarization
ROOT
S
VP
NP
NP
V
PP
N
P
NP
N
N
with
people
cats
claws
scratch
67
After binarization..
ROOT
S
_at_S-gt_NP
VP
NP
_at_VP-gt_V
_at_VP-gt_V_NP
NP
V
PP
N
P
_at_PP-gt_P
N
NP
N
people
claws
with
cats
scratch
68
The CKY algorithm (1960/1965)
function CKY(words, grammar) returns most
probable parse/prob score new
double(words)1(words)(nonterms) back
new Pair(words)1(words)1nonterms
for i0 ilt(words) i for A in nonterms
if A -gt wordsi in grammar
scoreii1A P(A -gt wordsi) //handle
unaries boolean added true while added
added false for A, B in nonterms
if scoreii1B gt 0 A-gtB in grammar
prob P(A-gtB)scoreii1B
if(prob gt scoreii1A)
scoreii1A prob backii1
A B added true
69
The CKY algorithm (1960/1965)
for span 2 to (words) for begin 0 to
(words)- span end begin span for
split begin1 to end-1 for A,B,C in
nonterms probscorebeginsplitBs
coresplitendCP(A-gtBC) if(prob gt
scorebeginendA)
scorebeginendA prob
backbeginendA new Triple(split,B,C)
//handle unaries boolean added true
while added added false for A,
B in nonterms prob P(A-gtB)scorebegin
endB if(prob gt scorebeginend
A) scorebeginend A prob
backbeginend A B
added true return buildTree(score, back)
70
cats
scratch
walls
with
claws
1
2
3
4
5
0
score01
score02
score03
score04
score05
1
score12
score13
score14
score15
2
score23
score24
score25
3
score34
score35
4
score45
5
71
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats
1
N?scratch P?scratch V?scratch
2
N?walls P?walls V?walls
3
N?with P?with V?with
for i0 ilt(words) i for A in nonterms
if A -gt wordsi in grammar
scoreii1A P(A -gt wordsi)
4
N?claws P?claws V?claws
5
72
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
// handle unaries
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
5
73
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
probscorebeginsplitBscoresplitendCP
(A-gtBC) probscore01Pscore12_at_PP-gt_PP
(PP?P _at_PP-gt_P) For each A, only keep the A-gtBC
with highest prob.
5
74
1
2
3
4
5
cats
scratch
walls
with
claws
0
N?cats P?cats V?cats NP?N _at_VP-gtV?NP _at_PP-gtP?NP
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
1
N?scratch P?scratch V?scratch NP?N _at_VP-gtV?NP _at_PP-gt
P?NP
N?scratch 0.0967 P?scratch 0.0773 V?scratch 0.9285
NP?N 0.0859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
2
N?walls P?walls V?walls NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?walls 0.2829 P?walls 0.0870 V?walls 0.1160 NP?N
0.2514 _at_VP-gtV?NP 0.1676 _at_PP-gtP?NP 0.2514
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
3
N?with P?with V?with NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?with 0.0967 P?with 1.3154 V?with 0.1031 NP?N 0.0
859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P VP?V _at_VP-gt_V _at_S-gt_NP?VP _at_NP-gt_NP?PP _at_
VP-gt_V_NP?PP
// handle unaries
4
N?claws P?claws V?claws NP?N _at_VP-gtV?NP _at_PP-gtP?NP
N?claws 0.4062 P?claws 0.0773 V?claws 0.1031 NP?N
0.3611 _at_VP-gtV?NP 0.2407 _at_PP-gtP?NP 0.3611
5
75

76
cats
scratch
walls
with
claws
1
2
3
4
5
0
N?cats 0.5259 P?cats 0.0725 V?cats 0.0967 NP?N 0.4
675 _at_VP-gtV?NP 0.3116 _at_PP-gtP?NP 0.4675
PP?P _at_PP-gt_P 0.0062 VP?V _at_VP-gt_V
0.0055 _at_S-gt_NP?VP 0.0055 _at_NP-gt_NP?PP
0.0062 _at_VP-gt_V_NP?PP 0.0062
_at_VP-gt_V?NP _at_VP-gt_V_NP
0.0030 NP?NP _at_NP-gt_NP
0.0010 S?NP _at_S-gt_NP
0.0727 ROOT?S
0.0727 _at_PP-gt_P?NP 0.0010
PP?P _at_PP-gt_P 5.187E-6 VP?V _at_VP-gt_V
2.074E-5 _at_S-gt_NP?VP 2.074E-5 _at_NP-gt_NP?PP
5.187E-6 _at_VP-gt_V_NP?PP
5.187E-6
_at_VP-gt_V?NP _at_VP-gt_V_NP
1.600E-4 NP?NP _at_NP-gt_NP 5.335E-5 S?NP
_at_S-gt_NP 0.0172 ROOT?S
0.0172 _at_PP-gt_P?NP 5.335E-5
1
N?scratch 0.0967 P?scratch 0.0773 V?scratch 0.9285
NP?N 0.0859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P 0.0194 VP?V _at_VP-gt_V 0.1556
_at_S-gt_NP?VP 0.1556 _at_NP-gt_NP?PP
0.0194 _at_VP-gt_V_NP?PP 0.0194
_at_VP-gt_V?NP _at_VP-gt_V_NP
2.145E-4 NP?NP _at_NP-gt_NP 7.150E-5 S?NP
_at_S-gt_NP 5.720E-4 ROOT?S
5.720E-4 _at_PP-gt_P?NP 7.150E-5
PP?P _at_PP-gt_P 0.0010 VP?V _at_VP-gt_V
0.0369 _at_S-gt_NP?VP 0.0369 _at_NP-gt_NP?PP
0.0010 _at_VP-gt_V_NP?PP 0.0010
2
N?walls 0.2829 P?walls 0.0870 V?walls 0.1160 NP?N
0.2514 _at_VP-gtV?NP 0.1676 _at_PP-gtP?NP 0.2514
PP?P _at_PP-gt_P 0.0074 VP?V _at_VP-gt_V
0.0066 _at_S-gt_NP?VP 0.0066 _at_NP-gt_NP?PP
0.0074 _at_VP-gt_V_NP?PP 0.0074
_at_VP-gt_V?NP _at_VP-gt_V_NP
0.0398 NP?NP _at_NP-gt_NP 0.0132 S?NP
_at_S-gt_NP 0.0062 ROOT?S
0.0062 _at_PP-gt_P?NP 0.0132
3
N?with 0.0967 P?with 1.3154 V?with 0.1031 NP?N 0.0
859 _at_VP-gtV?NP 0.0573 _at_PP-gtP?NP 0.0859
PP?P _at_PP-gt_P 0.4750 VP?V _at_VP-gt_V 0.0248
_at_S-gt_NP?VP 0.0248 _at_NP-gt_NP?PP
0.4750 _at_VP-gt_V_NP?PP 0.4750
4
N?claws 0.4062 P?claws 0.0773 V?claws 0.1031 NP?N
0.3611 _at_VP-gtV?NP 0.2407 _at_PP-gtP?NP 0.3611
Call buildTree(score, back) to get the best parse
5

Write a Comment

User Comments (0)