PARSING

About This Presentation

Title:

PARSING

Description:

Title: Slide 1 Author: srini Last modified by: srini Created Date: 10/29/2005 4:47:42 PM Document presentation format: On-screen Show Company: AT&T Labs Research – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 52

Provided by: Sri675

Learn more at: https://www.cs.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: PARSING

1
PARSING
2
Analyzing Linguistic Units
Task Formal Mechanism Formal Mechanism Formal Mechanism Resulting Representation
Morphology Analyze words into morphemes Context dependency rules FST composition Morphological structure
Phonology Analyze words into phonemes Context dependency rules FST composition Phonemic structure
Syntax Analyze sentences for syntactic relations between words Grammars CFGs PDA Top-down, Bottom-up, Earley, CKY parsing Parse tree, derivation tree

Why should we parse a sentence?
to detect relations among words
used to normalize surface syntactic variations.
invaluable for a number of NLP applications

3
Some Concepts

Grammar A generative device that prescribes a
set of valid strings.
Parser A device that uncovers the sequence of
grammar rules that might have generated the input
sentence.
Input Grammar, Sentence
Output parse tree, derivation tree
Recognizer A device that returns a yes if the
input string could be generated by the grammar.
Input Grammar, Sentence
Output boolean

4
Searching for a Parse

Grammar rewrite procedure encodes
all strings generated by the grammar L(G)
all parse trees for each string (s) generated
T(G) UTs(G)
Given an input sentence (I), the set of parse
trees is TI (G).
Parsing is searching for TI (G) ? T(G)
Ideally, parser finds the appropriate parse for
the sentence.

5
CFG for Fragment of English
S
S ? NP VP VP ? V
S ? Aux NP VP PP -gt Prep NP
S ? VP N ? book flight meal money
NP ? Det Nom V ? book include prefer
NP ?PropN Aux ? does
Nom ? N Nom Prep ?from to on
Nom ? N PropN ? Houston TWA
Nom ? Nom PP Det ? that this a
VP ? V NP
VP
NP
V
Nom
Book
Det
that
N
flight
Bottom-up Parsing
Top-down Parsing
6
Top-down/Bottom-up Parsing
Top-down (recursive decent parser) Bottom-up (shift-reduce parser)
Starts from S (goal) Words (input)
Algorithm (Parallel) a. Pick non-terminals b. Pick rules from the grammar to expand the non-terminals a. Match sequence of input symbols with the RHS of some rule b. Replace the sequence by the LHS of the matching rule
Termination Success When the leaves of a tree match the input Failure No more non-terminals to expand in any of the trees Success When S is reached Failure No more rewrites possible
Pros/Cons Pro Goal-driven, starts with S Con Constructs trees that may not match input Pro Constrained by the input string Con Constructs constituents that may not lead to the goal S

Control strategy -- how to explore search space?
Pursuing all parses in parallel or backtrack or
?
Which rule to apply next?
Which node to expand next?
Look at how the Top-down and Bottom-up parsing
works on the board for Book that flight

7
Top-down, Depth First, Left-to-Right parser

Systematic, incremental expansion of the search
space.
In contrast to a parallel parser
Start State (S,0)
End State (,n) n is the length of input to be
parsed
Next State Rules
(wj1b,j) ? (b,j1)
(Bb,j) ? (gb,j) if B?g (note B is left-most
non-terminal)
Agenda A data structure to keep track of the
states to be expanded.
Depth-first expansion, if Agenda is a stack.

8
Fig 10.7
CFG
9
Left Corners

Can we help top-down parsers with some bottom-up
information?
Unnecessary states created if there are many B?g
rules.
If after successive expansions B ? w d and w
does not match the input, then the series of
expansion is wasted.
The leftmost symbol derivable from B needs to
match the input.
look ahead to left-corner of the tree
B is a left-corner of A if A ? B g
Build table with left-corners of all
non-terminals in grammar and consult before
applying rule
At a given point in state expansion (Bb,j)
Pick the rule B ?C g if left-corner of C matches
the input wj1

10
Limitation of Top-down Parsing Left Recursion

Depth-first search will never terminate if
grammar is left recursive (e.g. NP --gt NP PP)
Solutions
Rewrite the grammar to a weakly equivalent one
which is not left-recursive
NP ? NP PP
NP ? Nom PP
NP ? Nom
This may make rules unnatural
Fix depth of search explicitly
Other book-keeping needed in top-down parsing
Memoization for reusing previously parsed
substrings
Packed representation for parse ambiguity

NP ? Nom NP NP ? PP NP NP ? e
11
Dynamic Programming for Parsing

Memoization
Create table of solutions to sub-problems (e.g.
subtrees) as parse proceeds
Look up subtrees for each constituent rather than
re-parsing
Since all parses implicitly stored, all available
for later disambiguation
Examples Cocke-Younger-Kasami (CYK) (1960),
Graham-Harrison-Ruzzo (GHR) (1980) and Earley
(1970) algorithms
Earley parser O(n3) parser
Top-down parser with bottom-up information
State i, A ? a b, j
j is the position in the string that has been
parsed
i is the position in the string where A begins
Top-down prediction S ? w1 wi A g
Bottom-up completion a wj1 wn ? wi wn

12
Earley Parser

Data Structure An n1 cell array called Chart
For each word position, chart contains set of
states representing all partial parse trees
generated to date.
E.g. chart0 contains all partial parse trees
generated at the beginning of the sentence
Chart entries represent three type of
constituents
predicted constituents (top-down predictions)
in-progress constituents (were in the midst of
)
completed constituents (weve found )
Progress in parse represented by Dotted Rules
Position of indicates type of constituent
0 Book 1 that 2 flight 3
(0,S ? VP, 0) (predicting VP)
(1,NP ? Det Nom, 2) (finding NP)
(0,VP ? V NP , 3) (found VP)

13
Earley Parser Parse Success

Final answer is found by looking at last entry in
chart
If entry resembles (0,S ? ? , n) then input
parsed successfully
But note that chart will also contain a record
of all possible parses of input string, given the
grammar -- not just the successful one(s)
Why is this useful?

14
Earley Parsing Steps

Start State (0, S ?S, 0)
End State (0, S?a, n) n is the input size
Next State Rules
Scanner read input
(i, A?awj1b, j) ? (i, A?awj1b, j1)
Predictor add top-down predictions
(i, A?aBb, j) ? (j, B?g, j) if B?g (note B is
left-most non-terminal)
Completer move dot to right when new constituent
found
(i, B?aAb, k) (k, A?g, j) ? (i, B?aAb, j)
No backtracking and no states removed keep
complete history of parse
Why is this useful?

15
Earley Parser Steps
Scanner Predictor Completer
When does it apply Applied when terminals are to the right of a dot (0, VP ? V NP, 0) Applied when non-terminals are to the right of a dot (0, S ? VP ,0) Applied when dot reaches the end of a rule (1, NP ? Det Nom , 3)
What chart cell is affected New states are added to the next cell New states are added to current cell New states are added to current cell
What contents in the chart cell Move the dot over the terminal (0, VP ? V NP, 1) One new state for each expansion of the non-terminal in the grammar (0, VP ? V, 0) (0, VP ? V NP, 0) One state for each rule waiting for the constituent such as (0, VP ? V NP, 1) (0, VP ? V NP , 3)
16
Book that flight (Chart 0)

Seed chart with top-down predictions for S from
grammar

17
CFG for Fragment of English
Det ? that this a
N ? book flight meal money
V ? book include prefer
Aux ? does
Nom ? N
Nom ? N Nom
NP ?PropN
VP ? V
Nom ? Nom PP
VP ? V NP
PP ? Prep NP
18
Chart1
V? book ? passed to Completer, which finds 2
states in Chart0 whose left corner is V and
adds them to Chart1, moving dots to right
19
(No Transcript)
20
Retrieving the parses

Augment the Completer to add pointer to prior
states it advances as a field in the current
state
i.e. what states combined to arrive here?
Read the pointers back from the final state
What if the final cell does not have the final
state? Error handling.
Is it a total loss? No...
Chart contains every constituent and combination
of constituents possible for the input given the
grammar
Useful for partial parsing or shallow parsing
used in information extraction

21
Alternative Control Strategies

Change Earley top-down strategy to bottom-up or
...
Change to best-first strategy based on the
probabilities of constituents
Compute and store probabilities of constituents
in the chart as you parse
Then instead of expanding states in fixed order,
allow probabilities to control order of expansion

22
Probabilistic and Lexicalized Parsing
23
Probabilistic CFGs

Weighted CFGs
Attach weights to rules of CFG
Compute weights of derivations
Use weights to pick, preferred parses
Utility Pruning and ordering the search space,
disambiguate, Language Model for ASR.
Parsing with weighted grammars (like Weighted FA)
T arg maxT W(T,S)
Probabilistic CFGs are one form of weighted CFGs.

24
Probability Model

Rule Probability
Attach probabilities to grammar rules
Expansions for a given non-terminal sum to 1
R1 VP ? V .55
R2 VP ? V NP .40
R3 VP ? V NP NP .05
Estimate the probabilities from annotated corpora
P(R1)counts(R1)/counts(VP)
Derivation Probability
Derivation T R1Rn
Probability of a derivation
Most likely probable parse
Probability of a sentence
Sum over all possible derivations for the
sentence
Note the independence assumption Parse
probability does not change based on where the
rule is expanded.

25
Structural ambiguity

S ? NP VP
VP ? V NP
NP ? NP PP
VP ? VP PP
PP ? P NP

NP ? John Mary Denver
V -gt called
P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
26
Cocke-Younger-Kasami Parser

Bottom-up parser with top-down filtering
Start State(s) (A, i, i1) for each A?wi1
End State (S, 0,n) n is the input size
Next State Rules
(B, i, k) (C, k, j) ? (A, i, j) if A?BC

27
Example

John called Mary from Denver
28
Base Case A?w
NP
P Denver
NP from
V Mary
NP called
John
29
Recursive Cases A?BC
NP
P Denver
NP from
X V Mary
NP called
John
30
NP
P Denver
VP NP from
X V Mary
NP called
John
31
NP
X P Denver
VP NP from
X V Mary
NP called
John
32
PP NP
X P Denver
VP NP from
X V Mary
NP called
John
33
PP NP
X P Denver
S VP NP from
V Mary
NP called
John
34
PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
35
NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
36
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
37
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
38
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
39
VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
40
S VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
41
S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
42
Probabilistic CKY

Assign probabilities to constituents as they are
completed and placed in the table
Computing the probability
Since we are interested in the max P(S,0,n)
Use the max probability for each constituent
Maintain back-pointers to recover the parse.

43
Problems with PCFGs

The probability model were using is just based
on the rules in the derivation.
Lexical insensitivity
Doesnt use the words in any real way
Structural disambiguation is lexically driven
PP attachment often depends on the verb, its
object, and the preposition
I ate pickles with a fork.
I ate pickles with relish.
Context insensitivity of the derivation
Doesnt take into account where in the derivation
a rule is used
Pronouns more often subjects than objects
She hates Mary.
Mary hates her.
Solution Lexicalization
Add lexical information to each rule

44
An example of lexical information Heads

Make use of notion of the head of a phrase
Head of an NP is a noun
Head of a VP is the main verb
Head of a PP is its preposition
Each LHS of a rule in the PCFG has a lexical item
Each RHS non-terminal has a lexical item.
One of the lexical items is shared with the LHS.
If R is the number of binary branching rules in
CFG, in lexicalized CFG O(2?R)
Unary rules O(?R)

45
Example (correct parse)
Attribute grammar
46
Example (less preferred)
47
Computing Lexicalized Rule Probabilities

We started with rule probabilities
VP ? V NP PP P(ruleVP)
E.g., count of this rule divided by the number of
VPs in a treebank
Now we want lexicalized probabilities
VP(dumped) ? V(dumped) NP(sacks)PP(in)
P(ruleVP dumped is the verb sacks is the
head of the NP in is the head of the PP)
Not likely to have significant counts in any
treebank

48
Another Example

Consider the VPs
Ate spaghetti with gusto
Ate spaghetti with marinara
Dependency is not between mother-child.

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
49
Log-linear models for Parsing

Why restrict to the conditioning to the elements
of a rule?
Use even larger context
Word sequence, word types, sub-tree context etc.
In general, compute P(yx) where fi(x,y) test
the properties of the context li is the weight
of that feature.
Use these as scores in the CKY algorithm to find
the best scoring parse.

50
Supertagging Almost parsing
Poachers now control the
underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers

e
Adj

underground
51
Summary