Title: Probabilistic Parsing
1Probabilistic Parsing
- Ling 571
- Fei Xia
- Week 5 10/25-10/27/05
2Outline
- Lexicalized CFG (Recap)
- Hw5 and Project 2
- Parsing evaluation measures ParseVal
- Collins parser
- TAG
- Parsing summary
3Lexicalized CFG recap
4Important equations
5Lexicalized CFG
- Lexicalized rules
- Sparse data problem
- First generate the head
- Then generate the unlexicalized rule
6Lexicalized models
7An example
8An example
9Head-head probability
10Head-rule probability
11Estimate parameters
12Building a statistical tool
- Design a model
- Objective function generative model vs.
discriminative model - Decomposition independence assumption
- The types of parameters and parameter size
- Training estimate model parameters
- Supervised vs. unsupervised
- Smoothing methods
- Decoding
13Team Project 1 (Hw5)
- Form a team program language, schedule,
expertise, etc. - Understand the lexicalized model
- Design the training algorithm
- Work out the decoding (parsing) algorithm
augment CYK algorithm. - Illustrate the algorithms with a real example.
14Team Project 2
- Task parse real data with a real grammar
extracted from a treebank. - Parser PCFG or lexicalized PCFG
- Training data English Penn Treebank Section
02-21 - Development data section 00
15Team Project 2 (cont)
- Hw6 extract PCFG from the treebank
- Hw7 make sure your parser works given real
grammar and real sentences measure parsing
performance - Hw8 improve parsing results
- Hw10 write a report and give a presentation
16Parsing evaluation measures
17Evaluation of parsers ParseVal
- Labeled recall
- Labeled precision
- Labeled F-measure
- Complete match of sents where recall and
precision are 100 - Average crossing of crossing per sent
- No crossing of sents which have no crossing.
18An example
- Gold standard
- (VP (V saw)
- (NP (Det the) (N man))
- (PP (P with) (NP (Det a) (N
telescope)))) - Parser output
- (VP (V saw)
- (NP (NP (Det the) (N man))
- (PP (P with) (NP (Det a) (N
telescope)))))
19ParseVal measures
- Gold standard
- (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5,
6) - System output
- (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,
6), (NP, 5, 6) - Recall4/4, Prec4/5, crossing0
20A different annotation
- Gold standard
- (VP (V saw)
- (NP (Det the)
- (N (N man))
- (PP (P with)
- (NP (Det a) (N (N
telescope))))) - Parser output
- (VP (V saw)
- (NP (Det the)
- (N (N man)
- (PP (P with)
- (NP (Det a) (N (N
telescope)))))))
21ParseVal measures (cont)
- Gold standard
- (VP, 1, 6), (NP, 2, 3), (N, 3, 3), (PP, 4,
6), - (NP, 5, 6), (N, 6,6)
- System output
- (VP, 1, 6), (NP, 2, 6), (N, 3, 6), (PP, 4,
6), - (NP, 5, 6), (N, 6, 6)
- Recall4/6, Prec4/6, crossing1
22EVALB
- A tool that calculates ParseVal measures
- To run it
- evalb p parameter_file gold_file system_output
- A copy is available in my dropbox
- You will need it for Team Project 2
23Summary of Parsing evaluation measures
- ParseVal is the widely used F-measure is the
most important - The results depend on annotation style
- EVALB is a tool that calculates ParseVal measures
- Other measures are used too e.g., accuracy of
dependency links
24History-based models
25History-based models
- History-based approaches maps (T, S) into a
decision sequence - Probability of tree T for sentence S is
26History-based models (cont)
- PCFGs can be viewed as a history-based model
- There are other history-based models
- Magermans parser (1995)
- Collins parsers (1996, 1997, .)
- Charniaks parsers (1996,1997,.)
- Ratnaparkhis parser (1997)
27Collins models
- Model 1 Generative model of (Collins, 1996)
- Model 2 Add complement/adjunct distinction
- Model 3 Add wh-movement
28Model 1
- First generate the head constituent label
- Then generate left and right dependents
29Model 1(cont)
30An example
Sentence Last week Marks bought Brooks.
31Model 2
- Generate a head label H
- Choose left and right subcat frames
- Generate left and right arguments
- Generate left and right modifiers
32An example
33Model 3
- Add Trace and wh-movement
- Given that the LHS of a rule has a gap, there are
three ways to pass down the gap - Head S(gap)?NP VP(gap)
- Left S(gap)?NP(gap) VP
- Right SBAR(that)(gap)?WHNP(that) S(gap)
34Parsing results
35Tree Adjoining Grammar (TAG)
36TAG
- TAG basics
- Extension of LTAG
- Lexicalized TAG (LTAG)
- Synchronous TAG (STAG)
- Multi-component TAG (MCTAG)
- .
37TAG basics
- A tree-rewriting formalism (Joshi et. al, 1975)
- It can generate mildly context-sensitive
languages. - The primitive elements of a TAG are elementary
trees. - Elementary trees are combined by two operations
substitution and adjoining. - TAG has been used in
- parsing, semantics, discourse, etc.
- Machine translation, summarization, generation,
etc.
38Two types of elementary trees
Initial tree
Auxiliary tree
39Substitution operation
40They draft policies
41Adjoining operation
42They still draft policies
43Derivation tree
Derived tree
Elementary trees
Derivation tree
44Derived tree vs. derivation tree
- The mapping is not 1-to-1.
- Finding the best derivation is not the same as
finding the best derived tree.
45Wh-movement
What do they draft ?
46Long-distance wh-movement
What does John think they draft ?
what
47Who did you have dinner with?
48TAG extension
- Lexicalized TAG (LTAG)
- Synchronized TAG (STAG)
- Multi-component TAG (MCTAG)
- .
49STAG
- The primitive elements in STAG are elementary
tree pairs. - Used for MT
50Summary of TAG
- A formalism beyond CFG
- Primitive elements are trees, not rules
- Extended domain of locality
- Two operations substitution and adjoining
- Parsing algorithm
- Statistical parser for TAG
- Algorithms for extracting TAG from treebanks.
51Parsing summary
52Types of parsers
- Phrase structure vs. dependency tree
- Statistical vs. rule-based
- Grammar-based or not
- Supervised vs. unsupervised
- Our focus
- Phrase structure
- Mainly statistical
- Mainly Grammar-based CFG, TAG
- Supervised
53Grammars
- Chomsky hierarchy
- Unstricted grammar (type 0)
- Context-sensitive grammar
- Context-free grammar
- Regular grammar
- Human languages are beyond context-free
- Other formalism
- HPSG, LFG
- TAG
- Dependency grammars
54Parsing algorithm for CFG
- Top-down
- Bottom-up
- Top-down with bottom-up filter
- Earley algorithm
- CYK algorithm
- Requiring CFG to be in CNF
- Can be augmented to deal with PCFG, lexicalized
CFG, etc.
55Extensions of CFG
- PCFG find the most likely parse trees
- Lexicalized CFG
- use less strong independence assumption
- Account for certain types of lexical and
structural dependency
56Beyond CFG
- History-based models
- Collins parsers
- TAG
- Tree-writing
- Mildly context-sensitive grammar
- Many extensions LTAG, STAG,
57Statistical approach
- Modeling
- Choose the objective function
- Decompose the function
- Common equations joint, conditional, marginal
probabilities - Independency assumptions
- Training
- Supervised vs. unsupervised
- Smoothing
- Decoding
- Dynamic programming
- Pruning
58Evaluation of parsers
- Accuracy ParseVal
- Robustness
- Resources needed
- Efficiency
- Richness
59Other things
- Converting into CNF
- CFG
- PCFG
- Lexicalized CFG
- Treebank annotation
- Tagset syntactic labels, POS tag, function tag,
empty categories - Format indentation, brackets