Probabilistic Parsing - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Probabilistic Parsing

Description:

Synchronous TAG (STAG) Multi-component TAG (MCTAG) TAG basics ... Many extensions: LTAG, STAG, ... Statistical approach. Modeling. Choose the objective function ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 60
Provided by: fei47
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Parsing


1
Probabilistic Parsing
  • Ling 571
  • Fei Xia
  • Week 5 10/25-10/27/05

2
Outline
  • Lexicalized CFG (Recap)
  • Hw5 and Project 2
  • Parsing evaluation measures ParseVal
  • Collins parser
  • TAG
  • Parsing summary

3
Lexicalized CFG recap
4
Important equations
5
Lexicalized CFG
  • Lexicalized rules
  • Sparse data problem
  • First generate the head
  • Then generate the unlexicalized rule

6
Lexicalized models
7
An example
  • he likes her

8
An example
  • he likes her

9
Head-head probability
10
Head-rule probability
11
Estimate parameters
12
Building a statistical tool
  • Design a model
  • Objective function generative model vs.
    discriminative model
  • Decomposition independence assumption
  • The types of parameters and parameter size
  • Training estimate model parameters
  • Supervised vs. unsupervised
  • Smoothing methods
  • Decoding

13
Team Project 1 (Hw5)
  • Form a team program language, schedule,
    expertise, etc.
  • Understand the lexicalized model
  • Design the training algorithm
  • Work out the decoding (parsing) algorithm
    augment CYK algorithm.
  • Illustrate the algorithms with a real example.

14
Team Project 2
  • Task parse real data with a real grammar
    extracted from a treebank.
  • Parser PCFG or lexicalized PCFG
  • Training data English Penn Treebank Section
    02-21
  • Development data section 00

15
Team Project 2 (cont)
  • Hw6 extract PCFG from the treebank
  • Hw7 make sure your parser works given real
    grammar and real sentences measure parsing
    performance
  • Hw8 improve parsing results
  • Hw10 write a report and give a presentation

16
Parsing evaluation measures
17
Evaluation of parsers ParseVal
  • Labeled recall
  • Labeled precision
  • Labeled F-measure
  • Complete match of sents where recall and
    precision are 100
  • Average crossing of crossing per sent
  • No crossing of sents which have no crossing.

18
An example
  • Gold standard
  • (VP (V saw)
  • (NP (Det the) (N man))
  • (PP (P with) (NP (Det a) (N
    telescope))))
  • Parser output
  • (VP (V saw)
  • (NP (NP (Det the) (N man))
  • (PP (P with) (NP (Det a) (N
    telescope)))))

19
ParseVal measures
  • Gold standard
  • (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5,
    6)
  • System output
  • (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,
    6), (NP, 5, 6)
  • Recall4/4, Prec4/5, crossing0

20
A different annotation
  • Gold standard
  • (VP (V saw)
  • (NP (Det the)
  • (N (N man))
  • (PP (P with)
  • (NP (Det a) (N (N
    telescope)))))
  • Parser output
  • (VP (V saw)
  • (NP (Det the)
  • (N (N man)
  • (PP (P with)
  • (NP (Det a) (N (N
    telescope)))))))

21
ParseVal measures (cont)
  • Gold standard
  • (VP, 1, 6), (NP, 2, 3), (N, 3, 3), (PP, 4,
    6),
  • (NP, 5, 6), (N, 6,6)
  • System output
  • (VP, 1, 6), (NP, 2, 6), (N, 3, 6), (PP, 4,
    6),
  • (NP, 5, 6), (N, 6, 6)
  • Recall4/6, Prec4/6, crossing1

22
EVALB
  • A tool that calculates ParseVal measures
  • To run it
  • evalb p parameter_file gold_file system_output
  • A copy is available in my dropbox
  • You will need it for Team Project 2

23
Summary of Parsing evaluation measures
  • ParseVal is the widely used F-measure is the
    most important
  • The results depend on annotation style
  • EVALB is a tool that calculates ParseVal measures
  • Other measures are used too e.g., accuracy of
    dependency links

24
History-based models
25
History-based models
  • History-based approaches maps (T, S) into a
    decision sequence
  • Probability of tree T for sentence S is

26
History-based models (cont)
  • PCFGs can be viewed as a history-based model
  • There are other history-based models
  • Magermans parser (1995)
  • Collins parsers (1996, 1997, .)
  • Charniaks parsers (1996,1997,.)
  • Ratnaparkhis parser (1997)

27
Collins models
  • Model 1 Generative model of (Collins, 1996)
  • Model 2 Add complement/adjunct distinction
  • Model 3 Add wh-movement

28
Model 1
  • First generate the head constituent label
  • Then generate left and right dependents

29
Model 1(cont)
30
An example
Sentence Last week Marks bought Brooks.
31
Model 2
  • Generate a head label H
  • Choose left and right subcat frames
  • Generate left and right arguments
  • Generate left and right modifiers

32
An example
33
Model 3
  • Add Trace and wh-movement
  • Given that the LHS of a rule has a gap, there are
    three ways to pass down the gap
  • Head S(gap)?NP VP(gap)
  • Left S(gap)?NP(gap) VP
  • Right SBAR(that)(gap)?WHNP(that) S(gap)

34
Parsing results
35
Tree Adjoining Grammar (TAG)
36
TAG
  • TAG basics
  • Extension of LTAG
  • Lexicalized TAG (LTAG)
  • Synchronous TAG (STAG)
  • Multi-component TAG (MCTAG)
  • .

37
TAG basics
  • A tree-rewriting formalism (Joshi et. al, 1975)
  • It can generate mildly context-sensitive
    languages.
  • The primitive elements of a TAG are elementary
    trees.
  • Elementary trees are combined by two operations
    substitution and adjoining.
  • TAG has been used in
  • parsing, semantics, discourse, etc.
  • Machine translation, summarization, generation,
    etc.

38
Two types of elementary trees
Initial tree
Auxiliary tree
39
Substitution operation
40
They draft policies
41
Adjoining operation
42
They still draft policies
43
Derivation tree
Derived tree
Elementary trees
Derivation tree
44
Derived tree vs. derivation tree
  • The mapping is not 1-to-1.
  • Finding the best derivation is not the same as
    finding the best derived tree.

45
Wh-movement
What do they draft ?
46
Long-distance wh-movement
What does John think they draft ?
what
47
Who did you have dinner with?
48
TAG extension
  • Lexicalized TAG (LTAG)
  • Synchronized TAG (STAG)
  • Multi-component TAG (MCTAG)
  • .

49
STAG
  • The primitive elements in STAG are elementary
    tree pairs.
  • Used for MT

50
Summary of TAG
  • A formalism beyond CFG
  • Primitive elements are trees, not rules
  • Extended domain of locality
  • Two operations substitution and adjoining
  • Parsing algorithm
  • Statistical parser for TAG
  • Algorithms for extracting TAG from treebanks.

51
Parsing summary
52
Types of parsers
  • Phrase structure vs. dependency tree
  • Statistical vs. rule-based
  • Grammar-based or not
  • Supervised vs. unsupervised
  • Our focus
  • Phrase structure
  • Mainly statistical
  • Mainly Grammar-based CFG, TAG
  • Supervised

53
Grammars
  • Chomsky hierarchy
  • Unstricted grammar (type 0)
  • Context-sensitive grammar
  • Context-free grammar
  • Regular grammar
  • Human languages are beyond context-free
  • Other formalism
  • HPSG, LFG
  • TAG
  • Dependency grammars

54
Parsing algorithm for CFG
  • Top-down
  • Bottom-up
  • Top-down with bottom-up filter
  • Earley algorithm
  • CYK algorithm
  • Requiring CFG to be in CNF
  • Can be augmented to deal with PCFG, lexicalized
    CFG, etc.

55
Extensions of CFG
  • PCFG find the most likely parse trees
  • Lexicalized CFG
  • use less strong independence assumption
  • Account for certain types of lexical and
    structural dependency

56
Beyond CFG
  • History-based models
  • Collins parsers
  • TAG
  • Tree-writing
  • Mildly context-sensitive grammar
  • Many extensions LTAG, STAG,

57
Statistical approach
  • Modeling
  • Choose the objective function
  • Decompose the function
  • Common equations joint, conditional, marginal
    probabilities
  • Independency assumptions
  • Training
  • Supervised vs. unsupervised
  • Smoothing
  • Decoding
  • Dynamic programming
  • Pruning

58
Evaluation of parsers
  • Accuracy ParseVal
  • Robustness
  • Resources needed
  • Efficiency
  • Richness

59
Other things
  • Converting into CNF
  • CFG
  • PCFG
  • Lexicalized CFG
  • Treebank annotation
  • Tagset syntactic labels, POS tag, function tag,
    empty categories
  • Format indentation, brackets
Write a Comment
User Comments (0)
About PowerShow.com