Computational Linguistics - PowerPoint PPT Presentation

1 / 166
About This Presentation
Title:

Computational Linguistics

Description:

Thanks to Dan Jurafsky and Jim Martin and James Pustejovsky for many of these ... Sneezed the book is a VP since 'sneeze' is a verb and 'the book' is a valid NP ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 167
Provided by: danj172
Category:

less

Transcript and Presenter's Notes

Title: Computational Linguistics


1
Computational Linguistics
  • Lecture 5 Parsing

IBased on Dan Jurafskys textbook, Speech and
Language Processing Ch. 13 Thanks to Dan Jurafsky
and Jim Martin and James Pustejovsky for many of
these slides!
2
Outline for Grammar/Parsing Week
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Sentence-level constructions
  • NP, PP, VP
  • Coordination
  • Subcategorization
  • Top-down and Bottom-up Parsing
  • Dynamic Programming Parsing
  • Quick sketch of probabilistic parsing

3
Review
  • Parts of Speech tagging For each word in a
    given sentence, determine
  • Assigning parts of speech to all the words in a
    sentence
  • In the context of the sentence
  • POS syntactic/morphological categories of word
  • Syntax how words group togeter
  • Morphology surface word stem affixes

4
Syntax
  • Syntax
  • Refers to the way words are arranged together,
    and the relationship between them.
  • Distinction
  • Prescriptive grammar how people should speak or
    write
  • Descriptive grammar how people actually do
  • Goal of syntax (grammar)
  • model the unconscious knowledge of people about
    how to use their native language

5
Syntax
  • Why should we care?
  • Grammar checkers
  • Question answering
  • Information extraction
  • Machine translation

6
key ideas of syntax
  • Constituency
  • Recursive structure
  • Part of a structure
  • Subcategorization
  • Variation of a structure
  • Type of head word causing the variation
  • Grammatical relations
  • Relation of words in a structure
  • Dependency relations
  • Non-adjacent grammatical relations

7
Context-Free Grammars (CFG)
  • Capture constituency and ordering
  • Ordering Rules governing the ordering of words,
    phrases, clauses in the language
  • E.g., noun phrase such as A high-class spot, A
    high-class spot like Mindys
  • Constituency How words group into units and how
    the various kinds of units behave
  • E.g., noun phrase in prepositional phrase and
    verb phrase

8
Constituency
  • Eamples of Noun phrases (NPs)
  • Three parties from Brooklyn
  • A high-class spot such as Mindys
  • The Broadway coppers
  • They
  • Harry the Horse
  • The reason he comes into the Hot Box
  • How do we know these form a constituent?
  • Substitution principle

9
Constituency (II)
  • They can all appear before a verb
  • Three parties from Brooklyn arrive
  • A high-class spot such as Mindys attracts
  • The Broadway coppers love
  • They sit
  • But individual words cant always appear before
    verbs
  • from arrive
  • as attracts
  • the is
  • spot is
  • Must be able to state generalizations like
  • Noun phrases occur before verbs

10
Constituency (III)
  • Preposing and postposing
  • On September 17th, Id like to fly from Atlanta
    to Denver
  • Id like to fly on September 17th from Atlanta to
    Denver
  • Id like to fly from Atlanta to Denver on
    September 17th.
  • But not
  • On September, Id like to fly 17th from Atlanta
    to Denver
  • On Id like to fly September 17th from Atlanta
    to Denver
  • Express constituency formally
  • Rewriting rules
  • Grammars
  • Context-free grammar

11
CFG example
  • S -gt NP VP
  • NP -gt Det NOMINAL
  • NOMINAL -gt Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left

12
CFGs set of rules
  • S -gt NP VP
  • This says that there are units called S, NP, and
    VP in this language
  • That an S consists of an NP followed immediately
    by a VP
  • Doesnt say that thats the only kind of S
  • Nor does it say that this is the only place that
    NPs and VPs occur

13
Generativity
  • As with FSAs you can view these rules as either
    analysis or synthesis machines
  • Generate strings in the language
  • Reject strings not in the language
  • Impose structures (trees) on strings in the
    language
  • How can we define grammatical vs. ungrammatical
    sentences?

14
Derivations
  • A derivation is a sequence of rules applied to a
    string that accounts for that string
  • Covers all the elements in the string
  • Covers only the elements in the string

15
Derivations as Trees
16
CFGs more formally
  • A context-free grammar has 4 parameters
  • A set of non-terminal symbols (variables) N
  • A set of terminal symbols ? (disjoint from N)
  • A set of productions P, each of the form
  • A -gt ?
  • Where A is a non-terminal and ? is a string of
    symbols from the infinite set of strings (? ? N)
  • A start symbol S

17
Defining a CF language via derivation
  • A string A derives a string B if
  • A can be rewritten as B via some series of rule
    applications
  • More formally
  • If A -gt ? is a production of P
  • ? and ? are any strings in the set (? ? N)
  • Then we say that
  • ?A? directly derives ??? or ?A? ? ???
  • Derivation is a generalization of direct
    derivation
  • Let ?1, ?2, ?m be strings in (? ? N), mgt 1,
    s.t.
  • ?1? ?2, ?2? ?3 ?m-1? ?m
  • We say that ?1derives ?m or ?1? ?m
  • We then formally define language LG generated by
    grammar G
  • A set of strings composed of terminal symbols
    derived from S
  • LG w w is in ? and S ? w

18
Parsing
  • Parsing is the process of taking a string and a
    grammar and returning one or many parse trees for
    the input sentence

19
Context?
  • The notion of context in CFGs
  • LHS symbol of a rule determines the RHS by itself
    (free of context)
  • Other parts have no influence on the derivation
  • Mom -gt N N Means that I can rewrite an N N
    regardless of the context of A

20
Key Constituents of English
  • Key Constituents
  • Sentences
  • Noun phrases
  • Verb phrases
  • Prepositional phrases
  • Similarity exists among languages

21
Sentence-Types
  • Declaratives A plane left
  • S -gt NP VP
  • Imperatives Leave!
  • S -gt VP
  • Yes-No Questions Did the plane leave?
  • S -gt Aux NP VP
  • WH Questions When did the plane leave?
  • S -gt WH Aux NP VP

22
NPs
  • NP -gt Pronoun
  • I came, you saw it, they conquered
  • NP -gt Proper-Noun
  • Los Angeles is west of Texas
  • John Hennessy is the president of Stanford
  • NP -gt Det Noun
  • The president
  • NP -gt Nominal
  • Nominal -gt Noun Noun
  • A morning flight to Denver

23
PPs
  • PP -gt Preposition NP
  • From LA
  • To the store
  • On Tuesday morning
  • With lunch

24
Recursion
  • Well have to deal with rules such as the
    following where the non-terminal on the left also
    appears somewhere on the right (directly)
  • NP -gt NP PP The flight to Boston
  • VP -gt VP PP departed Miami at noon

25
Recursion
  • Of course, this is what makes syntax interesting
  • Flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in February
  • Flights from Denver to Miami in February on a
    Friday
  • Flights from Denver to Miami in February on a
    Friday under 300
  • Flights from Denver to Miami in February on a
    Friday under 300 with lunch

26
Recursion
  • Flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in
    February
  • Flights from Denver to Miami in
    February on a Friday
  • Etc.
  • NP -gt NP PP

27
Implications of recursion and context-freeness
  • If you have a rule like
  • VP -gt V NP
  • The important thing the thing after the verb is
    an NP
  • Unimportant internal affairs of that NP

28
The point
  • VP -gt V NP
  • (I) hate
  • flights from Denver
  • flights from Denver to Miami
  • flights from Denver to Miami in February
  • flights from Denver to Miami in February on a
    Friday
  • flights from Denver to Miami in February on a
    Friday under 300
  • flights from Denver to Miami in February on a
    Friday under 300 with lunch

29
Bracketed Notation
  • S NP PRO I VP V prefer NP Det a Nom
    N morning
  • N flight

30
Coordination Constructions
  • S -gt S and S
  • John went to NY and Mary followed him
  • NP -gt NP and NP
  • VP -gt VP and VP
  • In fact the right rule for English is
  • X -gt X and X (Metarule)
  • However we can say
  • He was longwinded and a bully.

31
Problems
  • Agreement
  • Subcategorization
  • Movement (for want of a better term)

32
Agreement
  • This dog
  • Those dogs
  • This dog eats
  • Those dogs eat
  • This dogs
  • Those dog
  • This dog eat
  • Those dogs eats

33
Possible CFG Solution
  • S -gt NP VP
  • NP -gt Det Nominal
  • VP -gt V NP
  • SgS -gt SgNP SgVP
  • PlS -gt PlNp PlVP
  • SgNP -gt SgDet SgNom
  • PlNP -gt PlDet PlNom
  • PlVP -gt PlV NP
  • SgVP -gtSgV Np

34
CFG Solution for Agreement
  • It works and stays within the power of CFGs
  • But its ugly
  • And it doesnt scale all that well

35
Subcategorization
  • Sneeze John sneezed
  • John sneezed the book
  • Say You said United has a flightS
  • Prefer I prefer to leave earlierTO-VP
  • I prefer United has a flight
  • Give Give meNPa cheaper fareNP
  • Help Can you help meNPwith a flightPP
  • Give with a flight

36
Subcategorization
  • Subcat expresses the constraints that a predicate
    (verb for now) places on the number and syntactic
    types of arguments it wants to take (occur with).

37
So?
  • So the various rules for VPs overgenerate
  • They permit the presence of strings containing
    verbs and arguments that dont go together
  • For example
  • VP -gt V NP
  • therefore
  • Sneezed the book is a VP since sneeze is a
    verb and the book is a valid NP

38
Possible CFG Solution
  • VP -gt V
  • VP -gt V NP
  • VP -gt V NP PP
  • VP -gt IntransV
  • VP -gt TransV NP
  • VP -gt TransVwPP NP PP

39
Forward Pointer
  • It turns out that verb subcategorization facts
    will provide a key element for semantic analysis
    (determining who did what to who in an event).

40
Movement
  • Core example
  • My travel agent booked the flight
  • My travel agentNP booked the flightNPVPS
  • i.e. book is a straightforward transitive verb.
    It expects a single NP arg within the VP as an
    argument, and a single NP arg as the subject.

41
Movement
  • What about?
  • Which flight do you want me to have the travel
    agent book?
  • The direct object argument to book isnt
    appearing in the right place. It is in fact a
    long way from where its supposed to appear.
  • And note that its separated from its verb by 2
    other verbs.

42
CFGs a summary
  • CFGs appear to be just about what we need to
    account for a lot of basic syntactic structure in
    English.
  • But there are problems
  • That can be dealt with adequately, although not
    elegantly, by staying within the CFG framework.
  • There are simpler, more elegant, solutions that
    take us out of the CFG framework (beyond its
    formal power).
  • Syntactic theories HPSG, LFG, CCG, Minimalism,
    etc.

43
Other syntactic stuff
  • Grammatical relations
  • Subject
  • I booked a flight to New York
  • The flight was booked by my agent
  • Object
  • I booked a flight to New York
  • Complement
  • I said that I wanted to leave

44
Dependency parsing
  • Word to word links instead of constituency
  • Based on the European rather than American
    traditions
  • But dates back to the Greeks
  • The original notions of Subject, Object and the
    progenitor of subcategorization (called
    valence) came out of Dependency theory.
  • Dependency parsing is quite popular as a
    computational model
  • since relationships between words are quite
    useful

45
Dependency parsing
Parse tree Nesting of multi-word constituents
Typed dep parse Grammatical relations between
individual words
46
Why are dependency parses useful?
  • Example multi-document summarization
  • Need to identify sentences from different
    documents that each say roughly the same thing
  • phrase structure trees of paraphrasing sentences
    which differ in word order can be significantly
    different
  • but dependency representations will be very
    similar

47
Parsing
  • Parsing assigning correct trees to input strings
  • Correct tree a tree that covers all and only the
    elements of the input and has an S at the top
  • For now enumerate all possible trees
  • A further task disambiguation means choosing
    the correct tree from among all the possible
    trees.

48
Treebanks
  • Parsed corpora in the form of trees
  • The Penn Treebank
  • The Brown corpus
  • The WSJ corpus
  • Tgrep
  • http//www.ldc.upenn.edu/ldc/online/treebank/
  • Tregex
  • http//www-nlp.stanford.edu/nlp/javadoc/javanlp/

49
Parsing involves search
  • As with everything of interest, parsing involves
    a search which involves the making of choices
  • Well start with some basic (meaning bad) methods
    before moving on to the one or two that you need
    to know

50
For Now
  • Assume
  • You have all the words already in some buffer
  • The input isnt pos tagged
  • We wont worry about morphological analysis
  • All the words are known

51
Top-Down Parsing
  • Since were trying to find trees rooted with an S
    (Sentences) start with the rules that give us an
    S.
  • Then work your way down from there to the words.

52
Top Down Space
53
Bottom-Up Parsing
  • Of course, we also want trees that cover the
    input words. So start with trees that link up
    with the words in the right way.
  • Then work your way up from there.

54
Bottom-Up Space
55
Control
  • Of course, in both cases we left out how to keep
    track of the search space and how to make choices
  • Which node to try to expand next
  • Which grammar rule to use to expand a node

56
Top-Down, Depth-First, Left-to-Right Search
57
Example
58
Example
59
Example
60
Top-Down and Bottom-Up
  • Top-down
  • Only searches for trees that can be answers (i.e.
    Ss)
  • But also suggests trees that are not consistent
    with the words
  • Bottom-up
  • Only forms trees consistent with the words
  • Suggest trees that make no sense globally

61
So Combine Them
  • There are a million ways to combine top-down
    expectations with bottom-up data to get more
    efficient searches
  • Most use one kind as the control and the other as
    a filter
  • As in top-down parsing with bottom-up filtering

62
Adding Bottom-Up Filtering
63
3 problems with TDDFLtR Parser
  • Left-Recursion
  • Ambiguity
  • Inefficient reparsing of subtrees

64
Left-Recursion
  • What happens in the following situation
  • S -gt NP VP
  • S -gt Aux NP VP
  • NP -gt NP PP
  • NP -gt Det Nominal
  • With the sentence starting with
  • Did the flight

65
Lots of ambiguity
  • VP -gt VP PP
  • NP -gt NP PP
  • Show me the meal on flight 286 from SF to Denver
  • 14 parses!

66
Lots of ambiguity
  • Church and Patil (1982)
  • Number of parses for such sentences grows at rate
    of number of parenthesizations of arithmetic
    expressions
  • Which grow with Catalan numbers
  • PPs Parses
  • 1 2
  • 2 5
  • 3 14
  • 4 132
  • 5 469
  • 6 1430

67
Avoiding Repeated Work
  • Parsing is hard, and slow. Its wasteful to redo
    stuff over and over and over.
  • Consider an attempt to top-down parse the
    following as an NP
  • A flight from Indianapolis to Houston on TWA

68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
We done Part I
  • Part I
  • Context-Free Grammars and Constituency
  • Some common CFG feature of English
  • Baby parsers Top-down/Bottom-up Parsing

73
  • Part II
  • CYK and Probabilistic parsers

74
Grammars and Parsing
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Baby parsers Top-down and Bottom-up Parsing
  • Today Real parsers Dynamic Programming parsing
  • CKY
  • Probabilistic parsing
  • Optional section the Earley algorithm

75
Dynamic Programming
  • We need a method that fills a table with partial
    results that
  • Does not do (avoidable) repeated work
  • Does not fall prey to left-recursion
  • Can find all the pieces of an exponential number
    of trees in polynomial time.
  • Two popular methods
  • CKY
  • Earley

76
The CKY (Cocke-Kasami-Younger) Algorithm
  • Requires the grammar be in Chomsky Normal Form
    (CNF)
  • All rules must be in following form
  • A -gt B C
  • A -gt w
  • Any grammar can be converted automatically to
    Chomsky Normal Form

77
Converting to CNF
  • Rules that mix terminals and non-terminals
  • Introduce a new dummy non-terminal that covers
    the terminal
  • INFVP -gt to VP replaced by
  • INFVP -gt TO VP
  • TO -gt to
  • Rules that have a single non-terminal on right
    (unit productions)
  • Rewrite each unit production with the RHS of
    their expansions
  • Rules whose right hand side length gt2
  • Introduce dummy non-terminals that spread the
    right-hand side

78
Automatic Conversion to CNF
79
Sample Grammar
80
Back to CKY Parsing
  • Given rules in CNF
  • Consider the rule A -gt BC
  • If there is an A in the input then there must be
    a B followed by a C in the input.
  • If the A goes from i to j in the input then there
    must be some k st. iltkltj
  • Ie. The B splits from the C someplace.

81
CKY
  • So lets build a table so that an A spanning from
    i to j in the input is placed in cell i,j in
    the table.
  • So a non-terminal spanning an entire string will
    sit in cell 0, n
  • If we build the table bottom up well know that
    the parts of the A must go from i to k and from k
    to j

82
CKY
  • Meaning that for a rule like A -gt B C we should
    look for a B in i,k and a C in k,j.
  • In other words, if we think there might be an A
    spanning i,j in the input AND
  • A -gt B C is a rule in the grammar THEN
  • There must be a B in i,k and a C in k,j for
    some iltkltj
  • So just loop over the possible k values

83
CKY Table
  • Filling the i,jth cell in the CKY table

84
CKY Algorithm
85
Note
  • We arranged the loops to fill the table a column
    at a time, from left to right, bottom to top.
  • This assures us that whenever were filling a
    cell, the parts needed to fill it are already in
    the table (to the left and below)
  • Are there other ways to fill the table?

86
0 Book 1 the 2 flight 3 through 4 Houston 5
87
CYK Example
  • S -gt NP VP
  • VP -gt V NP
  • NP -gt NP PP
  • VP -gt VP PP
  • PP -gt P NP
  • NP -gt John, Mary, Denver
  • V -gt called
  • P -gt from

88
Example
S
NP
VP
PP
VP
V
NP
NP
P
John called Mary from Denver
89
Example
S
NP
VP
NP
PP
NP
V
John called Mary from Denver
90
Example
91
Example
92
Example
93
Example
94
Example
95
Example
96
Example
97
Example
98
Example
99
Example
100
Example
101
Example
102
Example
103
Example
104
Back to Ambiguity
  • Did we solve it?

105
Ambiguity
106
Ambiguity
  • No
  • Both CKY and Earley will result in multiple S
    structures for the 0,n table entry.
  • They both efficiently store the sub-parts that
    are shared between multiple parses.
  • But neither can tell us which one is right.
  • Not a parser a recognizer
  • The presence of an S state with the right
    attributes in the right place indicates a
    successful recognition.
  • But no parse tree no parser
  • Thats how we solve (not) an exponential problem
    in polynomial time

107
Converting CKY from Recognizer to Parser
  • With the addition of a few pointers we have a
    parser
  • Augment each new cell in chart to point to where
    we came from.

108
Optional section the Earley alg
109
Problem (minor)
  • We said CKY requires the grammar to be binary
    (ie. In Chomsky-Normal Form).
  • We showed that any arbitrary CFG can be converted
    to Chomsky-Normal Form so thats not a huge deal
  • Except when you change the grammar the trees come
    out wrong
  • All things being equal wed prefer to leave the
    grammar alone.

110
Earley Parsing
  • Allows arbitrary CFGs
  • Where CKY is bottom-up, Earley is top-down
  • Fills a table in a single sweep over the input
    words
  • Table is length N1 N is number of words
  • Table entries represent
  • Completed constituents and their locations
  • In-progress constituents
  • Predicted constituents

111
States
  • The table-entries are called states and are
    represented with dotted-rules.
  • S -gt ? VP A VP is predicted
  • NP -gt Det ? Nominal An NP is in progress
  • VP -gt V NP ? A VP has been found

112
States/Locations
  • It would be nice to know where these things are
    in the input so
  • S -gt ? VP 0,0 A VP is predicted at the
    start of the sentence
  • NP -gt Det ? Nominal 1,2 An NP is in progress
    the Det goes from 1 to 2
  • VP -gt V NP ? 0,3 A VP has been found
    starting at 0 and ending at 3

113
Graphically
114
Earley
  • As with most dynamic programming approaches, the
    answer is found by looking in the table in the
    right place.
  • In this case, there should be an S state in the
    final column that spans from 0 to n1 and is
    complete.
  • If thats the case youre done.
  • S gt a ? 0,n1

115
Earley Algorithm
  • March through chart left-to-right.
  • At each step, apply 1 of 3 operators
  • Predictor
  • Create new states representing top-down
    expectations
  • Scanner
  • Match word predictions (rule with word after dot)
    to words
  • Completer
  • When a state is complete, see what rules were
    looking for that completed constituent

116
Predictor
  • Given a state
  • With a non-terminal to right of dot
  • That is not a part-of-speech category
  • Create a new state for each expansion of the
    non-terminal
  • Place these new states into same chart entry as
    generated state, beginning and ending where
    generating state ends.
  • So predictor looking at
  • S -gt . VP 0,0
  • results in
  • VP -gt . Verb 0,0
  • VP -gt . Verb NP 0,0

117
Scanner
  • Given a state
  • With a non-terminal to right of dot
  • That is a part-of-speech category
  • If the next word in the input matches this
    part-of-speech
  • Create a new state with dot moved over the
    non-terminal
  • So scanner looking at
  • VP -gt . Verb NP 0,0
  • If the next word, book, can be a verb, add new
    state
  • VP -gt Verb . NP 0,1
  • Add this state to chart entry following current
    one
  • Note Earley algorithm uses top-down input to
    disambiguate POS! Only POS predicted by some
    state can get added to chart!

118
Completer
  • Applied to a state when its dot has reached right
    end of role.
  • Parser has discovered a category over some span
    of input.
  • Find and advance all previous states that were
    looking for this category
  • copy state, move dot, insert in current chart
    entry
  • Given
  • NP -gt Det Nominal . 1,3
  • VP -gt Verb. NP 0,1
  • Add
  • VP -gt Verb NP . 0,3

119
Earley how do we know we are done?
  • How do we know when we are done?.
  • Find an S state in the final column that spans
    from 0 to n1 and is complete.
  • If thats the case youre done.
  • S gt a ? 0,n1

120
Earley
  • So sweep through the table from 0 to n1
  • New predicted states are created by starting
    top-down from S
  • New incomplete states are created by advancing
    existing states as new constituents are
    discovered
  • New complete states are created in the same way.

121
Earley
  • More specifically
  • Predict all the states you can upfront
  • Read a word
  • Extend states based on matches
  • Add new predictions
  • Go to 2
  • Look at N1 to see if you have a winner

122
Example
  • Book that flight
  • We should find an S from 0 to 3 that is a
    completed state

123
Example
124
Example
125
Example
126
Details
  • What kind of algorithms did we just describe
    (both Earley and CKY)
  • Not parsers recognizers
  • The presence of an S state with the right
    attributes in the right place indicates a
    successful recognition.
  • But no parse tree no parser
  • Thats how we solve (not) an exponential problem
    in polynomial time

127
Back to Ambiguity
  • Did we solve it?

128
Ambiguity
129
Ambiguity
  • No
  • Both CKY and Earley will result in multiple S
    structures for the 0,n table entry.
  • They both efficiently store the sub-parts that
    are shared between multiple parses.
  • But neither can tell us which one is right.
  • Not a parser a recognizer
  • The presence of an S state with the right
    attributes in the right place indicates a
    successful recognition.
  • But no parse tree no parser
  • Thats how we solve (not) an exponential problem
    in polynomial time

130
Converting Earley from Recognizer to Parser
  • With the addition of a few pointers we have a
    parser
  • Augment the Completer to point to where we came
    from.

131
Augmenting the chart with structural information
S8
S8
S9
S9
S10
S8
S11
S12
S13
132
Retrieving Parse Trees from Chart
  • All the possible parses for an input are in the
    table
  • We just need to read off all the backpointers
    from every complete S in the last column of the
    table
  • Find all the S -gt X . 0,N1
  • Follow the structural traces from the Completer
  • Of course, this wont be polynomial time, since
    there could be an exponential number of trees
  • So we can at least represent ambiguity
    efficiently

133
How to do parse disambiguation
  • Probabilistic methods
  • Augment the grammar with probabilities
  • Then modify the parser to keep only most probable
    parses
  • And at the end, return the most probable parse

134
Probabilistic CFGs
  • The probabilistic model
  • Assigning probabilities to parse trees
  • Getting the probabilities for the model
  • Parsing with probabilities
  • Slight modification to dynamic programming
    approach
  • Task is to find the max probability tree for an
    input

135
Probability Model
  • Attach probabilities to grammar rules
  • The expansions for a given non-terminal sum to 1
  • VP -gt Verb .55
  • VP -gt Verb NP .40
  • VP -gt Verb NP NP .05
  • Read this as P(Specific rule LHS)

136
PCFG
137
PCFG
138
Probability Model (1)
  • A derivation (tree) consists of the set of
    grammar rules that are in the tree
  • The probability of a tree is just the product of
    the probabilities of the rules in the derivation.

139
Probability model
  • P(T,S) P(T)P(ST) P(T) since P(ST)1

140
Probability Model (1.1)
  • The probability of a word sequence P(S) is the
    probability of its tree in the unambiguous case.
  • Its the sum of the probabilities of the trees in
    the ambiguous case.

141
Getting the Probabilities
  • From an annotated database (a treebank)
  • So for example, to get the probability for a
    particular VP rule just count all the times the
    rule is used and divide by the number of VPs
    overall.

142
TreeBanks
143
Treebanks
144
Treebanks
145
Treebank Grammars
146
Lots of flat rules
147
Example sentences from those rules
  • Total over 17,000 different grammar rules in the
    1-million word Treebank corpus

148
Probabilistic Grammar Assumptions
  • Were assuming that there is a grammar to be used
    to parse with.
  • Were assuming the existence of a large robust
    dictionary with parts of speech
  • Were assuming the ability to parse (i.e. a
    parser)
  • Given all that we can parse probabilistically

149
Typical Approach
  • Bottom-up (CKY) dynamic programming approach
  • Assign probabilities to constituents as they are
    completed and placed in the table
  • Use the max probability for each constituent
    going up

150
Whats that last bullet mean?
  • Say were talking about a final part of a parse
  • S-gt0NPiVPj
  • The probability of the S is
  • P(S-gtNP VP)P(NP)P(VP)
  • The green stuff is already known. Were doing
    bottom-up parsing

151
Max
  • I said the P(NP) is known.
  • What if there are multiple NPs for the span of
    text in question (0 to i)?
  • Take the max (where?)

152
Problems with PCFGs
  • The probability model were using is just based
    on the rules in the derivation
  • Doesnt use the words in any real way
  • Doesnt take into account where in the derivation
    a rule is used

153
Solution
  • Add lexical dependencies to the scheme
  • Infiltrate the predilections of particular words
    into the probabilities in the derivation
  • I.e. Condition the rule probabilities on the
    actual words

154
Heads
  • To do that were going to make use of the notion
    of the head of a phrase
  • The head of an NP is its noun
  • The head of a VP is its verb
  • The head of a PP is its preposition
  • (Its really more complicated than that but this
    will do.)

155
Example (right)
Attribute grammar
156
Example (wrong)
157
How?
  • We used to have
  • VP -gt V NP PP P(ruleVP)
  • Thats the count of this rule divided by the
    number of VPs in a treebank
  • Now we have
  • VP(dumped)-gt V(dumped) NP(sacks)PP(in)
  • P(rVP dumped is the verb sacks is the head
    of the NP in is the head of the PP)
  • Not likely to have significant counts in any
    treebank

158
Declare Independence
  • When stuck, exploit independence and collect the
    statistics you can
  • Well focus on capturing two things
  • Verb subcategorization
  • Particular verbs have affinities for particular
    VPs
  • Objects affinities for their predicates (mostly
    their mothers and grandmothers)
  • Some objects fit better with some predicates than
    others

159
Subcategorization
  • Condition particular VP rules on their head so
  • r VP -gt V NP PP P(rVP)
  • Becomes
  • P(r VP dumped)
  • Whats the count?
  • How many times was this rule used with (head)
    dump, divided by the number of VPs that dump
    appears (as head) in total

160
Preferences
  • Subcat captures the affinity between VP heads
    (verbs) and the VP rules they go with.
  • What about the affinity between VP heads and the
    heads of the other daughters of the VP
  • Back to our examples

161
Example (right)
162
Example (wrong)
163
Preferences
  • The issue here is the attachment of the PP. So
    the affinities we care about are the ones between
    dumped and into vs. sacks and into.
  • So count the places where dumped is the head of a
    constituent that has a PP daughter with into as
    its head and normalize
  • Vs. the situation where sacks is a constituent
    with into as the head of a PP daughter.

164
Preferences (2)
  • Consider the VPs
  • Ate spaghetti with gusto
  • Ate spaghetti with marinara
  • The affinity of gusto for eat is much larger than
    its affinity for spaghetti
  • On the other hand, the affinity of marinara for
    spaghetti is much higher than its affinity for
    ate

165
Preferences (2)
  • Note the relationship here is more distant and
    doesnt involve a headword since gusto and
    marinara arent the heads of the PPs.

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
166
Summary
  • Context-Free Grammars
  • Parsing
  • Top Down, Bottom Up Metaphors
  • Dynamic Programming Parsers CKY. Earley
  • Disambiguation
  • PCFG
  • Probabilistic Augmentations to Parsers
  • Treebanks
Write a Comment
User Comments (0)
About PowerShow.com