LINGUIST 180: Introduction to Computational Linguistics - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

LINGUIST 180: Introduction to Computational Linguistics

Description:

Dan Jurafsky, Marie-Catherine de Marneffe Lecture 9: Grammar and Parsing (I) Thanks to Jim Martin for many of these s! – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 74
Provided by: DanJur6
Category:

less

Transcript and Presenter's Notes

Title: LINGUIST 180: Introduction to Computational Linguistics


1
LINGUIST 180 Introduction to Computational
Linguistics
  • Dan Jurafsky, Marie-Catherine de Marneffe
  • Lecture 9 Grammar and Parsing (I)

Thanks to Jim Martin for many of these slides!
2
Outline for Grammar/Parsing Week
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Sentence-level constructions
  • NP, PP, VP
  • Coordination
  • Subcategorization
  • Top-down and Bottom-up Parsing
  • Dynamic Programming Parsing
  • Quick sketch of probabilistic parsing

3
Review
  • Parts of Speech
  • Basic syntactic/morphological categories that
    words belong to
  • Part of Speech tagging
  • Assigning parts of speech to all the words in a
    sentence

4
Syntax
  • Syntax from Greek syntaxis setting out
    together, arrangement
  • Refers to the way words are arranged together,
    and the relationship between them.
  • Distinction
  • Prescriptive grammar how people ought to talk
  • Descriptive grammar how they do talk
  • Goal of syntax is to model the knowledge of that
    people unconsciously have about the grammar of
    their native language

5
Syntax
  • Why should we care?
  • Grammar checkers
  • Question answering
  • Information extraction
  • Machine translation

6
key ideas of syntax
  • Constituency (well spend most of our time on
    this)
  • Subcategorization
  • Grammatical relations
  • Plus one part we wont have time for
  • Movement/long-distance dependency

7
Context-Free Grammars (CFG)
  • Capture constituency and ordering
  • Ordering
  • What are the rules that govern the ordering of
    words and bigger units in the language?
  • Constituency
  • How words group into units and how the various
    kinds of units behave

8
Constituency
  • E.g., Noun phrases (NPs)
  • Three parties from Brooklyn
  • A high-class spot such as Mindys
  • The Broadway coppers
  • They
  • Harry the Horse
  • The reason he comes into the Hot Box
  • How do we know these form a constituent?

9
Constituency (II)
  • They can all appear before a verb
  • Three parties from Brooklyn arrive
  • A high-class spot such as Mindys attracts
  • The Broadway coppers love
  • They sit
  • But individual words cant always appear before
    verbs
  • from arrive
  • as attracts
  • the is
  • spot is
  • Must be able to state generalizations like
  • Noun phrases occur before verbs

10
Constituency (III)
  • Preposing and postposing
  • On September 17th, Id like to fly from Atlanta
    to Denver
  • Id like to fly on September 17th from Atlanta to
    Denver
  • Id like to fly from Atlanta to Denver on
    September 17th.
  • But not
  • On September, Id like to fly 17th from Atlanta
    to Denver
  • On Id like to fly September 17th from Atlanta
    to Denver

11
CFG example
  • S -gt NP VP
  • NP -gt Det NOMINAL
  • NOMINAL -gt Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left

12
CFGs set of rules
  • S -gt NP VP
  • This says that there are units called S, NP, and
    VP in this language
  • That an S consists of an NP followed immediately
    by a VP
  • Doesnt say that thats the only kind of S
  • Nor does it say that this is the only place that
    NPs and VPs occur

13
Generativity
  • As with FSAs you can view these rules as either
    analysis or synthesis machines
  • Generate strings in the language
  • Reject strings not in the language
  • Impose structures (trees) on strings in the
    language
  • How can we define grammatical vs. ungrammatical
    sentences?

14
Derivations
  • A derivation is a sequence of rules applied to a
    string that accounts for that string
  • Covers all the elements in the string
  • Covers only the elements in the string

15
Derivations as Trees
16
CFGs more formally
  • A context-free grammar has 4 parameters
  • (is a 4-tuple)
  • A set of non-terminal symbols (variables) N
  • A set of terminal symbols ? (disjoint from N)
  • A set of productions P, each of the form
  • A -gt ?
  • Where A is a non-terminal and ? is a string of
    symbols from the infinite set of strings (? ? N)
  • A designated start symbol S

17
Defining a CF language via derivation
  • A string A derives a string B if
  • A can be rewritten as B via some series of rule
    applications
  • More formally
  • If A -gt ? is a production of P
  • ? and ? are any strings in the set (? ? N)
  • Then we say that
  • ?A? directly derives ??? or ?A? ? ???
  • Derivation is a generalization of direct
    derivation
  • Let ?1, ?2, ?m be strings in (? ? N), mgt 1,
    s.t.
  • ?1? ?2, ?2? ?3 ?m-1? ?m
  • We say that ?1derives ?m or ?1? ?m
  • We then formally define language LG generated by
    grammar G
  • A set of strings composed of terminal symbols
    derived from S
  • LG w w is in ? and S ? w

18
Parsing
  • Parsing is the process of taking a string and a
    grammar and returning a (many?) parse tree(s) for
    that string

19
Context?
  • The notion of context in CFGs has nothing to do
    with the ordinary meaning of the word context in
    language
  • All it really means is that the non-terminal on
    the left-hand side of a rule is out there all by
    itself (free of context)
  • A -gt B C
  • Means that I can rewrite an A as a B followed by
    a C regardless of the context in which A is found

20
Key Constituents (English)
  • Sentences
  • Noun phrases
  • Verb phrases
  • Prepositional phrases

21
Sentence-Types
  • Declaratives A plane left
  • S -gt NP VP
  • Imperatives Leave!
  • S -gt VP
  • Yes-No Questions Did the plane leave?
  • S -gt Aux NP VP
  • WH Questions When did the plane leave?
  • S -gt WH Aux NP VP

22
NPs
  • NP -gt Pronoun
  • I came, you saw it, they conquered
  • NP -gt Proper-Noun
  • Los Angeles is west of Texas
  • John Hennessy is the president of Stanford
  • NP -gt Det Noun
  • The president
  • NP -gt Nominal
  • Nominal -gt Noun Noun
  • A morning flight to Denver

23
PPs
  • PP -gt Preposition NP
  • From LA
  • To the store
  • On Tuesday morning
  • With lunch

24
Recursion
  • Well have to deal with rules such as the
    following where the non-terminal on the left also
    appears somewhere on the right (directly)
  • NP -gt NP PP The flight to Boston
  • VP -gt VP PP departed Miami at noon

25
Recursion
  • Of course, this is what makes syntax interesting
  • Flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in February
  • Flights from Denver to Miami in February on a
    Friday
  • Flights from Denver to Miami in February on a
    Friday under 300
  • Flights from Denver to Miami in February on a
    Friday under 300 with lunch

26
Recursion
  • Flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in
    February
  • Flights from Denver to Miami in
    February on a Friday
  • Etc.
  • NP -gt NP PP

27
Implications of recursion and context-freeness
  • If you have a rule like
  • VP -gt V NP
  • It only cares that the thing after the verb is an
    NP
  • It doesnt have to know about the internal
    affairs of that NP

28
The point
  • VP -gt V NP
  • (I) hate
  • flights from Denver
  • flights from Denver to Miami
  • flights from Denver to Miami in February
  • flights from Denver to Miami in February on a
    Friday
  • flights from Denver to Miami in February on a
    Friday under 300
  • flights from Denver to Miami in February on a
    Friday under 300 with lunch

29
Bracketed Notation
  • S NP PRO I VP V prefer NP Det a Nom
    N morning
  • N flight

30
Coordination Constructions
  • S -gt S and S
  • John went to NY and Mary followed him
  • NP -gt NP and NP
  • VP -gt VP and VP
  • In fact the right rule for English is
  • X -gt X and X (Metarule)
  • However we can say
  • He was longwinded and a bully.

31
Problems
  • Agreement
  • Subcategorization
  • Movement (for want of a better term)

32
Agreement
  • This dog
  • Those dogs
  • This dog eats
  • Those dogs eat
  • This dogs
  • Those dog
  • This dog eat
  • Those dogs eats

33
Possible CFG Solution
  • S -gt NP VP
  • NP -gt Det Nominal
  • VP -gt V NP
  • SgS -gt SgNP SgVP
  • PlS -gt PlNp PlVP
  • SgNP -gt SgDet SgNom
  • PlNP -gt PlDet PlNom
  • PlVP -gt PlV NP
  • SgVP -gtSgV Np

34
CFG Solution for Agreement
  • It works and stays within the power of CFGs
  • But its ugly
  • And it doesnt scale all that well

35
Subcategorization
  • Sneeze John sneezed
  • John sneezed the book
  • Say You said United has a flightS
  • Prefer I prefer to leave earlierTO-VP
  • I prefer United has a flight
  • Give Give meNPa cheaper fareNP
  • Help Can you help meNPwith a flightPP
  • Give with a flight

36
Subcategorization
  • Subcat expresses the constraints that a predicate
    (verb for now) places on the number and syntactic
    types of arguments it wants to take (occur with).

37
So?
  • So the various rules for VPs overgenerate
  • They permit the presence of strings containing
    verbs and arguments that dont go together
  • For example
  • VP -gt V NP
  • therefore
  • Sneezed the book is a VP since sneeze is a
    verb and the book is a valid NP

38
Possible CFG Solution
  • VP -gt V
  • VP -gt V NP
  • VP -gt V NP PP
  • VP -gt IntransV
  • VP -gt TransV NP
  • VP -gt TransVwPP NP PP

39
Forward Pointer
  • It turns out that verb subcategorization facts
    will provide a key element for semantic analysis
    (determining who did what to who in an event).

40
Movement
  • Core example
  • My travel agent booked the flight
  • My travel agentNP booked the flightNPVPS
  • i.e. book is a straightforward transitive verb.
    It expects a single NP arg within the VP as an
    argument, and a single NP arg as the subject.

41
Movement
  • What about?
  • Which flight do you want me to have the travel
    agent book?
  • The direct object argument to book isnt
    appearing in the right place. It is in fact a
    long way from where its supposed to appear.
  • And note that its separated from its verb by 2
    other verbs.

42
CFGs a summary
  • CFGs appear to be just about what we need to
    account for a lot of basic syntactic structure in
    English.
  • But there are problems
  • That can be dealt with adequately, although not
    elegantly, by staying within the CFG framework.
  • There are simpler, more elegant, solutions that
    take us out of the CFG framework (beyond its
    formal power).
  • Syntactic theories HPSG, LFG, CCG, Minimalism,
    etc.

43
Other syntactic stuff
  • Grammatical relations
  • Subject
  • I booked a flight to New York
  • The flight was booked by my agent
  • Object
  • I booked a flight to New York
  • Complement
  • I said that I wanted to leave

44
Dependency parsing
  • Word to word links instead of constituency
  • Based on the European rather than American
    traditions
  • But dates back to the Greeks
  • The original notions of Subject, Object and the
    progenitor of subcategorization (called
    valence) came out of Dependency theory.
  • Dependency parsing is quite popular as a
    computational model
  • since relationships between words are quite
    useful

45
Dependency parsing
Parse tree Nesting of multi-word constituents
Typed dep parse Grammatical relations between
individual words
46
Why are dependency parses useful?
  • Example multi-document summarization
  • Need to identify sentences from different
    documents that each say roughly the same thing
  • phrase structure trees of paraphrasing sentences
    which differ in word order can be significantly
    different
  • but dependency representations will be very
    similar

47
Parsing
  • Parsing assigning correct trees to input strings
  • Correct tree a tree that covers all and only the
    elements of the input and has an S at the top
  • For now enumerate all possible trees
  • A further task disambiguation means choosing
    the correct tree from among all the possible
    trees.

48
Treebanks
  • Parsed corpora in the form of trees
  • The Penn Treebank
  • The Brown corpus
  • The WSJ corpus
  • Tgrep
  • http//www.ldc.upenn.edu/ldc/online/treebank/
  • Tregex
  • http//www-nlp.stanford.edu/nlp/javadoc/javanlp/

49
Parsing involves search
  • As with everything of interest, parsing involves
    a search which involves the making of choices
  • Well start with some basic (meaning bad) methods
    before moving on to the one or two that you need
    to know

50
For Now
  • Assume
  • You have all the words already in some buffer
  • The input isnt pos tagged
  • We wont worry about morphological analysis
  • All the words are known

51
Top-Down Parsing
  • Since were trying to find trees rooted with an S
    (Sentences) start with the rules that give us an
    S.
  • Then work your way down from there to the words.

52
Top Down Space
53
Bottom-Up Parsing
  • Of course, we also want trees that cover the
    input words. So start with trees that link up
    with the words in the right way.
  • Then work your way up from there.

54
Bottom-Up Space
55
Control
  • Of course, in both cases we left out how to keep
    track of the search space and how to make choices
  • Which node to try to expand next
  • Which grammar rule to use to expand a node

56
Top-Down, Depth-First, Left-to-Right Search
57
Example
58
Example
59
Example
60
Top-Down and Bottom-Up
  • Top-down
  • Only searches for trees that can be answers (i.e.
    Ss)
  • But also suggests trees that are not consistent
    with the words
  • Bottom-up
  • Only forms trees consistent with the words
  • Suggest trees that make no sense globally

61
So Combine Them
  • There are a million ways to combine top-down
    expectations with bottom-up data to get more
    efficient searches
  • Most use one kind as the control and the other as
    a filter
  • As in top-down parsing with bottom-up filtering

62
Adding Bottom-Up Filtering
63
3 problems with TDDFLtR Parser
  • Left-Recursion
  • Ambiguity
  • Inefficient reparsing of subtrees

64
Left-Recursion
  • What happens in the following situation
  • S -gt NP VP
  • S -gt Aux NP VP
  • NP -gt NP PP
  • NP -gt Det Nominal
  • With the sentence starting with
  • Did the flight

65
Ambiguity
  • One morning I shot an elephant in my pajamas.
    How he got into my pajamas I dont know.
  • Groucho Marx

66
Lots of ambiguity
  • VP -gt VP PP
  • NP -gt NP PP
  • Show me the meal on flight 286 from SF to Denver
  • 14 parses!

67
Lots of ambiguity
  • Church and Patil (1982)
  • Number of parses for such sentences grows at rate
    of number of parenthesizations of arithmetic
    expressions
  • Which grow with Catalan numbers
  • PPs Parses
  • 1 2
  • 2 5
  • 3 14
  • 4 132
  • 5 469
  • 6 1430

68
Avoiding Repeated Work
  • Parsing is hard, and slow. Its wasteful to redo
    stuff over and over and over.
  • Consider an attempt to top-down parse the
    following as an NP
  • A flight from Indianapolis to Houston on TWA

69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Grammars and Parsing
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Baby parsers Top-down and Bottom-up Parsing
  • Thursday Real parsers Dynamic Programming
    parsing
  • CKY
  • Probabilistic parsing
  • Optional section the Earley algorithm
Write a Comment
User Comments (0)
About PowerShow.com