Parsing: computing the grammatical structure of English sentences - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Parsing: computing the grammatical structure of English sentences

Description:

School of Computing FACULTY OF ENGINEERING Parsing: computing the grammatical structure of English sentences COMP3310 Natural Language Processing – PowerPoint PPT presentation

Number of Views:235
Avg rating:3.0/5.0
Slides: 37
Provided by: acuk
Category:

less

Transcript and Presenter's Notes

Title: Parsing: computing the grammatical structure of English sentences


1
Parsing computing the grammatical structure of
English sentences
School of Computing FACULTY OF ENGINEERING
  • COMP3310 Natural Language Processing
  • Eric Atwell, Language Research Group
  • (with thanks to Katja Markert, Marti Hearst, and
    other contributors)

2
Reminder Outline for Grammar/Parsing
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Sentence-level constructions
  • NP, PP, VP
  • Coordination
  • Subcategorization
  • Top-down and Bottom-up Parsing
  • Chart Parsing

3
CFG example
  • S -gt NP VP
  • NP -gt Det NOMINAL
  • NOMINAL -gt Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left
  • Alternatively
  • S -gt NP VP
  • NP -gt A N
  • VP -gt V
  • A -gt a
  • N -gt flight
  • V -gt left

4
Derivations
  • A derivation is a sequence of rules applied to a
    string that accounts for that string
  • Covers all the elements in the string
  • Covers only the elements in the string

5
Bracketed Notation
  • S NP PRO I VP V prefer NP Det a Nom
    N morning
  • N flight

6
CFGs a summary
  • CFGs appear to be just about what we need to
    account for a lot of basic syntactic structure in
    English.
  • But there are problems
  • That can be dealt with adequately, although not
    elegantly, by staying within the CFG framework.
  • There are simpler, more elegant, solutions that
    take us out of the CFG framework (beyond its
    formal power).
  • Syntactic theories TG, HPSG, LFG, GPSG, etc.

7
Other syntactic stuff
  • Grammatical relations or functions
  • Subject
  • I booked a flight to New York
  • The flight was booked by my agent
  • Object
  • I booked a flight to New York
  • Complement
  • I said that I wanted to leave

8
Dependency parsing
  • Word to word links instead of constituency
  • Based on the European rather than American
    traditions
  • But dates back to ancient Greek and Arab scholars
  • Eg see Quranic Arabic Corpus
  • The original notions of Subject, Object and the
    progenitor of subcategorization (called
    valence) came out of Dependency theory.
  • Dependency parsing is quite popular as a
    computational model
  • since relationships between words are quite
    useful

9
Dependency parsing
Parse tree Nesting of multi-word constituents
Typed dep parse Grammatical relations between
individual words
10
Why are dependency parses useful?
  • Example multi-document summarization
  • Need to identify sentences from different
    documents that each say roughly the same thing
  • phrase structure trees of paraphrasing sentences
    which differ in word order can be significantly
    different
  • but dependency representations will be very
    similar

11
Parsing
  • Parsing assigning correct trees to input strings
  • Correct tree a tree that covers all and only the
    elements of the input and has an S at the top
  • For now enumerate all possible trees
  • A further task disambiguation means choosing
    the correct tree from among all the possible
    trees.

12
Treebanks
  • Parsed corpora in the form of trees
  • The Penn Treebank
  • The Brown corpus
  • The WSJ corpus
  • Tgrep
  • http//www.ldc.upenn.edu/ldc/online/treebank/
  • Tregex
  • http//www-nlp.stanford.edu/nlp/javadoc/javanlp/

13
Parsing involves search
  • As with everything of interest, parsing involves
    a search which involves the making of choices
  • Well start with some basic (meaning bad) methods
    before moving on to the one or two that you need
    to know

14
For Now
  • Assume
  • You have all the words already in some buffer
  • The input isnt pos tagged
  • We wont worry about morphological analysis
  • All the words are known

15
Top-Down Parsing
  • Since were trying to find trees rooted with an S
    (Sentences) start with the rules that give us an
    S.
  • Then work your way down from there to the words.

16
Top Down Space
17
Bottom-Up Parsing
  • Of course, we also want trees that cover the
    input words. So start with trees that link up
    with the words in the right way.
  • Then work your way up from there.

18
Bottom-Up Space
19
Control
  • Of course, in both cases we left out how to keep
    track of the search space and how to make choices
  • Which node to try to expand next
  • Which grammar rule to use to expand a node

20
Top-Down, Depth-First, Left-to-Right Search
21
Example
22
Example
23
Example
24
Top-Down and Bottom-Up
  • Top-down
  • Only searches for trees that can be answers (i.e.
    Ss)
  • But also suggests trees that are not consistent
    with the words
  • Bottom-up
  • Only forms trees consistent with the words
  • Suggest trees that make no sense globally

25
So Combine Them
  • There are a million ways to combine top-down
    expectations with bottom-up data to get more
    efficient searches
  • Most use one kind as the control and the other as
    a filter
  • As in top-down parsing with bottom-up filtering

26
Adding Bottom-Up Filtering
27
3 problems with TDDFLtR Parser
  • Left-Recursion
  • Ambiguity
  • Inefficient reparsing of subtrees

28
Left-Recursion
  • What happens in the following situation
  • S -gt NP VP
  • S -gt Aux NP VP
  • NP -gt NP PP
  • NP -gt Det Nominal
  • With the sentence starting with
  • Did the flight

29
Ambiguity
  • One morning I shot an elephant in my pajamas.
    How he got into my pajamas I dont know.
  • Groucho Marx

30
Lots of ambiguity
  • VP -gt VP PP
  • NP -gt NP PP
  • Show me the meal on flight 286 from SF to Denver
  • 14 parses!

31
Lots of ambiguity
  • Church and Patil (1982)
  • Number of parses for such sentences grows at rate
    of number of parenthesizations of arithmetic
    expressions
  • Which grow with Catalan numbers
  • PPs Parses
  • 1 2
  • 2 5
  • 3 14
  • 4 132
  • 5 469
  • 6 1430

32
Chart ParserAvoiding Repeated Work
  • Parsing is hard, and slow. Its wasteful to redo
    stuff over and over and over.
  • A CHART PARSER maintains a CHART a table of
    partial parses found so far, to re-use if
    required.
  • Consider an attempt to top-down parse the
    following as an NP
  • A flight from Indianapolis to Houston on TWA

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Grammars and Parsing
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Basic parsers Top-down and Bottom-up Parsing
  • Chart Parser keep a CHART of partial parses
Write a Comment
User Comments (0)
About PowerShow.com