Parsing Natural Languages with Contextfree Grammars - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Parsing Natural Languages with Contextfree Grammars

Description:

Are natural languages context-free languages? ... det noun rel-pron verb det noun. NP. NP. NP. VP. Srel. NP. Martin Volk. 20. 11/8/09 ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 48
Provided by: lin87
Category:

less

Transcript and Presenter's Notes

Title: Parsing Natural Languages with Contextfree Grammars


1
Parsing Natural Languages with Context-free
Grammars
  • Martin Volk
  • Computational Linguistics
  • Stockholm University
  • volk_at_ling.su.se

2
The Chomsky Hierarchy
3
The Chomsky Hierarchy
  • states restrictions on rules.
  • Given
  • A, B are non-terminals.
  • x is a string of terminals.
  • ?,?,? are arbitrary strings (of terminals and
    non-terminals).
  • then each rule is of the form
  • Type 3 A ? xB or A ? x
  • Type 2 A ? ?
  • Type 1 ? A ? ? ? ? ? where ? is not empty
  • Type 0 left side of the rule is not empty

4
Context-free grammars
  • (may) have rules like
  • NP ? Det N
  • PP ? Prep NP
  • cannot have rules like
  • NP PP ? PP NP
  • ADV anfangen ? fangen ADV an
  • This restriction has implications for the
    processing resources and speed.

5
Issues
  • Why do computational linguists use formal
    grammars for describing natural languages?
  • Are natural languages context-free languages?
  • Are there grammar formalisms that linguists
    prefer? ? ID/LP-grammars

6
The goal of Natural Language Processing (NLP)
  • Given a natural language utterance (written or
    spoken)
  • Determine who did what to whom, when, where,
    how, why (for what reasons, for what purpose)?
  • Towards this goal Determine the syntactic
    structure of an utterance.

7
Steps to syntax analysis
  • For every word in the input string determine its
    word class.
  • Group all words into constituents.
  • Determine the linguistic functions (subject,
    object, etc.) of the constituents.
  • Determine the logical functions (agent,
    recipient, transfered-object, place, time )

8
An example
  • A book was given to Mary by Peter.

det
noun
aux
verb
prep
name
prep
name
9
An example
  • A book was given to Mary by Peter.

det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
10
An example
  • A book was given to Mary by Peter.

det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
11
An example
  • A book was given to Mary by Peter.

det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
sentence
12
An example
Logical subject
Logical object
  • A book was given to Mary by Peter.

det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
passive sentence
13
Result
  • Agent (the giver) Peter
  • The object a book
  • Recipient Mary
  • Action giving
  • When in the past
  • Via inference
  • Who has a book now? Mary

14
The context-free rules of a natural language
grammar
  • Noun_Phrase ? Determiner Noun
  • a book
  • the house
  • some houses
  • 50 books
  • Peters house

15
The context-free rules of a natural language
grammar
  • Adjective_Phrase ? Adjective
  • Adjective_Phrase ? Adverb Adjective
  • nice
  • nicest
  • very nice
  • hardly finished

16
The context-free rules of a natural language
grammar
  • Noun_Phrase ? Det Adjective_Phrase Noun
  • a nice book
  • the old house
  • some very old houses
  • 50 green books

17
The context-free rules of a natural language
grammar
  • Prep_Phrase ? Preposition Noun_Phrase
  • with a nice book
  • through the old house
  • in some very old houses
  • for 50 green books

18
The context-free rules of a natural language
grammar
  • (may) include recursion (direct and indirect)
  • Examples
  • NP ? NP PP the bridge over the Nile
  • NP ? NP Srelative a student who likes this
    course
  • Srelative ? NP VP who likes this course

19
NP
Srel
VP
NP
NP
NP
det noun rel-pron verb det
noun
a student who likes this course
20
Formal Definition of a Context-free Grammar
  • A context-free grammar consists of
  • a set of non-terminal symbols N
  • set of terminals ?
  • a set of productions A ? ?
  • A ?N, ?-string ? (??N)
  • a designated start symbol (from N)

21
Context-free grammars for natural language
  • A set of non-terminal symbols N
  • word class symbols (N, V, Adj, Adv, P)
  • linguistic constituent symbols (NP, VP, AdjP,
    AdvP, PP)
  • A set of terminals ?
  • all words of the English language
  • A set of productions A ? ?
  • the grammar rules (e.g. NP ? Det, AdjP, N)
  • A designated start symbol
  • a symbol for the complete sentence

22
How many ?
  • non-terminals do we need?
  • word class symbols (N, V, Adj, Adv, P)
  • usually between 20 and 50
  • linguistic constituent symbols (NP, VP, )
  • usually between 10 and 20
  • terminals do we need?
  • words of the English language?
  • Different word stems (see, walk, give)
  • gt 50000
  • Different word forms (see, sees, saw, seen)
  • gt 100000

23
How many ?
  • grammar rules do we need?
  • NP ? Name Mary, Peter
  • NP ? Det Noun a book
  • PP ? Prep NP to Mary
  • VP ? V NP PP gave a book to Mary
  • VP ? V NP NP gave Mary a book
  • Problem This grammar will also accept
  • Peter give Mary a books. agreement problem
  • Peter sees Mary a book. complement problem

24
Agreement Why bother?
  • Peter give Mary a books.
  • Consider
  • Peter threw the books into the garbage can that
    are old and grey.
  • Peter threw the books into the garbage can that
    is old and grey.
  • Agreement can help us determine the intended
    meaning!

25
Agreement First approach
  • NPsg ? Namesg Mary, Peter
  • NPsg ? Detsg Nounsg a book
  • NPpl ? Detpl Nounpl the books
  • PP ? Prep NPsg to Mary
  • PP ? Prep NPpl for the books
  • VP ? V NPsg NPsg gave Mary a book
  • VP ? V NPsg NPpl gave Mary the books
  • VP ? V NPpl NPsg gave the kids a book
  • VP ? V NPpl NPpl gave the kids the books
  • Combinatorial explosion too many rules

26
Agreement Better approach
  • Variables ensure agreement via feature
    unification.
  • NPNum ? NameNum
  • Mary, Peter
  • NPNum ? DetNum NounNum
  • a book, the books
  • PP ? Prep NPX
  • to Mary, for the books
  • VPNum ? VNum NPX NPY
  • give Mary a book gives Mary the books

27
Subcategorization
  • Verbs have preferences for the kinds of
    constituents they co-occur with.
  • For example
  • VP ? Verb (disappear)
  • VP ? Verb NP (prefer a morning flight)
  • VP ? Verb NP PP (leave Boston in the morning)
  • VP ? Verb PP (leaving on Thursday)
  • But not I disappeared the cat.

28
Parsing as Search
  • Top-down Parsing
  • Bottom-up Parsing
  • ? see Jurafsky slides

29
That sounds nice
  • where is the problem?

30
from the Financial Times of Nov. 23. 2004 at
http//news.ft.com/home/europe
  • McDonalds CEO steps down to battle cancerBy
    Neil Buckley in New YorkPublished November 23
    2004 0051
  • Last updated November 23 2004 0051
  • McDonald's said on Monday night Charlie Bell
    would step down as chief executive to devote his
    time to battling colorectal cancer, dealing
    another blow to the world's largest fast food
    company.
  • Mr Bell's resignation comes just seven months
    after James Cantalupo, its former chairman and
    chief executive, died from a heart attack.
  • McDonald's moved quickly to close the gap,
    appointing Jim Skinner, currently vice-chairman,
    to the chief executive's role.

31
Problems when parsing natural language sentences
  • Words that are (perhaps) not in the lexicon.
  • Proper names
  • James Cantalupo, McDonald's, InterContinental, GE
  • Compounded words ? need to be segmented
  • kurskamrater, kurslitteratur, kursavsnitt,
    kursplaneundersökningarna, kursförluster
  • valutakurs, snabbkurs, säljkurser aktiekurser,
    valutakursindex
  • Foreign language expressions
  • Don Kerr är Mellanösternspecialist pÃ¥ The
    International Institute for Strategic Studies i
    London , högt ansedd , oberoende thinktank .
  • Multiword expressions
  • Idioms to deal another blow
  • Metaphors
  • to battle cancer

32
Problems when parsing natural language sentences
  • Ambiguities
  • Word level (kurs as in valutakurs or kurskamrat)
  • Sentence level
  • He sees the man with the telescope.
  • Old men and women left the occupied city.
  • Additional knowledge sources are needed to
    resolve ambiguities
  • More world knowledge
  • Statistical knowledge (Parsing preferences)

33
How can we obtain statistical preferences?
  • From a parsed and manually checked corpus (
    collection of sentences)
  • Such a corpus is usually a database that contains
    the correct syntax tree with each sentence
    (therefore called a treebank).
  • Building a treebank is very time-consuming.

34
  • Can all the syntax of natural language be
    described with context-free rules?
  • Are there phenomena in natural language that
    require context-sensitive rules?

35
Limits of Context-free Grammars
  • It is not possible to write a context-free
    grammar
  • (or to design a Push-Down Automaton (PDA))
  • for the language L anbnan n gt 0
  • Why?
  • Intuitively The memory component of a PDA works
    like a stack. One stack! So, it can only be used
    to count once.

36
Are natural languages context-free?
  • Yes!
  • But there is a famous paper about some
    constructions in Swiss German of the form
  • w an bm x cn dm y
  • Jan säit, das mer (em Hans) (es huus) (hälfed)
    (aastriiche).
  • Jan säit, das mer (dchind)n (em Hans)m (es huus)
    (haend wele laa)n (hälfe)m (aastriiche).
  • but they are rather strange and rare.
  • The claim that they are not context-free relies
    on the assumption that n and m are unbounded.

37
The notion of context
  • We need context to understand a natural
    language utterance!
  • This notion of context is different from the
    notion of context in the name context-free
    languages.

38
Do linguists like context-free grammars? Not
really
39
Linguists want
  • to express grammar rules on different abstract
    levels.
  • For example Instead of saying
  • NP ? NP Conj NP the boy and the girl
  • VP ? VP Conj VP sang and danced
  • AdjP ? AdjP Conj AdjP wise and very famous
  • they would like to say
  • XP ? XP Conj XP

40
Linguists want
  • (to be able) to state dominance and precedence
    separately.
  • Peter dropped the course happily.
  • Happily Peter dropped the course.
  • S ? Adv S
  • S ? S Adv

41
Context-free Grammars
  • Context-free grammar rules encode both
  • Dominance and
  • Precedence information
  • Example
  • A? B C D
  • A dominates B and C and D
  • and B precedes C which in turn precedes D

42
ID/LP-Grammars
  • ID/LP-Grammars have separate rules
  • ID (Immediate dominance) rules and
  • LP (Linear precedence) rules.
  • Example
  • ID-rule A? B, C, D
  • A dominates B and C and D
  • LP-rule B lt C
  • B precedes C
  • ID/LP Grammars have been proposed in Linguistics,
    e.g. in Generalized Phrase Structure Grammar
    (GPSG by Gazdar, Klein, Pullum, Sag, 1985)

43
ID/LP-Grammars
  • Example from German
  • Gestern hat VP der Professor der Sekretärin
    diese Blumen geschenkt.
  • Gestern hat VP der Professor diese Blumen der
    Sekretärin geschenkt.
  • Gestern hat VP diese Blumen der Professor der
    Sekretärin geschenkt.
  • Gestern hat VP diese Blumen der Sekretärin der
    Professor geschenkt.
  • Gestern hat VP der Sekretärin der Professor
    diese Blumen geschenkt.
  • Gestern hat VP der Sekretärin diese Blumen der
    Professor geschenkt.

44
ID/LP-Grammars
  • The German verb phrase (or Mittelfeld) consists
    of
  • an NP_nominative
  • an NP_dative
  • an NP_accusative
  • a verb
  • To account for all order variations will require
    6 context-free grammar rules,
  • but it requires only one ID-rule plus one
    LP-rule
  • VP ? NP_accusative, NP_dative, NP_nominative, V
  • NP lt V

45
ID/LP-Grammars vs. Context-free Grammars
  • All ID/LP-grammars can be transformed into
    strongly equivalent context-free grammars.
  • Some cf grammars cannot be transformed into
    strongly equivalent ID/LP grammars.
  • Example The cf grammar consisting of the rule
    A? aca
  • cannot be transformed into a strongly equivalent
    ID/LP grammar, because of contradictiory ordering
    constraints
  • a before c AND c before a
  • An additional non-terminal is required
  • ID-rules A? Z,a Z ? a,c
  • LP-rules Z lt a a lt c

46
Summary
  • Why do computational linguists use formal
    grammars for describing natural languages?
  • As an intermediate step to capture the meaning of
    natural language utterances.
  • Are natural languages context-free languages?
  • The syntax of natural languages can be described
    with context-free grammars (in general).
  • What grammar formalisms do linguistics prefer?
  • Linguists want to describe natural language as
    precise and as comfortable as possible. They
    prefer grammar formalisms with feature variables,
    metarules, ID/LP separation, schemata, abstract
    rules

47
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com