Title: Parsing Natural Languages with Contextfree Grammars
1Parsing Natural Languages with Context-free
Grammars
- Martin Volk
- Computational Linguistics
- Stockholm University
- volk_at_ling.su.se
2The Chomsky Hierarchy
3The Chomsky Hierarchy
- states restrictions on rules.
- Given
- A, B are non-terminals.
- x is a string of terminals.
- ?,?,? are arbitrary strings (of terminals and
non-terminals). - then each rule is of the form
- Type 3 A ? xB or A ? x
- Type 2 A ? ?
- Type 1 ? A ? ? ? ? ? where ? is not empty
- Type 0 left side of the rule is not empty
4Context-free grammars
- (may) have rules like
- NP ? Det N
- PP ? Prep NP
- cannot have rules like
- NP PP ? PP NP
- ADV anfangen ? fangen ADV an
- This restriction has implications for the
processing resources and speed.
5Issues
- Why do computational linguists use formal
grammars for describing natural languages? - Are natural languages context-free languages?
- Are there grammar formalisms that linguists
prefer? ? ID/LP-grammars
6The goal of Natural Language Processing (NLP)
- Given a natural language utterance (written or
spoken) - Determine who did what to whom, when, where,
how, why (for what reasons, for what purpose)? - Towards this goal Determine the syntactic
structure of an utterance.
7Steps to syntax analysis
- For every word in the input string determine its
word class. - Group all words into constituents.
- Determine the linguistic functions (subject,
object, etc.) of the constituents. - Determine the logical functions (agent,
recipient, transfered-object, place, time )
8An example
- A book was given to Mary by Peter.
det
noun
aux
verb
prep
name
prep
name
9An example
- A book was given to Mary by Peter.
det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
10An example
- A book was given to Mary by Peter.
det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
11An example
- A book was given to Mary by Peter.
det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
sentence
12An example
Logical subject
Logical object
- A book was given to Mary by Peter.
det
noun
aux
verb
prep
name
prep
name
noun phrase
prep phrase
verb group
prep phrase
verb phrase
passive sentence
13Result
- Agent (the giver) Peter
- The object a book
- Recipient Mary
- Action giving
- When in the past
- Via inference
- Who has a book now? Mary
14The context-free rules of a natural language
grammar
- Noun_Phrase ? Determiner Noun
- a book
- the house
- some houses
- 50 books
- Peters house
15The context-free rules of a natural language
grammar
- Adjective_Phrase ? Adjective
- Adjective_Phrase ? Adverb Adjective
- nice
- nicest
- very nice
- hardly finished
16The context-free rules of a natural language
grammar
- Noun_Phrase ? Det Adjective_Phrase Noun
- a nice book
- the old house
- some very old houses
- 50 green books
17The context-free rules of a natural language
grammar
- Prep_Phrase ? Preposition Noun_Phrase
- with a nice book
- through the old house
- in some very old houses
- for 50 green books
18The context-free rules of a natural language
grammar
- (may) include recursion (direct and indirect)
- Examples
- NP ? NP PP the bridge over the Nile
- NP ? NP Srelative a student who likes this
course - Srelative ? NP VP who likes this course
19NP
Srel
VP
NP
NP
NP
det noun rel-pron verb det
noun
a student who likes this course
20Formal Definition of a Context-free Grammar
- A context-free grammar consists of
- a set of non-terminal symbols N
- set of terminals ?
- a set of productions A ? ?
- A ?N, ?-string ? (??N)
- a designated start symbol (from N)
21Context-free grammars for natural language
- A set of non-terminal symbols N
- word class symbols (N, V, Adj, Adv, P)
- linguistic constituent symbols (NP, VP, AdjP,
AdvP, PP) - A set of terminals ?
- all words of the English language
- A set of productions A ? ?
- the grammar rules (e.g. NP ? Det, AdjP, N)
- A designated start symbol
- a symbol for the complete sentence
22How many ?
- non-terminals do we need?
- word class symbols (N, V, Adj, Adv, P)
- usually between 20 and 50
- linguistic constituent symbols (NP, VP, )
- usually between 10 and 20
- terminals do we need?
- words of the English language?
- Different word stems (see, walk, give)
- gt 50000
- Different word forms (see, sees, saw, seen)
- gt 100000
23How many ?
- grammar rules do we need?
- NP ? Name Mary, Peter
- NP ? Det Noun a book
- PP ? Prep NP to Mary
- VP ? V NP PP gave a book to Mary
- VP ? V NP NP gave Mary a book
- Problem This grammar will also accept
- Peter give Mary a books. agreement problem
- Peter sees Mary a book. complement problem
24Agreement Why bother?
- Peter give Mary a books.
- Consider
- Peter threw the books into the garbage can that
are old and grey. - Peter threw the books into the garbage can that
is old and grey. - Agreement can help us determine the intended
meaning!
25Agreement First approach
- NPsg ? Namesg Mary, Peter
- NPsg ? Detsg Nounsg a book
- NPpl ? Detpl Nounpl the books
- PP ? Prep NPsg to Mary
- PP ? Prep NPpl for the books
- VP ? V NPsg NPsg gave Mary a book
- VP ? V NPsg NPpl gave Mary the books
- VP ? V NPpl NPsg gave the kids a book
- VP ? V NPpl NPpl gave the kids the books
- Combinatorial explosion too many rules
26Agreement Better approach
- Variables ensure agreement via feature
unification. - NPNum ? NameNum
- Mary, Peter
- NPNum ? DetNum NounNum
- a book, the books
- PP ? Prep NPX
- to Mary, for the books
- VPNum ? VNum NPX NPY
- give Mary a book gives Mary the books
27Subcategorization
- Verbs have preferences for the kinds of
constituents they co-occur with. - For example
- VP ? Verb (disappear)
- VP ? Verb NP (prefer a morning flight)
- VP ? Verb NP PP (leave Boston in the morning)
- VP ? Verb PP (leaving on Thursday)
- But not I disappeared the cat.
28Parsing as Search
- Top-down Parsing
- Bottom-up Parsing
- ? see Jurafsky slides
29That sounds nice
30from the Financial Times of Nov. 23. 2004 at
http//news.ft.com/home/europe
- McDonalds CEO steps down to battle cancerBy
Neil Buckley in New YorkPublished November 23
2004 0051 - Last updated November 23 2004 0051
- McDonald's said on Monday night Charlie Bell
would step down as chief executive to devote his
time to battling colorectal cancer, dealing
another blow to the world's largest fast food
company. -
- Mr Bell's resignation comes just seven months
after James Cantalupo, its former chairman and
chief executive, died from a heart attack. - McDonald's moved quickly to close the gap,
appointing Jim Skinner, currently vice-chairman,
to the chief executive's role.
31Problems when parsing natural language sentences
- Words that are (perhaps) not in the lexicon.
- Proper names
- James Cantalupo, McDonald's, InterContinental, GE
- Compounded words ? need to be segmented
- kurskamrater, kurslitteratur, kursavsnitt,
kursplaneundersökningarna, kursförluster - valutakurs, snabbkurs, säljkurser aktiekurser,
valutakursindex - Foreign language expressions
- Don Kerr är Mellanösternspecialist på The
International Institute for Strategic Studies i
London , högt ansedd , oberoende thinktank . - Multiword expressions
- Idioms to deal another blow
- Metaphors
- to battle cancer
32Problems when parsing natural language sentences
- Ambiguities
- Word level (kurs as in valutakurs or kurskamrat)
- Sentence level
- He sees the man with the telescope.
- Old men and women left the occupied city.
- Additional knowledge sources are needed to
resolve ambiguities - More world knowledge
- Statistical knowledge (Parsing preferences)
33How can we obtain statistical preferences?
- From a parsed and manually checked corpus (
collection of sentences) - Such a corpus is usually a database that contains
the correct syntax tree with each sentence
(therefore called a treebank). - Building a treebank is very time-consuming.
34- Can all the syntax of natural language be
described with context-free rules? - Are there phenomena in natural language that
require context-sensitive rules?
35Limits of Context-free Grammars
- It is not possible to write a context-free
grammar - (or to design a Push-Down Automaton (PDA))
- for the language L anbnan n gt 0
- Why?
- Intuitively The memory component of a PDA works
like a stack. One stack! So, it can only be used
to count once.
36Are natural languages context-free?
- Yes!
- But there is a famous paper about some
constructions in Swiss German of the form - w an bm x cn dm y
- Jan säit, das mer (em Hans) (es huus) (hälfed)
(aastriiche). - Jan säit, das mer (dchind)n (em Hans)m (es huus)
(haend wele laa)n (hälfe)m (aastriiche). - but they are rather strange and rare.
- The claim that they are not context-free relies
on the assumption that n and m are unbounded.
37The notion of context
- We need context to understand a natural
language utterance! - This notion of context is different from the
notion of context in the name context-free
languages.
38Do linguists like context-free grammars? Not
really
39Linguists want
- to express grammar rules on different abstract
levels. - For example Instead of saying
- NP ? NP Conj NP the boy and the girl
- VP ? VP Conj VP sang and danced
- AdjP ? AdjP Conj AdjP wise and very famous
- they would like to say
- XP ? XP Conj XP
40Linguists want
- (to be able) to state dominance and precedence
separately. - Peter dropped the course happily.
- Happily Peter dropped the course.
- S ? Adv S
- S ? S Adv
41Context-free Grammars
- Context-free grammar rules encode both
- Dominance and
- Precedence information
- Example
- A? B C D
- A dominates B and C and D
- and B precedes C which in turn precedes D
42ID/LP-Grammars
- ID/LP-Grammars have separate rules
- ID (Immediate dominance) rules and
- LP (Linear precedence) rules.
- Example
- ID-rule A? B, C, D
- A dominates B and C and D
- LP-rule B lt C
- B precedes C
- ID/LP Grammars have been proposed in Linguistics,
e.g. in Generalized Phrase Structure Grammar
(GPSG by Gazdar, Klein, Pullum, Sag, 1985)
43ID/LP-Grammars
- Example from German
- Gestern hat VP der Professor der Sekretärin
diese Blumen geschenkt. - Gestern hat VP der Professor diese Blumen der
Sekretärin geschenkt. - Gestern hat VP diese Blumen der Professor der
Sekretärin geschenkt. - Gestern hat VP diese Blumen der Sekretärin der
Professor geschenkt. - Gestern hat VP der Sekretärin der Professor
diese Blumen geschenkt. - Gestern hat VP der Sekretärin diese Blumen der
Professor geschenkt.
44ID/LP-Grammars
- The German verb phrase (or Mittelfeld) consists
of - an NP_nominative
- an NP_dative
- an NP_accusative
- a verb
- To account for all order variations will require
6 context-free grammar rules, - but it requires only one ID-rule plus one
LP-rule - VP ? NP_accusative, NP_dative, NP_nominative, V
- NP lt V
45ID/LP-Grammars vs. Context-free Grammars
- All ID/LP-grammars can be transformed into
strongly equivalent context-free grammars. - Some cf grammars cannot be transformed into
strongly equivalent ID/LP grammars. - Example The cf grammar consisting of the rule
A? aca - cannot be transformed into a strongly equivalent
ID/LP grammar, because of contradictiory ordering
constraints - a before c AND c before a
- An additional non-terminal is required
- ID-rules A? Z,a Z ? a,c
- LP-rules Z lt a a lt c
46Summary
- Why do computational linguists use formal
grammars for describing natural languages? - As an intermediate step to capture the meaning of
natural language utterances. - Are natural languages context-free languages?
- The syntax of natural languages can be described
with context-free grammars (in general). - What grammar formalisms do linguistics prefer?
- Linguists want to describe natural language as
precise and as comfortable as possible. They
prefer grammar formalisms with feature variables,
metarules, ID/LP separation, schemata, abstract
rules
47Any Questions?