Title: 74.406 Natural Language Processing
174.406 Natural Language Processing
- Christel Kemke
- Department of Computer Science
- University of Manitobe
74.406 Natural Language Processing, 1st term
2004/5
2Evolution of Human Language
- communication for "work"
- social interaction
- basis of cognition and thinking
- (Whorff Saphir)
3Communication
"Communication is the intentional exchange of
information brought about by the production and
perception of signs drawn from a shared system of
conventional signs." Russell Norvig, p.651
4Natural Language - General
- Natural Language is characterized by
- a common or shared set of signs alphabeth
lexicon - a systematic procedure to produce combinations of
signs - syntax
- a shared meaning of signs and combinations of
signs - (constructive) semantics
5Natural Language Processing Overview
- Speech Recognition
- Natural Language Processing
- Syntax
- Semantics
- Pragmatics
- Spoken Language
6Natural Language and Speech
- Speech Recognition
- acoustic signal as input
- conversion into phonemes and written words
- Natural Language Processing
- written text as input sentences (or
'utterances') - syntactic analysis parsing grammar
- semantic analysis "meaning", semantic
representation - pragmatics dialogue discourse metaphors
- Spoken Language Processing
- transcribed utterances
- Phenomena of spontaneous speech
7Words
8Morphology
- A morphological analyzer determines (at least)
- the stem ending of a word,
- and usually delivers related information, like
- the word class,
- the case and
- the person of the word.
- The morphology can be part of the lexicon or
implemented as a single component, for example as
a rule-based system. - eats ? eat s verb, singular, 3rd pers
- dog ? dog noun, singular
9Lexicon
- The Lexicon contains information on words, as
- inflected forms (e.g. goes, eats) or
- word-stems (e.g. go, eat).
- The Lexicon usually assigns a syntactic category,
- the word class or Part-of-Speech category
- Sometimes also
- further syntactic information (see Morphology)
- semantic information (e.g. semantic
classifications like agent) - syntactic-semantic information, e.g. on verb
complements like give requires a direct object.
10Lexicon
- Example contents
- eats ? verb singular, 3rd person
- can have direct object
- dog ? dog, noun, singular animal
- semantic annotation
11POS (Part-of-Speech) Tagging
- POS Tagging determines word class or
part-of-speech category (basic syntactic
categories) of single words or word-stems. - The det (determiner)
- dog noun
- eat, eats verb (3rd singular)
- the det
- bone noun
12NLP - Syntactic Analysis
Part-of-Speech (POS) Tagging
Morphological Analyzer
Parser
Grammar Rules
Lexicon
eat s eat verb Verb VP ? Verb
Noun VP recognized 3rd sing
VP Verb Noun
parse tree
13Syntax
14Language and Grammar
- Natural Language described as Formal Language L
using a Formal Grammar G - start-symbol S sentence
- non-terminals NT syntactic constituents
- terminals T lexical entries/ words
- production rules P grammar rules
- Generate sentences or recognize sentences
(Parsing) of the language L through the
application of grammar rules.
15Grammar
- Terminals can be words, part-of-speech
categories, or more complex lexical items
(including additional syntactic/semantic
information related to the word) - Non-Terminals represent (higher level) syntactic
categories
16Grammar
- Most often we deal with Context-free Grammars,
with a distinguished Start-symbol S (sentence). - det ? the
- noun ? dog bone
- verb ? eat eats
- NP ? det noun (NP ? noun phrase)
- VP ? verb (VP ? verb phrase)
- VP ? verb NP
- S ? NP VP (S ? sentence)
- Here, POS Tagging is included in the grammar.
17Parsing (here LR, bottom-up)
- Determine the syntactic structure of a sentence.
- the ? det POS Tagging
- dog ? noun
- det noun ? NP Rule application
- eats ? verb
- the ? det
- bone ? noun
- det noun ? NP
- verb NP ? VP
- NP VP ? S
18Syntax Analysis / Parsing
- Syntactic Structure often represented as Parse
Tree. - Connect symbols according to applied grammar
rules.
19Parse Trees
20Lexical Ambiguity
- Several word senses or word categories
- e.g. chase noun or verb
- e.g. plant - ????
21Syntactic Ambiguity
- Several parse trees
- e.g. The dog eats the bone in the park.
- e.g. The dog eats the bone in the package.
- Who/what is in the park and who/what is in the
package? - Syntactically speaking
- How do I bind the Prepositional Phrase "in the
..." ?
22Semantics
23Semantic Representation
- Representation meaning of the sentence.
- Generate
- a logic-based representation or
- a frame-based representation e.g.
- Fillmores case frames
- based on the syntactic structure, lexical
entries, and particularly the head-verb
(determines how to arrange parts of the sentence
in the semantic representation).
24Semantic Representation
- Verb-centered representation
- Verb (action, head) is regarded as center of
verbal expression and determines the case frame
with possible case roles other parts of the
sentence are described in relation to the action
as fillers of case slots. (cf. also Schanks CD
Theory) - Typing of case roles possible (e.g. 'agent'
refers to a specific sort or concept)
25General Frame for eat
- Agent animate
- Action eat
- Patiens food
- Manner e.g. fast
- Location e.g. in the yard
- Time e.g. at noon
26Frame with fillers for sample sentence
- Agent the dog
- Action eat
- Patiens the bone / the bone in the package
- Location in the park
27Frame with fillers for sample sentence
- Agent the dog
- Action eat
- Patiens the bone / the bone in the package
- Location in the park
28- General Frame for drive Frame with fillers
- Agent animate Agent she
- Action drive Action drives
- Patiens vehicle Patiens the convertible
- Mannerthe way it is done Manner fast
- Location Location-spec Location in the Rocky
Mountains - Source Location-spec Source from home
- Destination Location-spec Destination to the
ASIC conference - Time Time-spec Time in the summer holiday
29Pragmatics
30Pragmatics
- Pragmatics includes context-related aspects of NL
expressions (utterances). - These are in particular anaphoric references,
elliptic expressions, deictic expressions, - anaphoric references refer to items mentioned
before - deictic expressions simulate pointing
gestures - elliptic expressions incomplete expression
- relate to item mentioned
before
31Pragmatics
- I put the box on the top shelve.
- I know that. But I cant find it there.
-
-
deictic expression
anaphoric reference
32Pragmatics
- I put the box on the top shelve.
- I know that. But I cant find it there.
-
-
anaphoric reference
33Pragmatics
- I put the box on the top shelve.
- I know that. But I cant find it there.
-
deictic expression
34Pragmatics
- I put the box on the top shelve.
- I know that. But I cant find it there.
-
- The candy-box?
deictic expression
anaphoric reference
elliptic expression
35Pragmatics
- I put the box on the top shelve.
-
- The candy-box?
elliptic expression
36Intentions
- Intentions
- One philosophical assumption is that natural
language is used to achieve something - Do things with words.
- The meaning of an utterance is essentially
determined by the intention of the speaker.
37Intentionality - Examples
- What was said What was meant
- There is a terrible "Can you please
- draft here. close the window."
- How does it look "I am really mad
- here? clean up your room."
- "Will this ever end?" "I would prefer to be
- with my friends than to sit in class
now."
38Metaphors
- Metaphors
- The meaning of a sentence or expression is not
directly inferable from the sentence structure
and the word meanings. Metaphors transfer
concepts and relations from one area of discourse
into another area, for example, seeing time as
line (in space) or seing friendship or life as a
journey.
39Metaphors - Examples
- This car eats a lot of gas.
- She devoured the book.
- He was tied up with his clients.
- Marriage is like a journey.
- Their marriage was a one-way road into hell.
- (see also George Lakoff, e.g. Women, Fire and
Dangerous Things)
40Dialogue and Discourse
41Discourse / Dialogue Structure
- Grammar for various sentence types (speech acts)
dialogue, discourse, story grammar - Distinguish questions, commands, and statements
- Where is the remote-control?
- Bring the remote-control!
- The remote-control is on the brown table.
- Dialogue Grammars describe possible sequences of
Speech Acts in communication, e.g. that a
question is followed by an answer/statement. - Similar for Discourse (like continuous texts).
42Speech
43Speech Processing SystemsTypes and
Characteristics
- Speech Recognition vs. Speaker Recognition (Voice
Recognition Speaker Identification ) - speaker-dependent vs. speaker-independent
- training?
- unlimited vs. large vs. small vocabulary
- single word vs. continuous speech
44Speech Recognition Phases
- acoustic signal as input
- signal analysis - spectrogram
- feature extraction
- phoneme recognition
- word recognition
- conversion into written words
45Spoken Language
46Spoken Language
- Output of Speech Recognition System as input
"text". - Can be associated with probabilities for
different word sequences. - Contains ungrammatical structures, so-called
"disfluencies", e.g. repetitions and corrections.
47Spoken Language - Examples
- no s- straight southwest
- right to my my left
- that is that is correct
Robin J. Lickley. HCRC Disfluency Coding Manual.
http//www.ling.ed.ac.uk/robin/maptask/HCRCdsm-
01.html
48Spoken Language - Disfluency
come to ... walk right to the ... the
right-hand side of the page
Reparandum
Repair
49Spoken Language - Example
- we're going to g-- ... turn straight back
around for testing. - come to ... walk right to the ... right-hand
side of the page. - right up ... past ... up on the left of the ...
white mountain walk ... right up past. - i'm still ... i've still gone halfway back
round the lake again.
50Spoken Language - Example
- Id d if I need to go
- its basi-- see if you go over the old mill
- you are going make a gradual slope to your
right - Ive got one I dont realize why it is there
51(No Transcript)