Title: 74.419%20Artificial%20Intelligence
174.419 Artificial Intelligence
- Speech and Natural Language Processing
2Speech and Natural Language Processing
- Communication
- Natural Language
- Syntax
- Semantics
- Pragmatics
- Speech
3Evolution of Human Language
- communication for "work"
- social interaction
- basis of cognition and thinking
- (Whorff Saphir)
4Communication
"Communication is the intentional exchange of
information brought about by the production and
perception of signs drawn from a shared system of
conventional signs." Russell Norvig, p.651
5Natural Language - General
- Natural Language is characterized by
- a common or shared set of signs alphabeth
lexicon - a systematic procedure to produce combinations of
signs - syntax
- a shared meaning of signs and combinations of
signs - (constructive) semantics
6Speech and Natural Language
- Speech Recognition
- acoustic signal as input
- conversion into phonemes and written words
- Natural Language Processing
- written text as input sentences (or
'utterances') - syntactic analysis parsing grammar
- semantic analysis "meaning", semantic
representation - pragmatics
- dialogue discourse
- Spoken Language Processing
- transcribed utterances
- Phenomena of spontaneous speech
7Speech Recognition
Acoustic / sound wave
Filtering, FFT Spectral
Analysis Frequency Spectrum
Features (Phonemes Context)
Signal Processing / Analysis
Phoneme Recognition HMM, Neural
Networks Phonemes
Grammar or Statistics Phoneme Sequences /
Words
Grammar or Statistics for likely word
sequences Word Sequence / Sentence
8Areas in Natural Language Processing
- Morphology (word stem ending)
- Syntax, Grammar Parsing (syntactic description
analysis) - Semantics Pragmatics (meaning constructive
context-dependent references ambiguity) - Pragmatic Theory of Language Intentions
Metaphor (Communication as Action) - Discourse / Dialogue / Text
- Spoken Language Understanding
- Language Learning
9NLP Syntax Analysis - Processes
Part-of-Speech (POS) Tagging
Morphological Analyzer
Parser
Grammar Rules
Lexicon
the the determiner Det NP ? Det
Noun NP recognized NP Det
Noun parse tree
Linguistic Background Knowledge
10NLP - Syntactic Analysis
Part-of-Speech (POS) Tagging
Morphological Analyzer
Parser
Grammar Rules
Lexicon
eat s eat verb Verb VP ? Verb
Noun VP recognized 3rd sing
VP Verb Noun
parse tree
11Morphology
- A morphological analyzer determines (at least)
- the stem ending of a word,
- and usually delivers related information, like
- the word class,
- the number,
- the person and
- the case of the word.
- The morphology can be part of the lexicon or
implemented as a single component, for example as
a rule-based system. - eats ? eat s verb, singular, 3rd pers
- dog ? dog noun, singular
12Lexicon
- The Lexicon contains information on words, as
- inflected forms (e.g. goes, eats) or
- word-stems (e.g. go, eat).
- The Lexicon usually assigns a syntactic category,
- the word class or Part-of-Speech category
- Sometimes also
- further syntactic information (see Morphology)
- semantic information (e.g. agent)
- syntactic-semantic information (e.g. verb
complements like 'give' requires a direct
object).
13Lexicon
- Example contents
- eats ? verb singular, 3rd person (-s)
- can have direct object
- (verb subcategorization)
- dog ? dog, noun, singular
- animal
- (semantic annotation)
14POS (Part-of-Speech) Tagging
- POS Tagging determines the word class or
part-of-speech category (basic syntactic
categories) of single words or word-stems. - The det (determiner)
- dog noun
- eats verb (3rd person singular)
- the det
- bone noun
15Open Word Class Nouns
- Nouns denote objects, concepts,
- Proper Nouns
- Names for specific individual objects, entities
- e.g. the Eiffel Tower, Dr. Kemke
- Common Nouns
- Names for categories or classes or abstracts
- e.g. fruit, banana, table, freedom, sleep, ...
- Count Nouns
- enumerable entities, e.g. two bananas
- Mass Nouns
- not countable items, e.g. water, salt, freedom
16Open Word Class Verbs
- Verbs
- denote actions, processes, states
- e.g. smoke, dream, rest, run
- Several morphological forms e.g.
- non-3rd person - eat
- 3rd person - eats
- progressive/ - eating
- present participle/
- gerundive
- past participle - eaten
- Auxiliaries, e.g. be, as sub-class of verbs
17Open Word Class Adjectives
- Adjectives
- denote qualities or properties of objects, e.g.
heavy, blue, content - most languages have concepts for
- colour - white, green, ...
- age - young, old, ...
- value - good, bad, ...
- not all languages have adjectives as separate
class -
18Open Word Class Adverbs
- Adverbs
- denote modifications of actions (verbs),
qualities (adjectives) - e.g. walk slowly, heavily drunk
- Directional or Locational Adverbs
- Specify direction or location
- e.g. go home, stay here
- Degree Adverbs
- Specify extent of process, action, property
- e.g. extremely slow, very modest
19Open Word Class Adverbs 2
- Manner Adverbs
- Specify manner of action or process
- e.g. walk slowly, run fast
- Temporal Adverbs
- Specify time of event or action
- e.g. yesterday, Monday
20Closed Word Classes
- prepositions on, under, over, at, from, to,
with, ... - determiners a, an, the, ...
- pronouns he, she, it, his, her, who, I, ...
- conjunctions and, or, as, if, when, ...
- auxiliary verbs can, may, should, are
- particles up, down, on, off, in, out,
- numerals one, two, three, ..., first, second, ...
21Language and Grammar
- Natural Language described as Formal Language L
using a Formal Grammar G - start-symbol S sentence
- non-terminals NT syntactic constituents
- terminals T lexical entries/ words
- production rules P grammar rules
- Generate sentences or recognize sentences
(Parsing) of the language L through the
application of grammar rules.
22Grammar
- Here, POS Tags are included in the grammar rules.
- det ? the
- noun ? dog bone
- verb ? eat
- NP ? det noun (NP ? noun phrase)
- VP ? verb (VP ? verb phrase)
- VP ? verb NP
- S ? NP VP (S ? sentence)
- Most often we deal with Context-free Grammars,
with a distinguished Start-symbol S (sentence).
23Parsing
- Parsing
- derive the syntactic structure of a sentence
based on a language model (grammar) - construct a parse tree, i.e. the derivation of
the sentence based on the grammar (rewrite system)
24Parsing (here bottom-up)
- determine the syntactic structure of the sentence
- the ? det
- dog ? noun
- det noun ? NP
- eats ? verb
- the ? det
- bone ? noun
- det noun ? NP
- verb NP ? VP
- NP VP ? S
25Sample Grammar
Grammar (S, NT, T, P) - NT Non-Terminal T
Terminals P Productions Sentence Symbol S ? NT
Word-Classes / Part-of-Speech ? NT syntactic
Constituents ? NT terminal words ? NT Grammar
Rules P ? NT ? (NT ? T) S ? NP VP Aux NP
VP NP ? Det Nominal Proper-Noun Nominal ?
Noun Nominal PP VP ? Verb Verb NP Verb PP
Verb NP PP PP ? Prep NP Det ? that this
a Noun ? book flight meal money Proper-Noun
? Houston American Airlines TWA Verb ? book
include prefer Prep ? from to on Auc ? do
does
26Sample Parse Tree
Parse "Does this flight include a meal?" S
Aux NP VP
Det Nominal Verb NP
Noun Det Nominal does this
flight include a meal
27Bottom-up vs. Top-Down Parsing
Bottom-up from word-nodes to sentence-symbol
Top-down Parsing from sentence-symbol to
words S Aux NP
VP Det Nominal Verb NP
Noun Det Nominal does
this flight include a meal
28Ambiguity
- One morning, I shot an elephant in my pajamas.
- How he got into my pajamas, I dont know.
- Groucho Marx
- syntactical or structural ambiguity several
parse trees - example above sentence
- semantic or lexical ambiguity several word
meanings - bank (where you get money) and (river) bank
- even different word categories possible (interim)
- He books the flight. vs. The books are
here. - Fruit flies from the balcony vs. Fruit flies
are on the balcony.
29Lexical Ambiguity
- Several word senses or word categories
- e.g. chase noun or verb
- e.g. plant - ????
30Syntactic Ambiguity
- Several parse trees
- e.g. The dog eats the bone in the park.
- e.g. The dog eats the bone in the package.
- Who/what is in the park and who/what is in the
package? - Syntactically speaking
- How do I bind the Prepositional Phrase
- "in the ... " ?
31Problems in Parsing
- Problems with left-recursive rules like NP ? NP
PP dont know how many times recursion is
needed. - Pure Bottom-up or Top-down Parsing is inefficient
because it generates and explores too many
structures which in the end turn out to be
invalid. - Combine top-down and bottom-up approach
- Start with sentence use rules top-down
(look-ahead) read input try to find shortest
path from input to highest unparsed constituent
(from left to right). - ? Chart-Parsing / Earley-Parser
32Chart-Parsing / Early Algorithm
- Essence
- Integrate top-down and bottom-up parsing.
- Keep recognized sub-structures (sub-trees) for
shared use during parsing. - Top-down Prediction Start with S-symbol.
Generate all applicable rules for S. Go further
down with left-most constituent in rules and add
rules for these constituents until you encounter
a left-most node on the RHS which is a word
category (POS). - Bottom-up Completion Read input word and
compare. If word matches, mark as recognized and
continue the recognition bottom-up, trying to
complete active rules.
33Earley Algorithm - Functions
- predictor
- generates new rules for partly recognized RHS
with constituent right of (top-down
generation) - indicates how far a rule has been recognized
- scanner
- if word category (POS) is found right of the ,
the Scanner reads the next input word and adds a
rule for it to the chart (bottom-up mode) - completer
- if rule is completely recognized (the is far
right), the recognition state of earlier rules in
the chart advances the is moved over the
recognized constituent (bottom-up recognition).
34Chart
S ? VP .
VP? V NP .
NP? Det Nom .
Nom ? Noun .
Det
V
Noun
Book this flight
35(No Transcript)
36Semantics
37Semantic Representation
- Representation of the meaning of a sentence.
- Generate
- a logic-based representation or
- a frame-based representation
- based on the syntactic structure, lexical
entries, and particularly the head-verb
(determines how to arrange parts of the sentence
in the semantic representation).
38Semantic Representation
- Verb-centered Representation
- Verb (action, head) is regarded as center of
verbal expression and determines the case frame
with possible case roles other parts of the
sentence are described in relation to the action
as fillers of case slots. (cf. also Schanks CD
Theory) - Typing of case roles possible (e.g. 'agent'
refers to a specific sort or concept)
39General Frame for "eat"
- Agent animate
- Action eat
- Patiens food
- Manner e.g. fast
- Location e.g. in the yard
- Time e.g. at noon
40Example-Frame with Fillers
- Agent the dog
- Action eat
- Patiens the bone / the bone in the package
- Location in the park
41- General Frame for drive Frame with fillers
- Agent animate Agent she
- Action drive Action drives
- Patiens vehicle Patiens the convertible
- Mannerthe way it is done Manner fast
- Location Location-spec Location in the Rocky
Mountains - Source Location-spec Source from home
- Destination Location-spec Destination to the
ASIC conference - Time Time-spec Time in the summer holidays
42Representation in Logic
- Action eat
- Agent the dog
- Patiens the bone / the bone in the package
- Location in the park
predicate
eat (dog-1, bone-1, park-1)
constants
43Representation in Logic
eat (dog-1, bone-1, park-1)
lexical
variables
eat ( x, y, z )
general
syntactic
eat ( NP-1, NP-2, PP )
animate-being (x) food (y) location (z)
NP-1 (x) NP-2 (y) PP (z)
syntactic frame
semantic frame
44Pragmatics
45Pragmatics
- Pragmatics includes context-related aspects of NL
expressions (utterances). - These are in particular anaphoric references,
elliptic expressions, deictic expressions, - anaphoric references refer to items mentioned
before - deictic expressions simulate pointing
gestures - elliptic expressions incomplete expression
- relate to item mentioned
before
46Pragmatics
- I put the box on the top shelve.
- I know that. But I cant find it there.
deictic expression
anaphoric reference
The candy-box?
elliptic expression
47Intentions
- Intentions
- One philosophical assumption is that natural
language is used to achieve things or situations
Do things with words. - The meaning of an utterance is essentially
determined by the intention of the speaker.
48Intentionality - Examples
- What was said What was meant
- There is a terrible "Can you please
- draft here. close the window."
- How does it look "I am really mad
- here? clean up your room."
- "Will this ever end?" "I would prefer to be
- with my friends than to sit in class
now."
49Metaphors
- Metaphors
- The meaning of a sentence or expression is not
directly inferable from the sentence structure
and the word meanings. Metaphors transfer
concepts and relations from one area of discourse
into another area, for example, seeing time as
line (in space) or seing friendship or life as a
journey.
50Metaphors - Examples
- This car eats a lot of gas.
- She devoured the book.
- He was tied up with his clients.
- Marriage is like a journey.
- Their marriage was a one-way road into hell.
- (see George Lakoff, Women, Fire and Dangerous
Things)
51Dialogue and Discourse
52Discourse / Dialogue Structure
- Grammar for various sentence types (speech acts)
dialogue, discourse, story grammar - Distinguish questions, commands, and statements
- Where is the remote-control?
- Bring the remote-control!
- The remote-control is on the brown table.
- Dialogue Grammars describe possible sequences of
Speech Acts in communication, e.g. that a
question is followed by an answer/statement.
53Speech
54(No Transcript)
55Speech Production Reception
- Sound and Hearing
- change in air pressure ? sound wave
- reception through inner ear membrane / microphone
- break-up into frequency components receptors in
cochlea / mathematical frequency analysis (e.g.
Fast-Fourier Transform FFT) ? Frequency Spectrum - perception/recognition of phonemes and
subsequently words (e.g. Neural Networks,
Hidden-Markov Models)
56(No Transcript)
57Speech Recognition Phases
- Speech Recognition
- acoustic signal as input
- signal analysis - spectrogram
- feature extraction
- phoneme recognition
- word recognition
- conversion into written words
58Speech Signal
- Speech Signal composed of
- harmonic signal (sinus waves)
- with different frequencies and amplitudes
- frequency - waves/second ? like pitch
- amplitude - height of wave ? like loudness
- non-harmonic signal (not sinus wave) noise
-
59(No Transcript)
60glottis and speech signal in lingWAVES (from
http//www.lingcom.de)
61Speech Signal Analysis
- Analog-Digital Conversion of Acoustic Signal
- Sampling in Time Frames (windows)
- frequency 0-crossings per time frame
- ? e.g. 2 crossings/second is 1 Hz (1 wave)
- ? e.g. 10kHz needs sampling rate 20kHz
- measure amplitudes of signal in time frame
- ? digitized wave form
- separate different frequency components
- ? FFT (Fast Fourier Transform)
- ? spectrogram
- other frequency based representations
- ? LPC (linear predictive coding),
- ? Cepstrum
62Waveform
Amplitude/ Pressure
Time
"She just had a baby."
63Waveform for Vowel ae
Amplitude/ Pressure
Time
Time
64Waveform and Spectrogram
65Waveform and LPC Spectrum for Vowel ae
Amplitude/ Pressure
Time
Energy
Formants
Frequency
66Phoneme Recognition
- Recognition Process based on
- features extracted from spectral analysis
- phonological rules
- statistical properties of language/ pronunciation
- Recognition Methods
- Hidden Markov Models
- Neural Networks
- Pattern Classification in general
67Speech Signal Characteristics
- Derive from signal representation
- formants - dark stripes in spectrum
- strong frequency components characterize
particular vowels gender of speaker - pitch fundamental frequency
- baseline for higher frequency harmonics like
formants gender characteristic - change in frequency distribution
- characteristic for e.g. plosives (form of
articulation)
68Features for Vowels Consonants
69Probabilistic FAs as Word Models
70Word Recognition with Hidden Markov Model
71Viterbi-Algorithm
- The Viterbi Algorithm finds an optimal sequence
of states in continuous Speech Recognition, given
an observation sequence of phones and a
probabilistic (weighted) FA (state graph). The
algorithm returns the path through the automaton
which has maximum probability and accepts the
observation sequence. - as,s' is the transition probability (in the
phonetic word model) from current state s to next
state s', and bs',ot is the observation
likelihood of s' given ot. bs',ot is 1 if the
observation symbol matches the state, and 0
otherwise. - (cf. Jurafsky Ch.5)
72Speech Recognizer Architecture
73Speech Processing - Characteristics
- Speech Recognition vs. Speaker Identification
(Voice Recognition) - speaker-dependent vs. speaker-independent
- training
- unlimited vs. large vs. small vocabulary
- single word vs. continuous speech
74Spoken Language
75Spoken Language
- Output of Speech Recognition System as input
"text". - Can be associated with probabilities for
different word sequences. - Contains ungrammatical structures, so-called
"disfluencies", e.g. repetitions and corrections.
76(No Transcript)
77Spoken Language - Examples
- no s- straight southwest
- right to my my left
- that is that is correct
From Robin J. Lickley. HCRC Disfluency Coding
Manual. http//www.ling.ed.ac.uk/robin/maptask/
HCRCdsm-01.html
78Spoken Language - Examples
- we're going to g-- ... turn straight back
around for testing. - come to ... walk right to the ... right-hand
side of the page. - right up ... past ... up on the left of the ...
white mountain walk ... right up past. - i'm still ... i've still gone halfway back
round the lake again.
79Spoken Language - Examples
- Id d if I need to go
- its basi-- see if you go over the old mill
- you are going make a gradual slope to your
right - Ive got one I dont realize why it is there
80Spoken Language - Disfluency
come to ... walk right to the ... the
right-hand side of the page
Reparandum
Repair
81Additional References
- Jurafsky, D. J. H. Martin, Speech and Language
Processing, Prentice-Hall, 2000 - Hong, X. A. Acero H. Hon Spoken Language
Processing. A Guide to Theory, Algorithms, and
System Development. Prentice-Hall, NJ, 2001 - Kemke, C., 74.793 Natural Language and Speech
Processing - Course Notes, 2nd Term 2004, Dept.
of Computer Science, U. of Manitoba - Robin J. Lickley. HCRC Disfluency Coding Manual.
- http//www.ling.ed.ac.uk/robin/maptask/HCRCdsm-01
.html
82Figures
- Figures taken from
- Jurafsky, D. J. H. Martin, Speech and Language
Processing, Prentice-Hall, 2000, Chapters 5 and
7. - lingWAVES (from http//www.lingcom.de