Title: CS 904: Natural Language Processing Spoken Dialogue Systems
1CS 904 Natural Language ProcessingSpoken
Dialogue Systems
- L. Venkata Subramaniam
- April 2, 2002
2What is a Spoken Dialogue System?
- A system that allows a user to speak his queries
in natural language and receive useful responses
from it. - Spoken dialogue systems provide an interface
between the user and a computer-based application
that permits spoken interaction with the
application in a relatively natural manner.
3Issues in Dialogue Systems
- System needs to participate actively to maintain
a natural, smooth-flowing dialogue even in the
event of recognition and interpretation errors. - Use of acknowledgements to verify understanding.
- Recognize when something is not understood and
generate clarification sub dialogues.
4Spoken and Written Dialogues
- Spoken Querying
- Recognition errors
- Grammatical errors in speech
- Unclear sentence boundaries
- Omissions and word fragments
- Inversions
- Interjections
- Speech repairs
- Written querying does not pose many of these
problems.
5Interactive Voice Response (IVR) Systems and
Dialogue Systems
- IVR
- Provide an interface between users and computer
databases over telephone lines. - Employ a touch-tone or DTMF user interface.
- Newer systems allow simple voice commands.
- Spoken Dialogue System
- Permits spoken interaction for a user with the
application in a relatively natural manner.
6Controlled Speech and Spontaneous Speech
- Controlled speech has limited task vocabulary and
grammar - Spontaneous speech
- High out-of-vocabulary rate
- Higher Recognition errors
- High grammatical variation
- Unclear sentence boundaries, omissions,
inversions, word fragments, interjections,
restarts, speech repairs.
7System Performance Issues
- It must run in near real time.
- The user should need minimal training and should
not be constrained in what he can say. - The dialogue should result in something that can
be independently evaluated.
8Performance Assessment
- Confusion matrices for key words
- Number of dialogue turns
- Rate of correction/repair turns
- Time to completion
- Transaction success rate
- Quality of the final solution
There is no correct answer so it is difficult to
measure accuracy as in speech recognition for
instance.
9Dialogue Complexity and Example Systems
10Levels of Sophistication in a Dialogue System
- Touch-tone replacement
- System Prompt "For checking information, press
or say one."
Caller Response "One." - Directed dialogue
- System Prompt "Would you like checking account
information or rate information?"
Caller Response "Checking", or
"checking account," or "rates." - Natural language
- System Prompt "What transaction would you like
to perform?"
Caller
Response "Transfer Rs. 500 from checking to
savings."
11Levels of Complexity in Dialogue Management
- Strict Policy
- User can only specify information relating to
current goal/subgoal - Context is easier to determine
- Free Policy
- Handle unintended requests or requests that
deviate from the task - Context more difficult to determine
- Can lead to confusion/errors
12Initiative
- System-initiative system always has control,
user only responds to system questions - User-initiative user always has control, system
passively answers user questions - Mixed-initiative control switches between system
and user using fixed rules - Variable-initiative control switches between
system and user dynamically based on participant
roles, dialogue history, etc.
13Dialogue and Task Complexity
- Practical Dialogue Dialogue is focussed on
accomplishing a concrete task.
14Finite State Dialogue Modeling Long Dist. Dialing
- System asks a series of questions that the user
answers "What number would you like to call?",
"Is this a Delhi number?" - Initiative always with the system.
- Context is fixed by the question being asked.
15Frame Based Dialogue Modeling
- System interprets the speech to acquire enough
information in order to perform a specific
action. - There is a single context that remains fixed for
the system. - The problem is cast as form filling where the
form specifies all relevant information for an
action - Monitor the form for completion.
- From user utterances extract relevant elements.
- Use empty slots as triggers for questions to the
user.
16Frame Based Train Arrival/Departure Info
- "When does the Bangalore Rajdhani leave Hazrat
Nizammuddin?" - Initiative with the User.
- Context is fixed to train arr./dep. info.
17Frame Based Dialogue Modeling
- System interprets the speech to acquire enough
information in order to perform a specific
action. - There is a single context that remains fixed for
the system. - The problem is cast as form filling where the
form specifies all relevant information for an
action - Monitor the form for completion.
- From user utterances extract relevant elements.
- Use empty slots as triggers for questions to the
user.
18Sets of Contexts Banking Transaction
- Sets of Contexts each represented by using the
frame-based approach. - Initiative with the user.
- Context is fixed by the question asked by the
user.
19Issues in Multiple Context
- System should recognize when context switches.
- Effect changes/corrections User may want to
change the fixed deposit duration set earlier
based on new information he obtains from the
system on the interest rates.
20Complex Dialogue Modeling
- Plan (Task) Based Model The dialogue involves
interactively constructing a plan (e.g. kitchen
design consultant, a plan to rescue from an
island). - Agent Based Model Involves planning and also
executing and monitoring operations in a
dynamically changing world (e.g. emergency rescue
coordination).
21Dialogue System Model
22Spoken Dialogue System
Us e r
Discourse
Semantic
Speech
Interpretation
Interpretation
Recognition
Response
Dialogue
Speech
Generation
Management
Synthesis
23Parts of the Spoken Dialogue System
- Signal Processing
- Convert the audio wave into a sequence of feature
vectors. - Speech Recognition
- Decode the sequence of feature vectors into a
sequence of words. - Semantic Interpretation
- Determine the meaning of the words.
- Discourse Interpretation
- Understand what the user intends by interpreting
utterances in context. - Dialogue Management
- Determine system goals in response to user
utterances based on user intention. - Speech Synthesis
- Generate synthetic speech as a response.
24Dialogue System Interfaces
- The dialogue manager interacts with the user to
collect enough information to query the knowledge
source and give a useful reply.
25Robust interpretation of speech in presence of
recognition errors
- Statistical error correction
- Robust syntactic and semantic parsing
- Use of context (discourse)
26Statistical Error Correction
- Corrects the errors made by the speech
recognition unit. - Given an observed sequence O from the speech
recognizer, it finds the most likely original
word sequence S. - Finds S that maximizes Prob(O/S).Prob(S) or
alternately Prob(S/O).
27Robust Parsing
- Spoken Language is not sentence based.
- A speaker commits a sequence of speech acts "OK
let's do that then Open a new account for me."
Acknowledgement ("OK"), an acceptance ("let's do
that"), a request ("Open a new account"). - "Where is Lagaan playing in South Delhi?" The
parsing should do concept extraction/keyword(s)
spotting. Utterance
Type where question Movie
Lagaan
Town South Delhi.
28Discourse Interpretation
- Maintains the systems idea of the state of the
discourse. - The omitted words (or phrases) and the pronominal
references are complemented by the use of common
sense and discourse information.
29Reference Resolution
- Domain Knowledge (banking transaction)
- Discourse Knowledge
- World Knowledge
- U I would like to open a fixed deposit account.
- S For what amount?
- U Make it for 8000 Rupees.
- S For what duration?
- U What is the interest rate for 3 months?
- S Six percent.
- U Oh good then make it for that duration.
30Utterance Types
- Possible user utterances are tagged as one of
many types. - Examples
31Utterance Type Detection
- Words and Word Grammar Pick the Utterance Type
which is most likely given the word string. - Discourse Grammar Pick the Utterance Type which
is most likely given the surrounding utterance
types. - Prosodic Information Use pitch contour, energy,
SNR, speaking rate to choose Utterance Type.
32Implementation of Utterance Type Detection
- The discourse structure of a conversation is
modeled using a HMM where the individual dialogue
acts are observations emanating from the HMM
states. - Constraints on the likely sequence of dialogue
acts are modeled via a dialogue act n-gram. - The statistical dialogue grammar is combined with
word n-grams, decision trees, and neural networks
modeling the idiosyncratic lexical and prosodic
manifestations of each dialogue act.
33Training Set
- Typical transaction transcripts for the
application forms the training set. - Dialogues between the system and user are
recorded and transcribed. - Each sentence from the user (utterance) is hand
classified. Classes include yes-no question,
yes-answer etc.
34Dialogue Management
- Determine system goals in response to specific
user utterances in carrying out the intent of the
user - Interpretation of user input in context.
- Maintenance of discourse context.
- Planning the content of the system responses.
- Managing problem solving and planning.
- Interface between user and system knowledge base.
35Response Generation
- Generate natural language utterances to achieve
specific tasks. - Content selection determine what to say
- Utterance Realization determine how to say it
36Application Specific Needs
- Dictionary
- Domain concepts
- Grammar
- Dialog objects
37Spoken Dialogue Systems
38Dialogue Based Systems
- IBM http//www.software.ibm.com/speech/overview/
business/demo.html - Nuance http//www.nuance.com/demos/demos.html
- MIT Spoken Language Systems Laboratory
http//www.sls.lcs.mit.edu/sls/whatwedo/applicatio
ns.html - SpeechWorks http//www.speechworks.com/demos/in
dex.cfm - AT T http//www.research.att.com/algor/hmihy/
- CMU Communicator http//fife.speech.cs.cmu.edu/Co
mmunicator/
39References
40References
- Tutorial on Spoken Dialogue Systems
http//www.colloquial.com/carp/Publications/acl99.
ppt - Tutorial on IVR http//www.iec.org/online/tutoria
ls/speech_enabled/ - 1997 Summer Workshop at CLSP/JHU Discourse
Language Modeling Project http//www.colorado.edu/
ling/jurafsky/ws97/ - James F. Allen, Donna K. Byron, Myroslava
Dzikovska, George Ferguson, Lucian Galescu,
Amanda Stent, "Towards Conversational
Human-Computer Interaction, " AI magazine, 2001. - Zue, V., S. Seneff, J. Glass, J. Polifroni, C.
Pao, T. Hazen and L. Hetherington, Jupiter A
Telephone-based Conversational Interface for
Weather Information, IEEE Trans. on Speech and
Audio Processing, 8(1), 2000.
41References (Cont.)
- J. F. Allen, B. W. Miller, E. K. Ringger, T.
Sikorski, "Robust understanding in a dialogue
system," Proc. 34th Association for Computational
Linguistics, June 1996.