Spoken Dialogue Systems

About This Presentation

Title:

Spoken Dialogue Systems

Description:

... want a flight from Milwaukee to Orlando one way leaving after 5 p. ... hotels, ... Please choose airline, hotel, or rental car. / prompt grammar type ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 137

Provided by: DanJ85

Learn more at: http://www1.cs.columbia.edu

more less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems

1
Spoken Dialogue Systems
Julia Hirschberg CS 4706
2
Today

Basic Conversational Agents
ASR
NLU
Generation
Dialogue Manager
Dialogue Manager Design
Finite State
Frame-based
Initiative User, System, Mixed
Information-State
Dialogue-Act Detection
Dialogue-Act Generation
Evaluation
Utility-based conversational agents
MDP, POMDP

3
Conversational Agents

AKA
Interactive Voice Response Systems
Dialogue Systems
Spoken Dialogue Systems
Applications
Travel arrangements (Amtrak, United airlines)
Telephone call routing
Tutoring
Communicating with robots
Anything with limited screen/keyboard

4
A travel dialog Communicator
5
Call routing ATT HMIHY
6
A tutorial dialogue ITSPOKE
7
Conversational Structure

Telephone conversations
Stage 1 Enter a conversation
Stage 2 Identification
Stage 3 Establish joint willingness to converse
Stage 4 First topic is raised, usually by caller

8
Why is this customer confused?

Customer (rings)
Operator Directory Enquiries, for which town
please?
Customer Could you give me the phone number of
um Mrs. um Smithson?
Operator Yes, which town is this at please?
Customer Huddleston.
Operator Yes. And the name again?
Customer Mrs. Smithson

9
Why is this customer confused?

A And, what day in May did you want to travel?
C OK, uh, I need to be there for a meeting
thats from the 12th to the 15th.
Note that client did not answer question.
Meaning of clients sentence
Meeting
Start-of-meeting 12th
End-of-meeting 15th
Doesnt say anything about flying!!!!!
How does agent infer client is informing him/her
of travel dates?

10
Will this client be confused?

A theres 3 non-stops today.
True if in fact 7 non-stops today.
But agent means 3 and only 3.
How can client infer that agent means
only 3

11
Grice conversational implicature

Implicature means a particular class of licensed
inferences.
Grice (1975) proposed that what enables hearers
to draw correct inferences is
Cooperative Principle
This is a tacit agreement by speakers and
listeners to cooperate in communication

12
4 Gricean Maxims

Relevance Be relevant
Quantity Do not make your contribution more or
less informative than required
Quality try to make your contribution one that
is true (dont say things that are false or for
which you lack adequate evidence)
Manner Avoid ambiguity and obscurity be brief
and orderly

13
Relevance

A Is Regina here?
B Her car is outside.
Implication yes
Hearer thinks why would he mention the car? It
must be relevant. How could it be relevant? It
could since if her car is here she is probably
here.
Client I need to be there for a meeting thats
from the 12th to the 15th
Hearer thinks Speaker is following maxims, would
only have mentioned meeting if it was relevant.
How could meeting be relevant? If client meant me
to understand that he had to depart in time for
the mtg.

14
Quantity

AHow much money do you have on you?
B I have 5 dollars
Implication not 6 dollars
Similarly, 3 non stops cant mean 7 non-stops
(hearer thinks
if speaker meant 7 non-stops she would have said
7 non-stops
A Did you do the reading for todays class?
B I intended to
Implication No
Bs answer would be true if B intended to do the
reading AND did the reading, but would then
violate maxim

15
Dialogue System Architecture
16
Speech recognition

Input acoustic waveform
Output string of words
Basic components
a recognizer for phones, small sound units like
k or ae.
a pronunciation dictionary like cat k ae t
a grammar telling us what words are likely to
follow what words
A search algorithm to find the best string of
words

17
Natural Language Understanding

Or NLU
Or Computational semantics
There are many ways to represent the meaning of
sentences
For speech dialogue systems, most common is
Frame and slot semantics.

18
An example of a frame

Show me morning flights from Boston to SF on
Tuesday.
SHOW
FLIGHTS
ORIGIN
CITY Boston
DATE Tuesday
TIME morning
DEST
CITY San Francisco

19
How to generate this semantics?

Many methods,
Simplest semantic grammars
Well come back to these after weve seen
parsing.
But a quick teaser for those of you who might
have already seen parsing
CFG in which the LHS of rules is a semantic
category
LIST -gt show me I want can I see
DEPARTTIME -gt (afteraroundbefore) HOUR
morning afternoon evening
HOUR -gt onetwothreetwelve (ampm)
FLIGHTS -gt (a) flightflights
ORIGIN -gt from CITY
DESTINATION -gt to CITY
CITY -gt Boston San Francisco Denver
Washington

20
Semantics for a sentence

LIST FLIGHTS ORIGIN
Show me flights from Boston
DESTINATION DEPARTDATE
to San Francisco on Tuesday
DEPARTTIME
morning

21
Generation and TTS

Generation component
Chooses concepts to express to user
Plans out how to express these concepts in words
Assigns any necessary prosody to the words
TTS component
Takes words and prosodic annotations
Synthesizes a waveform

22
Generation Component

Content Planner
Decides what content to express to user
(ask a question, present an answer, etc)
Often merged with dialogue manager
Language Generation
Chooses syntactic structures and words to express
meaning.
Simplest method
All words in sentence are prespecified!
Template-based generation
Can have variables
What time do you want to leave CITY-ORIG?
Will you return to CITY-ORIG from CITY-DEST?

23
More sophisticated language generation component

Natural Language Generation
Approach
Dialogue manager builds representation of meaning
of utterance to be expressed
Passes this to a generator
Generators have three components
Sentence planner
Surface realizer
Prosody assigner

24
Architecture of a generator for a dialogue
system(after Walker and Rambow 2002)
25
HCI constraints on generation for dialogue
Coherence

Discourse markers and pronouns (Coherence)
(1) Please say the date.
Please say the start time.
Please say the duration
Please say the subject
(2) First, tell me the date.
Next, Ill need the time it starts.
Thanks. ltpausegt Now, how long is it supposed to
last?
Last of all, I just need a brief description

Bad!
Good!
26
HCI constraints on generation for dialogue
coherence (II) tapered prompts

Prompts which get incrementally shorter
System Now, whats the first company to add to
your watch list?
Caller Cisco
System Whats the next company name? (Or, you
can say, Finished)
Caller IBM
System Tell me the next company name, or say,
Finished.
Caller Intel
System Next one?
Caller America Online.
System Next?
Caller

27
Dialogue Manager

Controls the architecture and structure of
dialogue
Takes input from ASR/NLU components
Maintains some sort of state
Interfaces with Task Manager
Passes output to NLG/TTS modules

28
Four architectures for dialogue management

Finite State
Frame-based
Information State
Markov Decision Processes
AI Planning

29
Finite-State Dialogue Management

Consider a trivial airline travel system
Ask the user for a departure city
For a destination city
For a time
Whether the trip is round-trip or not

30
Finite State Dialogue Manager
31
Finite-state Dialogue Managers

System completely controls the conversation with
the user
Asks the user a series of questions
Ignores (or misinterprets) anything the user says
that is not a direct answer to the systems
questions

32
Dialogue Initiative

Systems that control conversation like this are
system initiative or single initiative.
Initiative who has control of conversation
In normal human-human dialogue, initiative shifts
back and forth between participants.

33
System Initiative

Systems which completely control the conversation
at all times are called system initiative.
Advantages
Simple to build
User always knows what they can say next
System always knows what user can say next
Known words Better performance from ASR
Known topic Better performance from NLU
Ok for VERY simple tasks (entering a credit card,
or login name and password)
Disadvantage
Too limited

34
User Initiative

User directs the system
Generally, user asks a single question, system
answers
System cant ask questions back, engage in
clarification dialogue, confirmation dialogue
Used for simple database queries
User asks question, system gives answer
Web search is user initiative dialogue.

35
Problems with System Initiative

Real dialogue involves give and take!
In travel planning, users might want to say
something that is not the direct answer to the
question.
For example answering more than one question in a
sentence
Hi, Id like to fly from Seattle Tuesday morning
I want a flight from Milwaukee to Orlando one way
leaving after 5 p.m. on Wednesday.

36
Single initiative universals

We can give users a little more flexibility by
adding universal commands
Universals commands you can say anywhere
As if we augmented every state of FSA with these
Help
Start over
Correct
This describes many implemented systems
But still doesnt allow user to say what the want
to say

37
Mixed Initiative

Conversational initiative can shift between
system and user
Simplest kind of mixed initiative use the
structure of the frame itself to guide dialogue
Slot Question
ORIGIN What city are you leaving from?
DEST Where are you going?
DEPT DATE What day would you like to leave?
DEPT TIME What time would you like to leave?
AIRLINE What is your preferred airline?

38
Frames are mixed-initiative

User can answer multiple questions at once.
System asks questions of user, filling any slots
that user specifies
When frame is filled, do database query
If user answers 3 questions at once, system has
to fill slots and not ask these questions again!
Anyhow, we avoid the strict constraints on order
of the finite-state architecture.

39
Multiple frames

flights, hotels, rental cars
Flight legs Each flight can have multiple legs,
which might need to be discussed separately
Presenting the flights (If there are multiple
flights meeting users constraints)
It has slots like 1ST_FLIGHT or 2ND_FLIGHT so
user can ask how much is the second one
General route information
Which airlines fly from Boston to San Francisco
Airfare practices
Do I have to stay over Saturday to get a decent
airfare?

40
Multiple Frames

Need to be able to switch from frame to frame
Based on what user says.
Disambiguate which slot of which frame an input
is supposed to fill, then switch dialogue control
to that frame.
Main implementation production rules
Different types of inputs cause different
productions to fire
Each of which can flexibly fill in different
frames
Can also switch control to different frame

41
Defining Mixed Initiative

Mixed Initiative could mean
User can arbitrarily take or give up initiative
in various ways
This is really only possible in very complex
plan-based dialogue systems
No commercial implementations
Important research area
Something simpler and quite specific which we
will define in the next few slides

42
True Mixed Initiative
43
How mixed initiative is usually defined

First we need to define two other factors
Open prompts vs. directive prompts
Restrictive versus non-restrictive grammar

44
Open vs. Directive Prompts

Open prompt
System gives user very few constraints
User can respond how they please
How may I help you? How may I direct your
call?
Directive prompt
Explicit instructs user how to respond
Say yes if you accept the call otherwise, say
no

45
Restrictive vs. Non-restrictive grammars

Restrictive grammar
Language model which strongly constrains the ASR
system, based on dialogue state
Non-restrictive grammar
Open language model which is not restricted to a
particular dialogue state

46
Definition of Mixed Initiative
Grammar Open Prompt Directive Prompt
Restrictive Doesnt make sense System Initiative
Non-restrictive User Initiative Mixed Initiative
47
VoiceXML

Voice eXtensible Markup Language
An XML-based dialogue design language
Makes use of ASR and TTS
Deals well with simple, frame-based mixed
initiative dialogue.
Most common in commercial world (too limited for
research systems)
But useful to get a handle on the concepts.

48
Voice XML

Each dialogue is a ltformgt. (Form is the VoiceXML
word for frame)
Each ltformgt generally consists of a sequence of
ltfieldgts, with other commands

49
Sample vxml doc

ltformgt
ltfield name"transporttype"gt
ltpromptgt
Please choose airline, hotel, or rental
car. lt/promptgt
ltgrammar type"application/xnuance-gsl"gt
airline hotel "rental car"
lt/grammargt
lt/fieldgt
ltblockgt
ltpromptgt
You have chosen ltvalue expr"transporttype"gt.
lt/promptgt
lt/blockgt
lt/formgt

50
VoiceXML interpreter

Walks through a VXML form in document order
Iteratively selecting each item
If multiple fields, visit each one in order.
Special commands for events

51
Another vxml doc (1)

ltnoinputgt
I'm sorry, I didn't hear you. ltreprompt/gt
lt/noinputgt
- noinput means silence exceeds a timeout
threshold
ltnomatchgt
I'm sorry, I didn't understand that. ltreprompt/gt
lt/nomatchgt
- nomatch means confidence value for utterance
is too low
- notice reprompt command

52
Another vxml doc (2)

ltformgt
ltblockgt Welcome to the air travel
consultant. lt/blockgt
ltfield name"origin"gt
ltpromptgt Which city do you want to
leave from? lt/promptgt
ltgrammar type"application/xnuance-gsl"gt
(san francisco) denver (new york)
barcelona
lt/grammargt
ltfilledgt
ltpromptgt OK, from ltvalue expr"origin"gt
lt/promptgt
lt/filledgt
lt/fieldgt
- filled tag is executed by interpreter as
soon as field filled by user

53
Another vxml doc (3)

ltfield name"destination"gt
ltpromptgt And which city do you want to go
to? lt/promptgt
ltgrammar type"application/xnuance-gsl"gt
(san francisco) denver (new york)
barcelona
lt/grammargt
ltfilledgt
ltpromptgt OK, to ltvalue
expr"destination"gt lt/promptgt
lt/filledgt
lt/fieldgt
ltfield name"departdate" type"date"gt
ltpromptgt And what date do you want to
leave? lt/promptgt
ltfilledgt
ltpromptgt OK, on ltvalue
expr"departdate"gt lt/promptgt
lt/filledgt
lt/fieldgt

54
Another vxml doc (4)

ltblockgt
ltpromptgt OK, I have you are departing from
ltvalue expr"origingt to ltvalue
expr"destinationgt on ltvalue expr"departdate"gt
lt/promptgt
send the info to book a flight...
lt/blockgt
lt/formgt

55
Summary VoiceXML

Voice eXtensible Markup Language
An XML-based dialogue design language
Makes use of ASR and TTS
Deals well with simple, frame-based mixed
initiative dialogue.
Most common in commercial world (too limited for
research systems)
But useful to get a handle on the concepts.

56
Information-State and Dialogue Acts

If we want a dialogue system to be more than just
form-filling
Needs to
Decide when the user has asked a question, made a
proposal, rejected a suggestion
Ground a users utterance, ask clarification
questions, suggestion plans
Suggests
Conversational agent needs sophisticated models
of interpretation and generation
In terms of speech acts and grounding
Needs more sophisticated representation of
dialogue context than just a list of slots

57
Information-state architecture

Information state
Dialogue act interpreter
Dialogue act generator
Set of update rules
Update dialogue state as acts are interpreted
Generate dialogue acts
Control structure to select which update rules to
apply

58
Information-state
59
Dialogue acts

Also called conversational moves
An act with (internal) structure related
specifically to its dialogue function
Incorporates ideas of grounding
Incorporates other dialogue and conversational
functions that Austin and Searle didnt seem
interested in

60
Verbmobil task

Two-party scheduling dialogues
Speakers were asked to plan a meeting at some
future date
Data used to design conversational agents which
would help with this task
(cross-language, translating, scheduling
assistant)

61
Verbmobil Dialogue Acts

THANK thanks
GREET Hello Dan
INTRODUCE Its me again
BYE Allright, bye
REQUEST-COMMENT How does that look?
SUGGEST June 13th through 17th
REJECT No, Friday Im booked all day
ACCEPT Saturday sounds fine
REQUEST-SUGGEST What is a good day of the week
for you?
INIT I wanted to make an appointment with you
GIVE_REASON Because I have meetings all
afternoon
FEEDBACK Okay
DELIBERATE Let me check my calendar here
CONFIRM Okay, that would be wonderful
CLARIFY Okay, do you mean Tuesday the 23rd?

62
Automatic Interpretation of Dialogue Acts

How do we automatically identify dialogue acts?
Given an utterance
Decide whether it is a QUESTION, STATEMENT,
SUGGEST, or ACK
Recognizing illocutionary force will be crucial
to building a dialogue agent
Perhaps we can just look at the form of the
utterance to decide?

63
Can we just use the surface syntactic form?

YES-NO-Qs have auxiliary-before-subject syntax
Will breakfast be served on USAir 1557?
STATEMENTs have declarative syntax
I dont care about lunch
COMMANDs have imperative syntax
Show me flights from Milwaukee to Orlando on
Thursday night

64
Surface form ! speech act type
Locutionary Force Illocutionary Force
Can I have the rest of your sandwich? Question Request
I want the rest of your sandwich Declarative Request
Give me your sandwich! Imperative Request
65
Dialogue act disambiguation is hard! Whos on
First?
Abbott Well, Costello, I'm going to New York
with you. Bucky Harris the Yankee's manager gave
me a job as coach for as long as you're on the
team. Costello Look Abbott, if you're the
coach, you must know all the players. Abbott I
certainly do. Costello Well you know I've never
met the guys. So you'll have to tell me their
names, and then I'll know who's playing on the
team. Abbott Oh, I'll tell you their names, but
you know it seems to me they give these ball
players now-a-days very peculiar names.
Costello You mean funny names? Abbott Strange
names, pet names...like Dizzy Dean... Costello
His brother Daffy Abbott Daffy Dean...
Costello And their French cousin. Abbott
French? Costello Goofe' Abbott Goofe' Dean.
Well, let's see, we have on the bags, Who's on
first, What's on second, I Don't Know is on
third... Costello That's what I want to find
out. Abbott I say Who's on first, What's on
second, I Don't Know's on third.
66
Dialogue act ambiguity

Whos on first?
INFO-REQUEST
or
STATEMENT

67
Dialogue Act ambiguity

Can you give me a list of the flights from
Atlanta to Boston?
This looks like an INFO-REQUEST.
If so, the answer is
YES.
But really its a DIRECTIVE or REQUEST, a polite
form of
Please give me a list of the flights
What looks like a QUESTION can be a REQUEST

68
Dialogue Act ambiguity

Similarly, what looks like a STATEMENT can be a
QUESTION

Us OPEN-OPTION I was wanting to make some arrangements for a trip that Im going to be taking uh to LA uh beginnning of the week after next
Ag HOLD OK uh let me pull up your profile and Ill be right with you here. pause
Ag CHECK And you said you wanted to travel next week?
Us ACCEPT Uh yes.
69
Indirect speech acts

Utterances which use a surface statement to ask a
question
Utterances which use a surface question to issue
a request

70
DA interpretation as statistical classification

Lots of clues in each sentence that can tell us
which DA it is
Words and Collocations
Please or would you good cue for REQUEST
Are you good cue for INFO-REQUEST
Prosody
Rising pitch is a good cue for INFO-REQUEST
Loudness/stress can help distinguish
yeah/AGREEMENT from yeah/BACKCHANNEL
Conversational Structure
Yeah following a proposal is probably AGREEMENT
yeah following an INFORM probably a BACKCHANNEL

71
Statistical classifier model of dialogue act
interpretation

Our goal is to decide for each sentence what
dialogue act it is
This is a classification task (we are making a
1-of-N classification decision for each sentence)
With N classes ( number of dialog acts).
Three probabilistic models corresponding to the 3
kinds of cues from the input sentence.
Conversational Structure Probability of one
dialogue act following another P(AnswerQuestion)
Words and Syntax Probability of a sequence of
words given a dialogue act P(do you
Question)
Prosody probability of prosodic features given a
dialogue act P(rise at end of sentence
Question)

72
An example of dialogue act detection Correction
Detection

Despite all these clever confirmation/rejection
strategies, dialogue systems still make mistakes
(Surprise!)
If system misrecognizes an utterance, and either
Rejects
Via confirmation, displays its misunderstanding
Then user has a chance to make a correction
Repeat themselves
Rephrasing
Saying no to the confirmation question.

73
Corrections

Unfortunately, corrections are harder to
recognize than normal sentences!
Swerts et al (2000) corrections misrecognized
twice as often (in terms of WER) as
non-corrections!!!
Why?
Prosody seems to be largest factor
hyperarticulation
English Example from Liz Shriberg
NO, I am DE-PAR-TING from Jacksonville)
A German example from Bettina Braun from a
talking elevator

74
A Labeled dialogue (Swerts et al)
75
Machine Learning and Classifiers

Given a labeled training set
We can build a classifier to label observations
into classes
Decision Tree
Regression
SVM
I wont introduce the algorithms here.
But these are at the core of NLP/computational
linguistics/Speech/Dialogue
You can learn them in
AI - CS 121/221
Machine Learning CS 229

76
Machine learning to detect user corrections

Build classifiers using features like
Lexical information (words no, correction, I
dont, swear words)
Prosodic features (various increases in F0 range,
pause duration, and word duration that
correlation with hyperarticulation)
Length
ASR confidence
LM probability
Various dialogue features (repetition)

77
Generating Dialogue Acts

Confirmation
Rejection

78
Confirmation

Another reason for grounding
Errors Speech is a pretty errorful channel
Even for humans so they use grounding to confirm
that they heard correctly
ASR is way worse than humans!
So dialogue systems need to do even more
grounding and confirmation than humans

79
Explicit confirmation

S Which city do you want to leave from?
U Baltimore
S Do you want to leave from Baltimore?
U Yes

80
Explicit confirmation

U Id like to fly from Denver Colorado to New
York City on September 21st in the morning on
United Airlines
S Lets see then. I have you going from Denver
Colorado to New York on September 21st. Is that
correct?
U Yes

81
Implicit confirmation display

U Id like to travel to Berlin
S When do you want to travel to Berlin?
U Hi Id like to fly to Seattle Tuesday morning
S Traveling to Seattle on Tuesday, August
eleventh in the morning. Your name?

82
Implicit vs. Explicit

Complementary strengths
Explicit easier for users to correct systemss
mistakes (can just say no)
But explicit is cumbersome and long
Implicit much more natural, quicker, simpler (if
system guesses right).

83
Implicit and Explicit

Early systems all-implicit or all-explicit
Modern systems adaptive
How to decide?
ASR system can give confidence metric.
This expresses how convinced system is of its
transcription of the speech
If high confidence, use implicit confirmation
If low confidence, use explicit confirmation

84
Computing confidence

Simplest use acoustic log-likelihood of users
utterance
More features
Prosodic utterances with longer pauses, F0
excursions, longer durations
Backoff did we have to backoff in the LM?
Cost of an error Explicit confirmation before
moving money or booking flights

85
Rejection

e.g., VoiceXML nomatch
Im sorry, I didnt understand that.
Reject when
ASR confidence is low
Best interpretation is semantically ill-formed
Might have four-tiered level of confidence
Below confidence threshhold, reject
Above threshold, explicit confirmation
If even higher, implicit confirmation
Even higher, no confirmation

86
Dialogue System Evaluation

Key point about SLP.
Whenever we design a new algorithm or build a new
application, need to evaluate it
Two kinds of evaluation
Extrinsic embedded in some external task
Intrinsic some sort of more local evaluation.
How to evaluate a dialogue system?
What constitutes success or failure for a
dialogue system?

87
Dialogue System Evaluation

It turns out well need an evaluation metric for
two reasons
1) the normal reason we need a metric to help us
compare different implementations
cant improve it if we dont know where it fails
Cant decide between two algorithms without a
goodness metric
2) a new reason we will need a metric for how
good a dialogue went as an input to
reinforcement learning
automatically improve our conversational agent
performance via learning

88
Evaluating Dialogue Systems

PARADISE framework (Walker et al 00)
Performance of a dialogue system is affected
both by what gets accomplished by the user and
the dialogue agent and how it gets accomplished

Maximize Task Success
Minimize Costs
Efficiency Measures
Qualitative Measures
Slide from Julia Hirschberg
89
PARADISE evaluation again

Maximize Task Success
Minimize Costs
Efficiency Measures
Quality Measures
PARADISE (PARAdigm for Dialogue System Evaluation)

90
Task Success

of subtasks completed
Correctness of each questions/answer/error msg
Correctness of total solution
Attribute-Value matrix (AVM)
Kappa coefficient
Users perception of whether task was completed

91
Task Success

Task goals seen as Attribute-Value Matrix
ELVIS e-mail retrieval task (Walker et al 97)
Find the time and place of your meeting with
Kim.

Attribute Value Selection Criterion Kim or
Meeting Time 1030 a.m. Place 2D516

Task success can be defined by match between AVM
values at end of task with true values for AVM

Slide from Julia Hirschberg
92
Efficiency Cost

Polifroni et al. (1992), Danieli and Gerbino
(1995) Hirschman and Pao (1993)
Total elapsed time in seconds or turns
Number of queries
Turn correction ration number of system or user
turns used solely to correct errors, divided by
total number of turns

93
Quality Cost

of times ASR system failed to return any
sentence
of ASR rejection prompts
of times user had to barge-in
of time-out prompts
Inappropriateness (verbose, ambiguous) of
systems questions, answers, error messages

94
Another key quality cost

Concept accuracy or Concept error rate
of semantic concepts that the NLU component
returns correctly
I want to arrive in Austin at 500
DESTCITY Boston
Time 500
Concept accuracy 50
Average this across entire dialogue
How many of the sentences did the system
understand correctly

95
PARADISE Regress against user satisfaction
96
Regressing against user satisfaction

Questionnaire to assign each dialogue a user
satisfaction rating this is dependent measure
Set of cost and success factors are independent
measures
Use regression to train weights for each factor

97
Experimental Procedures

Subjects given specified tasks
Spoken dialogues recorded
Cost factors, states, dialog acts automatically
logged ASR accuracy,barge-in hand-labeled
Users specify task solution via web page
Users complete User Satisfaction surveys
Use multiple linear regression to model User
Satisfaction as a function of Task Success and
Costs test for significant predictive factors

Slide from Julia Hirschberg
98
User SatisfactionSum of Many Measures

Was the system easy to understand? (TTS
Performance)
Did the system understand what you said? (ASR
Performance)
Was it easy to find the message/plane/train you
wanted? (Task Ease)
Was the pace of interaction with the system
appropriate? (Interaction Pace)
Did you know what you could say at each point of
the dialog? (User Expertise)
How often was the system sluggish and slow to
reply to you? (System Response)
Did the system work the way you expected it to in
this conversation? (Expected Behavior)
Do you think you'd use the system regularly in
the future? (Future Use)

Adapted from Julia Hirschberg
99
Performance Functions from Three Systems

ELVIS User Sat. .21 COMP .47 MRS - .15 ET
TOOT User Sat. .35 COMP .45 MRS - .14ET
ANNIE User Sat. .33COMP .25 MRS .33 Help
COMP User perception of task completion (task
success)
MRS Mean (concept) recognition accuracy (cost)
ET Elapsed time (cost)
Help Help requests (cost)

Slide from Julia Hirschberg
100
Performance Model

Perceived task completion and mean recognition
score (concept accuracy) are consistently
significant predictors of User Satisfaction
Performance model useful for system development
Making predictions about system modifications
Distinguishing good dialogues from bad
dialogues
As part of a learning model

101
Now that we have a success metric

Could we use it to help drive learning?
In recent work we use this metric to help us
learn an optimal policy or strategy for how the
conversational agent should behave

102
New Idea Modeling a dialogue system as a
probabilistic agent

A conversational agent can be characterized by
The current knowledge of the system
A set of states S the agent can be in
a set of actions A the agent can take
A goal G, which implies
A success metric that tells us how well the agent
achieved its goal
A way of using this metric to create a strategy
or policy ? for what action to take in any
particular state.

103
What do we mean by actions A and policies ??

Kinds of decisions a conversational agent needs
to make
When should I ground/confirm/reject/ask for
clarification on what the user just said?
When should I ask a directive prompt, when an
open prompt?
When should I use user, system, or mixed
initiative?

104
A threshold is a human-designed policy!

Could we learn what the right action is
Rejection
Explicit confirmation
Implicit confirmation
No confirmation
By learning a policy which,
given various information about the current
state,
dynamically chooses the action which maximizes
dialogue success

105
Another strategy decision

Open versus directive prompts
When to do mixed initiative

106
Outline

The Linguistics of Conversation
Basic Conversational Agents
ASR
NLU
Generation
Dialogue Manager
Dialogue Manager Design
Finite State
Frame-based
Initiative User, System, Mixed
VoiceXML
Information-State
Dialogue-Act Detection
Dialogue-Act Generation
Evaluation
Utility-based conversational agents
MDP, POMDP

107
END of TODAYS LECTURE

THE FOLLOWING SLIDES ARE AN OPTIONAL ADVANCED
DISCUSSION OF MARKOV-DECISION-PROCESS DIALOGUE
SYSTEMS.

108
Review Open vs. Directive Prompts

Open prompt
System gives user very few constraints
User can respond how they please
How may I help you? How may I direct your
call?
Directive prompt
Explicit instructs user how to respond
Say yes if you accept the call otherwise, say
no

109
Review Restrictive vs. Non-restrictive gramamrs

Restrictive grammar
Language model which strongly constrains the ASR
system, based on dialogue state
Non-restrictive grammar
Open language model which is not restricted to a
particular dialogue state

110
Kinds of Initiative

How do I decide which of these initiatives to use
at each point in the dialogue?

Grammar Open Prompt Directive Prompt
Restrictive Doesnt make sense System Initiative
Non-restrictive User Initiative Mixed Initiative
111
Modeling a dialogue system as a probabilistic
agent

A conversational agent can be characterized by
The current knowledge of the system
A set of states S the agent can be in
a set of actions A the agent can take
A goal G, which implies
A success metric that tells us how well the agent
achieved its goal
A way of using this metric to create a strategy
or policy ? for what action to take in any
particular state.

112
Goals are not enough

Goal user satisfaction
OK, thats all very well, but
Many things influence user satisfaction
We dont know user satisfaction til after the
dialogue is done
How do we know, state by state and action by
action, what the agent should do?
We need a more helpful metric that can apply to
each state

113
Utility

A utility function
maps a state or state sequence
onto a real number
describing the goodness of that state
I.e. the resulting happiness of the agent
Principle of Maximum Expected Utility
A rational agent should choose an action that
maximizes the agents expected utility

114
Maximum Expected Utility

Principle of Maximum Expected Utility
A rational agent should choose an action that
maximizes the agents expected utility
Action A has possible outcome states Resulti(A)
E agents evidence about current state of world
Before doing A, agent estimates prob of each
outcome
P(Resulti(A)Do(A),E)
Thus can compute expected utility

115
Utility (Russell and Norvig)
116
Markov Decision Processes

Or MDP
Characterized by
a set of states S an agent can be in
a set of actions A the agent can take
A reward r(a,s) that the agent receives for
taking an action in a state
( Some other things Ill come back to (gamma,
state transition probabilities))

117
A brief tutorial example

Levin et al (2000)
A Day-and-Month dialogue system
Goal fill in a two-slot frame
Month November
Day 12th
Via the shortest possible interaction with user

118
What is a state?

In principle, MDP state could include any
possible information about dialogue
Complete dialogue history so far
Usually use a much more limited set
Values of slots in current frame
Most recent question asked to user
Users most recent answer
ASR confidence
etc

119
State in the Day-and-Month example

Values of the two slots day and month.
Total
2 special initial state si and sf.
365 states with a day and month
1 state for leap year
12 states with a month but no day
31 states with a day but no month
411 total states

120
Actions in MDP models of dialogue

Speech acts!
Ask a question
Explicit confirmation
Rejection
Give the user some database information
Tell the user their choices
Do a database query

121
Actions in the Day-and-Month example

ad a question asking for the day
am a question asking for the month
adm a question asking for the daymonth
af a final action submitting the form and
terminating the dialogue

122
A simple reward function

For this example, lets use a cost function
A cost function for entire dialogue
Let
Ninumber of interactions (duration of dialogue)
Nenumber of errors in the obtained values (0-2)
Nfexpected distance from goal
(0 for complete date, 1 if either data or month
are missing, 2 if both missing)
Then (weighted) cost is
C wi?Ni we?Ne wf?Nf

123
3 possible policies
Dumb
P1probability of error in open prompt
Open prompt
Directive prompt
P2probability of error in directive prompt
124
3 possible policies
Strategy 3 is better than strategy 2 when
improved error rate justifies longer interaction
P1probability of error in open prompt
open
P2probability of error in directive prompt
directive
125
That was an easy optimization

Only two actions, only tiny of policies
In general, number of actions, states, policies
is quite large
So finding optimal policy ? is harder
We need reinforcement leraning
Back to MDPs

126
MDP

We can think of a dialogue as a trajectory in
state space
The best policy ? is the one with the greatest
expected reward over all trajectories
How to compute a reward for a state sequence?

127
Reward for a state sequence

One common approach discounted rewards
Cumulative reward Q of a sequence is discounted
sum of utilities of individual states
Discount factor ? between 0 and 1
Makes agent care more about current than future
rewards the more future a reward, the more
discounted its value

128
The Markov assumption

MDP assumes that state transitions are Markovian

129
Expected reward for an action

Expected cumulative reward Q(s,a) for taking a
particular action from a particular state can be
computed by Bellman equation
Expected cumulative reward for a given
state/action pair is
immediate reward for current state
expected discounted utility of all possible
next states s
Weighted by probability of moving to that state
s
And assuming once there we take optimal action a

130
What we need for Bellman equation

A model of p(ss,a)
Estimate of R(s,a)
How to get these?
If we had labeled training data
P(ss,a) C(s,s,a)/C(s,a)
If we knew the final reward for whole dialogue
R(s1,a1,s2,a2,,sn)
Given these parameters, can use value iteration
algorithm to learn Q values (pushing back reward
values over state sequences) and hence best policy

131
Final reward

What is the final reward for whole dialogue
R(s1,a1,s2,a2,,sn)?
This is what our automatic evaluation metric
PARADISE computes!
The general goodness of a whole dialogue!!!!!

132
How to estimate p(ss,a) without labeled data

Have random conversations with real people
Carefully hand-tune small number of states and
policies
Then can build a dialogue system which explores
state space by generating a few hundred random
conversations with real humans
Set probabilities from this corpus
Have random conversations with simulated people
Now you can have millions of conversations with
simulated people
So you can have a slightly larger state space

133
An example

Singh, S., D. Litman, M. Kearns, and M. Walker.
2002. Optimizing Dialogue Management with
Reinforcement Learning Experiments with the
NJFun System. Journal of AI Research.
NJFun system, people asked questions about
recreational activities in New Jersey
Idea of paper use reinforcement learning to make
a small set of optimal policy decisions

134
Very small of states and acts

States specified by values of 8 features
Which slot in frame is being worked on (1-4)
ASR confidence value (0-5)
How many times a current slot question had been
asked
Restrictive vs. non-restrictive grammar
Result 62 states
Actions each state only 2 possible actions
Asking questions System versus user initiative
Receiving answers explicit versus no
confirmation.

135
Ran system with real users

311 conversations
Simple binary reward function
1 if competed task (finding museums, theater,
winetasting in NJ area)
0 if not
System learned good dialogue strategy Roughly
Start with user initiative
Backoff to mixed or system initiative when
re-asking for an attribute
Confirm only a lower confidence values

136
State of the art

Only a few such systems
From (former) ATT Laboratories researchers, now
dispersed
And Cambridge UK lab
Hot topics
Partially observable MDPs (POMDPs)
We dont REALLY know the users state (we only
know what we THOUGHT the user said)
So need to take actions based on our BELIEF ,
I.e. a probability distribution over states
rather than the true state

Write a Comment

User Comments (0)