Spoken Dialogue Systems

1 / 136
About This Presentation
Title:

Spoken Dialogue Systems

Description:

... want a flight from Milwaukee to Orlando one way leaving after 5 p. ... hotels, ... Please choose airline, hotel, or rental car. / prompt grammar type ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 137
Provided by: DanJ85

less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems


1
Spoken Dialogue Systems
Julia Hirschberg CS 4706
2
Today
  • Basic Conversational Agents
  • ASR
  • NLU
  • Generation
  • Dialogue Manager
  • Dialogue Manager Design
  • Finite State
  • Frame-based
  • Initiative User, System, Mixed
  • Information-State
  • Dialogue-Act Detection
  • Dialogue-Act Generation
  • Evaluation
  • Utility-based conversational agents
  • MDP, POMDP

3
Conversational Agents
  • AKA
  • Interactive Voice Response Systems
  • Dialogue Systems
  • Spoken Dialogue Systems
  • Applications
  • Travel arrangements (Amtrak, United airlines)
  • Telephone call routing
  • Tutoring
  • Communicating with robots
  • Anything with limited screen/keyboard

4
A travel dialog Communicator
5
Call routing ATT HMIHY
6
A tutorial dialogue ITSPOKE
7
Conversational Structure
  • Telephone conversations
  • Stage 1 Enter a conversation
  • Stage 2 Identification
  • Stage 3 Establish joint willingness to converse
  • Stage 4 First topic is raised, usually by caller

8
Why is this customer confused?
  • Customer (rings)
  • Operator Directory Enquiries, for which town
    please?
  • Customer Could you give me the phone number of
    um Mrs. um Smithson?
  • Operator Yes, which town is this at please?
  • Customer Huddleston.
  • Operator Yes. And the name again?
  • Customer Mrs. Smithson

9
Why is this customer confused?
  • A And, what day in May did you want to travel?
  • C OK, uh, I need to be there for a meeting
    thats from the 12th to the 15th.
  • Note that client did not answer question.
  • Meaning of clients sentence
  • Meeting
  • Start-of-meeting 12th
  • End-of-meeting 15th
  • Doesnt say anything about flying!!!!!
  • How does agent infer client is informing him/her
    of travel dates?

10
Will this client be confused?
  • A theres 3 non-stops today.
  • True if in fact 7 non-stops today.
  • But agent means 3 and only 3.
  • How can client infer that agent means
  • only 3

11
Grice conversational implicature
  • Implicature means a particular class of licensed
    inferences.
  • Grice (1975) proposed that what enables hearers
    to draw correct inferences is
  • Cooperative Principle
  • This is a tacit agreement by speakers and
    listeners to cooperate in communication

12
4 Gricean Maxims
  • Relevance Be relevant
  • Quantity Do not make your contribution more or
    less informative than required
  • Quality try to make your contribution one that
    is true (dont say things that are false or for
    which you lack adequate evidence)
  • Manner Avoid ambiguity and obscurity be brief
    and orderly

13
Relevance
  • A Is Regina here?
  • B Her car is outside.
  • Implication yes
  • Hearer thinks why would he mention the car? It
    must be relevant. How could it be relevant? It
    could since if her car is here she is probably
    here.
  • Client I need to be there for a meeting thats
    from the 12th to the 15th
  • Hearer thinks Speaker is following maxims, would
    only have mentioned meeting if it was relevant.
    How could meeting be relevant? If client meant me
    to understand that he had to depart in time for
    the mtg.

14
Quantity
  • AHow much money do you have on you?
  • B I have 5 dollars
  • Implication not 6 dollars
  • Similarly, 3 non stops cant mean 7 non-stops
    (hearer thinks
  • if speaker meant 7 non-stops she would have said
    7 non-stops
  • A Did you do the reading for todays class?
  • B I intended to
  • Implication No
  • Bs answer would be true if B intended to do the
    reading AND did the reading, but would then
    violate maxim

15
Dialogue System Architecture
16
Speech recognition
  • Input acoustic waveform
  • Output string of words
  • Basic components
  • a recognizer for phones, small sound units like
    k or ae.
  • a pronunciation dictionary like cat k ae t
  • a grammar telling us what words are likely to
    follow what words
  • A search algorithm to find the best string of
    words

17
Natural Language Understanding
  • Or NLU
  • Or Computational semantics
  • There are many ways to represent the meaning of
    sentences
  • For speech dialogue systems, most common is
    Frame and slot semantics.

18
An example of a frame
  • Show me morning flights from Boston to SF on
    Tuesday.
  • SHOW
  • FLIGHTS
  • ORIGIN
  • CITY Boston
  • DATE Tuesday
  • TIME morning
  • DEST
  • CITY San Francisco

19
How to generate this semantics?
  • Many methods,
  • Simplest semantic grammars
  • Well come back to these after weve seen
    parsing.
  • But a quick teaser for those of you who might
    have already seen parsing
  • CFG in which the LHS of rules is a semantic
    category
  • LIST -gt show me I want can I see
  • DEPARTTIME -gt (afteraroundbefore) HOUR
    morning afternoon evening
  • HOUR -gt onetwothreetwelve (ampm)
  • FLIGHTS -gt (a) flightflights
  • ORIGIN -gt from CITY
  • DESTINATION -gt to CITY
  • CITY -gt Boston San Francisco Denver
    Washington

20
Semantics for a sentence
  • LIST FLIGHTS ORIGIN
  • Show me flights from Boston
  • DESTINATION DEPARTDATE
  • to San Francisco on Tuesday
  • DEPARTTIME
  • morning

21
Generation and TTS
  • Generation component
  • Chooses concepts to express to user
  • Plans out how to express these concepts in words
  • Assigns any necessary prosody to the words
  • TTS component
  • Takes words and prosodic annotations
  • Synthesizes a waveform

22
Generation Component
  • Content Planner
  • Decides what content to express to user
  • (ask a question, present an answer, etc)
  • Often merged with dialogue manager
  • Language Generation
  • Chooses syntactic structures and words to express
    meaning.
  • Simplest method
  • All words in sentence are prespecified!
  • Template-based generation
  • Can have variables
  • What time do you want to leave CITY-ORIG?
  • Will you return to CITY-ORIG from CITY-DEST?

23
More sophisticated language generation component
  • Natural Language Generation
  • Approach
  • Dialogue manager builds representation of meaning
    of utterance to be expressed
  • Passes this to a generator
  • Generators have three components
  • Sentence planner
  • Surface realizer
  • Prosody assigner

24
Architecture of a generator for a dialogue
system(after Walker and Rambow 2002)
25
HCI constraints on generation for dialogue
Coherence
  • Discourse markers and pronouns (Coherence)
  • (1) Please say the date.
  • Please say the start time.
  • Please say the duration
  • Please say the subject
  • (2) First, tell me the date.
  • Next, Ill need the time it starts.
  • Thanks. ltpausegt Now, how long is it supposed to
    last?
  • Last of all, I just need a brief description

Bad!
Good!
26
HCI constraints on generation for dialogue
coherence (II) tapered prompts
  • Prompts which get incrementally shorter
  • System Now, whats the first company to add to
    your watch list?
  • Caller Cisco
  • System Whats the next company name? (Or, you
    can say, Finished)
  • Caller IBM
  • System Tell me the next company name, or say,
    Finished.
  • Caller Intel
  • System Next one?
  • Caller America Online.
  • System Next?
  • Caller

27
Dialogue Manager
  • Controls the architecture and structure of
    dialogue
  • Takes input from ASR/NLU components
  • Maintains some sort of state
  • Interfaces with Task Manager
  • Passes output to NLG/TTS modules

28
Four architectures for dialogue management
  • Finite State
  • Frame-based
  • Information State
  • Markov Decision Processes
  • AI Planning

29
Finite-State Dialogue Management
  • Consider a trivial airline travel system
  • Ask the user for a departure city
  • For a destination city
  • For a time
  • Whether the trip is round-trip or not

30
Finite State Dialogue Manager
31
Finite-state Dialogue Managers
  • System completely controls the conversation with
    the user
  • Asks the user a series of questions
  • Ignores (or misinterprets) anything the user says
    that is not a direct answer to the systems
    questions

32
Dialogue Initiative
  • Systems that control conversation like this are
    system initiative or single initiative.
  • Initiative who has control of conversation
  • In normal human-human dialogue, initiative shifts
    back and forth between participants.

33
System Initiative
  • Systems which completely control the conversation
    at all times are called system initiative.
  • Advantages
  • Simple to build
  • User always knows what they can say next
  • System always knows what user can say next
  • Known words Better performance from ASR
  • Known topic Better performance from NLU
  • Ok for VERY simple tasks (entering a credit card,
    or login name and password)
  • Disadvantage
  • Too limited

34
User Initiative
  • User directs the system
  • Generally, user asks a single question, system
    answers
  • System cant ask questions back, engage in
    clarification dialogue, confirmation dialogue
  • Used for simple database queries
  • User asks question, system gives answer
  • Web search is user initiative dialogue.

35
Problems with System Initiative
  • Real dialogue involves give and take!
  • In travel planning, users might want to say
    something that is not the direct answer to the
    question.
  • For example answering more than one question in a
    sentence
  • Hi, Id like to fly from Seattle Tuesday morning
  • I want a flight from Milwaukee to Orlando one way
    leaving after 5 p.m. on Wednesday.

36
Single initiative universals
  • We can give users a little more flexibility by
    adding universal commands
  • Universals commands you can say anywhere
  • As if we augmented every state of FSA with these
  • Help
  • Start over
  • Correct
  • This describes many implemented systems
  • But still doesnt allow user to say what the want
    to say

37
Mixed Initiative
  • Conversational initiative can shift between
    system and user
  • Simplest kind of mixed initiative use the
    structure of the frame itself to guide dialogue
  • Slot Question
  • ORIGIN What city are you leaving from?
  • DEST Where are you going?
  • DEPT DATE What day would you like to leave?
  • DEPT TIME What time would you like to leave?
  • AIRLINE What is your preferred airline?

38
Frames are mixed-initiative
  • User can answer multiple questions at once.
  • System asks questions of user, filling any slots
    that user specifies
  • When frame is filled, do database query
  • If user answers 3 questions at once, system has
    to fill slots and not ask these questions again!
  • Anyhow, we avoid the strict constraints on order
    of the finite-state architecture.

39
Multiple frames
  • flights, hotels, rental cars
  • Flight legs Each flight can have multiple legs,
    which might need to be discussed separately
  • Presenting the flights (If there are multiple
    flights meeting users constraints)
  • It has slots like 1ST_FLIGHT or 2ND_FLIGHT so
    user can ask how much is the second one
  • General route information
  • Which airlines fly from Boston to San Francisco
  • Airfare practices
  • Do I have to stay over Saturday to get a decent
    airfare?

40
Multiple Frames
  • Need to be able to switch from frame to frame
  • Based on what user says.
  • Disambiguate which slot of which frame an input
    is supposed to fill, then switch dialogue control
    to that frame.
  • Main implementation production rules
  • Different types of inputs cause different
    productions to fire
  • Each of which can flexibly fill in different
    frames
  • Can also switch control to different frame

41
Defining Mixed Initiative
  • Mixed Initiative could mean
  • User can arbitrarily take or give up initiative
    in various ways
  • This is really only possible in very complex
    plan-based dialogue systems
  • No commercial implementations
  • Important research area
  • Something simpler and quite specific which we
    will define in the next few slides

42
True Mixed Initiative
43
How mixed initiative is usually defined
  • First we need to define two other factors
  • Open prompts vs. directive prompts
  • Restrictive versus non-restrictive grammar

44
Open vs. Directive Prompts
  • Open prompt
  • System gives user very few constraints
  • User can respond how they please
  • How may I help you? How may I direct your
    call?
  • Directive prompt
  • Explicit instructs user how to respond
  • Say yes if you accept the call otherwise, say
    no

45
Restrictive vs. Non-restrictive grammars
  • Restrictive grammar
  • Language model which strongly constrains the ASR
    system, based on dialogue state
  • Non-restrictive grammar
  • Open language model which is not restricted to a
    particular dialogue state

46
Definition of Mixed Initiative
Grammar Open Prompt Directive Prompt
Restrictive Doesnt make sense System Initiative
Non-restrictive User Initiative Mixed Initiative
47
VoiceXML
  • Voice eXtensible Markup Language
  • An XML-based dialogue design language
  • Makes use of ASR and TTS
  • Deals well with simple, frame-based mixed
    initiative dialogue.
  • Most common in commercial world (too limited for
    research systems)
  • But useful to get a handle on the concepts.

48
Voice XML
  • Each dialogue is a ltformgt. (Form is the VoiceXML
    word for frame)
  • Each ltformgt generally consists of a sequence of
    ltfieldgts, with other commands

49
Sample vxml doc
  • ltformgt
  • ltfield name"transporttype"gt
  • ltpromptgt
  • Please choose airline, hotel, or rental
    car. lt/promptgt
  • ltgrammar type"application/xnuance-gsl"gt
  • airline hotel "rental car"
  • lt/grammargt
  • lt/fieldgt
  • ltblockgt
  • ltpromptgt
  • You have chosen ltvalue expr"transporttype"gt.
    lt/promptgt
  • lt/blockgt
  • lt/formgt

50
VoiceXML interpreter
  • Walks through a VXML form in document order
  • Iteratively selecting each item
  • If multiple fields, visit each one in order.
  • Special commands for events

51
Another vxml doc (1)
  • ltnoinputgt
  • I'm sorry, I didn't hear you. ltreprompt/gt
  • lt/noinputgt
  • - noinput means silence exceeds a timeout
    threshold
  • ltnomatchgt
  • I'm sorry, I didn't understand that. ltreprompt/gt
  • lt/nomatchgt
  • - nomatch means confidence value for utterance
    is too low
  • - notice reprompt command

52
Another vxml doc (2)
  • ltformgt
  • ltblockgt Welcome to the air travel
    consultant. lt/blockgt
  • ltfield name"origin"gt
  • ltpromptgt Which city do you want to
    leave from? lt/promptgt
  • ltgrammar type"application/xnuance-gsl"gt
  • (san francisco) denver (new york)
    barcelona
  • lt/grammargt
  • ltfilledgt
  • ltpromptgt OK, from ltvalue expr"origin"gt
    lt/promptgt
  • lt/filledgt
  • lt/fieldgt
  • - filled tag is executed by interpreter as
    soon as field filled by user

53
Another vxml doc (3)
  • ltfield name"destination"gt
  • ltpromptgt And which city do you want to go
    to? lt/promptgt
  • ltgrammar type"application/xnuance-gsl"gt
  • (san francisco) denver (new york)
    barcelona
  • lt/grammargt
  • ltfilledgt
  • ltpromptgt OK, to ltvalue
    expr"destination"gt lt/promptgt
  • lt/filledgt
  • lt/fieldgt
  • ltfield name"departdate" type"date"gt
  • ltpromptgt And what date do you want to
    leave? lt/promptgt
  • ltfilledgt
  • ltpromptgt OK, on ltvalue
    expr"departdate"gt lt/promptgt
  • lt/filledgt
  • lt/fieldgt

54
Another vxml doc (4)
  • ltblockgt
  • ltpromptgt OK, I have you are departing from
  • ltvalue expr"origingt to ltvalue
    expr"destinationgt on ltvalue expr"departdate"gt
  • lt/promptgt
  • send the info to book a flight...
  • lt/blockgt
  • lt/formgt

55
Summary VoiceXML
  • Voice eXtensible Markup Language
  • An XML-based dialogue design language
  • Makes use of ASR and TTS
  • Deals well with simple, frame-based mixed
    initiative dialogue.
  • Most common in commercial world (too limited for
    research systems)
  • But useful to get a handle on the concepts.

56
Information-State and Dialogue Acts
  • If we want a dialogue system to be more than just
    form-filling
  • Needs to
  • Decide when the user has asked a question, made a
    proposal, rejected a suggestion
  • Ground a users utterance, ask clarification
    questions, suggestion plans
  • Suggests
  • Conversational agent needs sophisticated models
    of interpretation and generation
  • In terms of speech acts and grounding
  • Needs more sophisticated representation of
    dialogue context than just a list of slots

57
Information-state architecture
  • Information state
  • Dialogue act interpreter
  • Dialogue act generator
  • Set of update rules
  • Update dialogue state as acts are interpreted
  • Generate dialogue acts
  • Control structure to select which update rules to
    apply

58
Information-state
59
Dialogue acts
  • Also called conversational moves
  • An act with (internal) structure related
    specifically to its dialogue function
  • Incorporates ideas of grounding
  • Incorporates other dialogue and conversational
    functions that Austin and Searle didnt seem
    interested in

60
Verbmobil task
  • Two-party scheduling dialogues
  • Speakers were asked to plan a meeting at some
    future date
  • Data used to design conversational agents which
    would help with this task
  • (cross-language, translating, scheduling
    assistant)

61
Verbmobil Dialogue Acts
  • THANK thanks
  • GREET Hello Dan
  • INTRODUCE Its me again
  • BYE Allright, bye
  • REQUEST-COMMENT How does that look?
  • SUGGEST June 13th through 17th
  • REJECT No, Friday Im booked all day
  • ACCEPT Saturday sounds fine
  • REQUEST-SUGGEST What is a good day of the week
    for you?
  • INIT I wanted to make an appointment with you
  • GIVE_REASON Because I have meetings all
    afternoon
  • FEEDBACK Okay
  • DELIBERATE Let me check my calendar here
  • CONFIRM Okay, that would be wonderful
  • CLARIFY Okay, do you mean Tuesday the 23rd?

62
Automatic Interpretation of Dialogue Acts
  • How do we automatically identify dialogue acts?
  • Given an utterance
  • Decide whether it is a QUESTION, STATEMENT,
    SUGGEST, or ACK
  • Recognizing illocutionary force will be crucial
    to building a dialogue agent
  • Perhaps we can just look at the form of the
    utterance to decide?

63
Can we just use the surface syntactic form?
  • YES-NO-Qs have auxiliary-before-subject syntax
  • Will breakfast be served on USAir 1557?
  • STATEMENTs have declarative syntax
  • I dont care about lunch
  • COMMANDs have imperative syntax
  • Show me flights from Milwaukee to Orlando on
    Thursday night

64
Surface form ! speech act type
Locutionary Force Illocutionary Force
Can I have the rest of your sandwich? Question Request
I want the rest of your sandwich Declarative Request
Give me your sandwich! Imperative Request
65
Dialogue act disambiguation is hard! Whos on
First?
Abbott Well, Costello, I'm going to New York
with you. Bucky Harris the Yankee's manager gave
me a job as coach for as long as you're on the
team. Costello Look Abbott, if you're the
coach, you must know all the players. Abbott I
certainly do. Costello Well you know I've never
met the guys. So you'll have to tell me their
names, and then I'll know who's playing on the
team. Abbott Oh, I'll tell you their names, but
you know it seems to me they give these ball
players now-a-days very peculiar names.
Costello You mean funny names? Abbott Strange
names, pet names...like Dizzy Dean... Costello
His brother Daffy Abbott Daffy Dean...
Costello And their French cousin. Abbott
French? Costello Goofe' Abbott Goofe' Dean.
Well, let's see, we have on the bags, Who's on
first, What's on second, I Don't Know is on
third... Costello That's what I want to find
out. Abbott I say Who's on first, What's on
second, I Don't Know's on third.
66
Dialogue act ambiguity
  • Whos on first?
  • INFO-REQUEST
  • or
  • STATEMENT

67
Dialogue Act ambiguity
  • Can you give me a list of the flights from
    Atlanta to Boston?
  • This looks like an INFO-REQUEST.
  • If so, the answer is
  • YES.
  • But really its a DIRECTIVE or REQUEST, a polite
    form of
  • Please give me a list of the flights
  • What looks like a QUESTION can be a REQUEST

68
Dialogue Act ambiguity
  • Similarly, what looks like a STATEMENT can be a
    QUESTION

Us OPEN-OPTION I was wanting to make some arrangements for a trip that Im going to be taking uh to LA uh beginnning of the week after next
Ag HOLD OK uh let me pull up your profile and Ill be right with you here. pause
Ag CHECK And you said you wanted to travel next week?
Us ACCEPT Uh yes.
69
Indirect speech acts
  • Utterances which use a surface statement to ask a
    question
  • Utterances which use a surface question to issue
    a request

70
DA interpretation as statistical classification
  • Lots of clues in each sentence that can tell us
    which DA it is
  • Words and Collocations
  • Please or would you good cue for REQUEST
  • Are you good cue for INFO-REQUEST
  • Prosody
  • Rising pitch is a good cue for INFO-REQUEST
  • Loudness/stress can help distinguish
    yeah/AGREEMENT from yeah/BACKCHANNEL
  • Conversational Structure
  • Yeah following a proposal is probably AGREEMENT
    yeah following an INFORM probably a BACKCHANNEL

71
Statistical classifier model of dialogue act
interpretation
  • Our goal is to decide for each sentence what
    dialogue act it is
  • This is a classification task (we are making a
    1-of-N classification decision for each sentence)
  • With N classes ( number of dialog acts).
  • Three probabilistic models corresponding to the 3
    kinds of cues from the input sentence.
  • Conversational Structure Probability of one
    dialogue act following another P(AnswerQuestion)
  • Words and Syntax Probability of a sequence of
    words given a dialogue act P(do you
    Question)
  • Prosody probability of prosodic features given a
    dialogue act P(rise at end of sentence
    Question)

72
An example of dialogue act detection Correction
Detection
  • Despite all these clever confirmation/rejection
    strategies, dialogue systems still make mistakes
    (Surprise!)
  • If system misrecognizes an utterance, and either
  • Rejects
  • Via confirmation, displays its misunderstanding
  • Then user has a chance to make a correction
  • Repeat themselves
  • Rephrasing
  • Saying no to the confirmation question.

73
Corrections
  • Unfortunately, corrections are harder to
    recognize than normal sentences!
  • Swerts et al (2000) corrections misrecognized
    twice as often (in terms of WER) as
    non-corrections!!!
  • Why?
  • Prosody seems to be largest factor
    hyperarticulation
  • English Example from Liz Shriberg
  • NO, I am DE-PAR-TING from Jacksonville)
  • A German example from Bettina Braun from a
    talking elevator

74
A Labeled dialogue (Swerts et al)
75
Machine Learning and Classifiers
  • Given a labeled training set
  • We can build a classifier to label observations
    into classes
  • Decision Tree
  • Regression
  • SVM
  • I wont introduce the algorithms here.
  • But these are at the core of NLP/computational
    linguistics/Speech/Dialogue
  • You can learn them in
  • AI - CS 121/221
  • Machine Learning CS 229

76
Machine learning to detect user corrections
  • Build classifiers using features like
  • Lexical information (words no, correction, I
    dont, swear words)
  • Prosodic features (various increases in F0 range,
    pause duration, and word duration that
    correlation with hyperarticulation)
  • Length
  • ASR confidence
  • LM probability
  • Various dialogue features (repetition)

77
Generating Dialogue Acts
  • Confirmation
  • Rejection

78
Confirmation
  • Another reason for grounding
  • Errors Speech is a pretty errorful channel
  • Even for humans so they use grounding to confirm
    that they heard correctly
  • ASR is way worse than humans!
  • So dialogue systems need to do even more
    grounding and confirmation than humans

79
Explicit confirmation
  • S Which city do you want to leave from?
  • U Baltimore
  • S Do you want to leave from Baltimore?
  • U Yes

80
Explicit confirmation
  • U Id like to fly from Denver Colorado to New
    York City on September 21st in the morning on
    United Airlines
  • S Lets see then. I have you going from Denver
    Colorado to New York on September 21st. Is that
    correct?
  • U Yes

81
Implicit confirmation display
  • U Id like to travel to Berlin
  • S When do you want to travel to Berlin?
  • U Hi Id like to fly to Seattle Tuesday morning
  • S Traveling to Seattle on Tuesday, August
    eleventh in the morning. Your name?

82
Implicit vs. Explicit
  • Complementary strengths
  • Explicit easier for users to correct systemss
    mistakes (can just say no)
  • But explicit is cumbersome and long
  • Implicit much more natural, quicker, simpler (if
    system guesses right).

83
Implicit and Explicit
  • Early systems all-implicit or all-explicit
  • Modern systems adaptive
  • How to decide?
  • ASR system can give confidence metric.
  • This expresses how convinced system is of its
    transcription of the speech
  • If high confidence, use implicit confirmation
  • If low confidence, use explicit confirmation

84
Computing confidence
  • Simplest use acoustic log-likelihood of users
    utterance
  • More features
  • Prosodic utterances with longer pauses, F0
    excursions, longer durations
  • Backoff did we have to backoff in the LM?
  • Cost of an error Explicit confirmation before
    moving money or booking flights

85
Rejection
  • e.g., VoiceXML nomatch
  • Im sorry, I didnt understand that.
  • Reject when
  • ASR confidence is low
  • Best interpretation is semantically ill-formed
  • Might have four-tiered level of confidence
  • Below confidence threshhold, reject
  • Above threshold, explicit confirmation
  • If even higher, implicit confirmation
  • Even higher, no confirmation

86
Dialogue System Evaluation
  • Key point about SLP.
  • Whenever we design a new algorithm or build a new
    application, need to evaluate it
  • Two kinds of evaluation
  • Extrinsic embedded in some external task
  • Intrinsic some sort of more local evaluation.
  • How to evaluate a dialogue system?
  • What constitutes success or failure for a
    dialogue system?

87
Dialogue System Evaluation
  • It turns out well need an evaluation metric for
    two reasons
  • 1) the normal reason we need a metric to help us
    compare different implementations
  • cant improve it if we dont know where it fails
  • Cant decide between two algorithms without a
    goodness metric
  • 2) a new reason we will need a metric for how
    good a dialogue went as an input to
    reinforcement learning
  • automatically improve our conversational agent
    performance via learning

88
Evaluating Dialogue Systems
  • PARADISE framework (Walker et al 00)
  • Performance of a dialogue system is affected
    both by what gets accomplished by the user and
    the dialogue agent and how it gets accomplished

Maximize Task Success
Minimize Costs
Efficiency Measures
Qualitative Measures
Slide from Julia Hirschberg
89
PARADISE evaluation again
  • Maximize Task Success
  • Minimize Costs
  • Efficiency Measures
  • Quality Measures
  • PARADISE (PARAdigm for Dialogue System Evaluation)

90
Task Success
  • of subtasks completed
  • Correctness of each questions/answer/error msg
  • Correctness of total solution
  • Attribute-Value matrix (AVM)
  • Kappa coefficient
  • Users perception of whether task was completed

91
Task Success
  • Task goals seen as Attribute-Value Matrix
  • ELVIS e-mail retrieval task (Walker et al 97)
  • Find the time and place of your meeting with
    Kim.

Attribute Value Selection Criterion Kim or
Meeting Time 1030 a.m. Place 2D516
  • Task success can be defined by match between AVM
    values at end of task with true values for AVM

Slide from Julia Hirschberg
92
Efficiency Cost
  • Polifroni et al. (1992), Danieli and Gerbino
    (1995) Hirschman and Pao (1993)
  • Total elapsed time in seconds or turns
  • Number of queries
  • Turn correction ration number of system or user
    turns used solely to correct errors, divided by
    total number of turns

93
Quality Cost
  • of times ASR system failed to return any
    sentence
  • of ASR rejection prompts
  • of times user had to barge-in
  • of time-out prompts
  • Inappropriateness (verbose, ambiguous) of
    systems questions, answers, error messages

94
Another key quality cost
  • Concept accuracy or Concept error rate
  • of semantic concepts that the NLU component
    returns correctly
  • I want to arrive in Austin at 500
  • DESTCITY Boston
  • Time 500
  • Concept accuracy 50
  • Average this across entire dialogue
  • How many of the sentences did the system
    understand correctly

95
PARADISE Regress against user satisfaction
96
Regressing against user satisfaction
  • Questionnaire to assign each dialogue a user
    satisfaction rating this is dependent measure
  • Set of cost and success factors are independent
    measures
  • Use regression to train weights for each factor

97
Experimental Procedures
  • Subjects given specified tasks
  • Spoken dialogues recorded
  • Cost factors, states, dialog acts automatically
    logged ASR accuracy,barge-in hand-labeled
  • Users specify task solution via web page
  • Users complete User Satisfaction surveys
  • Use multiple linear regression to model User
    Satisfaction as a function of Task Success and
    Costs test for significant predictive factors

Slide from Julia Hirschberg
98
User SatisfactionSum of Many Measures
  • Was the system easy to understand? (TTS
    Performance)
  • Did the system understand what you said? (ASR
    Performance)
  • Was it easy to find the message/plane/train you
    wanted? (Task Ease)
  • Was the pace of interaction with the system
    appropriate? (Interaction Pace)
  • Did you know what you could say at each point of
    the dialog? (User Expertise)
  • How often was the system sluggish and slow to
    reply to you? (System Response)
  • Did the system work the way you expected it to in
    this conversation? (Expected Behavior)
  • Do you think you'd use the system regularly in
    the future? (Future Use)

Adapted from Julia Hirschberg
99
Performance Functions from Three Systems
  • ELVIS User Sat. .21 COMP .47 MRS - .15 ET
  • TOOT User Sat. .35 COMP .45 MRS - .14ET
  • ANNIE User Sat. .33COMP .25 MRS .33 Help
  • COMP User perception of task completion (task
    success)
  • MRS Mean (concept) recognition accuracy (cost)
  • ET Elapsed time (cost)
  • Help Help requests (cost)

Slide from Julia Hirschberg
100
Performance Model
  • Perceived task completion and mean recognition
    score (concept accuracy) are consistently
    significant predictors of User Satisfaction
  • Performance model useful for system development
  • Making predictions about system modifications
  • Distinguishing good dialogues from bad
    dialogues
  • As part of a learning model

101
Now that we have a success metric
  • Could we use it to help drive learning?
  • In recent work we use this metric to help us
    learn an optimal policy or strategy for how the
    conversational agent should behave

102
New Idea Modeling a dialogue system as a
probabilistic agent
  • A conversational agent can be characterized by
  • The current knowledge of the system
  • A set of states S the agent can be in
  • a set of actions A the agent can take
  • A goal G, which implies
  • A success metric that tells us how well the agent
    achieved its goal
  • A way of using this metric to create a strategy
    or policy ? for what action to take in any
    particular state.

103
What do we mean by actions A and policies ??
  • Kinds of decisions a conversational agent needs
    to make
  • When should I ground/confirm/reject/ask for
    clarification on what the user just said?
  • When should I ask a directive prompt, when an
    open prompt?
  • When should I use user, system, or mixed
    initiative?

104
A threshold is a human-designed policy!
  • Could we learn what the right action is
  • Rejection
  • Explicit confirmation
  • Implicit confirmation
  • No confirmation
  • By learning a policy which,
  • given various information about the current
    state,
  • dynamically chooses the action which maximizes
    dialogue success

105
Another strategy decision
  • Open versus directive prompts
  • When to do mixed initiative

106
Outline
  • The Linguistics of Conversation
  • Basic Conversational Agents
  • ASR
  • NLU
  • Generation
  • Dialogue Manager
  • Dialogue Manager Design
  • Finite State
  • Frame-based
  • Initiative User, System, Mixed
  • VoiceXML
  • Information-State
  • Dialogue-Act Detection
  • Dialogue-Act Generation
  • Evaluation
  • Utility-based conversational agents
  • MDP, POMDP

107
END of TODAYS LECTURE
  • THE FOLLOWING SLIDES ARE AN OPTIONAL ADVANCED
    DISCUSSION OF MARKOV-DECISION-PROCESS DIALOGUE
    SYSTEMS.

108
Review Open vs. Directive Prompts
  • Open prompt
  • System gives user very few constraints
  • User can respond how they please
  • How may I help you? How may I direct your
    call?
  • Directive prompt
  • Explicit instructs user how to respond
  • Say yes if you accept the call otherwise, say
    no

109
Review Restrictive vs. Non-restrictive gramamrs
  • Restrictive grammar
  • Language model which strongly constrains the ASR
    system, based on dialogue state
  • Non-restrictive grammar
  • Open language model which is not restricted to a
    particular dialogue state

110
Kinds of Initiative
  • How do I decide which of these initiatives to use
    at each point in the dialogue?

Grammar Open Prompt Directive Prompt
Restrictive Doesnt make sense System Initiative
Non-restrictive User Initiative Mixed Initiative
111
Modeling a dialogue system as a probabilistic
agent
  • A conversational agent can be characterized by
  • The current knowledge of the system
  • A set of states S the agent can be in
  • a set of actions A the agent can take
  • A goal G, which implies
  • A success metric that tells us how well the agent
    achieved its goal
  • A way of using this metric to create a strategy
    or policy ? for what action to take in any
    particular state.

112
Goals are not enough
  • Goal user satisfaction
  • OK, thats all very well, but
  • Many things influence user satisfaction
  • We dont know user satisfaction til after the
    dialogue is done
  • How do we know, state by state and action by
    action, what the agent should do?
  • We need a more helpful metric that can apply to
    each state

113
Utility
  • A utility function
  • maps a state or state sequence
  • onto a real number
  • describing the goodness of that state
  • I.e. the resulting happiness of the agent
  • Principle of Maximum Expected Utility
  • A rational agent should choose an action that
    maximizes the agents expected utility

114
Maximum Expected Utility
  • Principle of Maximum Expected Utility
  • A rational agent should choose an action that
    maximizes the agents expected utility
  • Action A has possible outcome states Resulti(A)
  • E agents evidence about current state of world
  • Before doing A, agent estimates prob of each
    outcome
  • P(Resulti(A)Do(A),E)
  • Thus can compute expected utility

115
Utility (Russell and Norvig)
116
Markov Decision Processes
  • Or MDP
  • Characterized by
  • a set of states S an agent can be in
  • a set of actions A the agent can take
  • A reward r(a,s) that the agent receives for
    taking an action in a state
  • ( Some other things Ill come back to (gamma,
    state transition probabilities))

117
A brief tutorial example
  • Levin et al (2000)
  • A Day-and-Month dialogue system
  • Goal fill in a two-slot frame
  • Month November
  • Day 12th
  • Via the shortest possible interaction with user

118
What is a state?
  • In principle, MDP state could include any
    possible information about dialogue
  • Complete dialogue history so far
  • Usually use a much more limited set
  • Values of slots in current frame
  • Most recent question asked to user
  • Users most recent answer
  • ASR confidence
  • etc

119
State in the Day-and-Month example
  • Values of the two slots day and month.
  • Total
  • 2 special initial state si and sf.
  • 365 states with a day and month
  • 1 state for leap year
  • 12 states with a month but no day
  • 31 states with a day but no month
  • 411 total states

120
Actions in MDP models of dialogue
  • Speech acts!
  • Ask a question
  • Explicit confirmation
  • Rejection
  • Give the user some database information
  • Tell the user their choices
  • Do a database query

121
Actions in the Day-and-Month example
  • ad a question asking for the day
  • am a question asking for the month
  • adm a question asking for the daymonth
  • af a final action submitting the form and
    terminating the dialogue

122
A simple reward function
  • For this example, lets use a cost function
  • A cost function for entire dialogue
  • Let
  • Ninumber of interactions (duration of dialogue)
  • Nenumber of errors in the obtained values (0-2)
  • Nfexpected distance from goal
  • (0 for complete date, 1 if either data or month
    are missing, 2 if both missing)
  • Then (weighted) cost is
  • C wi?Ni we?Ne wf?Nf

123
3 possible policies
Dumb
P1probability of error in open prompt
Open prompt
Directive prompt
P2probability of error in directive prompt
124
3 possible policies
Strategy 3 is better than strategy 2 when
improved error rate justifies longer interaction
P1probability of error in open prompt
open
P2probability of error in directive prompt
directive
125
That was an easy optimization
  • Only two actions, only tiny of policies
  • In general, number of actions, states, policies
    is quite large
  • So finding optimal policy ? is harder
  • We need reinforcement leraning
  • Back to MDPs

126
MDP
  • We can think of a dialogue as a trajectory in
    state space
  • The best policy ? is the one with the greatest
    expected reward over all trajectories
  • How to compute a reward for a state sequence?

127
Reward for a state sequence
  • One common approach discounted rewards
  • Cumulative reward Q of a sequence is discounted
    sum of utilities of individual states
  • Discount factor ? between 0 and 1
  • Makes agent care more about current than future
    rewards the more future a reward, the more
    discounted its value

128
The Markov assumption
  • MDP assumes that state transitions are Markovian

129
Expected reward for an action
  • Expected cumulative reward Q(s,a) for taking a
    particular action from a particular state can be
    computed by Bellman equation
  • Expected cumulative reward for a given
    state/action pair is
  • immediate reward for current state
  • expected discounted utility of all possible
    next states s
  • Weighted by probability of moving to that state
    s
  • And assuming once there we take optimal action a

130
What we need for Bellman equation
  • A model of p(ss,a)
  • Estimate of R(s,a)
  • How to get these?
  • If we had labeled training data
  • P(ss,a) C(s,s,a)/C(s,a)
  • If we knew the final reward for whole dialogue
    R(s1,a1,s2,a2,,sn)
  • Given these parameters, can use value iteration
    algorithm to learn Q values (pushing back reward
    values over state sequences) and hence best policy

131
Final reward
  • What is the final reward for whole dialogue
    R(s1,a1,s2,a2,,sn)?
  • This is what our automatic evaluation metric
    PARADISE computes!
  • The general goodness of a whole dialogue!!!!!

132
How to estimate p(ss,a) without labeled data
  • Have random conversations with real people
  • Carefully hand-tune small number of states and
    policies
  • Then can build a dialogue system which explores
    state space by generating a few hundred random
    conversations with real humans
  • Set probabilities from this corpus
  • Have random conversations with simulated people
  • Now you can have millions of conversations with
    simulated people
  • So you can have a slightly larger state space

133
An example
  • Singh, S., D. Litman, M. Kearns, and M. Walker.
    2002. Optimizing Dialogue Management with
    Reinforcement Learning Experiments with the
    NJFun System. Journal of AI Research.
  • NJFun system, people asked questions about
    recreational activities in New Jersey
  • Idea of paper use reinforcement learning to make
    a small set of optimal policy decisions

134
Very small of states and acts
  • States specified by values of 8 features
  • Which slot in frame is being worked on (1-4)
  • ASR confidence value (0-5)
  • How many times a current slot question had been
    asked
  • Restrictive vs. non-restrictive grammar
  • Result 62 states
  • Actions each state only 2 possible actions
  • Asking questions System versus user initiative
  • Receiving answers explicit versus no
    confirmation.

135
Ran system with real users
  • 311 conversations
  • Simple binary reward function
  • 1 if competed task (finding museums, theater,
    winetasting in NJ area)
  • 0 if not
  • System learned good dialogue strategy Roughly
  • Start with user initiative
  • Backoff to mixed or system initiative when
    re-asking for an attribute
  • Confirm only a lower confidence values

136
State of the art
  • Only a few such systems
  • From (former) ATT Laboratories researchers, now
    dispersed
  • And Cambridge UK lab
  • Hot topics
  • Partially observable MDPs (POMDPs)
  • We dont REALLY know the users state (we only
    know what we THOUGHT the user said)
  • So need to take actions based on our BELIEF ,
    I.e. a probability distribution over states
    rather than the true state
Write a Comment
User Comments (0)