Spoken Dialogue Systems - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Spoken Dialogue Systems

Description:

... Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score ... Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 20

Provided by: juliahir

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems

1
Spoken Dialogue Systems

Julia Hirschberg
CS 4706

2
Issues

Error avoidance
Error detection
From the system side how likely is it the
system made an error?
From the user side what cues does the user
provide to indicate an error?
Error handling what can the system do when it
thinks an error has occurred?
Evaluation how do you know what needs fixing
most?

3
Avoiding misunderstandings

By imitating human performance
Timing and grounding (Clark 03)

4
Recognizing Problematic Dialogues

Hastie et al, Whats the Trouble? ACL 2002.

5
Recognizing Problematic Utterances (Hirschberg et
al 99--)

Collect corpus from interactive voice response
system
Identify speaker turns
incorrectly recognized
where speakers first aware of error
that correct misrecognitions
Identify prosodic features of turns in each
category and compare to other turns
Use Machine Learning techniques to train a
classifier to make these distinctions
automatically

6
Turn Types
TOOT Hi. This is ATT Amtrak Schedule System.
This is TOOT. How may I help you? User Hello.
I would like trains from Philadelphia to New York
leaving on Sunday at ten thirty in the evening.
TOOT Which city do you want to go to? User
New York.
misrecognition
correction
aware site
7
Results

Reduced error in predicting misrecognized turns
to 8.64
Error in predicting awares (12)
Error in predicting corrections (18-21)

8
Evidence from Human Performance

Users provide explicit positive and negative
feedback
Corpus-based vs. laboratory experiments do
these tell us different things?
Bell Gustafson 00
What do we learn from this?
What functions does feedback serve?
Krahmer et al
go on and go back signals in grounding
situations (implicit/explicit verification)

Pos short turns, unmarked word order,
confirmation, answers, no corrections or
repetitions, new info
Neg long turns, marked word order,
disconfirmation, no answer, corrections,
repetitions, no new info
Hypotheses supported but
Can these cues be identified automatically?
How might they affect the design of SDS?

10
Error Handling Strategies

Goldberg et al 03 how should systems best
inform the user that they dont understand?
System rephrasing vs. repetitions vs. statement
of not understanding
Apologies
What behaviors might these produce?
Hyperarticulation
User frustration
User repetition or rephrasing

What lessons do we learn?
What produces least frustration?
Best recognized input?

12
Evaluating Dialogue Systems

PARADISE framework (Walker et al 00)
Performance of a dialogue system is affected
both by what gets accomplished by the user and
the dialogue agent and how it gets accomplished

Maximize Task Success
Minimize Costs
Efficiency Measures
Qualitative Measures
13
Task Success

Task goals seen as Attribute-Value Matrix
ELVIS e-mail retrieval task (Walker et al 97)
Find the time and place of your meeting with
Kim.

Attribute Value Selection Criterion Kim or
Meeting Time 1030 a.m. Place 2D516

Task success defined by match between AVM values
at end of with true values for AVM

14
Metrics

Efficiency of the InteractionUser Turns, System
Turns, Elapsed Time
Quality of the Interaction ASR rejections, Time
Out Prompts, Help Requests, Barge-Ins, Mean
Recognition Score (concept accuracy),
Cancellation Requests
User Satisfaction
Task Success perceived completion, information
extracted

15
Experimental Procedures

Subjects given specified tasks
Spoken dialogues recorded
Cost factors, states, dialog acts automatically
logged ASR accuracy,barge-in hand-labeled
Users specify task solution via web page
Users complete User Satisfaction surveys
Use multiple linear regression to model User
Satisfaction as a function of Task Success and
Costs test for significant predictive factors

16
User SatisfactionSum of Many Measures

Was Annie easy to understand in this
conversation? (TTS Performance)
In this conversation, did Annie understand what
you said? (ASR Performance)
In this conversation, was it easy to find the
message you wanted? (Task Ease)
Was the pace of interaction with Annie
appropriate in this conversation? (Interaction
Pace)
In this conversation, did you know what you could
say at each point of the dialog?

(User Expertise)
How often was Annie sluggish and slow to reply to
you in this conversation? (System Response)
Did Annie work the way you expected her to in
this conversation? (Expected Behavior)
From your current experience with using Annie to
get your email, do you think you'd use Annie
regularly to access your mail when you are away
from your desk? (Future Use)

17
Performance Functions from Three Systems

ELVIS User Sat. .21 COMP .47 MRS - .15 ET
TOOT User Sat. .35 COMP .45 MRS - .14ET
ANNIE User Sat. .33COMP .25 MRS .33 Help
COMP User perception of task completion (task
success)
MRS Mean recognition accuracy (cost)
ET Elapsed time (cost)
Help Help requests (cost)

18
Performance Model

Perceived task completion and mean recognition
score are consistently significant predictors of
User Satisfaction
Performance model useful for system development
Making predictions about system modifications
Distinguishing good dialogues from bad
dialogues
But can we also tell on-line when a dialogue is
going wrong

19
Next Week