Belief Updating in Spoken Dialog Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Belief Updating in Spoken Dialog Systems

Description:

spoken language interfaces lack robustness when faced with understanding errors. ... U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 40

Provided by: danb7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Belief Updating in Spoken Dialog Systems

1
Belief Updating in Spoken Dialog Systems

Dan Bohus
www.cs.cmu.edu/dbohus
dbohus_at_cs.cmu.edu
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA, 15217

2
problem
spoken language interfaces lack robustness when
faced with understanding errors.

stems mostly from speech recognition
spans most domains and interaction types

3
more concretely

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

4
non- and misunderstandings

S What city are you leaving from?
U Urbana Champaign OKAY IN THAT SAME PAY
S Sorry Im not sure I understood what you said.
What city are you leaving from ?
U Urbana Champaign FOR MINUTE SINCE HEY
S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from?
U Chicago CHICAGO
S traveling from Chicago. Where would you like
to go?
U Huntsville SEOUL
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M
S traveling in the afternoon. Okay what day
would you be departing chicago
U /uh/ the tenth of august AT THE TENTH OF
AUGUST
S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that?
U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND
S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at

5
approaches for increasing robustness

fix recognition

gracefully handle errors through interaction

detect the problems
develop a set of recovery strategies
know how to choose between them (policy)

6
six not-so-easy pieces
7
belief updating
misunderstandings

construct more accurate beliefs by integrating
information over multiple turns

detection
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
8
belief updating problem statement

given
an initial belief Pinitial(C) over concept C
a system action SA
a user response R
construct an updated belief
Pupdated(C) ? f (Pinitial(C), SA, R)

destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
9
outline

related work
a restricted version
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
10
confidence annotation heuristic updates

confidence annotation
traditionally focused on word-level errors
Chase, Cox, Bansal, Ravinshankar
more recently semantic confidence annotation
Walker, San-Segundo, Bohus
machine learning approach
results fairly good, but not perfect
heuristic updates
explicit confirmation no ? dont trust yes ?
trust
implicit confirmation no ? dont trust o/w
? trust
suboptimal for several reasons

related work restricted version data user
response analysis experiment results
caveats future work
11
correction detection

detect if the user is trying to correct the
system Litman, Swerts, Hirschberg, Krahmer,
Levow
machine learning approach
features from different knowledge sources in the
system
results fairly good, but not perfect

related work restricted version data user
response analysis experiment results
caveats future work
12
integration

confidence annotation and correction detection
are useful tools
but separately, neither solves the problem
bridge together in a unified approach to
accurately track beliefs

related work restricted version data user
response analysis experiment results
caveats future work
13
outline

related work
a restricted version
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
14
belief updating general form

given
an initial belief Pinitial(C) over concept C
a system action SA
a user response R
construct an updated belief
Pupdated(C) ? f (Pinitial(C), SA, R)

related work restricted version data user
response analysis experiment results
caveats future work
15
restricted version 2 simplifications

compact belief
system unlikely to hear more than 3 or 4 values
single vs. multiple recognition results
in our data max 3 values, only 6.9 have gt1
value
confidence score of top hypothesis
updates after confirmation actions
reduced problem
ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)

related work restricted version data user
response analysis experiment results
caveats future work
16
outline

related work
a restricted version
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
17
data

collected with RoomLine
a phone-based mixed-initiative spoken dialog
system
conference room reservation
search and negotiation
explicit and implicit confirmations
confidence threshold model ( some exploration)
unplanned implicit confirmations

I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?

I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?

related work restricted version data user
response analysis experiment results
caveats future work
18
corpus

user study
46 participants (naïve users)
10 scenario-based interactions each
compensated per task success
corpus
449 sessions, 8848 user turns
orthographically transcribed
rich annotation correct concepts, corrections,
etc.

related work restricted version data user
response analysis experiment results
caveats future work
19
outline

related work
a restricted version
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
20
user response types

following Krahmer and Swerts
study on Dutch train-table information system
3 user response types
YES yes, right, thats right, correct, etc.
NO no, wrong, etc.
OTHER
cross-tabulated against correctness of
confirmations

related work restricted version data user
response analysis experiment results
caveats future work
21
user responses to explicit confirmations

from transcripts
numbers in brackets from KrahmerSwerts
from decoded

YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37
YES NO Other
CORRECT 87 1 12
INCORRECT 1 61 38
related work restricted version data user
response analysis experiment results
caveats future work
22
other responses to explicit confirmations

70 users repeat the correct value
15 users dont address the question
attempt to shift conversation focus

User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
related work restricted version data user
response analysis experiment results
caveats future work
23
user responses to implicit confirmations

Transcripts
numbers in brackets from KrahmerSwerts
Decoded

YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85
YES NO Other
CORRECT 28 5 67
INCORRECT 7 27 66
related work restricted version data user
response analysis experiment results
caveats future work
24
ignoring errors in implicit confirmations
User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor

users correct later (40 of 118)
users interact strategically
correct only if essential

correct later correct later
critical 55 2
critical 14 47
related work restricted version data user
response analysis experiment results
caveats future work
25
outline

related work
a restricted version
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
26
machine learning approach

need good probability outputs
low cross-entropy between model predictions and
reality
cross-entropy negative average log posterior
logistic regression
sample efficient
stepwise approach ? feature selection
logistic model tree for each action
root splits on response-type

related work restricted version data user
response analysis experiment results
caveats future work
27
features. target.

initial situation
initial confidence score
concept identity, dialog state, turn number
system action
other actions performed in parallel
features of the user response
acoustic / prosodic features
lexical features
grammatical features
dialog-level features
target was the value correct?

related work restricted version data user
response analysis experiment results
caveats future work
28
baselines

initial baseline
accuracy of system beliefs before the update
heuristic baseline
accuracy of heuristic rule currently used in the
system
oracle baseline
accuracy if we knew exactly when the user is
correcting the system

related work restricted version data user
response analysis experiment results
caveats future work
29
results explicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
30
results implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
31
results unplanned implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
32
informative features

initial confidence score
prosody features
barge-in
expectation match
repeated grammar slots
concept id

related work restricted version data user
response analysis experiment results
caveats future work
33
outline

related work
a reduced version. approach
data
user response analysis
experiments and results
some caveats and future work

related work restricted version data user
response analysis experiment results
caveats future work
34
eliminate simplification 1

current restricted version
belief confidence score of top hypothesis
only 6.9 of cases had more than 1 hypothesis

extend to
N hypotheses 1 (other), where N is a small
integer (2 or 3)
approach multinomial generalized linear model
use information from multiple recognition
hypotheses

related work restricted version data user
response analysis experiment results
caveats future work
35
eliminate simplification 2

current restricted version
only updates following system confirmation
actions

users might correct the system at any point

extend to
updates after all system actions

related work restricted version data user
response analysis experiment results
caveats future work
36
shameless self promotion
- rejection threshold adaptation - nonu impact on
performance Interspeech-05
- comparative analysis of 10 recovery
strategies SIGdial-05