Title: Belief Updating in Spoken Dialog Systems
1Belief Updating in Spoken Dialog Systems
- Dan Bohus
- www.cs.cmu.edu/dbohus
- dbohus_at_cs.cmu.edu
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA, 15217
2problem
spoken language interfaces lack robustness when
faced with understanding errors.
- stems mostly from speech recognition
- spans most domains and interaction types
3more concretely
- S What city are you leaving from?
- U Urbana Champaign OKAY IN THAT SAME PAY
- S Sorry Im not sure I understood what you said.
What city are you leaving from ? - U Urbana Champaign FOR MINUTE SINCE HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U Chicago CHICAGO
- S traveling from Chicago. Where would you like
to go? - U Huntsville SEOUL
- S traveling to Seoul. What day did you need to
travel? - U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M - S traveling in the afternoon. Okay what day
would you be departing chicago - U /uh/ the tenth of august AT THE TENTH OF
AUGUST - S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND - S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at
4non- and misunderstandings
- S What city are you leaving from?
- U Urbana Champaign OKAY IN THAT SAME PAY
- S Sorry Im not sure I understood what you said.
What city are you leaving from ? - U Urbana Champaign FOR MINUTE SINCE HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U Chicago CHICAGO
- S traveling from Chicago. Where would you like
to go? - U Huntsville SEOUL
- S traveling to Seoul. What day did you need to
travel? - U no no Im traveling to Birmingham THE
TRAVELING TO BERLIN P_M - S traveling in the afternoon. Okay what day
would you be departing chicago - U /uh/ the tenth of august AT THE TENTH OF
AUGUST - S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U my destination is Birmingham FLIGHT
DESTINATION MR WEEKEND - S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at
5approaches for increasing robustness
- gracefully handle errors through interaction
- detect the problems
- develop a set of recovery strategies
- know how to choose between them (policy)
6six not-so-easy pieces
7belief updating
misunderstandings
- construct more accurate beliefs by integrating
information over multiple turns
detection
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
8belief updating problem statement
- given
- an initial belief Pinitial(C) over concept C
- a system action SA
- a user response R
- construct an updated belief
- Pupdated(C) ? f (Pinitial(C), SA, R)
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
9outline
- related work
- a restricted version
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
10confidence annotation heuristic updates
- confidence annotation
- traditionally focused on word-level errors
Chase, Cox, Bansal, Ravinshankar - more recently semantic confidence annotation
Walker, San-Segundo, Bohus - machine learning approach
- results fairly good, but not perfect
- heuristic updates
- explicit confirmation no ? dont trust yes ?
trust - implicit confirmation no ? dont trust o/w
? trust - suboptimal for several reasons
-
related work restricted version data user
response analysis experiment results
caveats future work
11correction detection
- detect if the user is trying to correct the
system Litman, Swerts, Hirschberg, Krahmer,
Levow - machine learning approach
- features from different knowledge sources in the
system - results fairly good, but not perfect
related work restricted version data user
response analysis experiment results
caveats future work
12integration
- confidence annotation and correction detection
are useful tools - but separately, neither solves the problem
- bridge together in a unified approach to
accurately track beliefs
related work restricted version data user
response analysis experiment results
caveats future work
13outline
- related work
- a restricted version
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
14belief updating general form
- given
- an initial belief Pinitial(C) over concept C
- a system action SA
- a user response R
- construct an updated belief
- Pupdated(C) ? f (Pinitial(C), SA, R)
related work restricted version data user
response analysis experiment results
caveats future work
15restricted version 2 simplifications
- compact belief
- system unlikely to hear more than 3 or 4 values
- single vs. multiple recognition results
- in our data max 3 values, only 6.9 have gt1
value - confidence score of top hypothesis
- updates after confirmation actions
- reduced problem
- ConfTopupdated(C) ? f (ConfTopinitial(C), SA, R)
related work restricted version data user
response analysis experiment results
caveats future work
16outline
- related work
- a restricted version
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
17data
- collected with RoomLine
- a phone-based mixed-initiative spoken dialog
system - conference room reservation
- search and negotiation
- explicit and implicit confirmations
- confidence threshold model ( some exploration)
- unplanned implicit confirmations
- I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?
- I found 10 rooms for Friday between 1 and 3 p.m.
Would like a small room or a large one?
related work restricted version data user
response analysis experiment results
caveats future work
18corpus
- user study
- 46 participants (naïve users)
- 10 scenario-based interactions each
- compensated per task success
- corpus
- 449 sessions, 8848 user turns
- orthographically transcribed
- rich annotation correct concepts, corrections,
etc.
related work restricted version data user
response analysis experiment results
caveats future work
19outline
- related work
- a restricted version
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
20user response types
- following Krahmer and Swerts
- study on Dutch train-table information system
- 3 user response types
- YES yes, right, thats right, correct, etc.
- NO no, wrong, etc.
- OTHER
- cross-tabulated against correctness of
confirmations
related work restricted version data user
response analysis experiment results
caveats future work
21user responses to explicit confirmations
- from transcripts
- numbers in brackets from KrahmerSwerts
- from decoded
YES NO Other
CORRECT 94 93 0 0 5 7
INCORRECT 1 6 72 57 27 37
YES NO Other
CORRECT 87 1 12
INCORRECT 1 61 38
related work restricted version data user
response analysis experiment results
caveats future work
22other responses to explicit confirmations
- 70 users repeat the correct value
- 15 users dont address the question
- attempt to shift conversation focus
User does not correct User corrects
CORRECT 1159 0
INCORRECT 29 10 of incor 250 90 of incor
related work restricted version data user
response analysis experiment results
caveats future work
23user responses to implicit confirmations
- Transcripts
- numbers in brackets from KrahmerSwerts
- Decoded
YES NO Other
CORRECT 30 0 7 0 63 100
INCORRECT 6 0 33 15 61 85
YES NO Other
CORRECT 28 5 67
INCORRECT 7 27 66
related work restricted version data user
response analysis experiment results
caveats future work
24ignoring errors in implicit confirmations
User does not correct User corrects
CORRECT 552 2
INCORRECT 118 51 of incor 111 49 of incor
- users correct later (40 of 118)
- users interact strategically
- correct only if essential
correct later correct later
critical 55 2
critical 14 47
related work restricted version data user
response analysis experiment results
caveats future work
25outline
- related work
- a restricted version
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
26machine learning approach
- need good probability outputs
- low cross-entropy between model predictions and
reality - cross-entropy negative average log posterior
- logistic regression
- sample efficient
- stepwise approach ? feature selection
- logistic model tree for each action
- root splits on response-type
related work restricted version data user
response analysis experiment results
caveats future work
27features. target.
- initial situation
- initial confidence score
- concept identity, dialog state, turn number
- system action
- other actions performed in parallel
- features of the user response
- acoustic / prosodic features
- lexical features
- grammatical features
- dialog-level features
- target was the value correct?
related work restricted version data user
response analysis experiment results
caveats future work
28baselines
- initial baseline
- accuracy of system beliefs before the update
- heuristic baseline
- accuracy of heuristic rule currently used in the
system - oracle baseline
- accuracy if we knew exactly when the user is
correcting the system
related work restricted version data user
response analysis experiment results
caveats future work
29results explicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
30results implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
31results unplanned implicit confirmation
Hard error ()
Soft error
related work restricted version data user
response analysis experiment results
caveats future work
32informative features
- initial confidence score
- prosody features
- barge-in
- expectation match
- repeated grammar slots
- concept id
related work restricted version data user
response analysis experiment results
caveats future work
33outline
- related work
- a reduced version. approach
- data
- user response analysis
- experiments and results
- some caveats and future work
related work restricted version data user
response analysis experiment results
caveats future work
34eliminate simplification 1
- current restricted version
- belief confidence score of top hypothesis
- only 6.9 of cases had more than 1 hypothesis
- extend to
- N hypotheses 1 (other), where N is a small
integer (2 or 3) - approach multinomial generalized linear model
- use information from multiple recognition
hypotheses
related work restricted version data user
response analysis experiment results
caveats future work
35eliminate simplification 2
- current restricted version
- only updates following system confirmation
actions
- users might correct the system at any point
- extend to
- updates after all system actions
related work restricted version data user
response analysis experiment results
caveats future work
36shameless self promotion
- rejection threshold adaptation - nonu impact on
performance Interspeech-05
- comparative analysis of 10 recovery
strategies SIGdial-05
- wizard experiment
- towards learning nonu recovery policies
Sigdial-05
37shameless CMU promotion
- Ananlada (Moss) Chotimongkol
- automatic concept and task structure acquisition
- Antoine Raux
- turn-taking, conversation micro-management
- Jahanzeb Sherwani
- multimodal personal information management
- Satanjeev Banerjee
- meeting understanding
- Stefanie Tomko
- universal speech interface
- Thomas Harris
- multi-participant dialog
- DoD / Young Researchers Roundtable
38thankyou!
39a more subtle caveat
- distribution of training data
- confidence annotator heuristic update rules
- distribution of run-time data
- confidence annotator learned model
- always a problem when interacting with the world
- hopefully, distribution shift will not cause
large degradation in performance - remains to validate empirically
- maybe a bootstrap approach?