Title: belief updating in spoken dialog systems
1belief updating in spoken dialog systems
- Dan Bohus
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA 15213
acknowledgements Alex Rudnicky, CMU Eric Horvitz
Tim Paek, MSR Antoine Raux
2spoken dialog systems
- use natural language processing technology
- engage in a goal-oriented conversation
- research community
- information access, command-and-control
- personal assistants, taskable agents, tutoring
systems - industry
- simpler automated phone systems
3the problem
- misunderstandings lead to interaction breakdowns
4more concretely
- S What city are you leaving from?
- U Urbana Champaign OKAY IN THAT same PAY
- S Sorry Im not sure I understood what you said.
What city are you leaving from ? - U Urbana Champaign for minute since HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U Chicago CHICAGO
- S traveling from Chicago. Where would you like
to go? - U Huntsville SEOUL
- S traveling to Seoul. What day did you need to
travel? - U no no Im traveling to Birmingham THE
TRAVELING to berlin P_M - S traveling in the afternoon. Okay what day
would you be departing chicago - U /uh/ the tenth of august AT THE TENTH OF
AUGUST - S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U my destination is Birmingham flight
destination mr WEEKEND - S traveling on Saturday, August 12th I have a
flight departing Chicago at 140pm arrives Seoul
at
5understanding errors
- stem from the speech recognition process
- difficult operating conditions
- typical word-error-rates
- 20-30
- up to 50 for non-natives
- strong negative impact on interactions
6pathways to increased robustness
- gracefully handle errors through conversation
- detect the problems
- develop a set of recovery strategies
- know how to choose between them (policy)
7guarding against misunderstandings
- S What city are you leaving from ?
- U for minute since HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U CHICAGO
- S traveling from Chicago. Where would you like
to go? - U SEOUL
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
- S traveling in the afternoon. Okay what day
would you be departing chicago - U AT THE TENTH OF AUGUST
- S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U flight destination mr WEEKEND
8guarding against misunderstandings
- S What city are you leaving from ?
- U for minute since HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U CHICAGO
- S traveling from Chicago. Where would you like
to go? - U SEOUL
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
- S traveling in the afternoon. Okay what day
would you be departing chicago - U AT THE TENTH OF AUGUST
- S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U flight destination mr WEEKEND
/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores
- confirmation actions
- reject
- explicit confirmation Did you say Seoul?
- implicit confirmation traveling to Seoul
What day did you need to travel? - accept
/ 0.35
/ 0.58
/ 0.28
9belief updating
- S What city are you leaving from ?
- U for minute since HEY
- S Im still having trouble understanding you. To
go on, I need you to answer the following
question. What city are you leaving from? - U CHICAGO
- S traveling from Chicago. Where would you like
to go? - U SEOUL
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
- S traveling in the afternoon. Okay what day
would you be departing chicago - U AT THE TENTH OF AUGUST
- S a flight tomorrow I have a Korean Airlines
flight departing Chicago at 140pm, arrives Seoul
at 5pm the next day. Do you want to take that? - U flight destination mr WEEKEND
/ 0.07
/ 0.72
arrival Seoul / 0.65
/ 0.65
confidence scores
/ 0.35
arrival ?
/ 0.58
/ 0.28
10belief updating problem statement
- given
- an initial belief Binitial(C) over concept C
- a system action SA(C)
- a user response R
- construct an updated belief
- Bupdated(C) ? f(Binitial(C), SA(C), R)
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
11outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
12detecting misunderstandings and corrections
- confidence annotation
- word-level Cox, Chase, Bansal, Ravinshankar,
etc - semantic confidence annotation Walker,
San-Segundo, Bohus, etc - correction detection Litman, Swerts, Hirschberg,
Krahmer, Levow - detect when the user corrects the system
arrival Seoul / 0.65
S traveling to SeoulWhat day did you need to
travel? U THE TRAVELING to berlin P_M
Conf0.35
Corr0.47
arrival ?
related work proposed approach data
experiments and results global performance
conclusion
13current solutions for tracking beliefs
- most systems only track single values
- new values overwrite old values
- use simple heuristic rules
- explicit confirmation
- S did you say you wanted to fly to Seoul?
- yes ? trust hypothesis
- no ? delete hypothesis
- other ? non-understanding
- implicit confirmation
- S traveling to Seoul what day did you need to
travel? - rely on new values overwriting old values
related work proposed approach data
experiments and results global performance
conclusion
14outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
15belief updating problem statement
arrival Seoul / 0.65
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
f
/ 0.35
arrival ?
- given
- an initial belief Binitial(C) over concept C
- a system action SA(C)
- a user response R
- construct an updated belief
- Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
16Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation
departure
- most accurate representation
- probability distribution over the set of
possible values
- however
- system hears only a small number of conflicting
values for a concept throughout a session - max 3 conflicting values heard
- only in 7 of cases, more than 1 value heard
related work proposed approach data
experiments and results global performance
conclusion
17belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
- compressed belief representation
- k hypotheses other
- dynamically add and drop hypotheses
- remember m hypotheses, add n new ones (mnk)
S flying from Aspen what is your destination?
U NO NO I DIDNT THAT THAT
- B(C) is a multinomial variable of degree k1
related work proposed approach data
experiments and results global performance
conclusion
18system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
19user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
related work proposed approach data
experiments and results global performance
conclusion
20approach
Bupdated(C) ? f(Binitial(C), SA(C), R)
- multinomial regression problem
- multinomial generalized linear model
- sample efficient
- stepwise approach
- feature selection
- BIC to control over-fitting
- one separate model for each system action
- Bupdated(C) ? fSA(C) (Binitial(C), R)
related work proposed approach data
experiments and results global performance
conclusion
21outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
22data
- collected with RoomLine
- a phone-based mixed-initiative spoken dialog
system - conference room reservation
- explicit and implicit confirmations
- simple heuristic rules for belief updating
- explicit confirm yes / no
- implicit confirm new values overwrite old ones
related work proposed approach data
experiments and results global performance
conclusion
23corpus
- user study
- 46 participants (first-time users)
- 10 scenario-based interactions each
- corpus
- 449 sessions, 8848 user turns
- orthographically transcribed
- manually annotated
- misunderstandings
- corrections
- correct concept values
related work proposed approach data
experiments and results global performance
conclusion
24outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
25models
- k2 other (m1, n1)
- k3 other (m2, n1)
- k4 other (m3, n1)
- full model
- all features
- basic model
- all features except priors and confusability
- runtime model
- all features available at runtime
related work proposed approach data
experiments and results global performance
conclusion
26baselines
- initial baseline
- accuracy of system beliefs before the update
- heuristic baseline
- accuracy of heuristic update rule used by the
system - correction baseline
- accuracy if we knew exactly when the user
corrects the system
related work proposed approach data
experiments and results global performance
conclusion
27results for k2 hyps other
explicit confirm
initial baseline (i)
heuristic baseline (h)
basic model (BM)
full model (FM)
runtime model (RM)
correctionbaseline (c)
related work proposed approach data
experiments and results global performance
conclusion
28a question remains
related work proposed approach data
experiments and results global performance
conclusion
29outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
30a new user study
- implemented models in RavenClaw
- 40 participants, first-time, non-native users
- improvements more likely at high word-error-rates
- 10 scenario-driven interactions each
- between-subjects 2 gender-balanced groups
- control RoomLine using heuristic update rules
- treatment RoomLine using runtime models
related work proposed approach data
experiments and results global performance
conclusion
31effect on task success
- logistic ANOVA on task success
p0.009
logit(TaskSuccess) ? 2.09 - 0.05WER
0.69Condition
100
80
probability of task success
60
40
20
0
20
40
60
80
100
0
word error rate
related work proposed approach data
experiments and results global performance
conclusion
32how about efficiency?
- ANOVA on task duration for successful tasks
- Duration ? -0.21 0.013WER - 0.106Condition
- significant improvement
- equivalent to 7.9 absolute reduction in
word-error
p0.0003
related work proposed approach data
experiments and results global performance
conclusion
33outline
- related work
- proposed approach
- data
- experiments and results
- effects on global performance
- conclusion and future work
related work proposed approach data
experiments and results global performance
conclusion
34summary
arrival
departure
/ 0.72
- U CHICAGO
- S traveling from Chicago. Where would you like
to go? - U SEOUL
- S traveling to SeoulWhat day did you need to
travel? - U THE TRAVELING to berlin P_M
- S traveling in the afternoon. Okay what day
would you be departing chicago
/ 0.65
arrival Seoul / 0.65
departure
/ 0.35
arrival ?
departure
- approach for constructing accurate beliefs
- integrate information across multiple turns
- large gains in task success and efficiency
related work proposed approach data
experiments and results global performance
conclusion
35other advantages
- learns from data
- tuned to the domain in which it operates
- sample efficient / scalable
- performs a local one-turn optimization
- works independently on concepts
- portable
- decoupled from dialog task specification
- no strong assumptions about dialog management
related work proposed approach data
experiments and results global performance
conclusion
36future work
- integrate information from n-best list
- integrate other high-level knowledge
- domain-specific constraints
- inter-concept dependencies
- unsupervised / implicit learning
- domain-specificity
related work proposed approach data
experiments and results global performance
conclusion
37thank you! questions
38improvements at different WER
absolute improvement in task success
word-error-rate
39user study
- 10 scenarios, fixed order
- presented graphically (explained during briefing)
- participants compensated per task success
40informative features
- priors and confusability
- initial confidence scores
- concept identity
- barge-in
- expectation match
- repeated grammar slots