Title: k hypotheses other belief updating in spoken dialog systems
1k hypotheses other belief updating in spoken
dialog systems
- Dialogs on Dialogs Talk, March 2006
- Dan Bohus Computer Science Department
- www.cs.cmu.edu/dbohus Carnegie Mellon
University - dbohus_at_cs.cmu.edu Pittsburgh, PA 15213
2problem
- spoken language interfaces lack robustness when
faced with understanding errors
- errors stem mostly from speech recognition
- typical word error rates 20-30
- significant negative impact on interactions
3guarding against understanding errors
- use confidence scores
- machine learning approaches for detecting
misunderstadings Walker, Litman, San-Segundo,
Wright, and others - engage in confirmation actions
- explicit confirmation
- did you say you wanted to fly to Seoul?
- yes ? trust hypothesis
- no ? delete hypothesis
- other ? non-understanding
- implicit confirmation
- traveling to Seoul what day did you need to
travel? - rely on new values overwriting old values
related work data user response analysis
proposed approach experiments and results
conclusion
4todays talk
- construct accurate beliefs by integrating
information over multiple turns in a conversation
S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
5belief updating problem statement
- given
- an initial belief Binitial(C) over concept C
- a system action SA
- a user response R
- construct an updated belief
- Bupdated(C) ? f (Binitial(C), SA, R)
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
6outline
- proposed approach
- data
- experiments and results
- effect on dialog performance
- conclusion
proposed approach data experiments and results
effect on dialog performance conclusion
7belief updating problem statement
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
- given
- an initial belief Binitial(C) over concept C
- a system action SA(C)
- a user response R
- construct an updated belief
- Bupdated(C) ? f(Binitial(C),SA(C),R)
proposed approach data experiments and results
effect on dialog performance conclusion
8Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation
- most accurate representation
- probability distribution over the set of possible
values - however
- system will hear only a small number of
conflicting values for a concept within a dialog
session - in our data
- max 3 (conflicting values heard)
- only in 6.9 of cases, more than 1 value heard
proposed approach data experiments and results
effect on dialog performance conclusion
9belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
- compressed belief representation
- k hypotheses other
- at each turn, the system retains the top m
initial hypotheses and adds n new hypotheses from
the input (mnk)
proposed approach data experiments and results
effect on dialog performance conclusion
10belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)
- B(C) modeled as a multinomial variable
- h1, h2, hk, other
- B(C) ltch1, ch2, , chk, cothergt
- where ch1 ch2 chk cother 1
- belief updating can be cast as multinomial
regression problem - Bupdated(C) ? Binitial(C) SA(C) R
proposed approach data experiments and results
effect on dialog performance conclusion
11system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
12user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
13approach
Bupdated(C) ? f(Binitial(C), SA(C), R)
- problem
- ltuch1, uchk, ucothgt ? f(ltich1, ichk, icothgt,
SA(C), R) - approach multinomial generalized linear model
- regression model, multinomial independent
variable - sample efficient
- stepwise approach
- feature selection
- BIC to control over-fitting
- one model for each system action
- ltuch1, uchk, ucothgt ? fSA(C)(ltich1, ichk,
icothgt, R)
proposed approach data experiments and results
effect on dialog performance conclusion
14outline
- proposed approach
- data
- experiments and results
- effect on dialog performance
- conclusion
proposed approach data experiments and results
effect on dialog performance conclusion
15data
- collected with RoomLine
- a phone-based mixed-initiative spoken dialog
system - conference room reservation
- explicit and implicit confirmations
- simple heuristic rules for belief updating
- explicit confirm yes / no
- implicit confirm new values overwrite old ones
proposed approach data experiments and results
effect on dialog performance conclusion
16corpus
- user study
- 46 participants (naïve users)
- 10 scenario-based interactions each
- compensated per task success
- corpus
- 449 sessions, 8848 user turns
- orthographically transcribed
- manually annotated
- misunderstandings
- corrections
- correct concept values
proposed approach data experiments and results
effect on dialog performance conclusion
17outline
- proposed approach
- data
- experiments and results
- effect on dialog performance
- conclusion
proposed approach data experiments and results
effect on dialog performance conclusion
18baselines
- initial baseline
- accuracy of system beliefs before the update
- heuristic baseline
- accuracy of heuristic update rule used by the
system - oracle baseline
- accuracy if we knew exactly when the user corrects
proposed approach data experiments and results
effect on dialog performance conclusion
19k2 hypotheses other
Informative features
- priors and confusability
- initial confidence score
- concept identity
- barge-in
- expectation match
- repeated grammar slots
proposed approach data experiments and results
effect on dialog performance conclusion
20outline
- proposed approach
- data
- experiments and results
- effect on dialog performance
- conclusion
proposed approach data experiments and results
effect on dialog performance conclusion
21a question remains
what is the effect on global dialog performance?
proposed approach data experiments and results
effect on dialog performance conclusion
22lets run an experiment
guinea pigs from Speech Lab for exp 0 getting
change from guys in the lab 2/3/5 real
subjects for the experiment 25 picture with
advisor of the VERY last exp at CMU
priceless!!!! courtesy of Mohit Kumar
23a new user study
- implemented models in RavenClaw, performed a new
user study - 40 participants, first-time users
- 10 scenario-driven interactions each
- non-native speakers of North-American English
- improvements more likely at higher WER
- supported by empirical evidence
- between-subjects 2 gender-balanced groups
- control RoomLine using heuristic update rules
- treatment RoomLine using runtime models
proposed approach data experiments and results
effect on dialog performance conclusion
24effect on task success
73.6
control
task success
81.3
treatment
proposed approach data experiments and results
effect on dialog performance conclusion
25effect on task success a closer look
probability of task success
word error rate
Task Success ? 2.09 - 0.05WER 0.69Condition
p0.001
proposed approach data experiments and results
effect on dialog performance conclusion
26improvements at different WER
absolute Improvement in task success
word-error-rate
proposed approach data experiments and results
effect on dialog performance conclusion
27effect on task duration (for successful tasks)
- ANOVA on task duration for successful tasks
- Duration ? -0.21 0.013WER - 0.106Condition
- significant improvement, equivalent to 7.9
absolute reduction in WER
proposed approach data experiments and results
effect on dialog performance conclusion
28outline
- proposed approach
- data
- experiments and results
- effect on dialog performance
- conclusion
proposed approach data experiments and results
effect on dialog performance conclusion
29summary
- data-driven approach for constructing accurate
system beliefs - integrate information across multiple turns
- bridge together detection of misunderstandings
and corrections - significantly outperforms current heuristics
- significantly improves effectiveness and
efficiency
30other advantages
- sample efficient
- performs a local one-turn optimization
- good local performance leads to good global
performance - scalable
- works independently on concepts
- 29 concepts, varying cardinalities
- portable
- decoupled from dialog task specification
- doesnt make strong assumptions about dialog
management technology
31thank you! questions
32user study
- 10 scenarios, fixed order
- presented graphically (explained during briefing)
- participants compensated per task success