k hypotheses other belief updating in spoken dialog systems - PowerPoint PPT Presentation

About This Presentation

Title:

k hypotheses other belief updating in spoken dialog systems

Description:

'k hypotheses other' belief updating in spoken dialog systems ... uch1, ... uchk, ucoth fSA(C)( ich1, ... ichk, icoth , R) Bupdated(C) f(Binitial(C), SA(C), R) ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 33

Provided by: danb7

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: k hypotheses other belief updating in spoken dialog systems

1
k hypotheses other belief updating in spoken
dialog systems

Dialogs on Dialogs Talk, March 2006
Dan Bohus Computer Science Department
www.cs.cmu.edu/dbohus Carnegie Mellon
University
dbohus_at_cs.cmu.edu Pittsburgh, PA 15213

2
problem

spoken language interfaces lack robustness when
faced with understanding errors

errors stem mostly from speech recognition
typical word error rates 20-30
significant negative impact on interactions

3
guarding against understanding errors

use confidence scores
machine learning approaches for detecting
misunderstadings Walker, Litman, San-Segundo,
Wright, and others
engage in confirmation actions
explicit confirmation
did you say you wanted to fly to Seoul?
yes ? trust hypothesis
no ? delete hypothesis
other ? non-understanding
implicit confirmation
traveling to Seoul what day did you need to
travel?
rely on new values overwriting old values

related work data user response analysis
proposed approach experiments and results
conclusion
4
todays talk

construct accurate beliefs by integrating
information over multiple turns in a conversation

S Where would you like to go? U Huntsville SEO
UL / 0.65
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
U no no Im traveling to Birmingham
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
5
belief updating problem statement

given
an initial belief Binitial(C) over concept C
a system action SA
a user response R
construct an updated belief
Bupdated(C) ? f (Binitial(C), SA, R)

destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?
6
outline

proposed approach
data
experiments and results
effect on dialog performance
conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
7
belief updating problem statement
destination seoul/0.65
S traveling to Seoul. What day did you need to
travel?
THE TRAVELING TO BERLIN P_M / 0.60
destination ?

given
an initial belief Binitial(C) over concept C
a system action SA(C)
a user response R
construct an updated belief
Bupdated(C) ? f(Binitial(C),SA(C),R)

proposed approach data experiments and results
effect on dialog performance conclusion
8
Bupdated(C) ? f(Binitial(C), SA(C), R)
belief representation

most accurate representation
probability distribution over the set of possible
values
however
system will hear only a small number of
conflicting values for a concept within a dialog
session
in our data
max 3 (conflicting values heard)
only in 6.9 of cases, more than 1 value heard

proposed approach data experiments and results
effect on dialog performance conclusion
9
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)

compressed belief representation
k hypotheses other
at each turn, the system retains the top m
initial hypotheses and adds n new hypotheses from
the input (mnk)

proposed approach data experiments and results
effect on dialog performance conclusion
10
belief representation
Bupdated(C) ? f(Binitial(C), SA(C), R)

B(C) modeled as a multinomial variable
h1, h2, hk, other
B(C) ltch1, ch2, , chk, cothergt
where ch1 ch2 chk cother 1
belief updating can be cast as multinomial
regression problem
Bupdated(C) ? Binitial(C) SA(C) R

proposed approach data experiments and results
effect on dialog performance conclusion
11
system action
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
12
user response
Bupdated(C) ? f(Binitial(C), SA(C), R)
proposed approach data experiments and results
effect on dialog performance conclusion
13
approach
Bupdated(C) ? f(Binitial(C), SA(C), R)

problem
ltuch1, uchk, ucothgt ? f(ltich1, ichk, icothgt,
SA(C), R)
approach multinomial generalized linear model
regression model, multinomial independent
variable
sample efficient
stepwise approach
feature selection
BIC to control over-fitting
one model for each system action
ltuch1, uchk, ucothgt ? fSA(C)(ltich1, ichk,
icothgt, R)

proposed approach data experiments and results
effect on dialog performance conclusion
14
outline

proposed approach
data
experiments and results
effect on dialog performance
conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
15
data

collected with RoomLine
a phone-based mixed-initiative spoken dialog
system
conference room reservation
explicit and implicit confirmations
simple heuristic rules for belief updating
explicit confirm yes / no
implicit confirm new values overwrite old ones

proposed approach data experiments and results
effect on dialog performance conclusion
16
corpus

user study
46 participants (naïve users)
10 scenario-based interactions each
compensated per task success
corpus
449 sessions, 8848 user turns
orthographically transcribed
manually annotated
misunderstandings
corrections
correct concept values

proposed approach data experiments and results
effect on dialog performance conclusion
17
outline

proposed approach
data
experiments and results
effect on dialog performance
conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
18
baselines

initial baseline
accuracy of system beliefs before the update
heuristic baseline
accuracy of heuristic update rule used by the
system
oracle baseline
accuracy if we knew exactly when the user corrects

proposed approach data experiments and results
effect on dialog performance conclusion
19
k2 hypotheses other
Informative features

priors and confusability
initial confidence score
concept identity
barge-in
expectation match
repeated grammar slots

proposed approach data experiments and results
effect on dialog performance conclusion
20
outline

proposed approach
data
experiments and results
effect on dialog performance
conclusion

proposed approach data experiments and results
effect on dialog performance conclusion
21
a question remains

does this really matter?

what is the effect on global dialog performance?
proposed approach data experiments and results
effect on dialog performance conclusion
22
lets run an experiment
guinea pigs from Speech Lab for exp 0 getting
change from guys in the lab 2/3/5 real
subjects for the experiment 25 picture with
advisor of the VERY last exp at CMU
priceless!!!! courtesy of Mohit Kumar
23
a new user study

implemented models in RavenClaw, performed a new
user study
40 participants, first-time users
10 scenario-driven interactions each
non-native speakers of North-American English
improvements more likely at higher WER
supported by empirical evidence
between-subjects 2 gender-balanced groups
control RoomLine using heuristic update rules
treatment RoomLine using runtime models

proposed approach data experiments and results
effect on dialog performance conclusion
24
effect on task success
73.6
control
task success
81.3
treatment
proposed approach data experiments and results
effect on dialog performance conclusion
25
effect on task success a closer look
probability of task success
word error rate
Task Success ? 2.09 - 0.05WER 0.69Condition
p0.001
proposed approach data experiments and results
effect on dialog performance conclusion
26
improvements at different WER
absolute Improvement in task success
word-error-rate
proposed approach data experiments and results
effect on dialog performance conclusion
27
effect on task duration (for successful tasks)