Title: A POMDP Approach to Affective Dialogue Management
1A POMDP Approach to Affective Dialogue Management
- Trung H. Bui
- Mannes Poel
- Anton Nijholt
- Job Zwiers
- University of Twente
- Vietri sul Mare, 10 September 2006
- INTERNATIONAL SCHOOL NEURAL NETWORKS E. R.
CAIANIELLO" - XI COURSE on
- The Fundamentals of Verbal and Non-verbal
Communication and the Biometrical Issue
2Outline
- Motivation
- MDP/POMDP dialogue management
- Affective dialogue modeling
- Example
- Conclusions and future work
3Motivation
- Affective dialogue management (ADM) model is a
dialogue management which is able to take into
account some aspects of the emotional state and
acts appropriately - Scope of the ADM we are focusing on
- human-computer interaction using multimodal
input/output - acting appropriately given knowing the users
emotional state and the user action with
uncertainty (not emotion recognition,
dialogue-act recognition) - POMDP provides an elegant framework for this type
of dialogue models
4Markov Decision Process (Howard-1960)
Agent (frog)
Environment (lily pond)
Actions (jump, look)
Rewards (jump successfully ? 10, else -10, look
-1)
States (lily pads)
5Partially Observable Markov Decision Process
Agent (frog)
Environment (fog shrouded lily pond)
Action set (A) (jump, look)
Observations
Reward model (R) (jump successfully ? 10, else
-10, look -1)
States (lily pads)
6Partially Observable Markov Decision Process
- ltS, A,Z,T,O,Rgt
- Related notations
- b is the agents belief state b
- ? is the agents policy to select the action
- Two main tasks
- Computing the belief state
- Finding the optimal policy
T transition model O observation model R
reward model
S state set A action set Z observation set
7Example (Roy et al. 2000)
output of ASR
8Computing the belief state
observation model
transition model
old belief
new belief
normalizing constant
Example S s1,s2, Aa1,a2, Zz1,z2,z3
bt(s1)
P(s1)
bt1(s1) given ata1,st1s1,ot1z1
9Finding the optimal policy
V?(b)
- V?(b) expected total discounted future reward
starting from b for a policy ? - ? is the discount factor
- The optimal policy
a1
a2
P(s1)
b
10POMDP Dialogue management
Focus on spoken dialogue management, noisy
environment
11Proposed POMDP Affective dialogue model
- Using the factored POMDP
- State set and observation set are composed of 6
features - State set users goal (Gu), users affective
state (Eu), users action (Au), users dialogue
state (Du) - Observation set observed users action (OAu)
observed users affective state (OEu)
12Transition model observation model
- No data available ? Use parameters
- pgc pec are the probability the users goal
affective change - pe is the probability of the users action error
being induced by emotion - poa poe are the probabilities of the observed
action observed affective state errors - Partial or full data available ? construct and
adjust the model from the collected data
13Example Simulated Route navigation in the unsafe
tunnel
rd-b
rd-a
a
b
c
rd-c
14Model specification
- State space (including an absorbing end state)
- Gu a,b,c
- Eu stress, no-stress
- Au a,b,c,yes,no
- Du 1location-specified, 2location-not-specifi
ed - System action
- A ask, confirm-a, confirm-b, confirm-c, rd-a,
rd-b,rd-c,fail - Observation
- OEu stress, nostress
- OAu a,b,c,yes,no
- Reward
- Confirms before the location is specified ?
reward -2 - Fail action ? reward -5
- rd-x with gux ? 10 otherwise -10
- The reward for any action taken in end state is 0
- The reward for other action is -1
(rd means give route description)
15Possible dialogue strategies
a
ask ? rd-a ask ? confirm-a ? rd-a ask ? ask ?
confirm-a ? rd-a
a
yes
a
a
yes
Some of them are useful. Which ones are optimal?
16Optimal policy (Using Standard PBVI Algorithm
27.83s)
- Test case
- Reformulated model
17Value function table
the optimal the action should start given the
initial belief
18Expected return vs. users action error being
induced by stress (pe)
No observation error
Low observation error
High observation error
Test results were carried out using Perseus
algorithm on full POMDP model (61 states, 8
actions, 10 observations)
19Comparing the result (using the simulated user)
20Conclusions
- The optimal dialogue strategy depends on the
correlation between the users affective state
action - The factored POMDP allows integrating the
features of states, actions, observations in a
flexible way - But!!!
- Computational complexity in finding the optimal
policy using both exact and some approximate
algorithms except small, toy dialogue problems - Recent advances in approximate POMDP techniques
plus heuristics in dialogue model design are
expected to solve real-world dialogue applications
21Future work
- Scaling up the model with larger state, action,
observation sets for real-world dialogue
management problems - Extending the model representation, e.g.
correlations between users emotion goal - Collecting generating both real artificial
data to build and train the model
22