A POMDP Approach to Affective Dialogue Management

1 / 22
About This Presentation
Title:

A POMDP Approach to Affective Dialogue Management

Description:

Vietri sul Mare, 10 September 2006. INTERNATIONAL SCHOOL 'NEURAL NETWORKS E. R. CAIANIELLO' ... The Fundamentals of Verbal and Non-verbal Communication and the ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 23
Provided by: Bui7

less

Transcript and Presenter's Notes

Title: A POMDP Approach to Affective Dialogue Management


1
A POMDP Approach to Affective Dialogue Management
  • Trung H. Bui
  • Mannes Poel
  • Anton Nijholt
  • Job Zwiers
  • University of Twente
  • Vietri sul Mare, 10 September 2006
  • INTERNATIONAL SCHOOL NEURAL NETWORKS E. R.
    CAIANIELLO"
  • XI COURSE on
  • The Fundamentals of Verbal and Non-verbal
    Communication and the Biometrical Issue

2
Outline
  • Motivation
  • MDP/POMDP dialogue management
  • Affective dialogue modeling
  • Example
  • Conclusions and future work

3
Motivation
  • Affective dialogue management (ADM) model is a
    dialogue management which is able to take into
    account some aspects of the emotional state and
    acts appropriately
  • Scope of the ADM we are focusing on
  • human-computer interaction using multimodal
    input/output
  • acting appropriately given knowing the users
    emotional state and the user action with
    uncertainty (not emotion recognition,
    dialogue-act recognition)
  • POMDP provides an elegant framework for this type
    of dialogue models

4
Markov Decision Process (Howard-1960)
Agent (frog)
Environment (lily pond)
Actions (jump, look)
Rewards (jump successfully ? 10, else -10, look
-1)
States (lily pads)
5
Partially Observable Markov Decision Process
Agent (frog)
Environment (fog shrouded lily pond)
Action set (A) (jump, look)
Observations
Reward model (R) (jump successfully ? 10, else
-10, look -1)
States (lily pads)
6
Partially Observable Markov Decision Process
  • ltS, A,Z,T,O,Rgt
  • Related notations
  • b is the agents belief state b
  • ? is the agents policy to select the action
  • Two main tasks
  • Computing the belief state
  • Finding the optimal policy

T transition model O observation model R
reward model
S state set A action set Z observation set
7
Example (Roy et al. 2000)
  • ltS, A,Z,T,O,Rgt

output of ASR
8
Computing the belief state
observation model
transition model
old belief
new belief
normalizing constant
Example S s1,s2, Aa1,a2, Zz1,z2,z3
bt(s1)
P(s1)
bt1(s1) given ata1,st1s1,ot1z1
9
Finding the optimal policy
V?(b)
  • V?(b) expected total discounted future reward
    starting from b for a policy ?
  • ? is the discount factor
  • The optimal policy

a1
a2
P(s1)
b
10
POMDP Dialogue management
Focus on spoken dialogue management, noisy
environment
11
Proposed POMDP Affective dialogue model
  • Using the factored POMDP
  • State set and observation set are composed of 6
    features
  • State set users goal (Gu), users affective
    state (Eu), users action (Au), users dialogue
    state (Du)
  • Observation set observed users action (OAu)
    observed users affective state (OEu)

12
Transition model observation model
  • No data available ? Use parameters
  • pgc pec are the probability the users goal
    affective change
  • pe is the probability of the users action error
    being induced by emotion
  • poa poe are the probabilities of the observed
    action observed affective state errors
  • Partial or full data available ? construct and
    adjust the model from the collected data

13
Example Simulated Route navigation in the unsafe
tunnel
rd-b
rd-a
a
b
c
rd-c
14
Model specification
  • State space (including an absorbing end state)
  • Gu a,b,c
  • Eu stress, no-stress
  • Au a,b,c,yes,no
  • Du 1location-specified, 2location-not-specifi
    ed
  • System action
  • A ask, confirm-a, confirm-b, confirm-c, rd-a,
    rd-b,rd-c,fail
  • Observation
  • OEu stress, nostress
  • OAu a,b,c,yes,no
  • Reward
  • Confirms before the location is specified ?
    reward -2
  • Fail action ? reward -5
  • rd-x with gux ? 10 otherwise -10
  • The reward for any action taken in end state is 0
  • The reward for other action is -1

(rd means give route description)
15
Possible dialogue strategies
a
ask ? rd-a ask ? confirm-a ? rd-a ask ? ask ?
confirm-a ? rd-a
a
yes
a
a
yes
Some of them are useful. Which ones are optimal?
16
Optimal policy (Using Standard PBVI Algorithm
27.83s)
  • Test case
  • Reformulated model

17
Value function table
the optimal the action should start given the
initial belief
18
Expected return vs. users action error being
induced by stress (pe)
No observation error
Low observation error
High observation error
Test results were carried out using Perseus
algorithm on full POMDP model (61 states, 8
actions, 10 observations)
19
Comparing the result (using the simulated user)
20
Conclusions
  • The optimal dialogue strategy depends on the
    correlation between the users affective state
    action
  • The factored POMDP allows integrating the
    features of states, actions, observations in a
    flexible way
  • But!!!
  • Computational complexity in finding the optimal
    policy using both exact and some approximate
    algorithms except small, toy dialogue problems
  • Recent advances in approximate POMDP techniques
    plus heuristics in dialogue model design are
    expected to solve real-world dialogue applications

21
Future work
  • Scaling up the model with larger state, action,
    observation sets for real-world dialogue
    management problems
  • Extending the model representation, e.g.
    correlations between users emotion goal
  • Collecting generating both real artificial
    data to build and train the model

22
  • Thank you
Write a Comment
User Comments (0)