Title: Learning of Mediation Strategies for Heterogeneous Agents Cooperation
1Learning of Mediation Strategies for
Heterogeneous Agents Cooperation
- R. Charton, A. Boyer and F. Charpillet
- Maia Team - LORIA France
- ICTAI'03 Sacramento, CA, USA November, 4th
2003
2Context of our works
- Industrial collaboration for the design of
adaptive services that are multimedia,
interactive, and general public. - Focus on Information Retrieval assistance
- Constraints
- User occasional, novice
- Information Source ownership, costs
- Goal To enhance the service quality
3Cooperation in heterogeneous multi-agent systems
- Agents of different nature human, software,
robots
How to make these agents cooperate ? Achieve
together applicative goals that satisfy a subset
of agents
4Presentation Overview Learning of Mediation
Strategies for Heterogeneous Agents Cooperation
- Typical example of interaction
- Mediator and Mediation Strategies
- Towards an MDP based Mediation
- Our prototype of Mediator
- Experiments and results
5An example of problem a flight booking system
Customer
6Role of the mediator agent
- The mediator has to perform an useful task
- Build a query that matches the most the user goal
- Provide relevant results to the user
- Maximize an utility approximation
- User satisfaction to be maximized
- Information Source cost to be minimized
- At any time, the mediator can
- Ask a question to the user about the query
- Send the query to the information source
- Propose a limited number of results to the user
In return, it perceives the other agent's answers
values, results, selections, rejections
7Mediation and Mediation strategies
A mediation is a sequence of question answer
interactions between the agents directed by the
mediator. It is successful if the user gets the
relevant results or if the mediator discovers
that the source can't give any result.
A mediation strategy specifies which action the
mediator must select to control the mediation,
according to the current state of the
interactions.
- Now, the question is how to
- produce the mediator's behavior ?
- optimize the service quality ?
- This requires finding an optimal mediation
strategy
8Progressive Query building
Query Precision
Fully specified
Sufficiently specified
Partially specified
Number of Interactions
Totally Unknown
9Requirements for the Mediator
- Management of uncertainty and of imperfect
knowledge - agents
- users may misunderstand the questions
- users may have a partial knowledge of their needs
- environment
- noise during communication
- imperfect sensors (for instance speech
recognition)
- This requires an adaptive behavior
- We propose to model the mediation problem with an
MDP and to - compute a stochastic behavior for the mediator.
10Markov Decision Process (MDP)
- Stochastic model ltS,A,T,Rgt
- Take a decision according to a Policy
- ? S ? A ? 01
Compute a Mediation Strategy leads to Compute a
Stochastic Policy
11Modeling of the flight booking example
- Define the model
- S State Space
- A Mediator's actions
- T Transitions
- R Rewards
12States How to describe goals and objects ?
- Form filling approach (Goddeau et al. 1996)
- Queries and source objects are described within a
referential. The referential is built on a set of
attributes - Ref At 1, , At m
- Example of referential
- Departure London, Geneva, Paris, Berlin,
- Arrival Sacramento, Beijing, Moscow,
- Class Business, Normal, Economic, ...
13State space S
User
Source
Mediator
S S U ? S R
14State abstraction
- The size of the state space S is (2 n 1) (2i) m
where - n total count of objects of the information
source - m number of attributes
- i average number of values per attribute
? The size of the abstract state S space 4 ? 3m
15Actions of the Mediator
User
Source
Mediator
16Rewards
User
Source
Mediator
17Example of mediation with the flight booking
service
State s Abstraction s Mediator Action Answers Rewards
lt?, ?, ? ?gt lt?, ?, ? ?gt Ask user for departure Paris 0
ltParis, ?, ? ?gt ltA, ?, ? ?gt Ask source for results 1700 flights - R Overnum
ltParis, ?, ? nr Max first flights gt ltA, ?, ? gt Ask user for destination Sacramento 0
ltParis, Sacramento, ? ?gt ltA, A, ? ?gt Ask user for flight class I don't know 0
ltParis, Sacramento, F ?gt ltA, A, F ?gt Ask source for results 4 flights 0
ltParis, Sacramento, F 4 flightsgt ltA, A, F gt Ask user for selection Selection 2 R Selection
Mediator
User
Source
Colors used
18Compute the Mediation Strategy
- Problem Two parts of model the are unknown !
- T f (user, information source)
- R f (user, information source)
? Learn the Mediation Strategy by reinforcement
19Reinforcement Learning
Dynamic System
20Q-Learning (Watkins 89)
- Reinforcement Learning method
- Can be used online
21Mediator Architecture
Mediator Agent
Decision Module (Q-Learning)
User Profile
Task Manager (real state)
Interaction Manager
User / Client Agent
Information Agent Source
22Experimentation on the flight-booking
application
- We trained the mediator task with
- 3 Attributes (cities of departure/arrival and
flight class) - 4 Attributes ( the time of day for taking off)
- 5 Attributes ( the airline)
Complexity growth as function of the number of
attributes.
of Attributes (m) of Abstract states (4.3 m) of Actions (3.m2) of Q-Values ((12.m8).3 m)
3 108 11 1,188
4 324 14 4,536
5 972 17 16,524
23Learning results for 3-5 attributes ( of hits)
- 3 and 4 attributes 99 of selection (close to
optimal) - 5 attributes 90 of selection (more time
required to converge)
24Learning results for 3-5 attributes (avg. length)
- 3 and 4 attributes the minimal length of the
mediation is reached - 5 attributes longer mediations
25Conclusion
- Advantages
- MDPRL allows to learn mediation strategies
- Answers to the needs of a majority of users
(profiles) - Designer Oriented ? User Oriented
- Incremental Approach
- Implemented Solution
- Limits
- User is partially observable, especially through
imperfect sensors, like speech recognition - Degradation of performance for more complex tasks
26Future works
- Use other probabilistic models and methods
- Learn on pre-established policy
- Learn the model (Sutton DynaQ, Classifiers)
- POMDP approach (Modified Q-learning, Baxter
Gradient) - For more generic / complex tasks
- Abstraction Scalability Change the abstract
state space for a better guidance of the process
in the real state - Hierarchical decomposition (H-MPD H-POMDP) with
attribute dependencies management - (ex City ? Possible Company ? Specific options)
27Thank you for your attention Any questions ?
28References
- (Allen et al. 2000) Allen J., Byron D., Dzikovska
M., Ferguson G, Galescu L., Stent A., An
Architecture for a Generic Dialogue Shell. In
Natural Language Engineering, Cambridge
University Press, vol 6, 2000. - (Young 1999) Young S., Probabilistic Methods in
Spoken Dialog Systems. In Royal Society, London,
September 1999. - (Levin et al. 1998) Levin E, Pieraccini R. and
Eckert W. Using Markov Decision Process for
Learning Dialogue Strategies. In Proceedings of
ICASSP'98, Seattle, USA, 1998. - (Goddeau et al. 1996) Goddeau D., Meng H.,
Polifroni J., Seneff S., Busayapongchaiy S., A
Form-Based Dialogue Manager For Spoken Language
Applications, In Proceedings of ICSLP'96,
Philadelphia, 1996. - (Sutton Barto 1998) R. S. and Barto A. G.
Reinforcement Learning An Introduction. MIT
Press Cambridge MA, 1998. - (Watkins 1989) Watkins C., Learning from Delayed
Rewards. PhD Thesis of the King's College,
University of Cambridge, England, 1989. - (Shardanand Maes 1995) Shardanand U. and Maes
P., Social Information Filtering Algorithms for
Automating "Word of Mouth", In Proceedings of ACM
CHI'95, Vol. 1, pp. 210-217, 1995.
29A trace in the Abstract State Space
ltA, ? ?gt ltA, ?, 0gt ltA, ? gt ltA, ? gt
ltA, A ?gt ltA, A, 0gt ltA, A gt ltA, A gt
ltF, ? ?gt ltF, ?, 0gt ltF, ? gt ltF, ? gt
ltF, A ?gt ltF, A, 0gt ltF, A gt ltF, A gt
lt?, ? ?gt lt?, ? 0gt lt?, ? gt lt?, ? gt
lt?, A ?gt lt?, A, 0gt lt?, A gt lt?, A gt
ltA, F ?gt ltA, F, 0gt ltA, F gt ltA, F gt
lt?, F ?gt lt?, F, 0gt lt?, F gt lt?, F gt
ltF, F ?gt ltF, F, 0gt ltF, F gt ltF, F gt
30Implementation of the mediator
Agent Body SmallMu
Database
User
31Roles and Service Classes