Learning of Mediation Strategies for Heterogeneous Agents Cooperation - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Description:

Learning of Mediation Strategies for Heterogeneous Agents Cooperation R. Charton, A. Boyer and F. Charpillet Maia Team - LORIA France ICTAI'03 Sacramento, CA ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 30

Provided by: Romaric5

Category:

more less

Transcript and Presenter's Notes

Title: Learning of Mediation Strategies for Heterogeneous Agents Cooperation

1
Learning of Mediation Strategies for
Heterogeneous Agents Cooperation

R. Charton, A. Boyer and F. Charpillet
Maia Team - LORIA France
ICTAI'03 Sacramento, CA, USA November, 4th
2003

2
Context of our works

Industrial collaboration for the design of
adaptive services that are multimedia,
interactive, and general public.
Focus on Information Retrieval assistance
Constraints
User occasional, novice
Information Source ownership, costs
Goal To enhance the service quality

3
Cooperation in heterogeneous multi-agent systems

Agents of different nature human, software,
robots

How to make these agents cooperate ? Achieve
together applicative goals that satisfy a subset
of agents
4
Presentation Overview Learning of Mediation
Strategies for Heterogeneous Agents Cooperation

Typical example of interaction
Mediator and Mediation Strategies
Towards an MDP based Mediation
Our prototype of Mediator
Experiments and results

5
An example of problem a flight booking system
Customer
6
Role of the mediator agent

The mediator has to perform an useful task
Build a query that matches the most the user goal
Provide relevant results to the user
Maximize an utility approximation
User satisfaction to be maximized
Information Source cost to be minimized

At any time, the mediator can
Ask a question to the user about the query
Send the query to the information source
Propose a limited number of results to the user

In return, it perceives the other agent's answers
values, results, selections, rejections
7
Mediation and Mediation strategies
A mediation is a sequence of question answer
interactions between the agents directed by the
mediator. It is successful if the user gets the
relevant results or if the mediator discovers
that the source can't give any result.
A mediation strategy specifies which action the
mediator must select to control the mediation,
according to the current state of the
interactions.

Now, the question is how to
produce the mediator's behavior ?
optimize the service quality ?

This requires finding an optimal mediation
strategy

8
Progressive Query building
Query Precision
Fully specified
Sufficiently specified
Partially specified
Number of Interactions
Totally Unknown
9
Requirements for the Mediator

Management of uncertainty and of imperfect
knowledge
agents
users may misunderstand the questions
users may have a partial knowledge of their needs
environment
noise during communication
imperfect sensors (for instance speech
recognition)

This requires an adaptive behavior
We propose to model the mediation problem with an
MDP and to
compute a stochastic behavior for the mediator.

10
Markov Decision Process (MDP)

Stochastic model ltS,A,T,Rgt

Reward R S ? A ? S ? IR

Take a decision according to a Policy
? S ? A ? 01

Compute a Mediation Strategy leads to Compute a
Stochastic Policy
11
Modeling of the flight booking example

Define the model
S State Space
A Mediator's actions
T Transitions
R Rewards

12
States How to describe goals and objects ?

Form filling approach (Goddeau et al. 1996)
Queries and source objects are described within a
referential. The referential is built on a set of
attributes
Ref At 1, , At m

Example of referential
Departure London, Geneva, Paris, Berlin,
Arrival Sacramento, Beijing, Moscow,
Class Business, Normal, Economic, ...

13
State space S
User
Source
Mediator
S S U ? S R
14
State abstraction

The size of the state space S is (2 n 1) (2i) m
where
n total count of objects of the information
source
m number of attributes
i average number of values per attribute

? The size of the abstract state S space 4 ? 3m
15
Actions of the Mediator
User
Source
Mediator
16
Rewards
User
Source

Rewards can be obtained

Mediator
17
Example of mediation with the flight booking
service
State s Abstraction s Mediator Action Answers Rewards
lt?, ?, ? ?gt lt?, ?, ? ?gt Ask user for departure Paris 0
ltParis, ?, ? ?gt ltA, ?, ? ?gt Ask source for results 1700 flights - R Overnum
ltParis, ?, ? nr Max first flights gt ltA, ?, ? gt Ask user for destination Sacramento 0
ltParis, Sacramento, ? ?gt ltA, A, ? ?gt Ask user for flight class I don't know 0
ltParis, Sacramento, F ?gt ltA, A, F ?gt Ask source for results 4 flights 0
ltParis, Sacramento, F 4 flightsgt ltA, A, F gt Ask user for selection Selection 2 R Selection
Mediator
User
Source
Colors used
18
Compute the Mediation Strategy

Problem Two parts of model the are unknown !
T f (user, information source)
R f (user, information source)

? Learn the Mediation Strategy by reinforcement
19
Reinforcement Learning
Dynamic System
20
Q-Learning (Watkins 89)

Reinforcement Learning method
Can be used online

21
Mediator Architecture
Mediator Agent
Decision Module (Q-Learning)
User Profile
Task Manager (real state)
Interaction Manager
User / Client Agent
Information Agent Source
22
Experimentation on the flight-booking
application

We trained the mediator task with
3 Attributes (cities of departure/arrival and
flight class)
4 Attributes ( the time of day for taking off)
5 Attributes ( the airline)

Complexity growth as function of the number of
attributes.
of Attributes (m) of Abstract states (4.3 m) of Actions (3.m2) of Q-Values ((12.m8).3 m)
3 108 11 1,188
4 324 14 4,536
5 972 17 16,524
23
Learning results for 3-5 attributes ( of hits)

3 and 4 attributes 99 of selection (close to
optimal)
5 attributes 90 of selection (more time
required to converge)

24
Learning results for 3-5 attributes (avg. length)

3 and 4 attributes the minimal length of the
mediation is reached
5 attributes longer mediations

25
Conclusion

Advantages
MDPRL allows to learn mediation strategies
Answers to the needs of a majority of users
(profiles)
Designer Oriented ? User Oriented
Incremental Approach
Implemented Solution

Limits
User is partially observable, especially through
imperfect sensors, like speech recognition
Degradation of performance for more complex tasks

26
Future works

Use other probabilistic models and methods
Learn on pre-established policy
Learn the model (Sutton DynaQ, Classifiers)
POMDP approach (Modified Q-learning, Baxter
Gradient)
For more generic / complex tasks
Abstraction Scalability Change the abstract
state space for a better guidance of the process
in the real state
Hierarchical decomposition (H-MPD H-POMDP) with
attribute dependencies management
(ex City ? Possible Company ? Specific options)

27
Thank you for your attention Any questions ?
28
References

(Allen et al. 2000) Allen J., Byron D., Dzikovska
M., Ferguson G, Galescu L., Stent A., An
Architecture for a Generic Dialogue Shell. In
Natural Language Engineering, Cambridge
University Press, vol 6, 2000.
(Young 1999) Young S., Probabilistic Methods in
Spoken Dialog Systems. In Royal Society, London,
September 1999.
(Levin et al. 1998) Levin E, Pieraccini R. and
Eckert W. Using Markov Decision Process for
Learning Dialogue Strategies. In Proceedings of
ICASSP'98, Seattle, USA, 1998.
(Goddeau et al. 1996) Goddeau D., Meng H.,
Polifroni J., Seneff S., Busayapongchaiy S., A
Form-Based Dialogue Manager For Spoken Language
Applications, In Proceedings of ICSLP'96,
Philadelphia, 1996.
(Sutton Barto 1998) R. S. and Barto A. G.
Reinforcement Learning An Introduction. MIT
Press Cambridge MA, 1998.
(Watkins 1989) Watkins C., Learning from Delayed
Rewards. PhD Thesis of the King's College,
University of Cambridge, England, 1989.
(Shardanand Maes 1995) Shardanand U. and Maes
P., Social Information Filtering Algorithms for
Automating "Word of Mouth", In Proceedings of ACM
CHI'95, Vol. 1, pp. 210-217, 1995.

29
A trace in the Abstract State Space
ltA, ? ?gt ltA, ?, 0gt ltA, ? gt ltA, ? gt
ltA, A ?gt ltA, A, 0gt ltA, A gt ltA, A gt
ltF, ? ?gt ltF, ?, 0gt ltF, ? gt ltF, ? gt
ltF, A ?gt ltF, A, 0gt ltF, A gt ltF, A gt
lt?, ? ?gt lt?, ? 0gt lt?, ? gt lt?, ? gt
lt?, A ?gt lt?, A, 0gt lt?, A gt lt?, A gt
ltA, F ?gt ltA, F, 0gt ltA, F gt ltA, F gt
lt?, F ?gt lt?, F, 0gt lt?, F gt lt?, F gt
ltF, F ?gt ltF, F, 0gt ltF, F gt ltF, F gt
30
Implementation of the mediator
Agent Body SmallMu
Database
User
31
Roles and Service Classes

Write a Comment

User Comments (0)