Learning of Mediation Strategies for Heterogeneous Agents Cooperation - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Learning of Mediation Strategies for Heterogeneous Agents Cooperation

Description:

Learning of Mediation Strategies for Heterogeneous Agents Cooperation R. Charton, A. Boyer and F. Charpillet Maia Team - LORIA France ICTAI'03 Sacramento, CA ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 30
Provided by: Romaric5
Category:

less

Transcript and Presenter's Notes

Title: Learning of Mediation Strategies for Heterogeneous Agents Cooperation


1
Learning of Mediation Strategies for
Heterogeneous Agents Cooperation
  • R. Charton, A. Boyer and F. Charpillet
  • Maia Team - LORIA France
  • ICTAI'03 Sacramento, CA, USA November, 4th
    2003

2
Context of our works
  • Industrial collaboration for the design of
    adaptive services that are multimedia,
    interactive, and general public.
  • Focus on Information Retrieval assistance
  • Constraints
  • User occasional, novice
  • Information Source ownership, costs
  • Goal To enhance the service quality

3
Cooperation in heterogeneous multi-agent systems
  • Agents of different nature human, software,
    robots

How to make these agents cooperate ? Achieve
together applicative goals that satisfy a subset
of agents
4
Presentation Overview Learning of Mediation
Strategies for Heterogeneous Agents Cooperation
  • Typical example of interaction
  • Mediator and Mediation Strategies
  • Towards an MDP based Mediation
  • Our prototype of Mediator
  • Experiments and results

5
An example of problem a flight booking system
Customer
6
Role of the mediator agent
  • The mediator has to perform an useful task
  • Build a query that matches the most the user goal
  • Provide relevant results to the user
  • Maximize an utility approximation
  • User satisfaction to be maximized
  • Information Source cost to be minimized
  • At any time, the mediator can
  • Ask a question to the user about the query
  • Send the query to the information source
  • Propose a limited number of results to the user

In return, it perceives the other agent's answers
values, results, selections, rejections
7
Mediation and Mediation strategies
A mediation is a sequence of question answer
interactions between the agents directed by the
mediator. It is successful if the user gets the
relevant results or if the mediator discovers
that the source can't give any result.
A mediation strategy specifies which action the
mediator must select to control the mediation,
according to the current state of the
interactions.
  • Now, the question is how to
  • produce the mediator's behavior ?
  • optimize the service quality ?
  • This requires finding an optimal mediation
    strategy

8
Progressive Query building
Query Precision
Fully specified
Sufficiently specified
Partially specified
Number of Interactions
Totally Unknown
9
Requirements for the Mediator
  • Management of uncertainty and of imperfect
    knowledge
  • agents
  • users may misunderstand the questions
  • users may have a partial knowledge of their needs
  • environment
  • noise during communication
  • imperfect sensors (for instance speech
    recognition)
  • This requires an adaptive behavior
  • We propose to model the mediation problem with an
    MDP and to
  • compute a stochastic behavior for the mediator.

10
Markov Decision Process (MDP)
  • Stochastic model ltS,A,T,Rgt
  • Reward R S ? A ? S ? IR
  • Take a decision according to a Policy
  • ? S ? A ? 01

Compute a Mediation Strategy leads to Compute a
Stochastic Policy
11
Modeling of the flight booking example
  • Define the model
  • S State Space
  • A Mediator's actions
  • T Transitions
  • R Rewards

12
States How to describe goals and objects ?
  • Form filling approach (Goddeau et al. 1996)
  • Queries and source objects are described within a
    referential. The referential is built on a set of
    attributes
  • Ref At 1, , At m
  • Example of referential
  • Departure London, Geneva, Paris, Berlin,
  • Arrival Sacramento, Beijing, Moscow,
  • Class Business, Normal, Economic, ...

13
State space S
User
Source
Mediator
S S U ? S R
14
State abstraction
  • The size of the state space S is (2 n 1) (2i) m
    where
  • n total count of objects of the information
    source
  • m number of attributes
  • i average number of values per attribute

? The size of the abstract state S space 4 ? 3m
15
Actions of the Mediator
User
Source
Mediator
16
Rewards
User
Source
  • Rewards can be obtained

Mediator
17
Example of mediation with the flight booking
service
State s Abstraction s Mediator Action Answers Rewards
lt?, ?, ? ?gt lt?, ?, ? ?gt Ask user for departure Paris 0
ltParis, ?, ? ?gt ltA, ?, ? ?gt Ask source for results 1700 flights - R Overnum
ltParis, ?, ? nr Max first flights gt ltA, ?, ? gt Ask user for destination Sacramento 0
ltParis, Sacramento, ? ?gt ltA, A, ? ?gt Ask user for flight class I don't know 0
ltParis, Sacramento, F ?gt ltA, A, F ?gt Ask source for results 4 flights 0
ltParis, Sacramento, F 4 flightsgt ltA, A, F gt Ask user for selection Selection 2 R Selection
Mediator
User
Source
Colors used
18
Compute the Mediation Strategy
  • Problem Two parts of model the are unknown !
  • T f (user, information source)
  • R f (user, information source)

? Learn the Mediation Strategy by reinforcement
19
Reinforcement Learning
Dynamic System
20
Q-Learning (Watkins 89)
  • Reinforcement Learning method
  • Can be used online

21
Mediator Architecture
Mediator Agent
Decision Module (Q-Learning)
User Profile
Task Manager (real state)
Interaction Manager
User / Client Agent
Information Agent Source
22
Experimentation on the flight-booking
application
  • We trained the mediator task with
  • 3 Attributes (cities of departure/arrival and
    flight class)
  • 4 Attributes ( the time of day for taking off)
  • 5 Attributes ( the airline)

Complexity growth as function of the number of
attributes.
of Attributes (m) of Abstract states (4.3 m) of Actions (3.m2) of Q-Values ((12.m8).3 m)
3 108 11 1,188
4 324 14 4,536
5 972 17 16,524
23
Learning results for 3-5 attributes ( of hits)
  • 3 and 4 attributes 99 of selection (close to
    optimal)
  • 5 attributes 90 of selection (more time
    required to converge)

24
Learning results for 3-5 attributes (avg. length)
  • 3 and 4 attributes the minimal length of the
    mediation is reached
  • 5 attributes longer mediations

25
Conclusion
  • Advantages
  • MDPRL allows to learn mediation strategies
  • Answers to the needs of a majority of users
    (profiles)
  • Designer Oriented ? User Oriented
  • Incremental Approach
  • Implemented Solution
  • Limits
  • User is partially observable, especially through
    imperfect sensors, like speech recognition
  • Degradation of performance for more complex tasks

26
Future works
  • Use other probabilistic models and methods
  • Learn on pre-established policy
  • Learn the model (Sutton DynaQ, Classifiers)
  • POMDP approach (Modified Q-learning, Baxter
    Gradient)
  • For more generic / complex tasks
  • Abstraction Scalability Change the abstract
    state space for a better guidance of the process
    in the real state
  • Hierarchical decomposition (H-MPD H-POMDP) with
    attribute dependencies management
  • (ex City ? Possible Company ? Specific options)

27
Thank you for your attention Any questions ?
28
References
  • (Allen et al. 2000) Allen J., Byron D., Dzikovska
    M., Ferguson G, Galescu L., Stent A., An
    Architecture for a Generic Dialogue Shell. In
    Natural Language Engineering, Cambridge
    University Press, vol 6, 2000.
  • (Young 1999) Young S., Probabilistic Methods in
    Spoken Dialog Systems. In Royal Society, London,
    September 1999.
  • (Levin et al. 1998) Levin E, Pieraccini R. and
    Eckert W. Using Markov Decision Process for
    Learning Dialogue Strategies. In Proceedings of
    ICASSP'98, Seattle, USA, 1998.
  • (Goddeau et al. 1996) Goddeau D., Meng H.,
    Polifroni J., Seneff S., Busayapongchaiy S., A
    Form-Based Dialogue Manager For Spoken Language
    Applications, In Proceedings of ICSLP'96,
    Philadelphia, 1996.
  • (Sutton Barto 1998) R. S. and Barto A. G.
    Reinforcement Learning An Introduction. MIT
    Press Cambridge MA, 1998.
  • (Watkins 1989) Watkins C., Learning from Delayed
    Rewards. PhD Thesis of the King's College,
    University of Cambridge, England, 1989.
  • (Shardanand Maes 1995) Shardanand U. and Maes
    P., Social Information Filtering Algorithms for
    Automating "Word of Mouth", In Proceedings of ACM
    CHI'95, Vol. 1, pp. 210-217, 1995.

29
A trace in the Abstract State Space
ltA, ? ?gt ltA, ?, 0gt ltA, ? gt ltA, ? gt
ltA, A ?gt ltA, A, 0gt ltA, A gt ltA, A gt
ltF, ? ?gt ltF, ?, 0gt ltF, ? gt ltF, ? gt
ltF, A ?gt ltF, A, 0gt ltF, A gt ltF, A gt
lt?, ? ?gt lt?, ? 0gt lt?, ? gt lt?, ? gt
lt?, A ?gt lt?, A, 0gt lt?, A gt lt?, A gt
ltA, F ?gt ltA, F, 0gt ltA, F gt ltA, F gt
lt?, F ?gt lt?, F, 0gt lt?, F gt lt?, F gt
ltF, F ?gt ltF, F, 0gt ltF, F gt ltF, F gt
30
Implementation of the mediator
Agent Body SmallMu
Database
User
31
Roles and Service Classes
Write a Comment
User Comments (0)
About PowerShow.com