Intelligent Agents: Technology and Applications Multiagent Learning

1 / 65
About This Presentation
Title:

Intelligent Agents: Technology and Applications Multiagent Learning

Description:

Having a global view might lead to sub-optimal results ... Goal of learning is to find an optimal policy for selecting actions. The Q value. R: Reward ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 66
Provided by: ist6

less

Transcript and Presenter's Notes

Title: Intelligent Agents: Technology and Applications Multiagent Learning


1
Intelligent Agents Technology and
ApplicationsMulti-agent Learning
  • IST 597B
  • Spring 2003
  • John Yen

2
Learning Objectives
  • How to identify goals for agent projects?
  • How to design agents?
  • How to identify risks/obstacles early on?

3
Multi-Agent Learning
4
Multi-Agent Learning
  • The learned behavior can be used as a basis for
    more complex interactive behavior
  • Enables agent to participate in higher level
    collaborative or adversarial learning situations
  • Learning would not be possible if the agent was
    isolated

5
Examples
  • Examples of single agent learning in a
    multi-agent environment
  • Reinforcement Learning agent which incorporates
    information gathered by another agent (Tan, 93)
  • Agent learning negotiating techniques of another
    using Bayesian Learning (Zeng Sycara, 96)
  • Class of multi-agent learning in which an agent
    attempts to model another agent

6
Examples
  • Training scenario in which a novice agent learns
    from a knowledgeable agent (Clouse, 96)
  • A common thing among all the examples is that the
    learning agent is interacting with other agents

7
Predator/Pray (Pursuit) Domain
  • Introduced by Bends et. al (86)
  • Four predators and one prey
  • Goal to capture (or surround) the prey
  • Not a complex real-world, toy domain that helps
    concretize concepts

8
Predator/Pray (Pursuit) Domain
9
Taxonomy of MAS
  • Taxonomy organized along
  • the degree of heterogeneity, and
  • the degree of communication
  • Homogenous, Non-Communicating Agents
  • Heterogeneous, Non-Communicating Agents
  • Homogenous, Communicating Agents
  • Heterogeneous, Communicating Agents

10
Taxonomy of MAS
11
Taxonomy of MAS
12
1. Homogenous, Non-Communicating Agents
  • All agents have the same internal structure
  • Goals
  • Domain knowledge
  • Actions
  • The only difference is their sensory input and
    the actions that they take
  • They are situated differently in the world
  • Korf (1992) introduces a policy for each
    predictor based on an attractive force to the
    prey and a repulsive force from other preditors

13
1. Homogenous, Non-Communicating Agents
  • Korf concludes that explicit cooperation is not
    necessary
  • Haynes Sen show that Korfs heuristic does not
    work for certain instantiation of the domain

14
1. Homogenous, Non-Communicating Agents
  • Issues
  • Reactive vs. deliberative agents
  • Local vs. global perspective
  • Modeling of other agents
  • How to affect others
  • Further learning opportunities

15
1 Reactive vs. Deliberative Agents
  • Reactive agents do not maintain an internal state
    and simply retrieve pre-set behaviors
  • Deliberative agents maintain an internal state
    and behave by searching through a space of
    behaviors, predicting the action of other agents
    and the effect of actions

16
2 Local vs. Global Perspective
  • How much sensory input should be available to
    agents? (observability)
  • Having a global view might lead to sub-optimal
    results
  • Better performance by agents with less knowledge
    Ignorance is Bliss

17
3 Modeling of Other Agents
  • Since agents are identical, they can predict each
    others actions given the sensory input
  • Recursive Modeling Method to model the internal
    state of another agent in order to predict its
    actions
  • Each predator bases its move on the predicted
    move of other predators and vice versa
  • Since reasoning can recurse indefinitely, it
    should be limited in terms of time or recursion

18
3 Modeling of Other Agents
  • If agents know too much, RMM could recurse
    indefinitely
  • For coordination to be possible, some potential
    knowledge must be ignored
  • Schmidhuber (1996) shows that agents can
    cooperate without modeling each other
  • They consider each other as part of the
    environment

19
4 How to Affect Others
  • Without communication, agents cannot affect each
    other directly
  • Can affect each other indirectly in several ways
  • They can be sensed by other agents
  • Change the state of another agent (e.g. by
    pushing it)
  • Affect each other by stigmergy (Becker, 94)

20
4 How to Affect Others
  • Active stigmergy
  • an agent alters the environment so as to effect
    the sensory input of another agent. E.g. an agent
    might leave a marker for other agents to observe
  • Passive stigmergy
  • altering the environment so that the effect of
    another agents actions change. If an agent turns
    of the main water valve of a building, the effect
    of another agent turning on the faucet is altered

21
4 How to Affect Others
  • Example A number of robots in an area with many
    pucks scattered around. Robots reactively move
    straight (turning at walls) until they are
    pushing 3 or more pucks. Then they back up and
    turn away
  • Although robots do not communicate, they can
    collect the pucks in a single pile over time
  • When a robot approaches an existing pile, it adds
    the pucks and turns away
  • A robot approaching an existing pile obliquely
    might take a puck away, but over time the desired
    result is accomplished

22
5 Further Learning Opportunities
  • An agent might try to learn to take actions that
    will not help it directly in the current
    situation, but may allow other agents to be more
    effective in the future.
  • In Traditional RL, if an action leads to a reward
    by another agent, the acting agent may have no
    way of reinforcing that action

23
2. Heterogeneous, Non-Communicating Agents
  • Can be heterogeneous in any of following
  • Goals
  • Actions
  • Domain knowledge
  • In the pursuit domain, the prey can be modeled as
    an agent
  • Haynes et. al. have used GA and case-based
    reasoning to make predators learn to cooperate in
    absence of communication

24
2. Heterogeneous, Non-Communicating Agents
  • They also explore the possibility of evolving
    both predators and the prey
  • Predators use Korfs greedy heuristic
  • Though one might think this will result in
    repeated improvement of predator and prey with no
    convergence, a prey behavior emerges that always
    succeeds
  • Prey simply moves in a constant straight line
  • Haynes et. al. conclude Korfs greedy algorithm
    relies on random prey movement

25
2. Heterogeneous, Non-Communicating Agents
  • Issues
  • Benevolence vs. competitiveness
  • Fixed vs. learning agents
  • Modeling of other agents
  • Resource management
  • Social conventions

26
1 Benevolence vs. Competitiveness
  • Can be benevolent even if they have different
    goals (if they are willing to help each other)
  • Selfish agents more effective and biologically
    plausible
  • Agents cooperate because it is in their own best
    interest

27
1 Benevolence vs. Competitiveness
  • Prisoners dilemma two burglars are captured.
    Each has to choose whether or not to confess and
    implicate the other. If neither confess, they
    will both serve 1 year. If both confess they will
    both serve 10 years. If one confesses and the
    other does not, the one who has collaborated will
    go free and the other will serve for 20 years

28
1 Benevolence vs. Competitiveness
29
1 Benevolence vs. Competitiveness
  • Each agent will decide to confess to maximize its
    own interest
  • If both confess, they will get 10 years each
  • If they had acted irrationally and kept quiet,
    they would each get 1 year
  • Mor et.al. (1995) show that in repeated
    prisoners dilemma cooperative behavior can emerge

30
1 Benevolence vs. Competitiveness
  • In zero-sum games cooperation is not sensible
  • If a third dimension was to be added to the
    taxonomy, besides the degree of heterogeneity and
    communication, it would be benevolence vs.
    competitiveness

31
2 Fixed vs. Learning Agents
  • Learning agents desirable in dynamic environments
  • Competitive vs. cooperative learning
  • Possibility of arms race in competitive
    learning. Competing agents continually adapt to
    each other in more and more specialized ways,
    never stabilizing at a good behavior

32
2 Fixed vs. Learning Agents
  • Credit-assignment problem when performance of an
    agent improves, it is not clear whether the
    improvement is due to an improvement in the
    agents behavior or a negative behavior in the
    opponents behavior. Same problem if the
    performance of an agent gets worse.
  • One solution is to fix the one agent while
    allowing the other to learn and the to switch.
    Encourages more arms race than ever!

33
3 Modeling of other agents
  • Goals, actions and domain knowledge of other
    agents may be unknown and need modeling
  • Without communication, modeling is done strictly
    through observation
  • RMM is good for modeling the states of homogenous
    agents
  • Tambe (1995) takes it one step further, studying
    how agents can learn models of teams of agents

34
4 Resource Management
  • Examples
  • Network traffic problem several agents send
    information through the same network (GA)
  • Load balancing several users have limited amount
    of computing power to share among them (RL)
  • Braess Paradox (Glance et. al., 1995) adding
    more resources to a network but getting worse
    performance

35
5 Social Conventions
  • Imagine you are to meet a friend in Paris. You
    both arrive on the same day but were unable to
    get in touch to set a time and place. Where will
    you go and when?
  • 75 of audience at AAAI-95 Symposium on Active
    Learning answered (without prior communication)
    they would go to Eiffel tower at noon.
  • Even without communication agents are able to
    coordinate actions

36
3. Homogenous, Communicating Agents
  • Communication can be either broadcast or
    point-to-point
  • Issues
  • Distributed sensing
  • Distributed vision project (Matsuyama, 1997)
  • Trafficopter system (Moukas et. al., 1997)
  • Communication content
  • What they should communicate? states, or goals?
  • Further learning opportunities
  • When to communicate?

37
4. Heterogeneous, Communicating Agents
  • Tradeoff between cost and freedom
  • Osawa suggests predators should go through 4
    phases
  • Autonomy, communication, negotiation, and control
  • When they stop making progress using one
    strategy, they should move to the next expensive
    strategy
  • Increasing order of cost (decreasing order of
    freedom)

38
4. Heterogeneous, Communicating Agents
  • Important issues
  • Understanding each other
  • Planning communication acts
  • Negotiation
  • Commitment/decommitment
  • Further learning opportunities

39
1 Understanding Each Other
  • Need some set protocol for communication
  • Aspects of the protocol
  • Information content KIF (Genesereth, 92)
  • Message Format KQML (Finin, 94)
  • Coordination COOL (Barbuceanu, 95)

40
2 Planning Communication Acts
  • The theory of communication as action is called
    speech acts
  • Communication acts have precondition and effects
  • Effects might be to alter an agents belief about
    the state of another agent or agents

41
3 Negotiation
  • Design negotiating MAS based on law of supply and
    demand
  • Contract nets (Smith, 1990)
  • Agents have their own goals, are self-interested,
    and have limited reasoning resources. They bid to
    accept tasks from other agents and can then
    either perform the task or subcontract it to
    another agent. Agent must pay to contract their
    tasks.

42
3 Negotiation
  • MAS controlling air temperature in different
    rooms of a building
  • An agent can set the thermostat to any
    temperature. Depending on the actual air
    temperature, the agent can buy hot or cold air
    from another room that has an excess. At the same
    time the agent can sell the excess air at the
    current temperature to other rooms. Modeling the
    loss of heat in transfer from one room to
    another, the agents try to buy and sell at the
    best possible prices.

43
4 Commitment/Decommitment
  • Agent agrees to pursue a given goal regardless of
    how much it serves its own interest
  • Commitments can make the systems run more
    smoothly by making agents trust each other
  • Unclear how to make self-interested agents to
    commit to others
  • Belief/desire/intention (BDI) a popular technique
    for modeling other agents
  • Used in OASIS air traffic control

44
5 Further Learning Opportunities
  • Instead of predefining a protocol, allow the
    agents to learn for themselves what to
    communicate and how to interpret it
  • Possible result would be more efficient
    communication

45
Q Learning
  • Assess state action pairs (s, a) using a Q value
  • Learn the Q value using rewards/feedback
  • A reward receives at time t is discounted to
    previous state-action pairs (using a discount
    factor)
  • Goal of learning is to find an optimal policy for
    selecting actions.

46
The Q value
R Reward Pxy The probability of reaching state
y from x by taking action action alpha. Gamma
Discount factor (between 0 and 1). V(y) The
expected total discounted return starting in y
following the policy . Policy a sequence of
actions.
47
The Expected Total Discount Return V for a state
is the maximal Q value among all actions that can
be taken at the state (following the rest of the
policy).
48
(No Transcript)
49
Learning Rule for Q value Alpha learning rate
50

for all
and
1.
and
2.
Do Forever
(a)
the current state
over all
Choose an action
(b)
that maximizes
(c)
in the world. Let the short term reward be
, and the new state be
Carry out action
(d)
(e)
(f)
do
For each state-action pair
(g)
(h)
51
Probability for the agent to select action ai
based on Q values T temperature parameter to
determine the randomness of decisions.
52
Towards Collaborative and Adversarial LearningA
Case Study in Robotic Soccer
  • Peter Stone Manuela Veloso

53
Introduction
  • Layered learning, to develop complex multi-agent
    behaviors from simple ones
  • Simple multi-agent behavior in Robotic Soccer, to
    shoot a moving ball
  • Passer
  • Shooter
  • Behavior to be learnt When the shooter should
    begin to move (shooting policy)

54
Simple Behavior
55
Parameters
  • Ball speed (fixed vs. variable)
  • Ball trajectory (fixed vs. variable)
  • Goal location (fixed vs. variable)
  • Action quadrant (fixed vs. variable)

56
Parameters
57
Fixed Ball Motion
  • Simple shooting policy begin accelerating when
    the balls distance to its projected point of
    intersection with the agents path reaches 110
    units
  • 100 success rate if shooter position fixed
  • 61 success rate if shooter position variable
  • Use Neural network,
  • Inputs to NN (coordinate independent)
  • Ball distance
  • Agent distance
  • Heading offset
  • Output 1 or 0 (shot successful or not)
  • Use random shooting policy for training

58
Neural Network
59
Results
60
Varying Ball Speed
  • Add a fourth input to NN, Ball Speed

61
Varying Balls Trajectory
  • Use the same shooting policy
  • Use another NN to determine the direction the
    shooter should steer (shooters aiming policy)

62
Moving the Goal
  • Can think of it as aiming for different parts of
    the goal
  • Change nothing but the shooters knowledge of the
    goal location

63
Cooperative Learning
  • Passing a moving ball
  • Passer where to aim the pass,
  • Shooter where to position itself

64
Cooperative Learning
65
Adversarial Learning
66
References
  • Peter Stone, Manuela Veloso, 2000, Multi-Agent
    Systems A Survey from a Machine Learning
    Perspective
  • Ming Tan, 1993, Multi-Agent Reinforcement
    Learning Independent vs. Cooperative Agents
  • Peter Stone, Manuela Veloso, 1998, Toward
    Collaborative and Adversarial Learning A Case
    Study in Robotic Soccer
Write a Comment
User Comments (0)