Title: Intelligent Agents: Technology and Applications Multiagent Learning
1Intelligent Agents Technology and
ApplicationsMulti-agent Learning
- IST 597B
- Spring 2003
- John Yen
2Learning Objectives
- How to identify goals for agent projects?
- How to design agents?
- How to identify risks/obstacles early on?
3Multi-Agent Learning
4Multi-Agent Learning
- The learned behavior can be used as a basis for
more complex interactive behavior - Enables agent to participate in higher level
collaborative or adversarial learning situations - Learning would not be possible if the agent was
isolated
5Examples
- Examples of single agent learning in a
multi-agent environment - Reinforcement Learning agent which incorporates
information gathered by another agent (Tan, 93) - Agent learning negotiating techniques of another
using Bayesian Learning (Zeng Sycara, 96) - Class of multi-agent learning in which an agent
attempts to model another agent
6Examples
- Training scenario in which a novice agent learns
from a knowledgeable agent (Clouse, 96) - A common thing among all the examples is that the
learning agent is interacting with other agents
7Predator/Pray (Pursuit) Domain
- Introduced by Bends et. al (86)
- Four predators and one prey
- Goal to capture (or surround) the prey
- Not a complex real-world, toy domain that helps
concretize concepts -
8Predator/Pray (Pursuit) Domain
9Taxonomy of MAS
- Taxonomy organized along
- the degree of heterogeneity, and
- the degree of communication
- Homogenous, Non-Communicating Agents
- Heterogeneous, Non-Communicating Agents
- Homogenous, Communicating Agents
- Heterogeneous, Communicating Agents
10Taxonomy of MAS
11Taxonomy of MAS
121. Homogenous, Non-Communicating Agents
- All agents have the same internal structure
- Goals
- Domain knowledge
- Actions
- The only difference is their sensory input and
the actions that they take - They are situated differently in the world
- Korf (1992) introduces a policy for each
predictor based on an attractive force to the
prey and a repulsive force from other preditors
131. Homogenous, Non-Communicating Agents
- Korf concludes that explicit cooperation is not
necessary - Haynes Sen show that Korfs heuristic does not
work for certain instantiation of the domain
141. Homogenous, Non-Communicating Agents
- Issues
- Reactive vs. deliberative agents
- Local vs. global perspective
- Modeling of other agents
- How to affect others
- Further learning opportunities
151 Reactive vs. Deliberative Agents
- Reactive agents do not maintain an internal state
and simply retrieve pre-set behaviors - Deliberative agents maintain an internal state
and behave by searching through a space of
behaviors, predicting the action of other agents
and the effect of actions
162 Local vs. Global Perspective
- How much sensory input should be available to
agents? (observability) - Having a global view might lead to sub-optimal
results - Better performance by agents with less knowledge
Ignorance is Bliss
173 Modeling of Other Agents
- Since agents are identical, they can predict each
others actions given the sensory input - Recursive Modeling Method to model the internal
state of another agent in order to predict its
actions - Each predator bases its move on the predicted
move of other predators and vice versa - Since reasoning can recurse indefinitely, it
should be limited in terms of time or recursion
183 Modeling of Other Agents
- If agents know too much, RMM could recurse
indefinitely - For coordination to be possible, some potential
knowledge must be ignored - Schmidhuber (1996) shows that agents can
cooperate without modeling each other - They consider each other as part of the
environment
194 How to Affect Others
- Without communication, agents cannot affect each
other directly - Can affect each other indirectly in several ways
- They can be sensed by other agents
- Change the state of another agent (e.g. by
pushing it) - Affect each other by stigmergy (Becker, 94)
204 How to Affect Others
- Active stigmergy
- an agent alters the environment so as to effect
the sensory input of another agent. E.g. an agent
might leave a marker for other agents to observe - Passive stigmergy
- altering the environment so that the effect of
another agents actions change. If an agent turns
of the main water valve of a building, the effect
of another agent turning on the faucet is altered
214 How to Affect Others
- Example A number of robots in an area with many
pucks scattered around. Robots reactively move
straight (turning at walls) until they are
pushing 3 or more pucks. Then they back up and
turn away - Although robots do not communicate, they can
collect the pucks in a single pile over time - When a robot approaches an existing pile, it adds
the pucks and turns away - A robot approaching an existing pile obliquely
might take a puck away, but over time the desired
result is accomplished
225 Further Learning Opportunities
- An agent might try to learn to take actions that
will not help it directly in the current
situation, but may allow other agents to be more
effective in the future. - In Traditional RL, if an action leads to a reward
by another agent, the acting agent may have no
way of reinforcing that action
232. Heterogeneous, Non-Communicating Agents
- Can be heterogeneous in any of following
- Goals
- Actions
- Domain knowledge
- In the pursuit domain, the prey can be modeled as
an agent - Haynes et. al. have used GA and case-based
reasoning to make predators learn to cooperate in
absence of communication
242. Heterogeneous, Non-Communicating Agents
- They also explore the possibility of evolving
both predators and the prey - Predators use Korfs greedy heuristic
- Though one might think this will result in
repeated improvement of predator and prey with no
convergence, a prey behavior emerges that always
succeeds - Prey simply moves in a constant straight line
- Haynes et. al. conclude Korfs greedy algorithm
relies on random prey movement
252. Heterogeneous, Non-Communicating Agents
- Issues
- Benevolence vs. competitiveness
- Fixed vs. learning agents
- Modeling of other agents
- Resource management
- Social conventions
261 Benevolence vs. Competitiveness
- Can be benevolent even if they have different
goals (if they are willing to help each other) - Selfish agents more effective and biologically
plausible - Agents cooperate because it is in their own best
interest
271 Benevolence vs. Competitiveness
- Prisoners dilemma two burglars are captured.
Each has to choose whether or not to confess and
implicate the other. If neither confess, they
will both serve 1 year. If both confess they will
both serve 10 years. If one confesses and the
other does not, the one who has collaborated will
go free and the other will serve for 20 years
281 Benevolence vs. Competitiveness
291 Benevolence vs. Competitiveness
- Each agent will decide to confess to maximize its
own interest - If both confess, they will get 10 years each
- If they had acted irrationally and kept quiet,
they would each get 1 year - Mor et.al. (1995) show that in repeated
prisoners dilemma cooperative behavior can emerge
301 Benevolence vs. Competitiveness
- In zero-sum games cooperation is not sensible
- If a third dimension was to be added to the
taxonomy, besides the degree of heterogeneity and
communication, it would be benevolence vs.
competitiveness
312 Fixed vs. Learning Agents
- Learning agents desirable in dynamic environments
- Competitive vs. cooperative learning
- Possibility of arms race in competitive
learning. Competing agents continually adapt to
each other in more and more specialized ways,
never stabilizing at a good behavior
322 Fixed vs. Learning Agents
- Credit-assignment problem when performance of an
agent improves, it is not clear whether the
improvement is due to an improvement in the
agents behavior or a negative behavior in the
opponents behavior. Same problem if the
performance of an agent gets worse. - One solution is to fix the one agent while
allowing the other to learn and the to switch.
Encourages more arms race than ever!
333 Modeling of other agents
- Goals, actions and domain knowledge of other
agents may be unknown and need modeling - Without communication, modeling is done strictly
through observation - RMM is good for modeling the states of homogenous
agents - Tambe (1995) takes it one step further, studying
how agents can learn models of teams of agents
344 Resource Management
- Examples
- Network traffic problem several agents send
information through the same network (GA) - Load balancing several users have limited amount
of computing power to share among them (RL) - Braess Paradox (Glance et. al., 1995) adding
more resources to a network but getting worse
performance
355 Social Conventions
- Imagine you are to meet a friend in Paris. You
both arrive on the same day but were unable to
get in touch to set a time and place. Where will
you go and when? - 75 of audience at AAAI-95 Symposium on Active
Learning answered (without prior communication)
they would go to Eiffel tower at noon. - Even without communication agents are able to
coordinate actions
363. Homogenous, Communicating Agents
- Communication can be either broadcast or
point-to-point - Issues
- Distributed sensing
- Distributed vision project (Matsuyama, 1997)
- Trafficopter system (Moukas et. al., 1997)
- Communication content
- What they should communicate? states, or goals?
- Further learning opportunities
- When to communicate?
374. Heterogeneous, Communicating Agents
- Tradeoff between cost and freedom
- Osawa suggests predators should go through 4
phases - Autonomy, communication, negotiation, and control
- When they stop making progress using one
strategy, they should move to the next expensive
strategy - Increasing order of cost (decreasing order of
freedom)
384. Heterogeneous, Communicating Agents
- Important issues
- Understanding each other
- Planning communication acts
- Negotiation
- Commitment/decommitment
- Further learning opportunities
391 Understanding Each Other
- Need some set protocol for communication
- Aspects of the protocol
- Information content KIF (Genesereth, 92)
- Message Format KQML (Finin, 94)
- Coordination COOL (Barbuceanu, 95)
402 Planning Communication Acts
- The theory of communication as action is called
speech acts - Communication acts have precondition and effects
- Effects might be to alter an agents belief about
the state of another agent or agents
413 Negotiation
- Design negotiating MAS based on law of supply and
demand - Contract nets (Smith, 1990)
- Agents have their own goals, are self-interested,
and have limited reasoning resources. They bid to
accept tasks from other agents and can then
either perform the task or subcontract it to
another agent. Agent must pay to contract their
tasks.
423 Negotiation
- MAS controlling air temperature in different
rooms of a building - An agent can set the thermostat to any
temperature. Depending on the actual air
temperature, the agent can buy hot or cold air
from another room that has an excess. At the same
time the agent can sell the excess air at the
current temperature to other rooms. Modeling the
loss of heat in transfer from one room to
another, the agents try to buy and sell at the
best possible prices.
434 Commitment/Decommitment
- Agent agrees to pursue a given goal regardless of
how much it serves its own interest - Commitments can make the systems run more
smoothly by making agents trust each other - Unclear how to make self-interested agents to
commit to others - Belief/desire/intention (BDI) a popular technique
for modeling other agents - Used in OASIS air traffic control
445 Further Learning Opportunities
- Instead of predefining a protocol, allow the
agents to learn for themselves what to
communicate and how to interpret it - Possible result would be more efficient
communication
45Q Learning
- Assess state action pairs (s, a) using a Q value
- Learn the Q value using rewards/feedback
- A reward receives at time t is discounted to
previous state-action pairs (using a discount
factor) - Goal of learning is to find an optimal policy for
selecting actions.
46The Q value
R Reward Pxy The probability of reaching state
y from x by taking action action alpha. Gamma
Discount factor (between 0 and 1). V(y) The
expected total discounted return starting in y
following the policy . Policy a sequence of
actions.
47The Expected Total Discount Return V for a state
is the maximal Q value among all actions that can
be taken at the state (following the rest of the
policy).
48(No Transcript)
49Learning Rule for Q value Alpha learning rate
50 for all
and
1.
and
2.
Do Forever
(a)
the current state
over all
Choose an action
(b)
that maximizes
(c)
in the world. Let the short term reward be
, and the new state be
Carry out action
(d)
(e)
(f)
do
For each state-action pair
(g)
(h)
51Probability for the agent to select action ai
based on Q values T temperature parameter to
determine the randomness of decisions.
52Towards Collaborative and Adversarial LearningA
Case Study in Robotic Soccer
- Peter Stone Manuela Veloso
53Introduction
- Layered learning, to develop complex multi-agent
behaviors from simple ones - Simple multi-agent behavior in Robotic Soccer, to
shoot a moving ball - Passer
- Shooter
- Behavior to be learnt When the shooter should
begin to move (shooting policy)
54Simple Behavior
55Parameters
- Ball speed (fixed vs. variable)
- Ball trajectory (fixed vs. variable)
- Goal location (fixed vs. variable)
- Action quadrant (fixed vs. variable)
56Parameters
57Fixed Ball Motion
- Simple shooting policy begin accelerating when
the balls distance to its projected point of
intersection with the agents path reaches 110
units - 100 success rate if shooter position fixed
- 61 success rate if shooter position variable
- Use Neural network,
- Inputs to NN (coordinate independent)
- Ball distance
- Agent distance
- Heading offset
- Output 1 or 0 (shot successful or not)
- Use random shooting policy for training
58Neural Network
59Results
60Varying Ball Speed
- Add a fourth input to NN, Ball Speed
61Varying Balls Trajectory
- Use the same shooting policy
- Use another NN to determine the direction the
shooter should steer (shooters aiming policy)
62Moving the Goal
- Can think of it as aiming for different parts of
the goal - Change nothing but the shooters knowledge of the
goal location
63Cooperative Learning
- Passing a moving ball
- Passer where to aim the pass,
- Shooter where to position itself
64Cooperative Learning
65Adversarial Learning
66References
- Peter Stone, Manuela Veloso, 2000, Multi-Agent
Systems A Survey from a Machine Learning
Perspective - Ming Tan, 1993, Multi-Agent Reinforcement
Learning Independent vs. Cooperative Agents - Peter Stone, Manuela Veloso, 1998, Toward
Collaborative and Adversarial Learning A Case
Study in Robotic Soccer