Intelligent Agents: Technology and Applications Multiagent Learning

1 / 65

About This Presentation

Title:

Intelligent Agents: Technology and Applications Multiagent Learning

Description:

Having a global view might lead to sub-optimal results ... Goal of learning is to find an optimal policy for selecting actions. The Q value. R: Reward ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 66

Provided by: ist6

more less

Transcript and Presenter's Notes

Title: Intelligent Agents: Technology and Applications Multiagent Learning

1
Intelligent Agents Technology and
ApplicationsMulti-agent Learning

IST 597B
Spring 2003
John Yen

2
Learning Objectives

How to identify goals for agent projects?
How to design agents?
How to identify risks/obstacles early on?

3
Multi-Agent Learning
4
Multi-Agent Learning

The learned behavior can be used as a basis for
more complex interactive behavior
Enables agent to participate in higher level
collaborative or adversarial learning situations
Learning would not be possible if the agent was
isolated

5
Examples

Examples of single agent learning in a
multi-agent environment
Reinforcement Learning agent which incorporates
information gathered by another agent (Tan, 93)
Agent learning negotiating techniques of another
using Bayesian Learning (Zeng Sycara, 96)
Class of multi-agent learning in which an agent
attempts to model another agent

6
Examples

Training scenario in which a novice agent learns
from a knowledgeable agent (Clouse, 96)
A common thing among all the examples is that the
learning agent is interacting with other agents

7
Predator/Pray (Pursuit) Domain

Introduced by Bends et. al (86)
Four predators and one prey
Goal to capture (or surround) the prey
Not a complex real-world, toy domain that helps
concretize concepts

8
Predator/Pray (Pursuit) Domain
9
Taxonomy of MAS

Taxonomy organized along
the degree of heterogeneity, and
the degree of communication
Homogenous, Non-Communicating Agents
Heterogeneous, Non-Communicating Agents
Homogenous, Communicating Agents
Heterogeneous, Communicating Agents

10
Taxonomy of MAS
11
Taxonomy of MAS
12
1. Homogenous, Non-Communicating Agents

All agents have the same internal structure
Goals
Domain knowledge
Actions
The only difference is their sensory input and
the actions that they take
They are situated differently in the world
Korf (1992) introduces a policy for each
predictor based on an attractive force to the
prey and a repulsive force from other preditors

13
1. Homogenous, Non-Communicating Agents

Korf concludes that explicit cooperation is not
necessary
Haynes Sen show that Korfs heuristic does not
work for certain instantiation of the domain

14
1. Homogenous, Non-Communicating Agents

Issues
Reactive vs. deliberative agents
Local vs. global perspective
Modeling of other agents
How to affect others
Further learning opportunities

15
1 Reactive vs. Deliberative Agents

Reactive agents do not maintain an internal state
and simply retrieve pre-set behaviors
Deliberative agents maintain an internal state
and behave by searching through a space of
behaviors, predicting the action of other agents
and the effect of actions

16
2 Local vs. Global Perspective

How much sensory input should be available to
agents? (observability)
Having a global view might lead to sub-optimal
results
Better performance by agents with less knowledge
Ignorance is Bliss

17
3 Modeling of Other Agents

Since agents are identical, they can predict each
others actions given the sensory input
Recursive Modeling Method to model the internal
state of another agent in order to predict its
actions
Each predator bases its move on the predicted
move of other predators and vice versa
Since reasoning can recurse indefinitely, it
should be limited in terms of time or recursion

18
3 Modeling of Other Agents

If agents know too much, RMM could recurse
indefinitely
For coordination to be possible, some potential
knowledge must be ignored
Schmidhuber (1996) shows that agents can
cooperate without modeling each other
They consider each other as part of the
environment

19
4 How to Affect Others

Without communication, agents cannot affect each
other directly
Can affect each other indirectly in several ways
They can be sensed by other agents
Change the state of another agent (e.g. by
pushing it)
Affect each other by stigmergy (Becker, 94)

20
4 How to Affect Others

Active stigmergy
an agent alters the environment so as to effect
the sensory input of another agent. E.g. an agent
might leave a marker for other agents to observe
Passive stigmergy
altering the environment so that the effect of
another agents actions change. If an agent turns
of the main water valve of a building, the effect
of another agent turning on the faucet is altered

21
4 How to Affect Others

Example A number of robots in an area with many
pucks scattered around. Robots reactively move
straight (turning at walls) until they are
pushing 3 or more pucks. Then they back up and
turn away
Although robots do not communicate, they can
collect the pucks in a single pile over time
When a robot approaches an existing pile, it adds
the pucks and turns away
A robot approaching an existing pile obliquely
might take a puck away, but over time the desired
result is accomplished

22
5 Further Learning Opportunities

An agent might try to learn to take actions that
will not help it directly in the current
situation, but may allow other agents to be more
effective in the future.
In Traditional RL, if an action leads to a reward
by another agent, the acting agent may have no
way of reinforcing that action

23
2. Heterogeneous, Non-Communicating Agents

Can be heterogeneous in any of following
Goals
Actions
Domain knowledge
In the pursuit domain, the prey can be modeled as
an agent
Haynes et. al. have used GA and case-based
reasoning to make predators learn to cooperate in
absence of communication

24
2. Heterogeneous, Non-Communicating Agents

They also explore the possibility of evolving
both predators and the prey
Predators use Korfs greedy heuristic
Though one might think this will result in
repeated improvement of predator and prey with no
convergence, a prey behavior emerges that always
succeeds
Prey simply moves in a constant straight line
Haynes et. al. conclude Korfs greedy algorithm
relies on random prey movement

25
2. Heterogeneous, Non-Communicating Agents

Issues
Benevolence vs. competitiveness
Fixed vs. learning agents
Modeling of other agents
Resource management
Social conventions

26
1 Benevolence vs. Competitiveness

Can be benevolent even if they have different
goals (if they are willing to help each other)
Selfish agents more effective and biologically
plausible
Agents cooperate because it is in their own best
interest

27
1 Benevolence vs. Competitiveness

Prisoners dilemma two burglars are captured.
Each has to choose whether or not to confess and
implicate the other. If neither confess, they
will both serve 1 year. If both confess they will
both serve 10 years. If one confesses and the
other does not, the one who has collaborated will
go free and the other will serve for 20 years

28
1 Benevolence vs. Competitiveness
29
1 Benevolence vs. Competitiveness

Each agent will decide to confess to maximize its
own interest
If both confess, they will get 10 years each
If they had acted irrationally and kept quiet,
they would each get 1 year
Mor et.al. (1995) show that in repeated
prisoners dilemma cooperative behavior can emerge

30
1 Benevolence vs. Competitiveness

In zero-sum games cooperation is not sensible
If a third dimension was to be added to the
taxonomy, besides the degree of heterogeneity and
communication, it would be benevolence vs.
competitiveness

31
2 Fixed vs. Learning Agents

Learning agents desirable in dynamic environments
Competitive vs. cooperative learning
Possibility of arms race in competitive
learning. Competing agents continually adapt to
each other in more and more specialized ways,
never stabilizing at a good behavior

32
2 Fixed vs. Learning Agents

Credit-assignment problem when performance of an
agent improves, it is not clear whether the
improvement is due to an improvement in the
agents behavior or a negative behavior in the
opponents behavior. Same problem if the
performance of an agent gets worse.
One solution is to fix the one agent while
allowing the other to learn and the to switch.
Encourages more arms race than ever!

33
3 Modeling of other agents

Goals, actions and domain knowledge of other
agents may be unknown and need modeling
Without communication, modeling is done strictly
through observation
RMM is good for modeling the states of homogenous
agents
Tambe (1995) takes it one step further, studying
how agents can learn models of teams of agents

34
4 Resource Management

Examples
Network traffic problem several agents send
information through the same network (GA)
Load balancing several users have limited amount
of computing power to share among them (RL)
Braess Paradox (Glance et. al., 1995) adding
more resources to a network but getting worse
performance

35
5 Social Conventions

Imagine you are to meet a friend in Paris. You
both arrive on the same day but were unable to
get in touch to set a time and place. Where will
you go and when?
75 of audience at AAAI-95 Symposium on Active
Learning answered (without prior communication)
they would go to Eiffel tower at noon.
Even without communication agents are able to
coordinate actions

36
3. Homogenous, Communicating Agents

Communication can be either broadcast or
point-to-point
Issues
Distributed sensing
Distributed vision project (Matsuyama, 1997)
Trafficopter system (Moukas et. al., 1997)
Communication content
What they should communicate? states, or goals?
Further learning opportunities
When to communicate?

37
4. Heterogeneous, Communicating Agents

Tradeoff between cost and freedom
Osawa suggests predators should go through 4
phases
Autonomy, communication, negotiation, and control
When they stop making progress using one
strategy, they should move to the next expensive
strategy
Increasing order of cost (decreasing order of
freedom)

38
4. Heterogeneous, Communicating Agents

Important issues
Understanding each other
Planning communication acts
Negotiation
Commitment/decommitment
Further learning opportunities

39
1 Understanding Each Other

Need some set protocol for communication
Aspects of the protocol
Information content KIF (Genesereth, 92)
Message Format KQML (Finin, 94)
Coordination COOL (Barbuceanu, 95)

40
2 Planning Communication Acts

The theory of communication as action is called
speech acts
Communication acts have precondition and effects
Effects might be to alter an agents belief about
the state of another agent or agents

41
3 Negotiation

Design negotiating MAS based on law of supply and
demand
Contract nets (Smith, 1990)
Agents have their own goals, are self-interested,
and have limited reasoning resources. They bid to
accept tasks from other agents and can then
either perform the task or subcontract it to
another agent. Agent must pay to contract their
tasks.

42
3 Negotiation

MAS controlling air temperature in different
rooms of a building
An agent can set the thermostat to any
temperature. Depending on the actual air
temperature, the agent can buy hot or cold air
from another room that has an excess. At the same
time the agent can sell the excess air at the
current temperature to other rooms. Modeling the
loss of heat in transfer from one room to
another, the agents try to buy and sell at the
best possible prices.

43
4 Commitment/Decommitment

Agent agrees to pursue a given goal regardless of
how much it serves its own interest
Commitments can make the systems run more
smoothly by making agents trust each other
Unclear how to make self-interested agents to
commit to others
Belief/desire/intention (BDI) a popular technique
for modeling other agents
Used in OASIS air traffic control

44
5 Further Learning Opportunities

Instead of predefining a protocol, allow the
agents to learn for themselves what to
communicate and how to interpret it
Possible result would be more efficient
communication

45
Q Learning

Assess state action pairs (s, a) using a Q value
Learn the Q value using rewards/feedback
A reward receives at time t is discounted to
previous state-action pairs (using a discount
factor)
Goal of learning is to find an optimal policy for
selecting actions.

46
The Q value
R Reward Pxy The probability of reaching state
y from x by taking action action alpha. Gamma
Discount factor (between 0 and 1). V(y) The
expected total discounted return starting in y
following the policy . Policy a sequence of
actions.
47
The Expected Total Discount Return V for a state
is the maximal Q value among all actions that can
be taken at the state (following the rest of the
policy).
48
(No Transcript)
49
Learning Rule for Q value Alpha learning rate
50

for all
and
1.
and
2.
Do Forever
(a)
the current state
over all
Choose an action
(b)
that maximizes
(c)
in the world. Let the short term reward be
, and the new state be
Carry out action
(d)
(e)
(f)
do
For each state-action pair
(g)
(h)
51
Probability for the agent to select action ai
based on Q values T temperature parameter to
determine the randomness of decisions.
52
Towards Collaborative and Adversarial LearningA
Case Study in Robotic Soccer

Peter Stone Manuela Veloso

53
Introduction

Layered learning, to develop complex multi-agent
behaviors from simple ones
Simple multi-agent behavior in Robotic Soccer, to
shoot a moving ball
Passer
Shooter
Behavior to be learnt When the shooter should
begin to move (shooting policy)

54
Simple Behavior
55
Parameters

Ball speed (fixed vs. variable)
Ball trajectory (fixed vs. variable)
Goal location (fixed vs. variable)
Action quadrant (fixed vs. variable)

56
Parameters
57
Fixed Ball Motion

Simple shooting policy begin accelerating when
the balls distance to its projected point of
intersection with the agents path reaches 110
units
100 success rate if shooter position fixed
61 success rate if shooter position variable
Use Neural network,
Inputs to NN (coordinate independent)
Ball distance
Agent distance
Heading offset
Output 1 or 0 (shot successful or not)
Use random shooting policy for training

58
Neural Network
59
Results
60
Varying Ball Speed

Add a fourth input to NN, Ball Speed

61
Varying Balls Trajectory

Use the same shooting policy
Use another NN to determine the direction the
shooter should steer (shooters aiming policy)

62
Moving the Goal

Can think of it as aiming for different parts of
the goal
Change nothing but the shooters knowledge of the
goal location

63
Cooperative Learning

Passing a moving ball
Passer where to aim the pass,
Shooter where to position itself

64
Cooperative Learning
65
Adversarial Learning
66
References

Peter Stone, Manuela Veloso, 2000, Multi-Agent
Systems A Survey from a Machine Learning
Perspective
Ming Tan, 1993, Multi-Agent Reinforcement
Learning Independent vs. Cooperative Agents
Peter Stone, Manuela Veloso, 1998, Toward
Collaborative and Adversarial Learning A Case
Study in Robotic Soccer

Write a Comment

User Comments (0)