Multi-Agent Exploration - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Multi-Agent Exploration

Description:

Title: Autonomous Inter-Task Transfer in Reinforcement Learning Domains Author: Matthew E. Taylor Last modified by: Matthew Taylor Created Date: 12/2/2006 10:58:49 PM – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 28
Provided by: Matth609
Category:

less

Transcript and Presenter's Notes

Title: Multi-Agent Exploration


1
Multi-Agent Exploration
Matthew E. Taylor
http//teamcore.usc.edu/taylorm/
2
DCOPs Distributed Constraint Optimization
Problems
  • Multiple domains
  • Multi-agent plan coordination
  • Sensor networks
  • Meeting scheduling
  • Traffic light coordination
  • RoboCup soccer
  • Distributed
  • Robust to failure
  • Scalable
  • (In)Complete
  • Quality bounds

3
DCOP Framework
a2 a3 Reward
10
0
0
6
a1 a2 Reward
10
0
0
6

a1
a2
a3
Different levels of coordination possible
4
Motivation DCOP Extension
  • Unrealistic often environment is not fully
    known!
  • Agents need to learn
  • Maximize total reward
  • Real-world applications
  • Mobile ad-hoc networks
  • Sensor networks

5
Problem Statement
  • DCEE
  • Distributed Coordination of Exploration
    Exploitation
  • Address Challenges
  • Local communication
  • Network of (known) interactions
  • Cooperative
  • Unknown rewards
  • Maximize on-line reward
  • Limited time-horizon
  • (Effectively) infinite reward matrix

5
6
Mobile Ad-Hoc Network
  • Rewards signal strength between agents 1,200
  • Goal Maximize signal strength over time
  • Assumes
  • Small Scale fading dominates
  • Topology is fixed

a1
a2
75
95
100
a3
a4
50
7
MGM
  • Review
  • Ideas?

8
Static Estimation SE-Optimistic
Rewards on 1,200
If I move, Id get R200
a1
a2
a3
a4
100
50
75
9
Static EstimationSE-Optimistic
Rewards on 1,200
If I move, Id gain 275
If I move, Id gain 250
If I move, Id gain 100
If I move, Id gain 125
a3
a1
a2
a4
a3
100
50
75
10
Results SimulationMaximize total reward area
under curve
SE-Optimistic
No Movement
11
Balanced Exploration Techniques
  • BE-Backtrack
  • Decision theoretic calculation of exploration
  • Track previous best location Rb
  • Bid to explore for some number of steps (te)

Reward while exploiting P(improve reward)
Reward while exploiting P(NOT improve reward)
Reward while exploring
12
Results SimulationMaximize total reward area
under curve
BE-Backtrack
SE-Optimistic
No Movement
13
Omniscient Algorithm
  • (Artificially) convert DCEE to DCOP
  • Run MGM algorithm Pearce Tambe, 2007
  • Quickly find local optimum
  • Establish upper bound
  • Only works in simulation

13
14
Results SimulationMaximize total reward area
under curve
Omniscient
BE-Backtrack
SE-Optimistic
No Movement
15
Balanced Exploration Techniques
  • BE-Rebid
  • Allows agents to backtrack
  • Re-evaluate every time-step Montemerlo04
  • Allows for on-the-fly reasoning

15
16
Balanced Exploration Techniques
  • BE-Stay
  • Agents unable to backtrack
  • True for some types of robots
  • Dynamic Programming Approach

16
17
Results (simulation)
(10 agents, random graphs with 15-20 links)
18
Results (simulation)
(chain topology, 100 rounds)
19
Results (simulation)
(20 agents, 100 rounds)
20
Also Tested on Physical Robots
Used iRobot Creates (Unfortunately, they dont
vacuum)
21
Sample Robot Results
21
22
k-Optimality
  • Increased coordination
  • Find pairs of agents to change variables
    (location)
  • Higher communication overhead
  • SE-Optimistic SE-Optimistic-2 SE-Optimistic-3
  • SE-Mean SE-Mean-2
  • BE-Rebid BE-Rebid-2
  • BE-Stay BE-Stay-2

22
23
Confirm Previous DCOP Results
If (artificially) provided rewards, k2
outperforms k1
24
Sample coordination results
Full Graph
Chain Graph
24
25
Surprising ResultIncreased Coordination can Hurt
26
Surprising ResultIncreased Coordination can Hurt
27
Regular Graphs
Write a Comment
User Comments (0)
About PowerShow.com