Title: A Learning Process Architecture for Continuous Strategic Games
1A Learning Process Architecture for Continuous
Strategic Games
- By Jonathan Gibbs
- Mentor Richard Murray
- Co-Mentor Ling Shi
2Artificial Intelligence Overview
- It is the science and engineering of making
intelligent machines, especially intelligent
computer programs. It is related to the similar
task of using computers to understand human
intelligence, but AI does not have to confine
itself to methods that are biologically
observable. (John McCarthy Stanford University) - To obtain a scientific understanding of the
mechanisms underlying thought and intelligent
behavior and their embodiment in machines.
(American Association for Artificial
Intelligence)
3Artificial Intelligence in Games
4The RoboFlag Game
- Up to 6 on 6 capture the flag game
- Limited sensing and communication capability
- Simulator and Hardware testbed
- Each robot operates as a separate entity
Courtesy Richard Murray
5Objectives
- Create a learning process architecture that does
not rely predefined strategies - Implement the architecture so that a simple
strategy can be defeated in a small number of
tries - Make the process cooperative
6Typical Learning Processes
- State Definition
- Reward Scheme
- Mathematical Model
- Strategy Database
- Probabilistic decision maker
- Solve the game as a math problem
- Solve a probabilistic graph
Current State
Game
Database
Next Action
Current State
Game
Model
Next Action
7Challenges with RoboFlag
- RoboFlagis a dynamic game, NOT a board game
- Limited model detail
- Limited database size
- Limited computation time
- Small amount useful information available
- Limited state definition must be efficient and
effective - Limited sharing capability
- Reward system must be aggressive
Current State
Game
Next Action
Current State
Game
Next Action
8State Definition
struct JRobotStatus float radius //radius
from flag float theta //theta from
flag BOOL myside //which side of the
field BOOL enemy_present //Is there an enemy
in front of us BOOL gotflag //Do we have the
flag float prob1 //Probabilities of assigned
actions. float prob2 float prob3 float
prob4 float prob5 float prob6 float
prob7 float prob8
- Contain relevant information
- Easy to interpret
- Small
- Computationally efficient
9Reward Scheme
- Aggressive
- Robust
- Efficient
- enum JReward Tagged -5, Ambig 0,
MovedCloser 2, InZone 10, GotFlag 10
10The Architecture (Good)
RoboFlag
11The Opposition (Evil)
- Man to Man Strategy
- Feasible for one robot to beat
- Spiral Approach
- Change directions
12Results
- Very little movement
- No reaction based on enemy location
- Many inconclusive events
- Flag was never captured
13Changes
- Changed default probabilities
- Replaced 2 boolean variables with enemy location
information - Cosmetic changes to the update function
- Added ability to read an old log file
14Results
- More movement towards the flag
- New probability weights made enemy information
insignificant - Did capture the flag
- Logger failed
15Conclusions
- Architecture did not achieve original objective
but showed potential - No matter how much learning the computer does,
the mechanisms by which it learns must be
continuously tweaked - Trial and Error is easy to implement but is
probably not the best approach - Severe limitations when using mathematics to
model reason
16Future Work
- Increase state definition size until it is
computationally too expensive - Implement a mechanism for cooperation with other
robots - Perfect the architecture so that it can learn
defensive and offensive strategy at the same time
17Acknowledgments
- Richard Murray
- Ling Shi
- Brian Beck and Jing Xiong
- CDS Staff
- MURF 2004