Title: Evolving
1Evolving Adapting opponents in PACMAN
- PRESENTED BY
- DEBARGHYA MAJUMDAR
- SOMDAS BANDYOPADHYAY
- KAUSTAV DEY BISWAS
2Topics to be discussed
- PacMan
- The Game
- Areas of Intelligence
- Classical Ghost Algorithm
- Basic Neuro-Evolution
- The Model
- The ANN
- Offline Learning
- Online Learning
- Neuro-Evolution of Augmenting Topologies
- The Model
- The ANN
- Training Algorithm
- Experiments
- Discussions
- Conclusions
- References
3PacMan The Game
- Classical PacMan released by Namco (Japan) in
1980 - Single player predator/prey game
- Player plays as the PacMan
- Navigates through a 2D maze
- Eats pellets
- Avoids Ghosts
- Ghosts played by the computer
- Usually 4 in number
- Chase the PacMan and try to catch it
- Winning / Losing
- PacMan wins if it has eaten all pellets or
survived for 180 sec - PacMan loses if caught by a Ghost
- Special Power-pills
- Allows PacMan to eat the Ghosts for a short
period of time - Eaten Ghosts respawns
4PacMan Areas of Intelligence
- Programming the PacMan to replace human player
- Heavily researched field
- Gives insight into AI, but doesn't add value to
the game engine - Making the Ghosts more intelligent
- Thrives to
- Make the Ghosts more efficient killers
- Incorporate teamwork
- Minimize score of the PacMan
- Make the Ghosts learn adapt
- Make the game more 'interesting
- Adds value to the game
- Scarcely researched field
- Currently our topic of interest
5PacMan The Classical Ghost algorithm
- Two modes of play
- Attack Chase down Pacman
- Scatter Break away when Pacman has a power-pill
- Decide target cell on reaching intersection
- Attack target
- Red Ghost Current position of Pacman
- Pink Ghost Cell 4 positions ahead of Pacman
- Blue Ghost Cell 2 positions ahead of Pacman
- Orange Ghost Target when far away, retire to
corner when nearby - Scatter target
- Pseudo-random behaviour
- Minimize distance from target cell
6PacMan Making the Ghosts Intelligent
- We will discuss two approaches for programming
intelligence, learnability adaptability into
the Ghosts - Local optimization using basic Neuro-Evolution
- Global optimization using Neuro-Evolution of
Augmenting Topologies (NEAT)
7Basic Neuro-Evolution The Model
- Game field modeled as a grid. Each square can
have any of - Wall block
- Pellet
- Pacman
- Ghost(s)
- Power-pills not incorporated to keep things
simple - Ghosts have only local info about the grid
- Game proceeds in cycles of 2 steps
- Gather information about environment
- Make a move by processing the information
- Each Ghost controlled independently by a
dedicated ANN - ANNs trained by Evolutionary Algorithm (GA)
- Offline (Train Deploy)
- Online (Learn as you go)
- Weight-adjusting is the only means of training
the ANNs
8Basic Neuro-Evolution The ANN
- A 4-5-4 Feed-Forward neural controller,
comprising of sigmoid neurons is employed to
manage the Ghosts motion. - Inputs
- Using their sensors, Ghosts inspect the
environment from their own point of view and
decide their next action. - Each Ghost receives input information from its
environment expressed in the neural networks
input array of dimension 4 - Outputs
- Four scores corresponding to each of the
direction - The direction with the maximum score is selected
9Basic Neuro-Evolution The ANN Inputs
- The Input array Consists of
- ?x,P xg xp
- ?y,P xy xp
- ?x,C xg xC
- ?y,C xg xC
Picture courtesy Ref.1
where (xg,yg), (xp,yp) and (xc,yc) are the
Cartesian co-ordinates of the current Ghosts,
Pacmans and closest Ghosts current position
respectively.
10Basic Neuro-Evolution The Network
UP
DOWN
LEFT
RIGHT
?y,P
?x,P
?x,C
?y,C
11Basic Neuro-Evolution Offline Learning
- Off-line evolutionary learning approach is used
in order to produce some good (i.e. in terms of
performance) initial behaviors for the online
learning mechanism. - The neural networks that determine the behavior
of the Ghosts are themselves evolved. - The evolving process is limited to the connection
weights of the neural network. - The evolutionary procedure is based on genetic
algorithm - Each Ghost has a genome that encodes the
connection weights of its neural network. - A population of neural networks (Ghosts) is
initialized randomly with initial uniformly
distributed random connection weights that lie
within -5, 5.
12Basic Neuro-Evolution Offline Learning
Algorithm
- At each generation
- Step 1
- Every Ghost in the population is cloned 4 times.
- These 4 clones are placed in the PacMan game ?eld
and play N games, each one for an evaluation
period of t simulation steps. - The outcome of these games is to ascertain the
time taken to kill pacman tk for each game.
13Basic Neuro-Evolution Offline Learning
Algorithm (contd.)
- Step 2
- Each Ghost is evaluated for each game and its
fitness value is given Ef over N games where f
is given by - - By the use of fitness function f , we promote
Pacman killing behaviors capable of achieving
high performance value.
14Basic Neuro-Evolution Offline Learning
Algorithm (contd.)
- Step 3
- A pure elitism selection method is used where
- only the 10 best ?t solutions determine the
members of the intermediate population and,
therefore, are able to breed. - Step 4
- Each parent clones an equal number of o?spring in
order to replace the non-picked solutions from
elitism.
15Basic Neuro-Evolution Offline Learning
Algorithm (contd.)
- Step 5
- Mutation occurs in each gene (connection weight)
of each o?springs genome with a small
probability. A uniform random distribution is
used to de?ne the mutated value of the connection
weight. - The algorithm is terminated when a predetermined
- number of generations g is achieved (e.g. g
1000) and the best-?t Ghosts connections weights
are saved.
16Basic Neuro-Evolution Offline Learning
- Pros
- Computationally efficient Minimum in-game
computations - Can be tailor-made for specific maps
- Cons
- Cannot adapt to changing maps
- May overfit to training player's characteristics
17Basic Neuro-Evolution Online Learning
- This learning approach is based on the idea of
Ghosts that learn while they are playing against
Pacman. - In other words, Ghosts that are reactive to any
players behavior and learn from its strategy
instead of being predictable and uninteresting - Furthermore, this approachs additional objective
is to keep the games interest at high levels as
long as it is being played.
18Basic Neuro-Evolution Online Learning (contd)
- Beginning from any initial group of homogeneous
o?line trained (OLT) Ghosts, the OLL mechanism
attempts to transform them into a group of
heterogeneous Ghosts that are interesting to play
against. - An OLT Ghost is cloned 4 times and its clones are
placed in the Pacman game ?eld to play against a
selected Pacman type of player.
19Basic Neuro-Evolution Online Learning Algorithm
- At each generation
- Step 1
- Each ghost is evaluated every t simulation steps
while the game is played. The fitness function is
- - f d1 dt
- where ,
- di Distance between ghost and pacman at the ith
simulation step. - This fitness function promotes ghosts that move
towards pacman within an evaluation period of t
seconds.
20Basic Neuro-Evolution Online Learning
Algorithm (contd.)
- Step 2
- A pure elitism selection method is used where
only the ?ttest solution is able to breed. The
best-?t parent clones an o?spring. - Step 3
- Mutation occurs in each gene (connection weight)
of each o?springs genome with a probability that
is inversely proportional to the entropy of the
group of ghosts.
21Basic Neuro-Evolution Online Learning
Algorithm (contd.)
- Step 4
- The cloned o?spring is evaluated brie?y o?line
mode, that is, by replacing the worst-?t member
of the population and playing an o?line (i.e. no
visualization of the actions) short game of t
simulation steps. - The ?tness values of the mutated o?spring and the
worst-?t Ghost are compared and the better one is
kept for the next generation.
22Basic Neuro-Evolution Online Learning
- Pros
- Adapts easily to varying maps, player mindsets
- Hugely generalized
- Cons
- Slow due to intensive computations during
run-time - May take some time to re-train on new maps
23Neuro-Evolution of Augmenting Topologies The
Model
- Takes into account team work and strategic
formations - Operates on global data
- Has three modes of operation
- Chasing (pursuing Pacman)
- Fleeing (evading Pacman when Pacman has consumed
a power-pill) - Returning (returning back to the hideout to be
restored) - Optimizes the team of Ghosts as a whole
- Each Ghost controlled independently by a
dedicated ANN - ANNs trained by Evolutionary Algorithm (GA)
- ANN training affects
- Weights of the edges
- Interconnection of the perceptrons (ANN topology)
- Ghosts trained in real-time
- Training proceeds parallely with the game
- Adapts the Ghosts over short time slices
- Ghosts classified according to their distance
from Pacman - Each distance class has a dedicated ANN
population which evolves genetically - Multiple populations aid in heterogeneous
strategy development
24Neuro-Evolution of Augmenting Topologies The ANN
- Each ANN represents a Ghost
- Input
- Current status of Ghost
- Current status of closest Ghost
- Current status of closest Ghost to Pacman
- Distances to objects of interest (Pacman, Ghost,
Powerpill, Pellet, Intersection, etc) - Distances between Pacman objects of interest
(Ghost, Powerpill, Pellet, Intersection, etc) - Output
- Score of a cell
- Applied 4 times, one for each adjacent cell
- Cell with maximum score selected for making move
- Connections
- Minimally connected
- Evolves through NEAT
25Neuro-Evolution of Augmenting Topologies
Training algorithm
- Initialize
- A number of random neural network populations
generated, each corresponding to ghosts
classified according to their distance to Pacman. - Game divided into time slices of a small number
of moves. Gn represents the state of the game
beginning at time slice n.
26Neuro-Evolution of Augmenting Topologies
Training algorithm (contd.)
- Algorithm
- Mark a ghost for learning during current time
slice, beginning at Gn. - Look ahead (based on the models of the other
ghosts and Pacman) and store the game state as
expected to be like at the beginning of the next
slice through simulated play (eGn1 ). This will
be the starting state for the NEAT simulation
runs. - The fitness of a ghost strategy is determined by
evaluating the game state that we expect to reach
when the strategy is used in place of the marked
ghost (eGn2 ). This evaluation is an evaluation
of the end state. Various fitness schemes are
considered. - In parallel to the running of the actual game,
run NEAT until the actual game reaches Gn1. - The best individual from the simulations is
substituted into the game, replacing the marked
ghost. - Repeat the process for the next ghost in turn.
27Neuro-Evolution of Augmenting Topologies
Training algorithm Pictorial Representation
Picture courtesy Ref.2
28Neuro-Evolution of Augmenting Topologies
Experiment 1 Chasing and Evading Pacman
Rank 1 Pacmans number of lives Rank 2
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Table courtesy Ref.2
- Improvement over Classical AI
- Ghosts tend to form clusters, reducing
effectiveness
29Neuro-Evolution of Augmenting Topologies
Experiment 2 Remaining Dispersed
Rank 1 Pacmans number of lives Rank 2 Rank
3
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
Table courtesy Ref.2
- Inefficient as compared to Experiment 1
- Ghosts tend to oscillate in dispersed locations
30Neuro-Evolution of Augmenting Topologies
Experiment 3 Protection Behaviour
Rank 1 Pacmans number of lives Rank 2
count(Ghostr) Rank 3 count(Ghostf) Rank 4
Rank 5
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
Experiment 3 4271.6 1.64
Table courtesy Ref.2
- Teamwork improved
- Ghosts committing suicide!
31Neuro-Evolution of Augmenting Topologies
Experiment 4 Ambushing Pacman
Rank 1 Pacmans number of lives Rank 2
Intersections controlled by Pacman Rank 3
Rank 4 Rank 5 Pacmans score
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
Experiment 3 4271.6 1.64
Experiment 4 4494.4 1.96
Table courtesy Ref.2
- Kill rate significantly increased
32Neuro-Evolution of Augmenting Topologies
Discussions
- Uses high-level global data about the state of
the game - Reduces computational lag by looking ahead and
employing parallelism - Encourages system to learn short-term strategies,
rather than generalized long-term ones - Chalks down basic fitness strategies, opening up
a horizon for many more - Demonstrates complex team behaviours
33Conclusion
- PacMan serves as a good test-bed for programming
intelligent agents - Generalized strategies applicable to a vast class
of predator/prey game - Programming Ghosts gives good insight into
efficient team strategies
34References
- 1 G. N. Yannakakis, and J. Hallam, "Evolving
Opponents for Interesting Interactive Computer
Games,'' in Proceedings of the 8th International
Conference on the Simulation of Adaptive Behavior
(SAB'04) From Animals to Animats 8, pp. 499-508,
Los Angeles, CA, USA, July 13-17, 2004. The MIT
Press. - 2 Mark Wittkamp, Luigi Barone, Philip Hingston,
Using NEAT for Continuous Adaptation and Teamwork
Formation in Pacman , 2008 IEEE Symposium on
computational Intelligence and Games (CIG 08) - 3 Kenneth O. Stanley, Risto Miikkulainen(2002),
Evolving Neural Networks through Augmenting
Topologies, The MIT Press Journals