Backgammon project

About This Presentation

Title:

Description:

Number of Views:56

Avg rating:3.0/5.0

Slides: 17

Provided by: salz151

Transcript and Presenter's Notes

Title: Backgammon project

1
(No Transcript)
2
Backgammon project

3
Agenda

4
Objectives

Developing an agent that learns to play
backgammon by playing with itself, using
reinforcement learning techniques
Inspired by Tesauros TDGammon version 0.0

5
Learning Algorithm - general

Evaluating positions using a neural network
Greedy policy
When the game ends the agent gets a reward
according to the result (2, 1, -1, -2)

6
TDGammon Problematic points

7
The Race Problem

In race, a more algorithmic approach is required
for choosing a move
Three solutions were considered
Designing a manual algorithm
Using a different Network for races
Using the same Network, but each feature is
dedicated either to a race or a non race
position.

8
Experiments

Various settings of parameters were checked
Learning step (0.1, 0.3, 0.8)
Lambda (0.1, 0.3, 0.5, 0.7, 0.9)
Discount factor (0.95, 0.97, 0.98, 0.999)
For each setting the agent played between half a
million and five million games.
All versions were compared to one golden version

9
Experiments results
10
Experiments results
11
Conclusions

Learning step of 0.1 yielded the best results
High discount factor (0.98, 0.999) were better
than lower ones.
Lambda of 0.1 and 0.9 were inferior to others.
Among 0.3, 0.5, and 0.7, 0.5 seemed the best.
None of the versions outperformed the golden
version

12
Future development

13
END
14
Learning Alogrithm - general

The agents plays against itself, and get rewards
(-2, -1, 1, 2) when the game ends.
The network weights are updated using the
following formulas
The eligibility trace is updated by

15
The Features
16
Backgammon Board Definitions

Write a Comment

User Comments (0)