Title: Backgammon project
1(No Transcript)
2Backgammon project
- Oren Salzman
- Guy Levit
- Instructors
- Part a Ishai Menashe
- Part b Yaki Engel
3Agenda
- Projects Objectives
- The Learning Algorithm
- TDGammon Problematic points
- The Race Problem
- Experimental Results
- Future Development
4Objectives
- Developing an agent that learns to play
backgammon by playing with itself, using
reinforcement learning techniques - Inspired by Tesauros TDGammon version 0.0
5Learning Algorithm - general
- Evaluating positions using a neural network
- Greedy policy
- When the game ends the agent gets a reward
according to the result (2, 1, -1, -2)
6TDGammon Problematic points
- Non linear neural network
- Policy is changing during training
- Environment is changing during training
- Solutions
- Linear network
- Learning in alternations
7The Race Problem
- In race, a more algorithmic approach is required
for choosing a move - Three solutions were considered
- Designing a manual algorithm
- Using a different Network for races
- Using the same Network, but each feature is
dedicated either to a race or a non race
position.
8Experiments
- Various settings of parameters were checked
- Learning step (0.1, 0.3, 0.8)
- Lambda (0.1, 0.3, 0.5, 0.7, 0.9)
- Discount factor (0.95, 0.97, 0.98, 0.999)
- For each setting the agent played between half a
million and five million games. - All versions were compared to one golden version
9Experiments results
10Experiments results
11Conclusions
- Learning step of 0.1 yielded the best results
- High discount factor (0.98, 0.999) were better
than lower ones. - Lambda of 0.1 and 0.9 were inferior to others.
Among 0.3, 0.5, and 0.7, 0.5 seemed the best. - None of the versions outperformed the golden
version
12Future development
- More than 1-ply search
- Adding features
- Going back to a non linear network
- Letting both agents learn simultaneously
- Connecting the player to the internet
- Graphical User Interface
13END
14Learning Alogrithm - general
- The agents plays against itself, and get rewards
(-2, -1, 1, 2) when the game ends. - The network weights are updated using the
following formulas - The eligibility trace is updated by
-
15The Features
16Backgammon Board Definitions