Backgammon project

1 / 16
About This Presentation
Title:

Backgammon project

Description:

... agent that learns to play backgammon by playing with itself, using ... The eligibility trace is updated by: The Features. Backgammon Board Definitions ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 17
Provided by: salz151

less

Transcript and Presenter's Notes

Title: Backgammon project


1
(No Transcript)
2
Backgammon project
  • Oren Salzman
  • Guy Levit
  • Instructors
  • Part a Ishai Menashe
  • Part b Yaki Engel

3
Agenda
  • Projects Objectives
  • The Learning Algorithm
  • TDGammon Problematic points
  • The Race Problem
  • Experimental Results
  • Future Development

4
Objectives
  • Developing an agent that learns to play
    backgammon by playing with itself, using
    reinforcement learning techniques
  • Inspired by Tesauros TDGammon version 0.0

5
Learning Algorithm - general
  • Evaluating positions using a neural network
  • Greedy policy
  • When the game ends the agent gets a reward
    according to the result (2, 1, -1, -2)

6
TDGammon Problematic points
  • Non linear neural network
  • Policy is changing during training
  • Environment is changing during training
  • Solutions
  • Linear network
  • Learning in alternations

7
The Race Problem
  • In race, a more algorithmic approach is required
    for choosing a move
  • Three solutions were considered
  • Designing a manual algorithm
  • Using a different Network for races
  • Using the same Network, but each feature is
    dedicated either to a race or a non race
    position.

8
Experiments
  • Various settings of parameters were checked
  • Learning step (0.1, 0.3, 0.8)
  • Lambda (0.1, 0.3, 0.5, 0.7, 0.9)
  • Discount factor (0.95, 0.97, 0.98, 0.999)
  • For each setting the agent played between half a
    million and five million games.
  • All versions were compared to one golden version

9
Experiments results
10
Experiments results
11
Conclusions
  • Learning step of 0.1 yielded the best results
  • High discount factor (0.98, 0.999) were better
    than lower ones.
  • Lambda of 0.1 and 0.9 were inferior to others.
    Among 0.3, 0.5, and 0.7, 0.5 seemed the best.
  • None of the versions outperformed the golden
    version

12
Future development
  • More than 1-ply search
  • Adding features
  • Going back to a non linear network
  • Letting both agents learn simultaneously
  • Connecting the player to the internet
  • Graphical User Interface

13
END
14
Learning Alogrithm - general
  • The agents plays against itself, and get rewards
    (-2, -1, 1, 2) when the game ends.
  • The network weights are updated using the
    following formulas
  • The eligibility trace is updated by

15
The Features
16
Backgammon Board Definitions
Write a Comment
User Comments (0)