How to Win a Chinese Chess Game

1 / 32
About This Presentation
Title:

How to Win a Chinese Chess Game

Description:

... function while playing on the Free Internet Chess Server (FICS, fics.onenet.net) ... to play a series of games in a self-play learning mode using temporal ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: How to Win a Chinese Chess Game


1
How to Win a Chinese Chess Game
  • Reinforcement Learning
  • Cheng, Wen Ju

2
Set Up
RIVER
3
General
4
Guard
5
Minister
6
Rook
7
Knight
8
Cannon
9
Pawn
10
Training
  • how long does it to take for a human?
  • how long does it to take for a computer?
  • Chess program, KnightCap, used TD to learn its
    evaluation function while playing on the Free
    Internet Chess Server (FICS, fics.onenet.net),
    improved from a 1650 rating to a 2100 rating (the
    level of US Master, world champion are rating
    around 2900) in just 308 games and 3 days of play.

11
Training
  • to play a series of games in a self-play learning
    mode using temporal difference learning
  • The goal is to learn some simple strategies
  • piece values or weights

12
Why Temporal Difference Learning
  • the average branching factor for the game tree is
    usually around 30
  • the average game lasts around 100 ply
  • the size of a game tree is 30100

13
Searching
  • alpha-beta search
  • 3 ply search vs 4 ply search
  • horizon effect
  • quiescence cutoff search

14
Horizon Effect
t
t1
t2
t3
15
Evaluation Function
  • feature
  • property of the game
  • feature evaluators
  • Rook, Knight, Cannon , Minister, Guard, and Pawn
  • weight
  • the value of a specific piece type
  • feature function f
  • return the current players piece advantage on a
    scale from -1 to 1
  • evaluation function Y
  • Y ?k1 to 7 wk fk

16
TD(?) and Updating the Weights
  • wi, t1 wi, t a (Yt1 Yt)S k1 to t l t-k?
    wiYk
  • wi, t a (Yt1 Yt)(fi, t l fi, t-1 l
    2fi, t-2 l t-1fi, 1)
  • 0.01
  • learning rate
  • how quickly the weights can change
  • 0.01
  • feedback coefficient
  • -how much to discount past values

17
Features Table
Array of Weights
18
Example
t5
t6
t7
t-8
19
Final Reward
  • loser
  • if is a draw, the final reward is 0
  • if the board evaluation is negative, then the
    final reward is twice the board
  • if the board evaluation is positive, then the
    final reward is -2 times the board evaluation
  • winner
  • if is a draw, the final reward is 0
  • if the board evaluation is negative, then the
    final reward is -2 times the board evaluation
  • if the board evaluation is positive, then the
    final reward is twice the board evaluation

20
Final Reward
  • the weights are normalized by dividing by the
    greatest weight
  • any negative weights are set to zero
  • the most valuable piece has weight 1

21
Summary of Main Events
  • Reds turn
  • Update weights for Red using TD(?)
  • Red does alpha-beta search.
  • Red executes the best move found
  • Blues turn
  • Update weights for Blue using TD(?)
  • Blue does alpha-beta search
  • Blue executes the best move found (go to 1)

22
After the Game Ends
  • Calculate and assign final reward for losing
    player
  • Calculate and assign final reward for winning
    player
  • Normalize the weights between 0 and 1

23
Results
  • 10 games series
  • 100 games series
  • learned weights are carried over into the next
    series
  • began with all weights initialized to 1
  • The goal is to learn the different the piece
    values that is close to the default values
    defined by H.T. Lau or even better

24
Observed Behavior
  • the early stages
  • played pretty randomly
  • after 20 games
  • had identified the most valuable piece Rook
  • after 250 games
  • played better
  • protecting the valuable pieces, and trying to
    capture a valuable piece

25
Weights
26
Testing
  • self-play games
  • Red played using the learned weights after 250
    games
  • Blue used H.T. Laus equivalent of the weights
  • 5 games
  • red won 3
  • blue won once
  • one draw

27
Future Works
  • 8 different types or "categories" of features
  • Piece Values
  • Comparative Piece Advantage
  • Mobility
  • Board Position
  • Piece Proximity
  • Time Value of Pieces
  • Piece Combinations
  • Piece Configurations

28
Examples
29
Cannon behind Knight
30
Conclusion
  • Computer Chinese chess has been studied for more
    than twenty years. Recently, due to the
    advancement of AI researches and enhancement of
    computer hardware in both efficiency and
    capacity, some Chinese chess programs with
    grand-master level (about 6-dan in Taiwan) have
    been successfully developed.
  • Professor Shun-Chin Hsu of Chang-Jung University
    (CJU), who has involved in the development of
    computer Chinese chess programs for a long time
    of period, points out that the strength of
    Chinese chess programs increase 1-dan every three
    years. He also predicts that a computer program
    will beat the world champion of Chinese chess
    before 2012.

31
When and What
  • 2004 World Computer Chinese Chess Championship
  • Competition Dates
  •  June 25-26, 2004
  • Prizes
  • (1) First Place USD 1,500 A gold medal
  • (2) Second Place USD 900 A silver medal
  • (3) Third Place USD 600 A bronze medal
  • (4) Fourth Place USD 300

32
References
  • C. Szeto. Chinese Chess and Temporal Difference
    Learning
  • J. Baxter. KnightCap A chess program that
    learns by combining TD(?) with minimax search
  • T. Trinh. Temporal Difference Learning in
    Chinese Chess
  • http//chess.ncku.edu.tw/index.html
Write a Comment
User Comments (0)