How to Win a Chinese Chess Game

About This Presentation

Title:

How to Win a Chinese Chess Game

Description:

... function while playing on the Free Internet Chess Server (FICS, fics.onenet.net) ... to play a series of games in a self-play learning mode using temporal ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 33

Provided by: wenju4

Learn more at: http://www.sci.brooklyn.cuny.edu

more less

Transcript and Presenter's Notes

Title: How to Win a Chinese Chess Game

1
How to Win a Chinese Chess Game

Reinforcement Learning
Cheng, Wen Ju

2
Set Up
RIVER
3
General
4
Guard
5
Minister
6
Rook
7
Knight
8
Cannon
9
Pawn
10
Training

how long does it to take for a human?
how long does it to take for a computer?
Chess program, KnightCap, used TD to learn its
evaluation function while playing on the Free
Internet Chess Server (FICS, fics.onenet.net),
improved from a 1650 rating to a 2100 rating (the
level of US Master, world champion are rating
around 2900) in just 308 games and 3 days of play.

11
Training

to play a series of games in a self-play learning
mode using temporal difference learning
The goal is to learn some simple strategies
piece values or weights

12
Why Temporal Difference Learning

the average branching factor for the game tree is
usually around 30
the average game lasts around 100 ply
the size of a game tree is 30100

13
Searching

alpha-beta search
3 ply search vs 4 ply search
horizon effect
quiescence cutoff search

14
Horizon Effect
t
t1
t2
t3
15
Evaluation Function

feature
property of the game
feature evaluators
Rook, Knight, Cannon , Minister, Guard, and Pawn
weight
the value of a specific piece type
feature function f
return the current players piece advantage on a
scale from -1 to 1
evaluation function Y
Y ?k1 to 7 wk fk

16
TD(?) and Updating the Weights

wi, t1 wi, t a (Yt1 Yt)S k1 to t l t-k?
wiYk
wi, t a (Yt1 Yt)(fi, t l fi, t-1 l
2fi, t-2 l t-1fi, 1)
0.01
learning rate
how quickly the weights can change
0.01
feedback coefficient
-how much to discount past values

17
Features Table
Array of Weights
18
Example
t5
t6
t7
t-8
19
Final Reward

loser
if is a draw, the final reward is 0
if the board evaluation is negative, then the
final reward is twice the board
if the board evaluation is positive, then the
final reward is -2 times the board evaluation
winner
if is a draw, the final reward is 0
if the board evaluation is negative, then the
final reward is -2 times the board evaluation
if the board evaluation is positive, then the
final reward is twice the board evaluation

20
Final Reward

the weights are normalized by dividing by the
greatest weight
any negative weights are set to zero
the most valuable piece has weight 1

21
Summary of Main Events

Reds turn
Update weights for Red using TD(?)
Red does alpha-beta search.
Red executes the best move found
Blues turn
Update weights for Blue using TD(?)
Blue does alpha-beta search
Blue executes the best move found (go to 1)

22
After the Game Ends

Calculate and assign final reward for losing
player
Calculate and assign final reward for winning
player
Normalize the weights between 0 and 1

23
Results

10 games series
100 games series
learned weights are carried over into the next
series
began with all weights initialized to 1
The goal is to learn the different the piece
values that is close to the default values
defined by H.T. Lau or even better

24
Observed Behavior

the early stages
played pretty randomly
after 20 games
had identified the most valuable piece Rook
after 250 games
played better
protecting the valuable pieces, and trying to
capture a valuable piece

25
Weights
26
Testing

self-play games
Red played using the learned weights after 250
games
Blue used H.T. Laus equivalent of the weights
5 games
red won 3
blue won once
one draw

27
Future Works

8 different types or "categories" of features
Piece Values
Comparative Piece Advantage
Mobility
Board Position
Piece Proximity
Time Value of Pieces
Piece Combinations
Piece Configurations

28
Examples
29
Cannon behind Knight
30
Conclusion

Computer Chinese chess has been studied for more
than twenty years. Recently, due to the
advancement of AI researches and enhancement of
computer hardware in both efficiency and
capacity, some Chinese chess programs with
grand-master level (about 6-dan in Taiwan) have
been successfully developed.
Professor Shun-Chin Hsu of Chang-Jung University
(CJU), who has involved in the development of
computer Chinese chess programs for a long time
of period, points out that the strength of
Chinese chess programs increase 1-dan every three
years. He also predicts that a computer program
will beat the world champion of Chinese chess
before 2012.

31
When and What

2004 World Computer Chinese Chess Championship
Competition Dates
June 25-26, 2004
Prizes
(1) First Place USD 1,500 A gold medal
(2) Second Place USD 900 A silver medal
(3) Third Place USD 600 A bronze medal
(4) Fourth Place USD 300

32
References

C. Szeto. Chinese Chess and Temporal Difference
Learning
J. Baxter. KnightCap A chess program that
learns by combining TD(?) with minimax search
T. Trinh. Temporal Difference Learning in
Chinese Chess
http//chess.ncku.edu.tw/index.html

Write a Comment

User Comments (0)