Title: Solving Probabilistic Combinatorial Games
1Solving Probabilistic Combinatorial Games
- Ling Zhao Martin Mueller
- University of Alberta
- September 7, 2005
Paper link http//www.cs.ualberta.ca/zhao/PCG.pd
f
2Motivations
- Ken Chens previous work to maximize winning
chance in Go. - Maximize points Vs. maximize winning probability
- How to solve the abstract game efficiently or
play the abstract game well?
3Motivations
Results (black plays first) 15 (80), -7 (20)
4Combinatorial Games
5Probabilistic Combinatorial Games
- A Terminal node which is expressed as a
probability distribution (dp1, v1, p2, v2,
, pn, vn ) is a PCG. - If A1, A2, An, B1, B2, , Bn are PCGs, then
A1, A2, An B1, B2, , Bn is a PCG. - A sum of PCGs is a PCG.
Left options
Right options
A move in a sum game consists of a move to an
option in exactly one subgame and leaves all
other subgames unchanged.
6Simple PCG (SPCG)
- Each PCG has exactly one left option and one
right option. - Each option leads immediately to a terminal node.
- Each distribution has exactly 2 values with
associate probabilities.
7Problems to Address
- How to solve PCGs efficiently?
- How to play PCGs well if resources are limited or
fast play is required?
8Game Tree Analysis
- Very regular game tree a node at depth k has
exactly n-k children, so n!/(n-k)! nodes in total
at depth k. - Very large number of transpositions
- C(n, k) C(k, k/2) distinct nodes at depth k.
9Terminal Node Evaluation
- Terminal node is a sum of probability
distributions. - Winning probability
10Monte-Carlo Terminal Evaluation (MCTE)
- Use Monte-Carlo methods to randomly collect k
samples from 2n data points in the sum of n
distributions. - Use the average winning percentage of samples
(Pw) to approximate the overall winning
probability (Pw). - Theory from statistics Pw - Pw is a normal
distribution, with mean0 and std dev
lt - Experimental results
11Monte-Carlo Interior Evaluation (MCIE)
- Evaluation of a node is
- approximated by averaging the
- values of terminal nodes reached
- from it through random play.
- Proposed by Abramson in 1990.
- - Using 4x4 tic-tac-toe and 6x6 Othello for
experiments. - Applied to Monte-Carlo Go by Bouzy and
Helmstetter and several other researchers.
12SPCG Solver and Player
- Solver alpha-beta search, transposition tables,
move ordering (MCIE MCTE). - Player alpha-beta search to a certain depth and
use Monte-Carlo interior evaluation for frontier
nodes.
13Experimental Results
- 100 randomly generated games, and each game has
14 subgames. - Value distribution probability from 0 to 1,
value from 1000 to 1000. - AMD 2400MHz CPUs
- 220 cache entries (terminal nodes have higher
priority) - About 8 seconds to solve a game.
14Solver Performance
- Monte-Carlo move ordering dm depth limit for
Monte-Carlo move ordering being used, otherwise
history heuristic is used. - Monte-Carlo interior evaluation nt percentage
of all the current nodes descendant terminal
nodes sampled. - Monte-Carlo terminal evaluation nc number of
data points sampled. - Accurate terminal evaluation occupies 90 of the
overall running time.
15Solver Performance Results
16Monte-Carlo Player
- Test against the perfect player.
- Each game has two rounds each side plays first
once. - Winning probability is the average of the two
rounds. - Parameters search depth and nc.
17Monte-Carlo Player Results
18Error in Move Ordering
- Average probability error
- Move A B
- Actual win prob 0.18 lt 0.19
- Estimate 0.32 gt 0.30
- Win prob lost 0.01
- Average probability error is the average of the
winning probability lost in all move pairs of a
node. - Worst probability error the probability lost due
to the wrong best move chosen.
19Results
20Conclusions
- Efficient exact and heuristic solvers for SPCG.
- Successfully incorporate Monte-Carlo move
ordering into alpha-beta search to SPCG. - A heuristic evaluation techniques based on
Monte-Carlo with performance close to the prefect
player. - Extensive experiments for the two solvers.
21Future Work
- Better algorithms to accurately evaluate terminal
nodes? - Progressive pruning.
- Why does the simple Abramsons Expected Outcome
model perform so well in move ordering?
22New Directions to Apply Monte-Carlo to Computer Go
Monte-Carlo Go
New Direction
23Future Work Transformation
?
?
PCG solver or player