Title: Nesterovs excessive gap technique and poker
1Nesterovs excessive gap technique and poker
- Andrew Gilpin
- CMU Theory Lunch
- Feb 28, 2007
- Joint work with
- Samid Hoda, Javier Peña, Troels Sørensen, Tuomas
Sandholm
2Outline
- Two-person zero-sum sequential games
- First-order methods for convex optimization
- Nesterovs excessive gap technique (EGT)
- EGT for sequential games
- Heuristics for EGT
- Application to Texas Holdem poker
3We want to solve
If Q1 and Q2 are simplices, this is the Nash
equilibrium problem for two-person zero-sum
matrix games
If Q1 and Q2 are complexes, this is the Nash
equilibrium problem for two-person zero-sum
sequential games
4Whats a complex?
Its just like a simplex, but more complex.
Each players complex encodes her set
of realization plans in the game In particular,
player 1s complex is where E and e depend on
the game
5A B C D E F G
H
6Recall our problem
where Q1 and Q2 are complexes
Since Q1 and Q2 have a linear description, this
problem can be solved as an LP. However, current
LP solution methods do not scale
7(Un)scalability of LP solvers
- Rhode Island poker Shi Littman 01
- LP has 91 million rows and columns
- Applying GameShrink automated abstraction
algorithm yields an LP with only 1.2 million rows
and columns, and 50 million non-zeros G.
Sandholm, 06a - Solution requires 25 GB RAM and over a week of
CPU time - Texas Holdem poker
- 1018 nodes in game tree
- Lossy abstractions need to be performed
- Limitations of current solver technology primary
limitation to achieving expert-level strategies
G. Sandholm 06b, 07a - Instead of standard LP solvers, what about a
first-order method?
8Convex optimization
Suppose we want to solve
Note that this formulation captures ALL convex
optimization problems (can model feasible space
using an indicator function)
where f is convex.
For general f, convergence requires O(1/e2)
iterations (e.g., for subgradient methods) For
smooth, strongly convex f with Lipschitz- continuo
us gradient, can be done in O(1/e½) iterations
Analysis based on black-box oracle access model.
Can we do better by looking inside the box?
9Strong convexity
- A function is strongly
convex if there exists such that - for all and all
- is the strong convexity parameter of d
10Recall our problem
where Q1 and Q2 are complexes
Equivalently where and
11,
,
Unfortunately, F and f are non-smooth Fortunately
, they have a special structure
Let d1,d2 be smooth and strongly convex on
Q1,Q2 These are called prox-functions Now let µ
gt 0 and consider These are well-defined
smooth functions
12Excessive gap condition
From weak duality, we have that f(y) F(x) The
excessive gap condition requires that
fµ(y) Fµ(x) (EGC) The
algorithm maintains (EGC), and gradually
decreases µ As µ decreases, the smoothed
functions approach the non-smooth functions, and
thus iterates satisfying (EGC) converge to
optimal solutions
13Nesterovs main theorem
- Theorem Nesterov 05
- There exists an algorithm such that after at most
N iterations, the iterates have duality gap at
most - Furthermore, each iteration only requires solving
three problems of the form - and performing three matrix-vector product
operations on A.
14Nice prox functions
- A prox function d for Q is nice if it is
- Strongly convex continuous everywhere in Q, and
differentiable in the relative interior of Q - The min of d over Q is 0
- The following maps are easily computable
15Nice simplex prox function 1 Entropy
16Nice simplex prox function 2 Euclidean
sargmax can be computed in O(n log n) time
17From the simplex to the complex
- Theorem Hoda, G., Peña 06
- A nice prox function can be constructed for
- the complex via a recursive application of
- any nice prox function for the simplex
18Prox function example
Let be any nice simplex prox function. The
prox function for this matrix is
19Solving
20(similar to b(i-vii))
21Heuristics G., Hoda, Peña, Sandholm 07
- Heuristic 1 Aggressive µ reduction
- The µ given in the previous algorithm is a
conservative choice guaranteeing convergence - In practice, we can do much better by
aggressively pushing µ, while checking that the
excessive gap condition is satisfied - Heuristic 2 Balanced µ reduction
- To prevent one µ from dominating the other, we
also perform periodic adjustments to keep them
within a small factor of one another
22Matrix-vector multiplication in pokerG., Hoda,
Peña, Sandholm 07
- The main time and space bottleneck of the
algorithm is the matrix-vector product on A - Instead of storing the entire matrix, we can
represent it as a composition of Kronecker
products - We can also effectively take advantage of
parallelization in the matrix-vector product to
achieve near-linear speedup
23Memory usage comparison
24Poker
- Poker is a recognized challenge problem in AI
because (among other reasons) - the other players cards are hidden
- bluffing and other deceptive strategies are
needed in a good player - there is uncertainty about future events.
- Texas Holdem most popular variant of poker
- Two-player game tree has 1018 nodes
25Potential-aware automated abstractionG.,
Sandholm, Sørensen 07
- Most prior automated abstraction algorithms
employ a myopic expected value computation as a
similarity metric - This ignores hands like flush draws where
although the probability of winning is small, the
payoff could be high - Our newest algorithm considers higher-dimensional
spaces consisting of histograms over abstracted
classes of states from later stages of the game - This enables our bottom-up abstraction algorithm
to automatically take into account positive and
negative potential
26Solving the four-round model
- Computed abstraction with
- 20 first-round buckets
- 800 second-round buckets
- 4800 third-round buckets
- 28800 fourth-round buckets
- Algorithm using 30 GB RAM
- Simply representing as an LP requires 32 TB
- Outputs new, improved solution every 2.5 days
27G., Sandholm, Sørensen 07
28G., Sandholm, Sørensen 07
29G., Sandholm, Sørensen 07
30Future research
- Customizing second-order (e.g. interior-point
methods) for the equilibrium problem - Additional heuristics for improving practical
performance of EGT algorithm - Techniques for finding an optimal solution from
an e-solution
31Thank you ?