An effective and easy-to-use library for solving Extensive-form games.
LiteEFG is an efficient Python
library for solving Extensive-form games (EFGs) based on computation graph. By defining the local update-rule for an information set (infoset) with Python
, LiteEFG
will automatically distribute the computation graph to all infosets of the game. Compared to implementing purely by Python
, LiteEFG
can provide up to 100x aaceleration.
To demonstrate the efficiency of LiteEFG
, the computation time of running Counterfactual Regret Minimization (CFR) [1] for
The results are tested on Ubuntu with CPU: 13th Gen Intel(R) Core(TM) i7-13700K 3.40 GHz
.
- Kuhn Poker:
$\leq 0.5s$ - Leduc Poker:
$10s$ - Goofspiel:
$10s$ - 4-sided Liar's Dice:
$45s$ - Liar's Dice:
$2$ hours
pip install LiteEFG
or
git clone --recursive https://github.com/liumy2010/LiteEFG.git
cd LiteEFG
pip install .
- open_spiel >= 1.2
- A compiler with C++17 support
- Pip 10+ or CMake >= 3.4 (or 3.14+ on Windows, which was the first version to support VS 2019)
- Ninja or Pip 10+
To use LiteEFG, the user first need to specify the computation graph of the algorithm. The computation graph has 2 parts: backward, and forward.
- Backward Graph. LiteEFG will enumerate the infosets in the reversed breadth-first order of infosets, and the computation defined in backward graph will be executed. The variables of backward graph are stored locally in each infoset
- Forward Graph. LiteEFG will enumerate the infosets in the breadth-first order of infosets, and the computation defined in forward graph will be executed. The variables of forward graph are stored locally in each infoset
To define graph nodes for backward / forward graph, user may use the following two class
LiteEFG.backward(is_static=False, color=0)
: Within thewith
statement of this class, all graph nodes in the following will be defined in the backward graphis_static
indicates whether the graph node is static. If it is, then it will only be updated at initialization, otherwise update every time when callingEnvironment.update
color
: Indicates the color of the nodes defined below. Then, during each update, the user may specify the node color that will be updated
LiteEFG.forward(is_static=False, color=0)
: Within thewith
statement of this class, all graph nodes in the following will be defined in the forward graph
Here's an example of computing the Fibonacci sequence.
with LiteEFG.backward(is_static=True):
a = LiteEFG.const(size=1, val=1.0)
b = LiteEFG.const(size=1, val=1.0)
with LiteEFG.backward(is_static=False):
c = a + b
a.inplace(b.copy()) # change the value of a to b
b.inplace(c.copy()) # change the value of b to c
# The details of inplace and copy can be found in
# the following
- Initialize. The static computation will be executed once when initialize the environment. The order is still
backward->forward
. Initialization will be automatically done when callingenv.set_graph(graph)
- Update. At every iteration when calling
environment.update
, the backward graph will be updated first. Then, the forward graph will be updated
LiteEFG prepare several default variables for easier implementation of algorithms. Note that all variables are stored locally for each infoset.
-
utility
: Updated every iteration. It stores the expected utility of each action of an infoset. For a two-player zero-sum game$\max_{\mathbf{x}}\min_{\mathbf{y}}\mathbf{x^\top A y}$ , where$\mathbf{A}$ is the utility matrix,utility
=$(\mathbf{Ay}^{tr})_{(I,a)}$ for max-player$\mathbf{x}$ .$(\mathbf{x}^{tr}, \mathbf{y}^{tr})$ is the strategy fed intoenv.update
-
action_set_size
: The size of action set of an infoset -
reach_prob
: The reach probabiltiy of player$p$ of an infoset, where$p$ is the player to take actions at that infoset -
opponent_reach_prob
: The reach probability of players other than$p$ of an infoset -
subtree_size
: A vector with sizeaction_set_size
, which stores the number of terminal (infoset, action) pairs in the subtree when taking an action
-
LiteEFG.set_seed
: Set the seed for the pseudo-random number generator -
GraphNode.inplace(expression)
: The value of the expression will be stored to the original address of GraphNode -
GraphNode.sum() / LiteEFG.sum(x)
: Return sum of all elements ofGraphNode
-
GraphNode.mean() / LiteEFG.mean(x)
: Return mean of all elements ofGraphNode
-
GraphNode.max() / LiteEFG.max(x)
: Return max of all elements ofGraphNode
-
GraphNode.min() / LiteEFG.min(x)
: Return min of all elements ofGraphNode
-
GraphNode.copy()
: Return the copy ofGraphNode
-
GraphNode.exp() / LiteEFG.exp(x)
: Return exponential of all elements ofGraphNode
. For every index$i$ , let$x_i\to e^{x_i}$ -
GraphNode.log() / LiteEFG.log(x)
: Return log of all elements ofGraphNode
. For every index$i$ , let$x_i\to \ln(x_i)$ -
GraphNode.argmax() / LiteEFG.argmax(x)
: Return argmax ofGraphNode
-
GraphNode.argmin() / LiteEFG.argmin(x)
: Return argmin ofGraphNode
-
GraphNode.euclidean() / LiteEFG.euclidean(x)
: Return$\frac{1}{2}\sum_i x_i^2$ , where$\mathbf{x}$ is the vector stored atGraphNode
GraphNode.negative_entropy(shifted=False) / LiteEFG.negative_entropy(x, shifted=False)
: Return$\sum_i x_i\ln(x_i)+\mathbf{1}(\text{shifted})\log N$ , where$\mathbf{x}\in\mathbf{R}^N$ is the vector stored atGraphNode
-
GraphNode.normalize(p_norm, ignore_negative=False) / LiteEFG.normalize(x, p_norm, ignore_negative=False)
: Normalize$\mathbf{x}$ by its$p$ -norm. Specifically,$x_i\to \frac{x_i}{(\sum_j |x_j|^p)^{1/p}}$ . Whenignore_negative=True
,$x_i\to \frac{[x_i]^+}{(\sum_j ([x_j]^+)^p)^{1/p}}$ , where$[x_i]^+=x_i\cdot \mathbf{1}(x_i\geq 0)$ . -
GraphNode.dot(y) / LiteEFG.dot(x, y)
: Return$\langle \mathbf{x}, \mathbf{y}\rangle$ -
GraphNode.maximum(y) / LiteEFG.maximum(x, y)
: Return$\mathbf{z}$ with$z_i=\max(x_i,y_i)$ . Also supportsGraphNode.maximum(scalar) / LiteEFG.maximum(x, scalar)
-
GraphNode.minimum(y) / LiteEFG.minimum(x, y)
: Return$\mathbf{z}$ with$z_i=\min(x_i,y_i)$ . Also supportsGraphNode.minimum(scalar) / LiteEFG.minimum(x, scalar)
-
x**y
:y
should be a scalar and it will return$(x_1^y, x_2^y, ..., x_n^y)$ -
LiteEFG.cat(nodes_list: list)
: Concatenate all nodes in the nodes_list
-
GraphNode.project(distance_name : ["L2", "KL"], gamma=0.0, mu=uniform_distribution)
: Project$\mathbf{x}\in\mathbf{R}^N$ to the perturbed simplex$\Delta_N:=\{\mathbf{v}\succeq \gamma\mathbf{\mu}\colon \sum_i v_i=1\}$ , with respect to either Euclidean distance orKL
-Divergence. By default,$\gamma=0.0$ and$\mathbf{\mu}=\frac{1}{N}\mathbf{1}$ -
LiteEFG.project(x, distance_name, gamma=0.0, mu=uniform_distribution)
: Returnx.project(distance_name, gamma, mu)
-
LiteEFG.aggregate(x, aggregator_name : ["sum", "mean", "max", "min"], object_name : ["children", "parent"]="children", player : ["self", "opponents"]="self", padding=0.0)
: Aggregate variables from eitherobject_name
infoset. By default,object_name="children"
-
object_name="children"
: For each actiona
of infosetI
, the vector stored inGraphNode
x
of subsequent infosets under(I,a)
will be concatenated into one vector. Then, the vector will be aggregated viaaggregator_name
. At last, the resulted scalars of each(I,a)
pair will be concatenated again and returned. Therefore, the returnedGraphNode
stores a vector of sizeaction_set_size
. If there's no subsequent infoset under an(I,a)
pair, usepadding
as the scalar for(I,a)
-
object_name="parent"
: AssumeI
is a children of infoset-action pair(I',a')
. If the vector stored atGraphNode
x
is of sizeaction_set_size
of infosetI'
, then its$(a')^{th}$ component will be returned. Otherwise, ifx
is simply a scalar,x
will be returned. If no parent exists, returnpadding
-
player
indicates whether to aggregate infosets belong to the player herself or aggregate infosets belong to opponents. An example can be found inLiteEFG/baselines/QFR.py
-
-
Environment.update(strategies, upd_player=-1, upd_color=[-1], traverse_type="default")
: Update the computation graph stored in the environment.strategies
is a list of lengthnum_players
which specify the strategy used to traverse the game for each player.upd_player=-1
means that the graph of all players will be updated. Otherwise, only update the graph ofupd_player
.upd_color=[-1]
means that all nodes will be updated. Otherwise, only node with the color inupd_color
will be updated. An example can be found inLiteEFG/baselines/CMD.py
. Whentraverse_type
is "default", the environment will be traversed by the traverse_type specified when defining the environment. Otherwise, user can also input a specific traverse type among ["Enumerate", "External", "Outcome"] -
Environment.update(strategy, upd_player=-1, upd_color=[-1], traverse_type="default")
: Same asEnvironment.update([strategy, strategy, ..., strategy], upd_player, upd_color)
, i.e. all players usestrategy
to traverse the game -
Environment.update_strategy(strategy, update_best=False)
: Store the sequence-form strategy corresponding to the behavior-form strategy stored instrategy
- Last-iterate:
$\mathbf{x}_T$ - Average-iterate:
$\frac{1}{T} \sum\limits_{t=1}^{T} \mathbf{x}_t$ - Linear average-iterate:
$\frac{2}{T(T+1)} \sum\limits_{t=1}^{T} t\cdot \mathbf{x}_t$ - When
update_best=True
, compute the exploitability and store the sequence-form strategy with the lowest exploitability
- Last-iterate:
-
Environment.exploitability(strategy, type_name="default")
: Return the exploitability of each player when all players usestrategy
-
type_name="default"
: Compute the sequence-form strategy in real-time using the behavior-form strategy stored atstrategy
-
type_name="last-iterate"
: Need to callEnvironment.update_strategy(strategy)
first. Then, compute the exploitability corresponding to the last-iterate of the stored sequence-form strategy -
type_name="avg-iterate"
: Need to callEnvironment.update_strategy(strategy)
first. Then, compute the exploitability corresponding to the average-iterate of the stored sequence-form strategy -
type_name="linear-avg-iterate"
: Need to callEnvironment.update_strategy(strategy)
first. Then, compute the exploitability corresponding to the linear average-iterate of the stored sequence-form strategytype_name="last-iterate"
: Need to callEnvironment.update_strategy(strategy, update_best=True)
first. Then, compute the exploitability corresponding to the best-iterate of the stored sequence-form strategy
-
-
Environment.exploitability(strategy_list, type_name="default")
: Return the exploitability of each player when playeri
usesstrategy_list[i-1]
-
Environment.utility(strategy, type_name="default")
: Similar toEnvironment.exploitability
above, but returns the utility of each player when all players usestrategy
-
Environment.utility(strategy_list, type_name="default")
: Similar toEnvironment.exploitability
above, but returns the utility of each player when playeri
usesstrategy_list[i-1]
-
Environment.get_value(player, node)
: Return a list of(infoset, vector)
pairs, where vector is the value ofnode
in the infoset -
Environment.get_strategy(player, strategy, type_name="default")
: Return a list of(infoset, vector)
pairs, where vector is the value ofstrategy
in the infoset.type_name
is the same meaning as that inEnvironment.exploitability
-
Environment.set_value(player, node, values: list(list(float)))
: Set the variables atnode
tovalues
LiteEFG
is compatible with OpenSpiel
[8]. To use the environment implemented in OpenSpiel, one can call LiteEFG.OpenSpielEnv(game: pyspiel.game, traverse_type="Enumerate", regenerate=False)
. The game should be an pyspiel.game
instance, e.g. pyspiel.load_game("kuhn_poker")
. The traverse type specifies whether the environment will be explored via enumerating all nodes at each iteration, external-sampling, or outcome sampling [2]. regenerate
denotes whether generating the game file again regardless of its existence
OpenSpielEnv.get_strategy(strategy, type_name="default")
: Return(open_spiel.python.policy.TabularPolicy, [pandas.DataFrame])
. The function wrap up the strategy given byEnvironment.get_strategy
to theopen_spiel.python.policy.TabularPolicy
. Also, for each player, her strategy is stored in apandas.DataFrame
, and the strategy of each player will be concatenated into a list ofpandas.DataFrame
OpenSpiel.interact(policy: TabularPolicy, controlled_player=0, reveal_private=True, epochs=1000)
: Interact with the policy stored inpolicy
.controlled_player
indicate which player the user want to control, and the rest of the players will apply their policy stored inpolicy
.reveal_private
indicates whether to reveal the private information of all players at the end of each round, such as revealing the private cards of all players in poker games.epochs
indicates how many rounds the user want to play
The example can be found in LiteEFG/LiteEFG/src/Environment/OpenSpiel/OpenSpielToGameFile.py
To support games not implemented by OpenSpiel, LiteEFG
also supports games described in a specific game format. An example can be found in LiteEFG/game_instances/*.game
.
To be more flexible, LiteEFG
also supports writing new environments by c++
. There are examples provided in LiteEFG/LiteEFG/src/Environment/Leduc/
and LiteEFG/LiteEFG/src/Environment/NFG/
import LiteEFG as leg
import pyspiel
import argparse
from tqdm import tqdm
class CFR(leg.Graph): # inherits the computation graph
def __init__(self):
super().__init__() # initialize computation graph
with leg.backward(is_static=True):
# In the following, we will define static backward graph variables
ev = leg.const(size=1, val=0.0) # initialize expected value as a scalar 0.0
self.strategy = leg.const(self.action_set_size, 1.0 / self.action_set_size)
# initialize the strategy as uniform strategy
self.regret_buffer = leg.const(self.action_set_size, 0.0)
# initialize the regret buffer as all 0 vector
with leg.backward():
# In the following, we will define the backward graph variables
gradient = leg.aggregate(ev, aggregator="sum")
# For current infoset I and action a, aggregate will first concatenate the object variable "ev" of all infosets with parent equal to (I,a).
# Then, the aggregator "sum" will be called to aggregate the vector to 1 scalar.
# At last, a vector of size "action_set_size" will be returned,
# since aggregate maintain a scalar for each action
gradient.inplace(gradient + self.utility)
# does not change the address of gradient,
# replace it with new value
ev.inplace(leg.dot(gradient, self.strategy))
# replace expected value
self.regret_buffer.inplace(self.regret_buffer + gradient - ev)
# update regret_buffer
self.strategy.inplace(leg.normalize(self.regret_buffer, p_norm=1.0, ignore_negative=True))
# Normalize the strategy by p_norm.
# ignore_negative means that negative componenents will be treated as 0 during normalization.
# When the p_norm is 0, a uniform distribution will be returned (all one vector divided by dimension)
def update_graph(self, env : leg.Environment) -> None:
env.update(self.strategy)
# update the graph for both players.
# self.strategy is the traversal strategy to compute the expected utility
def current_strategy(self) -> leg.GraphNode:
return self.strategy
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--traverse_type", type=str, choices=["Enumerate", "External"], default="Enumerate")
parser.add_argument("--iter", type=int, default=100000)
parser.add_argument("--print_freq", type=int, default=1000)
args = parser.parse_args()
leduc_poker = pyspiel.load_game("leduc_poker")
env = leg.OpenSpielEnv(leduc_poker, traverse_type=args.traverse_type) # load the environment from files
alg = CFR()
env.set_graph(alg) # pass graph to the environment
for i in tqdm(range(args.iter)):
alg.update_graph(env) # update the graph
env.update_strategy(alg.current_strategy())
# automatically update the last-iterate, best-iterate, avg-iterate, linear-avg-iterate of the strategy
if i % args.print_freq == 0:
print(i, env.exploitability(alg.current_strategy(), "avg-iterate")) # output the exploitabiltiy
In LiteEFG
, the baselines are stored in LiteEFG/LiteEFG/baselines
. Currently, the following algorithms are implemented.
- Counterfactual Regret Minimization (CFR) [1]
- Monte-Carlo Counterfactual Regret Minimization (MCCFR) [2]
- Counterfactual Regret Minimization+ (CFR+) [5]
- Dilated Optimistic Mirror Descent (DOMD) [6]
- Magnetic Mirror Descent (MMD) [7]
- Clairvoyant Mirror Descent (CMD) [9]
- Discounted Counterfactual Regret Minimization (DCFR) [10]
- Predictive Counterfactual Regret Minimization (PCFR) [11]
- Q-Function based Regret Minimization (QFR)
- Regularized Dilated Optimistic Mirror Descent (Reg-DOMD) [12]
- Regularized Counterfactual Regret Minimization (Reg-CFR) [12]
- Implicit Exploration Online Mirror Descent (IXOMD) [13]
- Balanced Online Mirror Descent (Balanced OMD) [14]
If you use LiteEFG in your research, please cite the paper with the following BibTeX:
@article{liu2024liteefgefficientpythonlibrary,
title={{LiteEFG}: An Efficient Python Library for Solving Extensive-form Games},
author={Mingyang Liu and Gabriele Farina and Asuman Ozdaglar},
year={2024},
eprint={2407.20351},
archivePrefix={arXiv},
primaryClass={cs.GT},
url={https://arxiv.org/abs/2407.20351},
}
[1] Zinkevich, Martin, et al. "Regret minimization in games with incomplete information." Advances in neural information processing systems 20 (2007).
[2] Lanctot, Marc, et al. "Monte Carlo sampling for regret minimization in extensive games." Advances in neural information processing systems 22 (2009).
[3] Harold W Kuhn. A simplified two-person poker. Contributions to the Theory of Games, 1(417): 97–103, 1950.
[4] Finnegan Southey, Michael Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, and Chris Rayner. Bayes’ bluff: opponent modelling in poker. Conference on Uncertainty in Artificial Intelligence (2005).
[5] Tammelin, Oskari. "Solving large imperfect information games using CFR+." arXiv preprint arXiv:1407.5042 (2014).
[6] Lee, Chung-Wei, Christian Kroer, and Haipeng Luo. "Last-iterate convergence in extensive-form games. Advances in Neural Information Processing Systems (2021).
[7] Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, and Christian Kroer. "A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games." International Conference on Learning Representations (2023)
[8] Lanctot, Marc, et al. "OpenSpiel: A framework for reinforcement learning in games." arXiv preprint arXiv:1908.09453 (2019).
[9] Piliouras, Georgios, Ryann Sim, and Stratis Skoulakis. "Beyond time-average convergence: Near-optimal uncoupled online learning via clairvoyant multiplicative weights update." Advances in Neural Information Processing Systems (2022).
[10] Brown, Noam, and Tuomas Sandholm. "Solving imperfect-information games via discounted regret minimization." Proceedings of the AAAI Conference on Artificial Intelligence. 2019.
[11] Farina, Gabriele, Christian Kroer, and Tuomas Sandholm. "Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent." Proceedings of the AAAI Conference on Artificial Intelligence. 2021.
[12] Liu, Mingyang, Asuman Ozdaglar, Tiancheng Yu, and Kaiqing Zhang. "The power of regularization in solving extensive-form games." International Conference on Learning Representations (2023)
[13] Kozuno, Tadashi, Pierre Ménard, Rémi Munos, and Michal Valko. "Learning in two-player zero-sum partially observable Markov games with perfect recall." Advances in Neural Information Processing Systems (2021)
[14] Bai, Yu, Chi Jin, Song Mei, and Tiancheng Yu. "Near-optimal learning of extensive-form games with imperfect information." International Conference on Machine Learning (2022)