Parallelizing Counterfactual Regret Minimization

Juho Kim; Tuomas Sandholm

arxiv: 2605.14277 · v1 · pith:2GCDGRS4new · submitted 2026-05-14 · 💻 cs.AI · cs.GT

Parallelizing Counterfactual Regret Minimization

Juho Kim , Tuomas Sandholm This is my paper

Pith reviewed 2026-05-15 02:35 UTC · model grok-4.3

classification 💻 cs.AI cs.GT

keywords counterfactual regret minimizationparallelizationGPUimperfect-information gameslinear algebraCFR+game solvingregret matching

0 comments

The pith

Counterfactual regret minimization can be reframed as linear algebra operations to run up to four orders of magnitude faster on GPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to parallelize the family of counterfactual regret minimization algorithms by recasting their core updates as sequences of matrix and vector operations. This lets standard techniques for speeding up linear algebra on GPUs be applied directly to CFR and its variants such as CFR+, discounted CFR, and predictive CFR. A sympathetic reader would care because CFR has been central to solving large imperfect-information games, and the reported speedups would make it practical to handle games that are currently out of reach on CPUs. The authors implement the approach and measure up to 10,000 times faster execution than OpenSpiel's CPU versions. The claim rests on the idea that the linear-algebra form preserves the original regret-matching logic while unlocking hardware parallelism.

Core claim

By expressing the regret and strategy updates of CFR as linear algebra operations, existing parallel linear-algebra routines can be used to accelerate the algorithm on GPUs without changing its convergence behavior, and the same transformation extends to other members of the CFR family including CFR+, discounted CFR, and predictive variants.

What carries the argument

A generalized parallelization framework that reformulates CFR regret and strategy updates as linear algebra operations so that GPU-accelerated matrix routines can be substituted for the original sequential loops.

If this is right

The same linear-algebra reformulation applies directly to CFR+, discounted CFR, and predictive CFR variants.
A GPU implementation achieves up to four orders of magnitude speedup over existing CPU baselines such as OpenSpiel.
Larger imperfect-information games become solvable in the same wall-clock time as smaller ones were before.
The framework is general enough to be reused for any tabular CFR-family algorithm.
Training time for game-solving agents drops dramatically, enabling more iterations or bigger game trees.
pith_inferences:[

Load-bearing premise

The linear-algebra reformulation must preserve the exact regret bounds and convergence properties of the original CFR without introducing floating-point or parallelism errors.

What would settle it

Execute both the original serial CFR and the new GPU version on Kuhn poker until convergence and verify whether the final strategies and exploitability values match within floating-point tolerance.

Figures

Figures reproduced from arXiv: 2605.14277 by Juho Kim, Tuomas Sandholm.

**Figure 2.** Figure 2: Iteration times and (CUDA) memory usage versus the size of the game being solved. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

read the original abstract

Parallelization has played an instrumental role in the field of artificial intelligence (AI), drastically reducing the time taken to train and evaluate large AI models. In contrast to its impact in the broader field of AI, applying parallelization to computational game solving is relatively unexplored, despite its great potential. In this paper, we parallelize the family of counterfactual regret minimization (CFR) algorithms, which were central to important breakthroughs for solving large imperfect-information games. We present a generalized parallelization framework, reframing CFR as a series of linear algebra operations. Then, existing techniques for parallelizing linear algebra operations can be applied to accelerate CFR. We also describe how our technique can be applied to other tabular members of the CFR family of algorithms, including the state-of-the-art, such as CFR+, discounted CFR, and predictive variants of CFR. Experimentally, we show that our CFR implementation on a GPU is up to four orders of magnitude faster than Google DeepMind OpenSpiel's CFR implementations on a CPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper recasts CFR as linear algebra ops to run on GPUs and reports big speedups, but the rewrite must be shown to match the original exactly or the regret bounds are at risk.

read the letter

The main thing to know is that they rewrite the CFR steps—regret matching, counterfactual values, and updates—as matrix operations so the whole algorithm runs in parallel on a GPU. They claim up to 10,000x faster than OpenSpiel's CPU version and say the same trick covers CFR+, discounted CFR, and predictive variants. If the mapping is exact, this is a practical win for anyone who needs to solve bigger imperfect-information games without waiting days for convergence. The experiments appear to back the runtime numbers with real runs, which is the strongest part of the work. The framework is systematic rather than a one-off hack, and that looks new relative to earlier parallel game-solving papers. The soft spot is the missing verification that the linear-algebra version produces identical iterates to the sequential version. Regret matching uses a max(0,·) and the updates depend on reach probabilities; any change in reduction order or floating-point behavior could alter the path and break the O(1/√T) guarantee. The abstract asserts exactness but does not show the derivation, so the convergence claim rests on an unproven equivalence. If the full paper supplies a clear proof or a side-by-side check that the outputs match to machine precision, the result is solid. This is for people who actually run large CFR instances in multi-agent settings or security games. A reader who needs faster tabular solvers will find the implementation details useful even if they have to re-derive the bounds themselves. It deserves peer review because the empirical gains are large and the idea is straightforward to test, though the theory section will need close scrutiny.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a generalized parallelization framework for counterfactual regret minimization (CFR) by reframing the algorithm family as sequences of linear algebra operations, enabling the application of existing parallel linear-algebra techniques. It states that this applies to CFR, CFR+, discounted CFR, and predictive CFR variants, and reports that a GPU implementation achieves speedups of up to four orders of magnitude over OpenSpiel's CPU CFR implementations.

Significance. If the reformulation is shown to be mathematically equivalent to standard CFR and to preserve its regret bounds and convergence properties, the work would provide a practical route to scaling CFR to substantially larger games, potentially expanding the range of imperfect-information problems that can be solved exactly or approximately within reasonable compute budgets.

major comments (2)

[Framework section (reformulation)] The central claim that the linear-algebra reformulation is exact (and therefore inherits the O(1/√T) regret bound) requires an explicit derivation or proof of equivalence. No such derivation, error analysis, or verification that parallel reductions, matrix multiplications for reach probabilities, and the max(0,·) operation in regret matching preserve semantics appears in the manuscript; without it the reported speedups cannot be claimed to solve the same problem.
[Experiments] Experimental results must include a direct comparison of the strategies and cumulative regrets produced by the parallel GPU version versus the sequential CPU version on the same games; floating-point associativity or execution-order differences could silently violate the regret-matching guarantee even if wall-clock time improves.

minor comments (2)

[Abstract] The abstract should state the specific game sizes, number of information sets, and iteration counts used to obtain the four-order-of-magnitude speedup so readers can assess the practical scope of the result.
[Framework section] Notation for the matrix forms of counterfactual values and regret vectors should be introduced with explicit mappings back to the standard CFR recurrence to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of the linear-algebra reformulation and the experimental validation.

read point-by-point responses

Referee: [Framework section (reformulation)] The central claim that the linear-algebra reformulation is exact (and therefore inherits the O(1/√T) regret bound) requires an explicit derivation or proof of equivalence. No such derivation, error analysis, or verification that parallel reductions, matrix multiplications for reach probabilities, and the max(0,·) operation in regret matching preserve semantics appears in the manuscript; without it the reported speedups cannot be claimed to solve the same problem.

Authors: We agree that an explicit derivation is necessary to rigorously confirm equivalence and inheritance of the regret bound. In the revised manuscript we will insert a new subsection (tentatively 3.3) that derives the equivalence step by step: (i) showing that the reach-probability computation is identical to the standard product of strategy and transition matrices, (ii) demonstrating that the regret update is a direct matrix subtraction and accumulation, and (iii) proving that the regret-matching operator (including the element-wise max(0,·) and normalization) produces the identical strategy vector as the sequential formulation. We will also add a short floating-point error discussion noting that all operations remain within the same associative order as the original algorithm when reductions are performed with the same precision. revision: yes
Referee: [Experiments] Experimental results must include a direct comparison of the strategies and cumulative regrets produced by the parallel GPU version versus the sequential CPU version on the same games; floating-point associativity or execution-order differences could silently violate the regret-matching guarantee even if wall-clock time improves.

Authors: We accept this requirement. The revised Experiments section will contain a new table (and accompanying text) that reports, for each benchmark game, the L1 distance between the final strategy profiles produced by the GPU and CPU implementations, as well as the absolute difference in cumulative regret after a fixed number of iterations. We will also include a short convergence plot overlaying the two curves to confirm that any numerical discrepancy remains negligible and does not alter the observed convergence rate. revision: yes

Circularity Check

0 steps flagged

No circularity: CFR reframed as linear-algebra operations using standard parallelization techniques

full rationale

The paper reframes CFR (regret matching, counterfactual values, strategy updates) as a sequence of matrix and vector operations so that existing GPU linear-algebra libraries can be applied directly. No equation is defined in terms of its own output, no parameter is fitted to a subset of data and then relabeled a prediction, and no uniqueness theorem or ansatz is imported via self-citation. The derivation chain consists of explicit algebraic rewrites whose equivalence to the original CFR recurrence is asserted by construction of the reformulation itself; this is an independent translation step rather than a reduction to fitted inputs or prior self-referential results. The reported speedups are therefore empirical consequences of the parallel execution of the rewritten operations, not circularly forced by the paper's own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard correctness of CFR and the well-known parallelizability of linear algebra primitives; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption CFR regret updates can be expressed exactly as matrix-vector operations
Invoked when the authors reframe CFR as linear algebra; correctness of this equivalence is assumed rather than re-derived.

pith-pipeline@v0.9.0 · 5464 in / 1148 out tokens · 20959 ms · 2026-05-15T02:35:14.273833+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

G. E. Blelloch. Programming parallel algorithms.Commun. ACM, 39(3):85–97, 1996

work page 1996
[2]

Brown and T

N. Brown and T. Sandholm. Regret-based pruning in extensive-form games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2015

work page 2015
[3]

Brown and T

N. Brown and T. Sandholm. Reduced space and faster convergence in imperfect-information games via pruning. InProceedings of the International Conference on Machine Learning (ICML), 2017

work page 2017
[4]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018

work page 2018
[5]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019

work page 2019
[6]

Brown and T

N. Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019

work page 2019
[7]

Brown, C

N. Brown, C. Kroer, and T. Sandholm. Dynamic thresholding and pruning for regret minimiza- tion. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017

work page 2017
[8]

Brown, A

N. Brown, A. Lerer, S. Gross, and T. Sandholm. Deep counterfactual regret minimization. In Proceedings of the International Conference on Machine Learning (ICML), 2019

work page 2019
[9]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. a. Ranzato, A. Senior, P. Tucker, K. Yang, Q. Le, and A. Ng. Large scale distributed deep networks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2012

work page 2012
[10]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Regret circuits: Composability of regret minimizers. In Proceedings of the International Conference on Machine Learning (ICML), 2019

work page 2019
[11]

Farina, C

G. Farina, C. K. Ling, F. Fang, and T. Sandholm. Correlation in extensive-form games: Saddle- point formulation and benchmarks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[12]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Stochastic regret minimization in extensive-form games. InProceedings of the International Conference on Machine Learning (ICML), 2020

work page 2020
[13]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Faster game solving via predictive Blackwell approacha- bility: Connecting regret matching and mirror descent. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

work page 2021
[14]

Gilpin and T

A. Gilpin and T. Sandholm. Speeding up gradient-based algorithms for sequential games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010

work page 2010
[15]

Gilpin, S

A. Gilpin, S. Hoda, J. Peña, and T. Sandholm. Gradient-based algorithms for finding Nash equilibria in extensive form games. InProceedings of the International Workshop on Internet and Network Economics (WINE), 2007

work page 2007
[16]

Goyal, P

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y . Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour, 2018

work page 2018
[17]

Hart and A

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000. 10

work page 2000
[18]

S. Hoda, A. Gilpin, J. Pe `na, and T. Sandholm. Smoothing techniques for computing Nash equilibria of sequential games.Math. Oper. Res., 35(2):494–512, 2010

work page 2010
[19]

Kepner, P

J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert, D. Hutchison, M. Kumar, A. Lumsdaine, H. Meyerhenke, S. McMillan, C. Yang, J. D. Owens, M. Zalewski, T. Mattson, and J. Moreira. Mathematical foundations of the GraphBLAS. InProceedings of the IEEE High Performance Extreme Computing Conference (HPEC), 2016

work page 2016
[20]

Koller, N

D. Koller, N. Megiddo, and B. von Stengel. Efficient computation of equilibria for extensive two-person games.Games and Economic Behavior, 14(2):247–259, 1996

work page 1996
[21]

Kroer, G

C. Kroer, G. Farina, and T. Sandholm. Solving large sequential games with the excessive gap technique. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[22]

Lanctot, K

M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling. Monte Carlo sampling for regret mini- mization in extensive games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2009

work page 2009
[23]

Lanctot, E

M. Lanctot, E. Lockhart, J.-B. Lespiau, V . Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, D. Hennes, D. Morrill, P. Muller, T. Ewalds, R. Faulkner, J. Kramár, B. D. Vylder, B. Saeta, J. Bradbury, D. Ding, S. Borgeaud, M. Lai, J. Schrit- twieser, T. Anthony, E. Hughes, I. Danihelka, and J. Ryan-Davis. OpenSpiel: A ...

work page 2019
[24]

S. M. McAleer, G. Farina, M. Lanctot, and T. Sandholm. ESCHER: Eschewing importance sampling in games by computing a history value function to estimate regret. InProceedings of the International Conference on Learning Representations (ICLR), 2023

work page 2023
[25]

Morav ˇcík, M

M. Morav ˇcík, M. Schmid, N. Burch, V . Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.Science, 356(6337):508–513, 2017

work page 2017
[26]

Nesterov

Y . Nesterov. Excessive gap technique in nonsmooth convex minimization.SIAM Journal on Optimization, 16(1):235–249, 2005

work page 2005
[27]

Okuta, Y

R. Okuta, Y . Unno, D. Nishino, S. Hido, and C. Loomis. CuPy: A NumPy-compatible library for NVIDIA GPU calculations. InProceedings of the Workshop on Machine Learning Systems (LearningSys) in the Annual Conference on Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[28]

J. Reis. A GPU implementation of counterfactual regret minimization. Master’s thesis, Univer- sity of Porto, 2015

work page 2015
[29]

J. Rudolf. Counterfactual regret minimization on GPU, 2021

work page 2021
[30]

Sergeev and M

A. Sergeev and M. D. Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018

work page 2018
[31]

Tammelin

O. Tammelin. Solving large imperfect information games using CFR+, 2014

work page 2014
[32]

Tammelin, N

O. Tammelin, N. Burch, M. Johanson, and M. Bowling. Solving heads-up limit Texas Hold’em. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015

work page 2015
[33]

von Stengel

B. von Stengel. Efficient computation of behavior strategies.Games and Economic Behavior, 14(2):220–246, 1996

work page 1996
[34]

H. Xu, K. Li, H. Fu, Q. Fu, J. Xing, and J. Cheng. Dynamic discounted counterfactual regret minimization. InProceedings of the International Conference on Learning Representations (ICLR), 2024

work page 2024
[35]

B. H. Zhang, I. Anagnostides, and T. Sandholm. Scale-invariant regret matching and online learning with optimal convergence: Bridging theory and practice in zero-sum games, 2026. 11

work page 2026
[36]

Zhang, S

N. Zhang, S. McAleer, and T. Sandholm. Faster game solving via hyperparameter schedules. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026
[37]

Zinkevich, M

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret minimization in games with incomplete information. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2007. 12 A Omitted proofs in Section 3.4 This appendix section contains proofs excluded from the main paper content due to space constraints. A.1 Proof ...

work page 2007

[1] [1]

G. E. Blelloch. Programming parallel algorithms.Commun. ACM, 39(3):85–97, 1996

work page 1996

[2] [2]

Brown and T

N. Brown and T. Sandholm. Regret-based pruning in extensive-form games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2015

work page 2015

[3] [3]

Brown and T

N. Brown and T. Sandholm. Reduced space and faster convergence in imperfect-information games via pruning. InProceedings of the International Conference on Machine Learning (ICML), 2017

work page 2017

[4] [4]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018

work page 2018

[5] [5]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019

work page 2019

[6] [6]

Brown and T

N. Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019

work page 2019

[7] [7]

Brown, C

N. Brown, C. Kroer, and T. Sandholm. Dynamic thresholding and pruning for regret minimiza- tion. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017

work page 2017

[8] [8]

Brown, A

N. Brown, A. Lerer, S. Gross, and T. Sandholm. Deep counterfactual regret minimization. In Proceedings of the International Conference on Machine Learning (ICML), 2019

work page 2019

[9] [9]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. a. Ranzato, A. Senior, P. Tucker, K. Yang, Q. Le, and A. Ng. Large scale distributed deep networks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2012

work page 2012

[10] [10]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Regret circuits: Composability of regret minimizers. In Proceedings of the International Conference on Machine Learning (ICML), 2019

work page 2019

[11] [11]

Farina, C

G. Farina, C. K. Ling, F. Fang, and T. Sandholm. Correlation in extensive-form games: Saddle- point formulation and benchmarks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019

work page 2019

[12] [12]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Stochastic regret minimization in extensive-form games. InProceedings of the International Conference on Machine Learning (ICML), 2020

work page 2020

[13] [13]

Farina, C

G. Farina, C. Kroer, and T. Sandholm. Faster game solving via predictive Blackwell approacha- bility: Connecting regret matching and mirror descent. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

work page 2021

[14] [14]

Gilpin and T

A. Gilpin and T. Sandholm. Speeding up gradient-based algorithms for sequential games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010

work page 2010

[15] [15]

Gilpin, S

A. Gilpin, S. Hoda, J. Peña, and T. Sandholm. Gradient-based algorithms for finding Nash equilibria in extensive form games. InProceedings of the International Workshop on Internet and Network Economics (WINE), 2007

work page 2007

[16] [16]

Goyal, P

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y . Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour, 2018

work page 2018

[17] [17]

Hart and A

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000. 10

work page 2000

[18] [18]

S. Hoda, A. Gilpin, J. Pe `na, and T. Sandholm. Smoothing techniques for computing Nash equilibria of sequential games.Math. Oper. Res., 35(2):494–512, 2010

work page 2010

[19] [19]

Kepner, P

J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert, D. Hutchison, M. Kumar, A. Lumsdaine, H. Meyerhenke, S. McMillan, C. Yang, J. D. Owens, M. Zalewski, T. Mattson, and J. Moreira. Mathematical foundations of the GraphBLAS. InProceedings of the IEEE High Performance Extreme Computing Conference (HPEC), 2016

work page 2016

[20] [20]

Koller, N

D. Koller, N. Megiddo, and B. von Stengel. Efficient computation of equilibria for extensive two-person games.Games and Economic Behavior, 14(2):247–259, 1996

work page 1996

[21] [21]

Kroer, G

C. Kroer, G. Farina, and T. Sandholm. Solving large sequential games with the excessive gap technique. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018

work page 2018

[22] [22]

Lanctot, K

M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling. Monte Carlo sampling for regret mini- mization in extensive games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2009

work page 2009

[23] [23]

Lanctot, E

M. Lanctot, E. Lockhart, J.-B. Lespiau, V . Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, D. Hennes, D. Morrill, P. Muller, T. Ewalds, R. Faulkner, J. Kramár, B. D. Vylder, B. Saeta, J. Bradbury, D. Ding, S. Borgeaud, M. Lai, J. Schrit- twieser, T. Anthony, E. Hughes, I. Danihelka, and J. Ryan-Davis. OpenSpiel: A ...

work page 2019

[24] [24]

S. M. McAleer, G. Farina, M. Lanctot, and T. Sandholm. ESCHER: Eschewing importance sampling in games by computing a history value function to estimate regret. InProceedings of the International Conference on Learning Representations (ICLR), 2023

work page 2023

[25] [25]

Morav ˇcík, M

M. Morav ˇcík, M. Schmid, N. Burch, V . Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.Science, 356(6337):508–513, 2017

work page 2017

[26] [26]

Nesterov

Y . Nesterov. Excessive gap technique in nonsmooth convex minimization.SIAM Journal on Optimization, 16(1):235–249, 2005

work page 2005

[27] [27]

Okuta, Y

R. Okuta, Y . Unno, D. Nishino, S. Hido, and C. Loomis. CuPy: A NumPy-compatible library for NVIDIA GPU calculations. InProceedings of the Workshop on Machine Learning Systems (LearningSys) in the Annual Conference on Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[28] [28]

J. Reis. A GPU implementation of counterfactual regret minimization. Master’s thesis, Univer- sity of Porto, 2015

work page 2015

[29] [29]

J. Rudolf. Counterfactual regret minimization on GPU, 2021

work page 2021

[30] [30]

Sergeev and M

A. Sergeev and M. D. Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018

work page 2018

[31] [31]

Tammelin

O. Tammelin. Solving large imperfect information games using CFR+, 2014

work page 2014

[32] [32]

Tammelin, N

O. Tammelin, N. Burch, M. Johanson, and M. Bowling. Solving heads-up limit Texas Hold’em. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015

work page 2015

[33] [33]

von Stengel

B. von Stengel. Efficient computation of behavior strategies.Games and Economic Behavior, 14(2):220–246, 1996

work page 1996

[34] [34]

H. Xu, K. Li, H. Fu, Q. Fu, J. Xing, and J. Cheng. Dynamic discounted counterfactual regret minimization. InProceedings of the International Conference on Learning Representations (ICLR), 2024

work page 2024

[35] [35]

B. H. Zhang, I. Anagnostides, and T. Sandholm. Scale-invariant regret matching and online learning with optimal convergence: Bridging theory and practice in zero-sum games, 2026. 11

work page 2026

[36] [36]

Zhang, S

N. Zhang, S. McAleer, and T. Sandholm. Faster game solving via hyperparameter schedules. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026

[37] [37]

Zinkevich, M

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret minimization in games with incomplete information. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2007. 12 A Omitted proofs in Section 3.4 This appendix section contains proofs excluded from the main paper content due to space constraints. A.1 Proof ...

work page 2007