pith. sign in

arxiv: 2605.14277 · v1 · pith:2GCDGRS4new · submitted 2026-05-14 · 💻 cs.AI · cs.GT

Parallelizing Counterfactual Regret Minimization

Pith reviewed 2026-05-15 02:35 UTC · model grok-4.3

classification 💻 cs.AI cs.GT
keywords counterfactual regret minimizationparallelizationGPUimperfect-information gameslinear algebraCFR+game solvingregret matching
0
0 comments X

The pith

Counterfactual regret minimization can be reframed as linear algebra operations to run up to four orders of magnitude faster on GPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to parallelize the family of counterfactual regret minimization algorithms by recasting their core updates as sequences of matrix and vector operations. This lets standard techniques for speeding up linear algebra on GPUs be applied directly to CFR and its variants such as CFR+, discounted CFR, and predictive CFR. A sympathetic reader would care because CFR has been central to solving large imperfect-information games, and the reported speedups would make it practical to handle games that are currently out of reach on CPUs. The authors implement the approach and measure up to 10,000 times faster execution than OpenSpiel's CPU versions. The claim rests on the idea that the linear-algebra form preserves the original regret-matching logic while unlocking hardware parallelism.

Core claim

By expressing the regret and strategy updates of CFR as linear algebra operations, existing parallel linear-algebra routines can be used to accelerate the algorithm on GPUs without changing its convergence behavior, and the same transformation extends to other members of the CFR family including CFR+, discounted CFR, and predictive variants.

What carries the argument

A generalized parallelization framework that reformulates CFR regret and strategy updates as linear algebra operations so that GPU-accelerated matrix routines can be substituted for the original sequential loops.

If this is right

  • The same linear-algebra reformulation applies directly to CFR+, discounted CFR, and predictive CFR variants.
  • A GPU implementation achieves up to four orders of magnitude speedup over existing CPU baselines such as OpenSpiel.
  • Larger imperfect-information games become solvable in the same wall-clock time as smaller ones were before.
  • The framework is general enough to be reused for any tabular CFR-family algorithm.
  • Training time for game-solving agents drops dramatically, enabling more iterations or bigger game trees.
  • pith_inferences:[

Load-bearing premise

The linear-algebra reformulation must preserve the exact regret bounds and convergence properties of the original CFR without introducing floating-point or parallelism errors.

What would settle it

Execute both the original serial CFR and the new GPU version on Kuhn poker until convergence and verify whether the final strategies and exploitability values match within floating-point tolerance.

Figures

Figures reproduced from arXiv: 2605.14277 by Juho Kim, Tuomas Sandholm.

Figure 1
Figure 1. Figure 1: Exploitabilities versus wall-clock time for our CFR implementations and those of OpenSpiel. [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Iteration times and (CUDA) memory usage versus the size of the game being solved. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
read the original abstract

Parallelization has played an instrumental role in the field of artificial intelligence (AI), drastically reducing the time taken to train and evaluate large AI models. In contrast to its impact in the broader field of AI, applying parallelization to computational game solving is relatively unexplored, despite its great potential. In this paper, we parallelize the family of counterfactual regret minimization (CFR) algorithms, which were central to important breakthroughs for solving large imperfect-information games. We present a generalized parallelization framework, reframing CFR as a series of linear algebra operations. Then, existing techniques for parallelizing linear algebra operations can be applied to accelerate CFR. We also describe how our technique can be applied to other tabular members of the CFR family of algorithms, including the state-of-the-art, such as CFR+, discounted CFR, and predictive variants of CFR. Experimentally, we show that our CFR implementation on a GPU is up to four orders of magnitude faster than Google DeepMind OpenSpiel's CFR implementations on a CPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a generalized parallelization framework for counterfactual regret minimization (CFR) by reframing the algorithm family as sequences of linear algebra operations, enabling the application of existing parallel linear-algebra techniques. It states that this applies to CFR, CFR+, discounted CFR, and predictive CFR variants, and reports that a GPU implementation achieves speedups of up to four orders of magnitude over OpenSpiel's CPU CFR implementations.

Significance. If the reformulation is shown to be mathematically equivalent to standard CFR and to preserve its regret bounds and convergence properties, the work would provide a practical route to scaling CFR to substantially larger games, potentially expanding the range of imperfect-information problems that can be solved exactly or approximately within reasonable compute budgets.

major comments (2)
  1. [Framework section (reformulation)] The central claim that the linear-algebra reformulation is exact (and therefore inherits the O(1/√T) regret bound) requires an explicit derivation or proof of equivalence. No such derivation, error analysis, or verification that parallel reductions, matrix multiplications for reach probabilities, and the max(0,·) operation in regret matching preserve semantics appears in the manuscript; without it the reported speedups cannot be claimed to solve the same problem.
  2. [Experiments] Experimental results must include a direct comparison of the strategies and cumulative regrets produced by the parallel GPU version versus the sequential CPU version on the same games; floating-point associativity or execution-order differences could silently violate the regret-matching guarantee even if wall-clock time improves.
minor comments (2)
  1. [Abstract] The abstract should state the specific game sizes, number of information sets, and iteration counts used to obtain the four-order-of-magnitude speedup so readers can assess the practical scope of the result.
  2. [Framework section] Notation for the matrix forms of counterfactual values and regret vectors should be introduced with explicit mappings back to the standard CFR recurrence to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of the linear-algebra reformulation and the experimental validation.

read point-by-point responses
  1. Referee: [Framework section (reformulation)] The central claim that the linear-algebra reformulation is exact (and therefore inherits the O(1/√T) regret bound) requires an explicit derivation or proof of equivalence. No such derivation, error analysis, or verification that parallel reductions, matrix multiplications for reach probabilities, and the max(0,·) operation in regret matching preserve semantics appears in the manuscript; without it the reported speedups cannot be claimed to solve the same problem.

    Authors: We agree that an explicit derivation is necessary to rigorously confirm equivalence and inheritance of the regret bound. In the revised manuscript we will insert a new subsection (tentatively 3.3) that derives the equivalence step by step: (i) showing that the reach-probability computation is identical to the standard product of strategy and transition matrices, (ii) demonstrating that the regret update is a direct matrix subtraction and accumulation, and (iii) proving that the regret-matching operator (including the element-wise max(0,·) and normalization) produces the identical strategy vector as the sequential formulation. We will also add a short floating-point error discussion noting that all operations remain within the same associative order as the original algorithm when reductions are performed with the same precision. revision: yes

  2. Referee: [Experiments] Experimental results must include a direct comparison of the strategies and cumulative regrets produced by the parallel GPU version versus the sequential CPU version on the same games; floating-point associativity or execution-order differences could silently violate the regret-matching guarantee even if wall-clock time improves.

    Authors: We accept this requirement. The revised Experiments section will contain a new table (and accompanying text) that reports, for each benchmark game, the L1 distance between the final strategy profiles produced by the GPU and CPU implementations, as well as the absolute difference in cumulative regret after a fixed number of iterations. We will also include a short convergence plot overlaying the two curves to confirm that any numerical discrepancy remains negligible and does not alter the observed convergence rate. revision: yes

Circularity Check

0 steps flagged

No circularity: CFR reframed as linear-algebra operations using standard parallelization techniques

full rationale

The paper reframes CFR (regret matching, counterfactual values, strategy updates) as a sequence of matrix and vector operations so that existing GPU linear-algebra libraries can be applied directly. No equation is defined in terms of its own output, no parameter is fitted to a subset of data and then relabeled a prediction, and no uniqueness theorem or ansatz is imported via self-citation. The derivation chain consists of explicit algebraic rewrites whose equivalence to the original CFR recurrence is asserted by construction of the reformulation itself; this is an independent translation step rather than a reduction to fitted inputs or prior self-referential results. The reported speedups are therefore empirical consequences of the parallel execution of the rewritten operations, not circularly forced by the paper's own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard correctness of CFR and the well-known parallelizability of linear algebra primitives; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption CFR regret updates can be expressed exactly as matrix-vector operations
    Invoked when the authors reframe CFR as linear algebra; correctness of this equivalence is assumed rather than re-derived.

pith-pipeline@v0.9.0 · 5464 in / 1148 out tokens · 20959 ms · 2026-05-15T02:35:14.273833+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    G. E. Blelloch. Programming parallel algorithms.Commun. ACM, 39(3):85–97, 1996

  2. [2]

    Brown and T

    N. Brown and T. Sandholm. Regret-based pruning in extensive-form games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2015

  3. [3]

    Brown and T

    N. Brown and T. Sandholm. Reduced space and faster convergence in imperfect-information games via pruning. InProceedings of the International Conference on Machine Learning (ICML), 2017

  4. [4]

    Brown and T

    N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018

  5. [5]

    Brown and T

    N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019

  6. [6]

    Brown and T

    N. Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019

  7. [7]

    Brown, C

    N. Brown, C. Kroer, and T. Sandholm. Dynamic thresholding and pruning for regret minimiza- tion. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2017

  8. [8]

    Brown, A

    N. Brown, A. Lerer, S. Gross, and T. Sandholm. Deep counterfactual regret minimization. In Proceedings of the International Conference on Machine Learning (ICML), 2019

  9. [9]

    J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. a. Ranzato, A. Senior, P. Tucker, K. Yang, Q. Le, and A. Ng. Large scale distributed deep networks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2012

  10. [10]

    Farina, C

    G. Farina, C. Kroer, and T. Sandholm. Regret circuits: Composability of regret minimizers. In Proceedings of the International Conference on Machine Learning (ICML), 2019

  11. [11]

    Farina, C

    G. Farina, C. K. Ling, F. Fang, and T. Sandholm. Correlation in extensive-form games: Saddle- point formulation and benchmarks. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2019

  12. [12]

    Farina, C

    G. Farina, C. Kroer, and T. Sandholm. Stochastic regret minimization in extensive-form games. InProceedings of the International Conference on Machine Learning (ICML), 2020

  13. [13]

    Farina, C

    G. Farina, C. Kroer, and T. Sandholm. Faster game solving via predictive Blackwell approacha- bility: Connecting regret matching and mirror descent. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021

  14. [14]

    Gilpin and T

    A. Gilpin and T. Sandholm. Speeding up gradient-based algorithms for sequential games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010

  15. [15]

    Gilpin, S

    A. Gilpin, S. Hoda, J. Peña, and T. Sandholm. Gradient-based algorithms for finding Nash equilibria in extensive form games. InProceedings of the International Workshop on Internet and Network Economics (WINE), 2007

  16. [16]

    Goyal, P

    P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y . Jia, and K. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour, 2018

  17. [17]

    Hart and A

    S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000. 10

  18. [18]

    S. Hoda, A. Gilpin, J. Pe `na, and T. Sandholm. Smoothing techniques for computing Nash equilibria of sequential games.Math. Oper. Res., 35(2):494–512, 2010

  19. [19]

    Kepner, P

    J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert, D. Hutchison, M. Kumar, A. Lumsdaine, H. Meyerhenke, S. McMillan, C. Yang, J. D. Owens, M. Zalewski, T. Mattson, and J. Moreira. Mathematical foundations of the GraphBLAS. InProceedings of the IEEE High Performance Extreme Computing Conference (HPEC), 2016

  20. [20]

    Koller, N

    D. Koller, N. Megiddo, and B. von Stengel. Efficient computation of equilibria for extensive two-person games.Games and Economic Behavior, 14(2):247–259, 1996

  21. [21]

    Kroer, G

    C. Kroer, G. Farina, and T. Sandholm. Solving large sequential games with the excessive gap technique. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018

  22. [22]

    Lanctot, K

    M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling. Monte Carlo sampling for regret mini- mization in extensive games. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2009

  23. [23]

    Lanctot, E

    M. Lanctot, E. Lockhart, J.-B. Lespiau, V . Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, D. Hennes, D. Morrill, P. Muller, T. Ewalds, R. Faulkner, J. Kramár, B. D. Vylder, B. Saeta, J. Bradbury, D. Ding, S. Borgeaud, M. Lai, J. Schrit- twieser, T. Anthony, E. Hughes, I. Danihelka, and J. Ryan-Davis. OpenSpiel: A ...

  24. [24]

    S. M. McAleer, G. Farina, M. Lanctot, and T. Sandholm. ESCHER: Eschewing importance sampling in games by computing a history value function to estimate regret. InProceedings of the International Conference on Learning Representations (ICLR), 2023

  25. [25]

    Morav ˇcík, M

    M. Morav ˇcík, M. Schmid, N. Burch, V . Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.Science, 356(6337):508–513, 2017

  26. [26]

    Nesterov

    Y . Nesterov. Excessive gap technique in nonsmooth convex minimization.SIAM Journal on Optimization, 16(1):235–249, 2005

  27. [27]

    Okuta, Y

    R. Okuta, Y . Unno, D. Nishino, S. Hido, and C. Loomis. CuPy: A NumPy-compatible library for NVIDIA GPU calculations. InProceedings of the Workshop on Machine Learning Systems (LearningSys) in the Annual Conference on Neural Information Processing Systems (NeurIPS), 2017

  28. [28]

    J. Reis. A GPU implementation of counterfactual regret minimization. Master’s thesis, Univer- sity of Porto, 2015

  29. [29]

    J. Rudolf. Counterfactual regret minimization on GPU, 2021

  30. [30]

    Sergeev and M

    A. Sergeev and M. D. Balso. Horovod: fast and easy distributed deep learning in TensorFlow, 2018

  31. [31]

    Tammelin

    O. Tammelin. Solving large imperfect information games using CFR+, 2014

  32. [32]

    Tammelin, N

    O. Tammelin, N. Burch, M. Johanson, and M. Bowling. Solving heads-up limit Texas Hold’em. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015

  33. [33]

    von Stengel

    B. von Stengel. Efficient computation of behavior strategies.Games and Economic Behavior, 14(2):220–246, 1996

  34. [34]

    H. Xu, K. Li, H. Fu, Q. Fu, J. Xing, and J. Cheng. Dynamic discounted counterfactual regret minimization. InProceedings of the International Conference on Learning Representations (ICLR), 2024

  35. [35]

    B. H. Zhang, I. Anagnostides, and T. Sandholm. Scale-invariant regret matching and online learning with optimal convergence: Bridging theory and practice in zero-sum games, 2026. 11

  36. [36]

    Zhang, S

    N. Zhang, S. McAleer, and T. Sandholm. Faster game solving via hyperparameter schedules. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

  37. [37]

    Zinkevich, M

    M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret minimization in games with incomplete information. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2007. 12 A Omitted proofs in Section 3.4 This appendix section contains proofs excluded from the main paper content due to space constraints. A.1 Proof ...