Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent

Claire Chen; Yuheng Zhang

arxiv: 2605.00268 · v2 · pith:JTDISI2Vnew · submitted 2026-04-30 · 💻 cs.GT

Fast Rates in α-Potential Games via Regularized Mirror Descent

Claire Chen , Yuheng Zhang This is my paper

Pith reviewed 2026-05-21 00:33 UTC · model grok-4.3

classification 💻 cs.GT

keywords alpha-potential gamesoffline learningNash equilibriummirror descentmulti-agent learningKL regularizationfast statistical ratesdecentralized algorithms

0 comments

The pith

A decentralized algorithm achieves O(1/n) statistical rates for learning Nash equilibria in alpha-potential games under reference-anchored data coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that alpha-potential games admit fast offline learning of Nash equilibria when data coverage is anchored to a known reference policy. It develops a KL-regularized mirror descent method that converts the potential structure into an accelerated convergence rate of order 1/n. This rate improves on the slower square-root rates standard in offline multi-agent settings. The approach remains fully decentralized, so each player can update independently while still converging to an approximate equilibrium. A sympathetic reader would care because it makes sample-efficient learning practical in structured multi-player interactions without requiring knowledge of the optimal policy.

Core claim

The paper claims that Offline Potential Mirror Descent (OPMD) attains a fast statistical rate of order 1/n for finding approximate Nash equilibria in alpha-potential games. This holds under a Reference-Anchored offline data coverage condition that requires the collected data to cover a known reference policy sufficiently well. The KL regularization in the mirror-descent updates exploits the potential structure to cancel bias terms and produce the accelerated rate.

What carries the argument

Offline Potential Mirror Descent (OPMD), a decentralized KL-regularized mirror descent procedure that performs per-player updates using the potential function approximation.

If this is right

Players can learn approximate equilibria from finite offline data at a rate linear in the number of samples.
The decentralized nature allows each agent to run its own mirror-descent update without a central coordinator.
The reference-anchored coverage condition is checkable in advance using only the reference policy and the collected dataset.
The same regularization technique converts the structural bias alpha into a controllable error term that vanishes at the fast rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reference-anchored coverage idea could be tested in other structured game classes such as near-potential or weakly acyclic games.
In applied domains like routing or resource allocation, the method would reduce the number of historical interaction samples needed to reach stable outcomes.
Extending the analysis to continuous action spaces or stochastic payoffs would clarify how the fast rate behaves outside the finite discrete setting studied here.

Load-bearing premise

Offline data must satisfy coverage with respect to a known reference policy rather than the unknown optimal policy.

What would settle it

A controlled experiment that measures the empirical convergence rate of OPMD on synthetic alpha-potential games when reference-anchored coverage holds versus when it is deliberately violated while standard coverage remains.

read the original abstract

An $\alpha$-potential game is a multi-player non-cooperative interaction in which a global potential function approximates individual player rewards up to a structural bias $\alpha$. While identifying a Nash Equilibrium (NE) in generic general-sum games is known to be computationally intractable, the potential game structure enables tractable NE identification. In this paper, we study the offline learning of NE in $\alpha$-potential games using KL regularization. To analyze this process, we propose a novel Reference-Anchored offline data coverage framework--a verifiable condition that anchors data requirements to a known reference policy rather than an unknown optimum. Building on this, we propose Offline Potential Mirror Descent (OPMD), a decentralized algorithm that achieves an accelerated $\widetilde{\mathcal{O}}(1/n)$ statistical rate, surpassing the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rate typical of offline multi-agent learning. This work characterizes the first fast-rate offline learning approach for $\alpha$-potential games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first fast-rate offline result for alpha-potential games via regularized mirror descent but the alpha bias likely keeps the final guarantee to an approximate equilibrium.

read the letter

The key takeaway is that this work gives the first fast-rate result for offline equilibrium learning in alpha-potential games, with an O(1/n) statistical rate via regularized mirror descent and a new reference-anchored coverage condition. What stands out is how they leverage the potential structure to get better rates than the usual 1/sqrt(n) in offline multi-agent settings. The algorithm OPMD is decentralized, which is a plus for practical multi-agent systems. Introducing a coverage framework that ties data requirements to a known reference policy rather than the optimum makes the assumption more verifiable and less circular. That seems like a genuine improvement over standard concentrability conditions in offline RL. The soft spot is around the role of alpha. As the stress-test notes, in an alpha-potential game the potential function approximates rewards only up to alpha, so any equilibrium of the potential is at best an alpha-NE of the original game. Their rate applies to recovering the maximizer of the empirical potential, but the final distance to a true Nash equilibrium of the given game will include a term linear in alpha. The abstract does not make clear whether the bound is something like O(1/n + alpha) or if they assume alpha vanishes. If alpha is a fixed structural parameter, then the fast rate is only to an approximate equilibrium, which is still valuable but changes the interpretation. I would check the exact theorem to see how they handle this. The rest of the analysis appears to follow standard mirror descent arguments adapted to the potential game setting, with no obvious internal contradictions or fitting issues. The citation pattern draws from relevant prior work on potential games and offline learning without gaps that jump out. This paper is aimed at people in algorithmic game theory and multi-agent reinforcement learning who focus on offline methods and structured game classes. A reader interested in sample complexity for equilibrium finding would find the new coverage notion and the rate improvement useful. It has enough concrete novelty and technical grounding to deserve a serious referee. I would recommend sending this to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Reference-Anchored offline data coverage framework for verifiable data requirements anchored to a known reference policy. It proposes the decentralized Offline Potential Mirror Descent (OPMD) algorithm that employs KL regularization to learn Nash equilibria in α-potential games, claiming an accelerated Õ(1/n) statistical rate that improves upon the standard Õ(1/√n) rate in offline multi-agent learning.

Significance. If the central claims hold, this would be the first fast-rate offline learning result for α-potential games, providing a concrete algorithmic and analytical advance over existing offline multi-agent methods that typically achieve only slow rates. The Reference-Anchored coverage condition is a verifiable structural assumption that could be useful beyond this setting.

major comments (2)

[Abstract, §4] Abstract and §4 (main theorem): the claimed Õ(1/n) rate is stated for 'learning of NE in α-potential games,' yet an α-potential game only guarantees that any maximizer of the potential Φ is an O(α)-NE of the original game. The manuscript does not clarify whether the final bound on distance to exact NE absorbs a fixed α term (yielding O(1/n + α)) or treats α as vanishing; this distinction is load-bearing for the 'accelerated rate to NE' claim.
[§3.2] §3.2 (Reference-Anchored coverage definition): the framework anchors coverage to a known reference policy π_ref rather than the unknown optimum. It is unclear whether the coverage constant depends on α or on the distance between π_ref and the α-NE; if the constant grows with 1/α, the practical sample complexity may not improve over standard coverage assumptions.

minor comments (2)

[§2] Notation for the potential approximation error α is introduced in the abstract but the precise definition (e.g., whether it is uniform or per-player) should be restated in §2 for clarity.
The abstract uses both O(1/n) and Õ(1/n); consistency in the use of tilde notation for logarithmic factors should be maintained throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major point below with clarifications on the approximation quality of the equilibria and the dependence of the coverage condition. These can be resolved by revisions to the presentation without altering the technical results.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (main theorem): the claimed Õ(1/n) rate is stated for 'learning of NE in α-potential games,' yet an α-potential game only guarantees that any maximizer of the potential Φ is an O(α)-NE of the original game. The manuscript does not clarify whether the final bound on distance to exact NE absorbs a fixed α term (yielding O(1/n + α)) or treats α as vanishing; this distinction is load-bearing for the 'accelerated rate to NE' claim.

Authors: We agree that this distinction requires explicit clarification. By definition, an α-potential game satisfies that the potential Φ approximates each player's payoff up to an additive α term, so any maximizer of Φ is an O(α)-approximate Nash equilibrium of the original game. The OPMD algorithm and the analysis in §4 establish an Õ(1/n) rate on the suboptimality gap of the potential function. This directly yields a bound of Õ(1/n) + O(α) on the distance to an exact NE. We do not treat α as vanishing with n; it is a fixed structural parameter of the game class. We will revise the abstract and the statement of Theorem 4.1 to state explicitly that the accelerated rate applies to learning an O(α)-NE, with the total error being the sum of the statistical term and the approximation error α. This change improves precision but does not affect the validity or novelty of the fast-rate result. revision: yes
Referee: [§3.2] §3.2 (Reference-Anchored coverage definition): the framework anchors coverage to a known reference policy π_ref rather than the unknown optimum. It is unclear whether the coverage constant depends on α or on the distance between π_ref and the α-NE; if the constant grows with 1/α, the practical sample complexity may not improve over standard coverage assumptions.

Authors: The Reference-Anchored coverage condition is defined solely with respect to the known reference policy π_ref and the offline data distribution. Our analysis shows that the coverage constant C is independent of α. The KL regularization term and the potential-game structure permit error propagation bounds that do not require the coverage factor to scale with 1/α or with the distance between π_ref and the α-NE. The constant depends only on the divergence between the data and π_ref. We will insert a remark immediately after Definition 3.2 and add an explicit statement of this independence (with the precise functional dependence) in the statement of the main theorem to remove any ambiguity. revision: yes

Circularity Check

0 steps flagged

No circularity: O(1/n) rate derived from independent coverage and mirror-descent analysis

full rationale

The derivation introduces a verifiable Reference-Anchored coverage condition and the OPMD algorithm, then obtains the accelerated rate via standard KL-regularized mirror descent applied to the potential function. No equation reduces a fitted parameter to a prediction by construction, no load-bearing claim rests on self-citation, and the α-bias is treated as a fixed structural parameter rather than absorbed into the statistical term. The analysis is self-contained against external benchmarks and does not rename known results or smuggle ansatzes via prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review is based solely on the abstract; the central domain assumption is the alpha-potential game structure that enables tractable analysis, and the reference-anchored framework is a new concept introduced here.

axioms (1)

domain assumption The interaction is an alpha-potential game where a global potential approximates individual rewards up to bias alpha
This structural assumption is what distinguishes the setting from intractable general-sum games and enables the fast-rate analysis.

invented entities (1)

Reference-Anchored offline data coverage framework no independent evidence
purpose: To provide a verifiable condition anchoring data requirements to a known reference policy
New concept proposed to support the statistical analysis of the algorithm.

pith-pipeline@v0.9.0 · 5691 in / 1371 out tokens · 57714 ms · 2026-05-21T00:33:32.935186+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Offline Two-Player Zero-Sum Markov Games with KL Regularization
cs.LG 2026-05 unverdicted novelty 8.0

KL regularization enables Õ(1/n) convergence for offline Nash equilibria in zero-sum Markov games under unilateral concentrability via the ROSE framework and SOS-MD algorithm.
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
cs.LG 2026-04 unverdicted novelty 7.0

KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rat...
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
cs.LG 2026-04 unverdicted novelty 6.0

KL regularization enables pessimism-free offline learning in general-sum games by recovering regularized Nash equilibria at rate O(1/n) via GANE and converging to coarse correlated equilibria at O(1/sqrt(n) + 1/T) via GAMD.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 2 Pith papers · 6 internal anchors

[1]

Offline Two-Player Zero-Sum Markov Games with KL Regularization

Offline Two-Player Zero-Sum Markov Games with KL Regularization , author=. arXiv preprint arXiv:2605.13025 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

International Conference on Learning Representations , volume=

Efficient policy evaluation with safety constraint for reinforcement learning , author=. International Conference on Learning Representations , volume=

work page
[3]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Efficient multi-policy evaluation for reinforcement learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[4]

Unpublished Manuscript, http://ttic

On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization , author=. Unpublished Manuscript, http://ttic. uchicago. edu/shai/papers/KakadeShalevTewari09. pdf , volume=

work page
[5]

Games and economic behavior , volume=

Potential games , author=. Games and economic behavior , volume=. 1996 , publisher=

work page 1996
[6]

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization , author=. arXiv preprint arXiv:2605.00264 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent

Fast Rates in alpha -Potential Games via Regularized Mirror Descent , author=. arXiv preprint arXiv:2605.00268 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

International Conference on Machine Learning , pages=

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[9]

IEEE Control Systems Letters , volume=

Learning Nash in constrained Markov games with an -potential , author=. IEEE Control Systems Letters , volume=. 2024 , publisher=

work page 2024
[10]

2025 IEEE 64th Conference on Decision and Control (CDC) , pages=

Markov potential game construction and multi-agent reinforcement learning with applications to autonomous driving , author=. 2025 IEEE 64th Conference on Decision and Control (CDC) , pages=. 2025 , organization=

work page 2025
[11]

arXiv preprint arXiv:2106.01969 , year=

Global convergence of multi-agent policy gradient in markov potential games , author=. arXiv preprint arXiv:2106.01969 , year=

work page arXiv
[12]

IEEE Transactions on Automatic Control , year=

Markov -Potential Games , author=. IEEE Transactions on Automatic Control , year=

work page
[13]

2025 , publisher=

An -Potential Game Framework for Non-Cooperative Dynamic Games: Theory and Algorithms , author=. 2025 , publisher=

work page 2025
[14]

SIAM Journal on Control and Optimization , volume=

An-Potential Game Framework for-Player Dynamic Games , author=. SIAM Journal on Control and Optimization , volume=. 2025 , publisher=

work page 2025
[15]

arXiv preprint arXiv:2305.12553 , year=

Markov -Potential Games , author=. arXiv preprint arXiv:2305.12553 , year=

work page arXiv
[16]

arXiv preprint arXiv:2310.06243 , year=

Sample-efficient multi-agent rl: An optimization perspective , author=. arXiv preprint arXiv:2310.06243 , year=

work page arXiv
[17]

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback , author=. arXiv preprint arXiv:2603.28281 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Journal of the ACM (JACM) , volume=

Settling the complexity of computing two-player Nash equilibria , author=. Journal of the ACM (JACM) , volume=. 2009 , publisher=

work page 2009
[19]

Journal of Computer and system Sciences , volume=

On the complexity of the parity argument and other inefficient proofs of existence , author=. Journal of Computer and system Sciences , volume=. 1994 , publisher=

work page 1994
[20]

Communications of the ACM , volume=

The complexity of computing a Nash equilibrium , author=. Communications of the ACM , volume=. 2009 , publisher=

work page 2009
[21]

International conference on machine learning , pages=

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity , author=. International conference on machine learning , pages=. 2022 , organization=

work page 2022
[22]

Operations research , volume=

Model-based reinforcement learning for offline zero-sum Markov games , author=. Operations research , volume=. 2024 , publisher=

work page 2024
[23]

Beyond Pessimism: Offline Learning in KL-regularized Games

Beyond Pessimism: Offline Learning in KL-regularized Games , author=. arXiv preprint arXiv:2604.06738 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page
[25]

2016 , publisher=

Twenty lectures on algorithmic game theory , author=. 2016 , publisher=

work page 2016
[26]

2006 , publisher=

Prediction, learning, and games , author=. 2006 , publisher=

work page 2006
[27]

Theory of computing , volume=

The multiplicative weights update method: a meta-algorithm and applications , author=. Theory of computing , volume=. 2012 , publisher=

work page 2012
[28]

Machine Intelligence Research , volume=

Offline pre-trained multi-agent decision transformer , author=. Machine Intelligence Research , volume=. 2023 , publisher=

work page 2023
[29]

arXiv preprint arXiv:2102.04402 , year=

Contrasting centralized and decentralized critics in multi-agent reinforcement learning , author=. arXiv preprint arXiv:2102.04402 , year=

work page arXiv
[30]

International Journal of Group Decision and Negotiation , volume=

Automated negotiation: prospects, methods and challenges , author=. International Journal of Group Decision and Negotiation , volume=

work page
[31]

2001 , publisher=

Strategic negotiation in multiagent environments , author=. 2001 , publisher=

work page 2001
[32]

Communications of the ACM , volume=

Algorithmic game theory , author=. Communications of the ACM , volume=. 2010 , publisher=

work page 2010
[33]

Econometrica: Journal of the Econometric Society , pages=

A theory of auctions and competitive bidding , author=. Econometrica: Journal of the Econometric Society , pages=. 1982 , publisher=

work page 1982
[34]

Games and Economic Behavior , volume=

On the value of information in a strategic conflict , author=. Games and Economic Behavior , volume=. 1990 , publisher=

work page 1990
[35]

1995 , publisher=

Repeated games with incomplete information , author=. 1995 , publisher=

work page 1995
[36]

Bayesian

Games with incomplete information played by “Bayesian” players, I--III Part I. The basic model , author=. Management science , volume=. 1967 , publisher=

work page 1967
[37]

Mathematics of operations research , volume=

Optimal auction design , author=. Mathematics of operations research , volume=. 1981 , publisher=

work page 1981
[38]

The Journal of finance , volume=

Counterspeculation, auctions, and competitive sealed tenders , author=. The Journal of finance , volume=. 1961 , publisher=

work page 1961
[39]

Proceedings of the national academy of sciences , volume=

Stochastic games , author=. Proceedings of the national academy of sciences , volume=. 1953 , publisher=

work page 1953
[40]

Behavior Regularized Offline Reinforcement Learning

Behavior regularized offline reinforcement learning , author=. arXiv preprint arXiv:1911.11361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1911
[41]

Proceedings of the AAAI conference on artificial intelligence , volume=

Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[42]

Foundations and Trends

Online learning and online convex optimization , author=. Foundations and Trends. 2025 , publisher=

work page 2025
[43]

2000 , publisher=

Empirical Processes in M-estimation , author=. 2000 , publisher=

work page 2000
[44]

IEEE transactions on information theory , volume=

Minimum complexity density estimation , author=. IEEE transactions on information theory , volume=. 2002 , publisher=

work page 2002
[45]

, author=

On general minimax theorems. , author=

work page
[46]

Dynamic Games and Applications , volume=

Upper and lower values in zero-sum stochastic games with asymmetric information , author=. Dynamic Games and Applications , volume=. 2021 , publisher=

work page 2021
[47]

Games and Economic Behavior , volume=

Adaptive game playing using multiplicative weights , author=. Games and Economic Behavior , volume=. 1999 , publisher=

work page 1999
[48]

Iterative Nash Policy Optimization: Aligning

Yuheng Zhang and Dian Yu and Baolin Peng and Linfeng Song and Ye Tian and Mingyue Huo and Nan Jiang and Haitao Mi and Dong Yu , booktitle=. Iterative Nash Policy Optimization: Aligning

work page
[49]

1994 , publisher=

A course in game theory , author=. 1994 , publisher=

work page 1994
[50]

1998 , publisher=

Dynamic noncooperative game theory , author=. 1998 , publisher=

work page 1998
[51]

Handbook of reinforcement learning and control , pages=

Multi-agent reinforcement learning: A selective overview of theories and algorithms , author=. Handbook of reinforcement learning and control , pages=. 2021 , publisher=

work page 2021
[52]

Machine learning proceedings 1994 , pages=

Markov games as a framework for multi-agent reinforcement learning , author=. Machine learning proceedings 1994 , pages=. 1994 , publisher=

work page 1994
[53]

arXiv preprint arXiv:2102.00479 , year=

Fast rates for the regret of offline reinforcement learning , author=. arXiv preprint arXiv:2102.00479 , year=

work page arXiv
[54]

International Conference on Machine Learning , pages=

Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[55]

Advances in Neural Information Processing Systems , volume=

When are offline two-player zero-sum Markov games solvable? , author=. Advances in Neural Information Processing Systems , volume=

work page
[56]

International conference on machine learning , pages=

Is pessimism provably efficient for offline rl? , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[57]

Advances in neural information processing systems , volume=

Bellman-consistent pessimism for offline reinforcement learning , author=. Advances in neural information processing systems , volume=

work page
[58]

International Conference on Machine Learning , pages=

Offline learning in markov games with general function approximation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[59]

International conference on machine learning , pages=

A theory of regularized markov decision processes , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[60]

Asadi and Idan Shenfeld and Youssef Mroueh , booktitle=

Gholamali Aminian and Amir R. Asadi and Idan Shenfeld and Youssef Mroueh , booktitle=

work page
[61]

ArXiv Preprint , year=

G\"odel's Poetry , author=. ArXiv Preprint , year=

work page
[62]

2025 , journal=

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis , author=. 2025 , journal=

work page 2025
[63]

ArXiv Preprint , year=

Hilbert: Recursively Building Formal Proofs with Informal Reasoning , author=. ArXiv Preprint , year=

work page
[64]

ArXiv Preprint , year=

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning , author=. ArXiv Preprint , year=

work page
[65]

ArXiv Preprint , year=

Solving formal math problems by decomposition and iterative reflection , author=. ArXiv Preprint , year=

work page
[66]

ArXiv Preprint , year=

Formal theorem proving by rewarding llms to decompose proofs hierarchically , author=. ArXiv Preprint , year=

work page
[67]

ArXiv Preprint , year=

Lemmanaid: Neuro-Symbolic Lemma Conjecturing , author=. ArXiv Preprint , year=

work page
[68]

2022 , journal =

Sivaraman, Aishwarya and Sanchez-Stern, Alex and Chen, Bretton and Lerner, Sorin and Millstein, Todd , title =. 2022 , journal =

work page 2022
[69]

ArXiv Preprint

LEGO-Prover: Neural Theorem Proving with Growing Libraries , author=. ArXiv Preprint

work page
[70]

ArXiv Preprint

LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving. ArXiv Preprint. 2025

work page 2025
[71]

Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

Kazumi Kasaura and Naoto Onda and Yuta Oriike and Masaya Taniguchi and Akiyoshi Sannai and Sho Sonoda. Discovering New Theorems via LLMs with In-Context Proof Learning in Lean. ArXiv Preprint. 2025

work page 2025
[72]

ArXiv Preprint , year=

Aristotle: Imo-level automated theorem proving , author=. ArXiv Preprint , year=

work page
[73]

ArXiv Preprint , year=

Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction , author=. ArXiv Preprint , year=

work page
[74]

ArXiv Preprint , year=

Goedel-prover: A frontier model for open-source automated theorem proving , author=. ArXiv Preprint , year=

work page
[75]

Nature , year=

Olympiad-level formal mathematical reasoning with reinforcement learning , author=. Nature , year=

work page
[76]

ArXiv Preprint , year=

Gold-medalist performance in solving olympiad geometry with alphageometry2 , author=. ArXiv Preprint , year=

work page
[77]

ArXiv Preprint , year=

Seed-prover: Deep and broad reasoning for automated theorem proving , author=. ArXiv Preprint , year=

work page
[78]

ArXiv Preprint , year=

Minif2f: a cross-system benchmark for formal olympiad-level mathematics , author=. ArXiv Preprint , year=

work page
[79]

ArXiv Preprint , year=

Formalmath: Benchmarking formal mathematical reasoning of large language models , author=. ArXiv Preprint , year=

work page
[80]

ArXiv Preprint , year=

Proofnet: Autoformalizing and formally proving undergraduate-level mathematics , author=. ArXiv Preprint , year=

work page

Showing first 80 references.

[1] [1]

Offline Two-Player Zero-Sum Markov Games with KL Regularization

Offline Two-Player Zero-Sum Markov Games with KL Regularization , author=. arXiv preprint arXiv:2605.13025 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

International Conference on Learning Representations , volume=

Efficient policy evaluation with safety constraint for reinforcement learning , author=. International Conference on Learning Representations , volume=

work page

[3] [3]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Efficient multi-policy evaluation for reinforcement learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[4] [4]

Unpublished Manuscript, http://ttic

On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization , author=. Unpublished Manuscript, http://ttic. uchicago. edu/shai/papers/KakadeShalevTewari09. pdf , volume=

work page

[5] [5]

Games and economic behavior , volume=

Potential games , author=. Games and economic behavior , volume=. 1996 , publisher=

work page 1996

[6] [6]

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization , author=. arXiv preprint arXiv:2605.00264 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent

Fast Rates in alpha -Potential Games via Regularized Mirror Descent , author=. arXiv preprint arXiv:2605.00268 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

International Conference on Machine Learning , pages=

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[9] [9]

IEEE Control Systems Letters , volume=

Learning Nash in constrained Markov games with an -potential , author=. IEEE Control Systems Letters , volume=. 2024 , publisher=

work page 2024

[10] [10]

2025 IEEE 64th Conference on Decision and Control (CDC) , pages=

Markov potential game construction and multi-agent reinforcement learning with applications to autonomous driving , author=. 2025 IEEE 64th Conference on Decision and Control (CDC) , pages=. 2025 , organization=

work page 2025

[11] [11]

arXiv preprint arXiv:2106.01969 , year=

Global convergence of multi-agent policy gradient in markov potential games , author=. arXiv preprint arXiv:2106.01969 , year=

work page arXiv

[12] [12]

IEEE Transactions on Automatic Control , year=

Markov -Potential Games , author=. IEEE Transactions on Automatic Control , year=

work page

[13] [13]

2025 , publisher=

An -Potential Game Framework for Non-Cooperative Dynamic Games: Theory and Algorithms , author=. 2025 , publisher=

work page 2025

[14] [14]

SIAM Journal on Control and Optimization , volume=

An-Potential Game Framework for-Player Dynamic Games , author=. SIAM Journal on Control and Optimization , volume=. 2025 , publisher=

work page 2025

[15] [15]

arXiv preprint arXiv:2305.12553 , year=

Markov -Potential Games , author=. arXiv preprint arXiv:2305.12553 , year=

work page arXiv

[16] [16]

arXiv preprint arXiv:2310.06243 , year=

Sample-efficient multi-agent rl: An optimization perspective , author=. arXiv preprint arXiv:2310.06243 , year=

work page arXiv

[17] [17]

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback , author=. arXiv preprint arXiv:2603.28281 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Journal of the ACM (JACM) , volume=

Settling the complexity of computing two-player Nash equilibria , author=. Journal of the ACM (JACM) , volume=. 2009 , publisher=

work page 2009

[19] [19]

Journal of Computer and system Sciences , volume=

On the complexity of the parity argument and other inefficient proofs of existence , author=. Journal of Computer and system Sciences , volume=. 1994 , publisher=

work page 1994

[20] [20]

Communications of the ACM , volume=

The complexity of computing a Nash equilibrium , author=. Communications of the ACM , volume=. 2009 , publisher=

work page 2009

[21] [21]

International conference on machine learning , pages=

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity , author=. International conference on machine learning , pages=. 2022 , organization=

work page 2022

[22] [22]

Operations research , volume=

Model-based reinforcement learning for offline zero-sum Markov games , author=. Operations research , volume=. 2024 , publisher=

work page 2024

[23] [23]

Beyond Pessimism: Offline Learning in KL-regularized Games

Beyond Pessimism: Offline Learning in KL-regularized Games , author=. arXiv preprint arXiv:2604.06738 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page

[25] [25]

2016 , publisher=

Twenty lectures on algorithmic game theory , author=. 2016 , publisher=

work page 2016

[26] [26]

2006 , publisher=

Prediction, learning, and games , author=. 2006 , publisher=

work page 2006

[27] [27]

Theory of computing , volume=

The multiplicative weights update method: a meta-algorithm and applications , author=. Theory of computing , volume=. 2012 , publisher=

work page 2012

[28] [28]

Machine Intelligence Research , volume=

Offline pre-trained multi-agent decision transformer , author=. Machine Intelligence Research , volume=. 2023 , publisher=

work page 2023

[29] [29]

arXiv preprint arXiv:2102.04402 , year=

Contrasting centralized and decentralized critics in multi-agent reinforcement learning , author=. arXiv preprint arXiv:2102.04402 , year=

work page arXiv

[30] [30]

International Journal of Group Decision and Negotiation , volume=

Automated negotiation: prospects, methods and challenges , author=. International Journal of Group Decision and Negotiation , volume=

work page

[31] [31]

2001 , publisher=

Strategic negotiation in multiagent environments , author=. 2001 , publisher=

work page 2001

[32] [32]

Communications of the ACM , volume=

Algorithmic game theory , author=. Communications of the ACM , volume=. 2010 , publisher=

work page 2010

[33] [33]

Econometrica: Journal of the Econometric Society , pages=

A theory of auctions and competitive bidding , author=. Econometrica: Journal of the Econometric Society , pages=. 1982 , publisher=

work page 1982

[34] [34]

Games and Economic Behavior , volume=

On the value of information in a strategic conflict , author=. Games and Economic Behavior , volume=. 1990 , publisher=

work page 1990

[35] [35]

1995 , publisher=

Repeated games with incomplete information , author=. 1995 , publisher=

work page 1995

[36] [36]

Bayesian

Games with incomplete information played by “Bayesian” players, I--III Part I. The basic model , author=. Management science , volume=. 1967 , publisher=

work page 1967

[37] [37]

Mathematics of operations research , volume=

Optimal auction design , author=. Mathematics of operations research , volume=. 1981 , publisher=

work page 1981

[38] [38]

The Journal of finance , volume=

Counterspeculation, auctions, and competitive sealed tenders , author=. The Journal of finance , volume=. 1961 , publisher=

work page 1961

[39] [39]

Proceedings of the national academy of sciences , volume=

Stochastic games , author=. Proceedings of the national academy of sciences , volume=. 1953 , publisher=

work page 1953

[40] [40]

Behavior Regularized Offline Reinforcement Learning

Behavior regularized offline reinforcement learning , author=. arXiv preprint arXiv:1911.11361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1911

[41] [41]

Proceedings of the AAAI conference on artificial intelligence , volume=

Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[42] [42]

Foundations and Trends

Online learning and online convex optimization , author=. Foundations and Trends. 2025 , publisher=

work page 2025

[43] [43]

2000 , publisher=

Empirical Processes in M-estimation , author=. 2000 , publisher=

work page 2000

[44] [44]

IEEE transactions on information theory , volume=

Minimum complexity density estimation , author=. IEEE transactions on information theory , volume=. 2002 , publisher=

work page 2002

[45] [45]

, author=

On general minimax theorems. , author=

work page

[46] [46]

Dynamic Games and Applications , volume=

Upper and lower values in zero-sum stochastic games with asymmetric information , author=. Dynamic Games and Applications , volume=. 2021 , publisher=

work page 2021

[47] [47]

Games and Economic Behavior , volume=

Adaptive game playing using multiplicative weights , author=. Games and Economic Behavior , volume=. 1999 , publisher=

work page 1999

[48] [48]

Iterative Nash Policy Optimization: Aligning

Yuheng Zhang and Dian Yu and Baolin Peng and Linfeng Song and Ye Tian and Mingyue Huo and Nan Jiang and Haitao Mi and Dong Yu , booktitle=. Iterative Nash Policy Optimization: Aligning

work page

[49] [49]

1994 , publisher=

A course in game theory , author=. 1994 , publisher=

work page 1994

[50] [50]

1998 , publisher=

Dynamic noncooperative game theory , author=. 1998 , publisher=

work page 1998

[51] [51]

Handbook of reinforcement learning and control , pages=

Multi-agent reinforcement learning: A selective overview of theories and algorithms , author=. Handbook of reinforcement learning and control , pages=. 2021 , publisher=

work page 2021

[52] [52]

Machine learning proceedings 1994 , pages=

Markov games as a framework for multi-agent reinforcement learning , author=. Machine learning proceedings 1994 , pages=. 1994 , publisher=

work page 1994

[53] [53]

arXiv preprint arXiv:2102.00479 , year=

Fast rates for the regret of offline reinforcement learning , author=. arXiv preprint arXiv:2102.00479 , year=

work page arXiv

[54] [54]

International Conference on Machine Learning , pages=

Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[55] [55]

Advances in Neural Information Processing Systems , volume=

When are offline two-player zero-sum Markov games solvable? , author=. Advances in Neural Information Processing Systems , volume=

work page

[56] [56]

International conference on machine learning , pages=

Is pessimism provably efficient for offline rl? , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021

[57] [57]

Advances in neural information processing systems , volume=

Bellman-consistent pessimism for offline reinforcement learning , author=. Advances in neural information processing systems , volume=

work page

[58] [58]

International Conference on Machine Learning , pages=

Offline learning in markov games with general function approximation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[59] [59]

International conference on machine learning , pages=

A theory of regularized markov decision processes , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[60] [60]

Asadi and Idan Shenfeld and Youssef Mroueh , booktitle=

Gholamali Aminian and Amir R. Asadi and Idan Shenfeld and Youssef Mroueh , booktitle=

work page

[61] [61]

ArXiv Preprint , year=

G\"odel's Poetry , author=. ArXiv Preprint , year=

work page

[62] [62]

2025 , journal=

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis , author=. 2025 , journal=

work page 2025

[63] [63]

ArXiv Preprint , year=

Hilbert: Recursively Building Formal Proofs with Informal Reasoning , author=. ArXiv Preprint , year=

work page

[64] [64]

ArXiv Preprint , year=

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning , author=. ArXiv Preprint , year=

work page

[65] [65]

ArXiv Preprint , year=

Solving formal math problems by decomposition and iterative reflection , author=. ArXiv Preprint , year=

work page

[66] [66]

ArXiv Preprint , year=

Formal theorem proving by rewarding llms to decompose proofs hierarchically , author=. ArXiv Preprint , year=

work page

[67] [67]

ArXiv Preprint , year=

Lemmanaid: Neuro-Symbolic Lemma Conjecturing , author=. ArXiv Preprint , year=

work page

[68] [68]

2022 , journal =

Sivaraman, Aishwarya and Sanchez-Stern, Alex and Chen, Bretton and Lerner, Sorin and Millstein, Todd , title =. 2022 , journal =

work page 2022

[69] [69]

ArXiv Preprint

LEGO-Prover: Neural Theorem Proving with Growing Libraries , author=. ArXiv Preprint

work page

[70] [70]

ArXiv Preprint

LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving. ArXiv Preprint. 2025

work page 2025

[71] [71]

Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

Kazumi Kasaura and Naoto Onda and Yuta Oriike and Masaya Taniguchi and Akiyoshi Sannai and Sho Sonoda. Discovering New Theorems via LLMs with In-Context Proof Learning in Lean. ArXiv Preprint. 2025

work page 2025

[72] [72]

ArXiv Preprint , year=

Aristotle: Imo-level automated theorem proving , author=. ArXiv Preprint , year=

work page

[73] [73]

ArXiv Preprint , year=

Goedel-prover-v2: Scaling formal theorem proving with scaffolded data synthesis and self-correction , author=. ArXiv Preprint , year=

work page

[74] [74]

ArXiv Preprint , year=

Goedel-prover: A frontier model for open-source automated theorem proving , author=. ArXiv Preprint , year=

work page

[75] [75]

Nature , year=

Olympiad-level formal mathematical reasoning with reinforcement learning , author=. Nature , year=

work page

[76] [76]

ArXiv Preprint , year=

Gold-medalist performance in solving olympiad geometry with alphageometry2 , author=. ArXiv Preprint , year=

work page

[77] [77]

ArXiv Preprint , year=

Seed-prover: Deep and broad reasoning for automated theorem proving , author=. ArXiv Preprint , year=

work page

[78] [78]

ArXiv Preprint , year=

Minif2f: a cross-system benchmark for formal olympiad-level mathematics , author=. ArXiv Preprint , year=

work page

[79] [79]

ArXiv Preprint , year=

Formalmath: Benchmarking formal mathematical reasoning of large language models , author=. ArXiv Preprint , year=

work page

[80] [80]

ArXiv Preprint , year=

Proofnet: Autoformalizing and formally proving undergraduate-level mathematics , author=. ArXiv Preprint , year=

work page