KL regularization enables Õ(1/n) convergence for offline Nash equilibria in zero-sum Markov games under unilateral concentrability via the ROSE framework and SOS-MD algorithm.
A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using L - Smoothness
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Establishes almost sure convergence rates arbitrarily close to o(n^{1-2η}) for power-law rates η in (1/2,1) and o(n^{-1}) for harmonic rates in contractive stochastic approximation with Markovian noise.
Proposes OPMD algorithm achieving accelerated O(1/n) rates for offline Nash equilibrium learning in alpha-potential games via reference-anchored data coverage.
KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rate O(1/sqrt(n)+1/T) via GAMD.
citing papers explorer
-
Offline Two-Player Zero-Sum Markov Games with KL Regularization
KL regularization enables Õ(1/n) convergence for offline Nash equilibria in zero-sum Markov games under unilateral concentrability via the ROSE framework and SOS-MD algorithm.
-
Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift
Establishes almost sure convergence rates arbitrarily close to o(n^{1-2η}) for power-law rates η in (1/2,1) and o(n^{-1}) for harmonic rates in contractive stochastic approximation with Markovian noise.
-
Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent
Proposes OPMD algorithm achieving accelerated O(1/n) rates for offline Nash equilibrium learning in alpha-potential games via reference-anchored data coverage.
-
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rate O(1/sqrt(n)+1/T) via GAMD.