KL regularization enables Õ(1/n) convergence for offline Nash equilibria in zero-sum Markov games under unilateral concentrability via the ROSE framework and SOS-MD algorithm.
Handbook of reinforcement learning and control , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
The paper establishes equilibrium existence and uniqueness for nonlinear utility consumer networks under contraction conditions and proposes a shape-constrained isotonic regression approach with strict no-regret convergence for learning utilities in targeted monopoly pricing.
AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
Proposes OPMD algorithm achieving accelerated O(1/n) rates for offline Nash equilibrium learning in alpha-potential games via reference-anchored data coverage.
KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rate O(1/sqrt(n)+1/T) via GAMD.
citing papers explorer
-
Offline Two-Player Zero-Sum Markov Games with KL Regularization
KL regularization enables Õ(1/n) convergence for offline Nash equilibria in zero-sum Markov games under unilateral concentrability via the ROSE framework and SOS-MD algorithm.
-
Equilibrium and Pricing in Consumer Networks with Nonlinear Utilities: An Online Shape-Constrained Learning Approach
The paper establishes equilibrium existence and uniqueness for nonlinear utility consumer networks under contraction conditions and proposes a shape-constrained isotonic regression approach with strict no-regret convergence for learning utilities in targeted monopoly pricing.
-
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification
AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
-
Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent
Proposes OPMD algorithm achieving accelerated O(1/n) rates for offline Nash equilibrium learning in alpha-potential games via reference-anchored data coverage.
-
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rate O(1/sqrt(n)+1/T) via GAMD.