With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
Advances in applied mathematics , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Consumers transfer brand-level regularities across contexts using low-D boundedly rational meta-learning approximations that fit choice data better than no-transfer or fully integrated Bayesian benchmarks.
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
citing papers explorer
-
Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions
With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
-
Boundedly Rational Meta-Learning in Sequential Consumer Choice
Consumers transfer brand-level regularities across contexts using low-D boundedly rational meta-learning approximations that fit choice data better than no-transfer or fully integrated Bayesian benchmarks.
-
Bandit Learning in General Open Multi-agent Systems
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.