With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
arXiv preprint arXiv:1906.01217 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.
citing papers explorer
-
Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions
With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
-
Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.