Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.
Sample complexity of asynchronous Q-learning: sharper analysis and variance reduction
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Unified ODE convergence analysis for smooth Q-learning variants via p-norm Lyapunov functions, valid even when the Bellman operator is not a contraction.
citing papers explorer
-
Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
Provides the first finite-time convergence guarantees for Q-value iteration in general-sum Stackelberg Markov games.
-
Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms
Unified ODE convergence analysis for smooth Q-learning variants via p-norm Lyapunov functions, valid even when the Bellman operator is not a contraction.