Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

· 2025 · cs.LG · arXiv 2510.22819

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The convergence analysis of online learning algorithms is central to machine learning theory, where the last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, FTRL with the $1/2$-Tsallis entropy regularizer $\Psi(p) = -4\sum_{i=1}^d \sqrt{p_i}$ (the $1/2$-Tsallis-INF algorithm, by arXiv:1807.07623) was shown to achieve logarithmic regret in stochastic bandits. Nevertheless, its last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper studies the $1/2$-Tsallis-INF algorithm and partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by $\Psi(p)$, between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.

representative citing papers

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

cs.LG · 2026-05-10 · unverdicted · novelty 8.0

With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.

citing papers explorer

Showing 1 of 1 citing paper.

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions cs.LG · 2026-05-10 · unverdicted · none · ref 259 · internal anchor
With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.

Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

fields

years

verdicts

representative citing papers

citing papers explorer