Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback

Bo Jiang; Chang He; Shuzhong Zhang

arxiv: 2508.04654 · v3 · pith:6WZ4WSJCnew · submitted 2025-08-06 · 🧮 math.OC

Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback

Chang He , Bo Jiang , Shuzhong Zhang This is my paper

classification 🧮 math.OC

keywords mathcalalgorithmsqrtbanditboundoptimalconvexeuclidean

0 comments

read the original abstract

This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to non-Euclidean settings. Let $T$ be the total number of iterations and $\mathcal{P}_{T,p}$ the path variation with respect to the $\ell_p$-norm. In Euclidean space, our algorithm matches the optimal regret bound $\mathcal{O}(\sqrt{dT(1+\mathcal{P}_{T,2})})$, improving upon \citet{zhao2021bandit} by a factor of $\mathcal{O}(\sqrt{d})$. Beyond Euclidean settings, our algorithm achieves an upper bound of $\mathcal{O}(\sqrt{d\log(d)T\log(T)(1 + \mathcal{P}_{T,1})})$ on the simplex, which is nearly optimal up to log factors. For the cross-polytope, the bound reduces to $\mathcal{O}(\sqrt{d\log(d)T(1+\mathcal{P}_{T,p})})$ for some $p = 1 + 1/\log(d)$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Bandit Convex Optimization with Gradient Prediction Adaptivity
cs.LG 2026-05 unverdicted novelty 7.0

TP-VR-OPT achieves O(√(d E[S_T])) prediction-adaptive regret in two-point bandit convex optimization, with a matching Ω(√E[S_T]) lower bound up to √d, while single-point feedback cannot benefit from predictions.