pith. sign in

arxiv: 2601.21167 · v2 · pith:GGC2JTYHnew · submitted 2026-01-29 · 💻 cs.LG

Learning What to Recommend: Minimax Optimal Simple Regret in Logistic Bandits

classification 💻 cs.LG
keywords actionfinalboundhardlogisticrecommendationactionsbandits
0
0 comments X
read the original abstract

We study stochastic logistic bandits with $d$-dimensional action features under the simple-regret objective, where a learner uses $T$ rounds of exploration to output a single final action. The logistic structure is essential here: because the informativeness of an action depends on the local curvature of the sigmoid, actions that are best for immediate reward need not be the most useful for identifying the best final recommendation. We show that the first-order minimax difficulty is governed by $\kappa_*$, the inverse slope of the sigmoid at the optimal action. The lower bound is realized by a shifted saturated hard family in which saturation simultaneously limits the information available about the final decision and controls the value loss from a wrong recommendation. This reveals a hard mechanism distinct from cumulative-regret constructions, even though online-to-batch reductions recover the same leading order in expectation. We then develop two curvature-aware algorithms: \MULog, a pure-exploration method whose final recommendation satisfies a high-probability upper bound of order $\tilde O(d/\sqrt{\kappa_* T})$, matching the lower bound up to logarithmic factors, and \THATS, a Thompson-sampling-style method that provides a computationally lighter alternative. Experiments on both hard and easy geometries support the same picture: informative low-reward actions can make instances substantially easier, and the curvature-aware methods exploit this structure especially effectively.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Active Context Selection Improves Simple Regret in Contextual Bandits

    cs.LG 2026-05 accept novelty 7.0

    Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.