SIAM journal on computing , volume=

The nonstochastic multiarmed bandit problem , author= · 2002

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

cs.LG · 2026-05-10 · unverdicted · novelty 8.0

With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.

On Characterizing Learnability for Adversarial Noisy Bandits

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.

Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

A restarting-based nonparametric online learning method for dynamic pricing with one-point revenue feedback that achieves regret bounds scaling with time horizon and total market variation.

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · 2 refs

citing papers explorer

Showing 6 of 6 citing papers.

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions cs.LG · 2026-05-10 · unverdicted · none · ref 168
With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time cs.LG · 2026-05-08 · unverdicted · none · ref 73
A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability cs.LG · 2026-05-09 · unverdicted · none · ref 64
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
On Characterizing Learnability for Adversarial Noisy Bandits cs.LG · 2026-05-09 · unverdicted · none · ref 6
Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.
Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity cs.LG · 2026-05-20 · unverdicted · none · ref 51
A restarting-based nonparametric online learning method for dynamic pricing with one-point revenue feedback that achieves regret bounds scaling with time horizon and total market variation.
Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unreviewed · ref 145 · 2 links

SIAM journal on computing , volume=

fields

years

verdicts

representative citing papers

citing papers explorer