arXiv preprint arXiv:1910.04928 , year=

Old dog learns new tricks: Randomized ucb for bandit problems , author= · 1910 · arXiv 1910.04928

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Constrained Contextual Bandits with Adversarial Contexts

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

cs.LG · 2025-06-02 · unverdicted · novelty 6.0

Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.

citing papers explorer

Showing 3 of 3 citing papers.

Constrained Contextual Bandits with Adversarial Contexts cs.LG · 2026-05-07 · unverdicted · none · ref 290
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction cs.LG · 2026-05-20 · unverdicted · none · ref 300
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration cs.LG · 2025-06-02 · unverdicted · none · ref 28
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.

arXiv preprint arXiv:1910.04928 , year=

fields

years

verdicts

representative citing papers

citing papers explorer