A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
arXiv preprint arXiv:1910.04928 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.
citing papers explorer
-
Constrained Contextual Bandits with Adversarial Contexts
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
-
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.
-
Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.