Online Improper Learning with an Approximation Oracle

Elad Hazan; Wei Hu; Yuanzhi Li; Zhiyuan Li

arxiv: 1804.07837 · v1 · pith:OHPTB4PVnew · submitted 2018-04-20 · 💻 cs.LG · stat.ML

Online Improper Learning with an Approximation Oracle

Elad Hazan , Wei Hu , Yuanzhi Li , Zhiyuan Li This is my paper

classification 💻 cs.LG stat.ML

keywords learningoraclesettingalgorithmsapproximationimproperonlineregret

0 comments

read the original abstract

We revisit the question of reducing online learning to approximate optimization of the offline problem. In this setting, we give two algorithms with near-optimal performance in the full information setting: they guarantee optimal regret and require only poly-logarithmically many calls to the approximation oracle per iteration. Furthermore, these algorithms apply to the more general improper learning problems. In the bandit setting, our algorithm also significantly improves the best previously known oracle complexity while maintaining the same regret.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Constrained Contextual Bandits with Adversarial Contexts
cs.LG 2026-05 unverdicted novelty 7.0

A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
cs.LG 2026-05 unverdicted novelty 6.0

A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.