An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

Jordan Lekeufack; Michael I. Jordan

arxiv: 2412.08060 · v2 · pith:CR3AK2P6new · submitted 2024-12-11 · 📊 stat.ML · cs.LG· math.OC

An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

Jordan Lekeufack , Michael I. Jordan This is my paper

classification 📊 stat.ML cs.LGmath.OC

keywords constraintsqrtlossconstraintscumulativefunctionsadversarialalgorithm

0 comments

read the original abstract

We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make sequential decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of $ O(\sqrt{T}) $ regret and $ \tilde{O}(\sqrt{T}) $ cumulative constraint violations to $ O(\sqrt{E_T(f)}) $ and $ \tilde{O}(\sqrt{E_T(g^+)}) $, respectively, where $ E_T(f) $ and $E_T(g^+)$ represent the cumulative prediction errors of the loss and constraint functions. In the worst case, where $E_T(f) = O(T) $ and $ E_T(g^+) = O(T) $ (assuming bounded gradients of the loss and constraint functions), our rates match the prior $ O(\sqrt{T}) $ results. However, when the loss and constraint predictions are accurate, our approach yields significantly smaller regret and cumulative constraint violations. Finally, we apply this to the setting of adversarial contextual bandits with sequential risk constraints, obtaining optimistic bounds $O (\sqrt{E_T(f)} T^{1/3})$ regret and $O(\sqrt{E_T(g^+)} T^{1/3})$ constraints violation, yielding better performance than existing results when prediction quality is sufficiently high.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Constrained Online Convex Optimization without Slater's Condition
cs.LG 2026-06 unverdicted novelty 7.0

A primal-dual framework with adaptive dual regularizer achieves O(√T) regret and O(√T log T) constraint violation for constrained OCO without Slater's condition under stochastic constraints, with extensions to adversa...
Online Learning with Gradient-Variation Interval Regret
cs.LG 2026-06 unverdicted novelty 7.0

Introduces first algorithm for interval regret scaling with gradient variation via two-layer ensemble, plus Lipschitz-smoothness agnostic variant, with extensions to dynamic regret and stochastic settings.