Beyond mathcal{O}(sqrt{T}) Regret: Decoupling Learning and Decision-making in Online Linear Programming
read the original abstract
Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates
Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.
-
Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy
Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.