Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Chenyu Xue; Chunlin Sun; Dongdong Ge; Wenzhi Gao; Yinyu Ye

arxiv: 2501.02761 · v1 · pith:7SN4C2PGnew · submitted 2025-01-06 · 📊 stat.ML · cs.LG· math.OC

Beyond mathcal{O}(sqrt{T}) Regret: Decoupling Learning and Decision-making in Online Linear Programming

Wenzhi Gao , Dongdong Ge , Chenyu Xue , Chunlin Sun , Yinyu Ye This is my paper

classification 📊 stat.ML cs.LGmath.OC

keywords mathcalregretonlinesqrtalgorithmsfirst-orderlearninglinear

0 comments

read the original abstract

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates
cs.DS 2026-07 unverdicted novelty 7.0

Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.
Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy
cs.LG 2026-07 unverdicted novelty 6.0

Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.