pith. sign in

arxiv: 2501.02761 · v1 · pith:7SN4C2PGnew · submitted 2025-01-06 · 📊 stat.ML · cs.LG· math.OC

Beyond mathcal{O}(sqrt{T}) Regret: Decoupling Learning and Decision-making in Online Linear Programming

classification 📊 stat.ML cs.LGmath.OC
keywords mathcalregretonlinesqrtalgorithmsfirst-orderlearninglinear
0
0 comments X
read the original abstract

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates

    cs.DS 2026-07 unverdicted novelty 7.0

    Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.

  2. Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy

    cs.LG 2026-07 unverdicted novelty 6.0

    Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.