Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy
Pith reviewed 2026-07-03 16:48 UTC · model grok-4.3
The pith
Online policies must incur regret of order at least T to the power 1/2 minus 1 over 2p when the weighted-mass exponent p exceeds 1, matched by a sample-path marginal policy even with degenerate fluid relaxations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that additive regret is governed by the active weighted-mass exponent p. When p > 1 every online policy must incur regret of order at least T^{1/2 - 1/(2p)}; a sample-path marginal policy matches this lower bound up to polylog factors, and when p = 1 it attains O((log T)^2) regret. The results hold without any fluid non-degeneracy assumption, allowing both primal degeneracy and dual non-uniqueness.
What carries the argument
The active weighted-mass exponent p, which formalizes the size-weighted mass of requests whose value-to-size ratios lie near the active acceptance cutoffs and governs the regret scaling.
If this is right
- Every online policy incurs regret of order at least T^{1/2 - 1/(2p)} for any p > 1.
- The sample-path marginal policy matches the lower bound up to polylogarithmic factors when p > 1.
- When p = 1 the marginal policy attains O((log T)^2) regret.
- The regret bounds continue to hold when the deterministic fluid relaxation is degenerate.
Where Pith is reading between the lines
- Estimating p from observed request data could let a system choose or tune policies according to the realized mass near cutoffs.
- The same exponent idea might extend to vector-valued resource consumption if the mass can be defined along the relevant hyperplane.
- Degeneracy of the fluid solution may not be the dominant source of hardness once consumption sizes are continuous.
Load-bearing premise
The size-weighted mass of requests near the active acceptance cutoffs can be captured by a single exponent p that controls the regret rate.
What would settle it
Generate request data with size and value-to-size ratio independent and uniform, then measure whether the marginal policy achieves O((log T)^2) regret; repeat with size and reward independent and uniform to check for T^{1/4} scaling.
read the original abstract
We study online resource allocation when both rewards and consumption sizes may be continuously distributed. Requests arrive sequentially and must be accepted or rejected irrevocably under fixed resource capacities. Each request belongs to one of finitely many observable types; conditional on an observable request type, both the reward and the scalar size are random, and the realized size scales a fixed type-specific resource-consumption vector. The model allows the deterministic fluid relaxation to be degenerate. We show that additive regret is governed by the size-weighted mass of requests whose value-to-size ratios lie near the active acceptance cutoffs. We formalize this quantity through an active weighted-mass exponent p. When p > 1, this cutoff mass is thin, and the problem is genuinely hard: every online policy must incur regret of order at least $T^{1/2 - 1/(2p)}$, and this holds for every p > 1. A sample-path marginal policy matches this lower bound up to polylogarithmic factors; and when p = 1, so that the mass grows linearly near the cutoff, it attains $O((\log T)^2)$ regret. For example, if the size and the value-to-size ratio are independent and uniformly distributed, then p = 1; if instead the size and the reward are independent and uniformly distributed, then p = 2. Thus the policy achieves $o(\sqrt{T})$ regret throughout this regularity class without any fluid non-degeneracy assumption, allowing both primal degeneracy and dual non-uniqueness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies online resource allocation with continuous random rewards and consumption sizes under fixed capacities, allowing degeneracy in the deterministic fluid relaxation and dual non-uniqueness. It formalizes the size-weighted mass of requests near active acceptance cutoffs via a single active weighted-mass exponent p. For p > 1 the paper claims every online policy incurs regret at least T^{1/2 - 1/(2p)}, matched up to polylog factors by a sample-path marginal policy; when p = 1 the same policy attains O((log T)^2) regret. Concrete examples are given: independent uniform size and value-to-size yields p = 1; independent uniform size and reward yields p = 2.
Significance. If the claimed matching bounds hold, the work is significant because it removes the standard non-degeneracy assumption while still obtaining o(sqrt(T)) regret for an entire regularity class parameterized by p. The introduction of a single exponent p that simultaneously governs the lower bound, the upper bound, and the transition to logarithmic regret under degeneracy is a clean way to quantify problem difficulty from the distribution near cutoffs. The explicit policy that works under both primal degeneracy and dual non-uniqueness, together with the two illustrative distributions, strengthens the contribution to online stochastic optimization.
minor comments (2)
- The abstract states that the lower and upper bounds match up to polylog factors and that the examples attain p = 1 and p = 2, but does not display the explicit definition of p or the verification steps for the two distributions. The main text should place the formal definition of p (presumably via an integral or limit involving the size-weighted density near the cutoff) in the model section and include a short appendix or paragraph confirming the two examples.
- Notation for the sample-path marginal policy and the active cutoffs should be introduced once and used consistently; the abstract refers to both “active acceptance cutoffs” and “value-to-size ratios” without cross-reference.
Simulated Author's Rebuttal
We thank the referee for the careful reading and positive evaluation of the manuscript. The referee's summary correctly identifies the core technical contribution: the introduction of the active weighted-mass exponent p to characterize regret under degeneracy (both primal and dual), the matching lower and upper bounds of order T^{1/2 - 1/(2p)} for p > 1, the logarithmic regret when p = 1, and the fact that the sample-path marginal policy achieves o(sqrt(T)) regret without non-degeneracy assumptions. We are pleased that the referee finds the parameterization by p and the concrete distributional examples to be a clean way to quantify problem difficulty.
Circularity Check
No significant circularity; derivation parameterized by externally defined p
full rationale
The paper introduces the active weighted-mass exponent p as a property of the input distribution (size-weighted mass near cutoffs), then derives regret bounds (lower bound T^{1/2-1/(2p)} for p>1, O((log T)^2) for p=1) and shows a policy matches them. This is a standard parameterization of a theoretical result by an instance-dependent quantity, not a reduction of the claimed bounds to fitted parameters or self-referential definitions. No self-citations are load-bearing for the central claims, no ansatz is smuggled, and the examples (p=1 for uniform size/value-to-size, p=2 for uniform size/reward) are direct calculations from the distributions rather than circular fits. The derivation chain is self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Conditional on observable request type, reward and scalar size are random with continuous distributions; realized size scales a fixed type-specific consumption vector.
invented entities (1)
-
active weighted-mass exponent p
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A dynamic near-optimal algorithm for online linear programming
Shipra Agrawal, Zizhuo Wang, and Yinyu Ye. A dynamic near-optimal algorithm for online linear programming. Operations Research, 62 0 (4): 0 876--890, 2014
2014
-
[2]
Uniformly bounded regret in the multisecretary problem
Alessandro Arlotto and Itai Gurvich. Uniformly bounded regret in the multisecretary problem. Stochastic Systems, 9 0 (3): 0 231--260, 2019
2019
-
[3]
Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards
Alessandro Arlotto and Xinchang Xie. Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards. Stochastic Systems, 10 0 (2): 0 170--191, 2020
2020
-
[4]
Online resource allocation with limited flexibility
Arash Asadpour, Xuan Wang, and Jiawei Zhang. Online resource allocation with limited flexibility. Management Science, 66 0 (2): 0 642--666, 2020
2020
-
[5]
Balseiro, Omar Besbes, and Dana Pizarro
Santiago R. Balseiro, Omar Besbes, and Dana Pizarro. Survey of dynamic resource-constrained reward collection problems: Unified model and analysis. Operations Research, 72 0 (5): 0 2168--2189, 2024
2024
-
[6]
Balseiro, Haihao Lu, and Vahab Mirrokni
Santiago R. Balseiro, Haihao Lu, and Vahab Mirrokni. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 71 0 (1): 0 101--119, 2023
2023
-
[7]
Good prophets know when the end is near
Siddhartha Banerjee and Daniel Freund. Good prophets know when the end is near. Management Science, 71 0 (6): 0 4877--4894, 2025
2025
-
[8]
Dynamic resource allocation: Algorithmic design principles and spectrum of achievable performances
Omar Besbes, Yash Kanoria, and Akshit Kumar. Dynamic resource allocation: Algorithmic design principles and spectrum of achievable performances. Operations Research, 73 0 (3): 0 1273--1288, 2025
2025
-
[9]
Improved approximation results for stochastic knapsack problems
Anand Bhalgat, Ashish Goel, and Sanjeev Khanna. Improved approximation results for stochastic knapsack problems. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1647--1665, 2011
2011
-
[10]
Boyd, Stephen, Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press
2004
-
[11]
Robert L. Bray. Logarithmic regret in multisecretary and online linear programs with continuous valuations. Operations Research, 73 0 (4): 0 2188--2203, 2025
2025
-
[12]
A re-solving heuristic with uniformly bounded loss for network revenue management
Pornpawee Bumpensanti and He Wang. A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66 0 (7): 0 2993--3009, 2020
2020
-
[13]
Hoffman constant of the argmin mapping in linear optimization
J. Camacho, M. J. C \'a novas, H. Gfrerer, and J. Parra. Hoffman constant of the argmin mapping in linear optimization. arXiv preprint arXiv:2307.01034, version 2, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
Beyond non-degeneracy: Revisiting certainty equivalent heuristic for online linear programming
Yilun Chen and Wenjia Wang. Beyond non-degeneracy: Revisiting certainty equivalent heuristic for online linear programming. arXiv preprint arXiv:2501.01716, 2025
-
[15]
Dean, Michel X
Brian C. Dean, Michel X. Goemans, and Jan Vondr \'a k. Adaptivity and approximation for stochastic packing problems. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 395--404, 2005
2005
-
[16]
Dean, Michel X
Brian C. Dean, Michel X. Goemans, and Jan Vondr \'a k. Approximating the stochastic knapsack problem: The benefit of adaptivity. Mathematics of Operations Research, 33 0 (4): 0 945--964, 2008
2008
-
[17]
Prophet inequalities made easy: Stochastic optimization by pricing nonstochastic inputs
Paul D \"u tting, Michal Feldman, Thomas Kesselheim, and Brendan Lucier. Prophet inequalities made easy: Stochastic optimization by pricing nonstochastic inputs. SIAM Journal on Computing, 49 0 (3): 0 540--582, 2020
2020
-
[18]
Overbooking with bounded loss
Daniel Freund and Jiayu Zhao. Overbooking with bounded loss. Mathematics of Operations Research, 48 0 (3): 0 1344--1363, 2023
2023
-
[19]
Optimal dynamic pricing of inventories with stochastic demand over finite horizons
Guillermo Gallego and Garrett van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science, 40 0 (8): 0 999--1020, 1994
1994
-
[20]
Wenzhi Gao, Dongdong Ge, Chenyu Xue, Chunlin Sun, and Yinyu Ye. Beyond O( T) regret. arXiv preprint arXiv:2501.02761, 2025
-
[21]
Greedy algorithm for multiway matching with bounded regret
Varun Gupta. Greedy algorithm for multiway matching with bounded regret. Operations Research, 72 0 (3): 0 1139--1155, 2024
2024
-
[22]
Online resource allocation without re-solving: The effectiveness of primal-dual policies
Shuangchi He, Yifan Wei, Jinzhi Xu, and Shuanghao Yu. Online resource allocation without re-solving: The effectiveness of primal-dual policies. Working paper, SSRN 5133857, 2025
2025
-
[23]
Alan J. Hoffman. On approximate solutions of systems of linear inequalities. Journal of Research of the National Bureau of Standards, 49 0 (4): 0 263--265, 1952
1952
-
[24]
A re-solving heuristic with bounded revenue loss for network revenue management with customer choice
Stefanus Jasin and Sunil Kumar. A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Mathematics of Operations Research, 37 0 (2): 0 313--345, 2012
2012
-
[25]
Online stochastic optimization with W asserstein-based nonstationarity
Jiashuo Jiang, Xiaocheng Li, and Jiawei Zhang. Online stochastic optimization with W asserstein-based nonstationarity. Management Science, 71 0 (11): 0 9104--9122, 2025
2025
-
[26]
Degeneracy is OK : Logarithmic regret for network revenue management with indiscrete distributions
Jiashuo Jiang, Will Ma, and Jiawei Zhang. Degeneracy is OK : Logarithmic regret for network revenue management with indiscrete distributions. Operations Research, 73 0 (6): 0 3405--3420, 2025
2025
-
[27]
Tight guarantees for multi-unit prophet inequalities and online stochastic knapsack
Jiashuo Jiang, Will Ma, and Jiawei Zhang. Tight guarantees for multi-unit prophet inequalities and online stochastic knapsack. Operations Research, 73 0 (3): 0 1703--1721, 2025
2025
-
[28]
Online resource allocation with stochastic resource consumption
Jiashuo Jiang and Jiawei Zhang. Online resource allocation with stochastic resource consumption. arXiv preprint arXiv:2012.07933, 2020
-
[29]
o nnis, and Berthold V \
Thomas Kesselheim, Klaus Radke, Andreas T \"o nnis, and Berthold V \"o cking. Primal beats dual on online packing LP s in the random-order model. SIAM Journal on Computing, 47 0 (5): 0 1939--1964, 2018
1939
-
[30]
Kleywegt and Jason D
Anton J. Kleywegt and Jason D. Papastavrou. The dynamic and stochastic knapsack problem. Operations Research, 46 0 (1): 0 17--35, 1998
1998
-
[31]
Kleywegt and Jason D
Anton J. Kleywegt and Jason D. Papastavrou. The dynamic and stochastic knapsack problem with random sized items. Operations Research, 49 0 (1): 0 26--41, 2001
2001
-
[32]
Online linear programming: Dual convergence, new algorithms, and regret bounds
Xiaocheng Li and Yinyu Ye. Online linear programming: Dual convergence, new algorithms, and regret bounds. Operations Research, 70 0 (5): 0 2948--2966, 2022
2022
-
[33]
Simple and fast algorithm for binary integer and online linear programming
Xiaocheng Li, Chunlin Sun, and Yinyu Ye. Simple and fast algorithm for binary integer and online linear programming. Mathematical Programming, 200: 0 831--875, 2023
2023
-
[34]
Infrequent resolving algorithm for online linear programming
Guokai Li, Zizhuo Wang, and Jingwei Zhang. Infrequent resolving algorithm for online linear programming. arXiv preprint arXiv:2408.00465, 2024
-
[35]
George S. Lueker. Average-case analysis of off-line and on-line knapsack problems. Journal of Algorithms, 29 0 (2): 0 277--305, 1998
1998
-
[36]
Improvements and generalizations of stochastic knapsack and M arkovian bandit approximation algorithms
Will Ma. Improvements and generalizations of stochastic knapsack and M arkovian bandit approximation algorithms. Mathematics of Operations Research, 43 0 (3): 0 789--812, 2018
2018
-
[37]
Wanteng Ma, Ying Cao, Danny H. K. Tsang, and Dong Xia. Optimal regularized online allocation by adaptive re-solving. Operations Research, 73 0 (4): 0 2079--2096, 2025
2079
-
[38]
Stochastic on-line knapsack problems
Alberto Marchetti-Spaccamela and Carlo Vercellis. Stochastic on-line knapsack problems. Mathematical Programming, 68: 0 73--104, 1995
1995
-
[39]
Reiman and Qiong Wang
Martin I. Reiman and Qiong Wang. An asymptotically optimal policy for a quantity-based network revenue management problem. Mathematics of Operations Research, 33 0 (2): 0 257--282, 2008
2008
-
[40]
Talluri and Garrett J
Kalyan T. Talluri and Garrett J. van Ryzin. The Theory and Practice of Revenue Management. Springer, 2004
2004
-
[41]
The Bayesian prophet: A low-regret framework for online decision making
Alberto Vera and Siddhartha Banerjee. The Bayesian prophet: A low-regret framework for online decision making. Management Science, 67 0 (3): 0 1368--1391, 2021
2021
-
[42]
Online allocation and pricing: Constant regret via B ellman inequalities
Alberto Vera, Siddhartha Banerjee, and Itai Gurvich. Online allocation and pricing: Constant regret via B ellman inequalities. Operations Research, 69 0 (3): 0 821--840, 2021
2021
-
[43]
Constant regret primal-dual policy for multi-way dynamic matching
Yifan Wei, Jinzhi Xu, and Shuanghao Yu. Constant regret primal-dual policy for multi-way dynamic matching. Working paper, SSRN 4357216, 2023
2023
-
[44]
Tight lower bounds for the multi-secretary problem via B ellman certificates
Jiawei Zhang. Tight lower bounds for the multi-secretary problem via B ellman certificates. SSRN working paper, abstract no. 6772762, posted May 22, 2026, revised June 3, 2026. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6772762
2026
-
[45]
Online resource allocation with continuously distributed reward and consumption: unknown distributions case
Jiawei Zhang. Online resource allocation with continuously distributed reward and consumption: unknown distributions case. Working paper, 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.