Online resource allocation with stochastic resource consumption

Jiashuo Jiang, Jiawei Zhang · 2012 · arXiv 2012.07933

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates

cs.DS · 2026-07-02 · unverdicted · novelty 7.0

Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.

Cross-Epoch Adaptive Rollout Optimization for RL Post-Training

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.

Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates cs.DS · 2026-07-02 · unverdicted · none · ref 23
Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.
Cross-Epoch Adaptive Rollout Optimization for RL Post-Training cs.LG · 2026-06-04 · unverdicted · none · ref 5
CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.
Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy cs.LG · 2026-07-02 · unverdicted · none · ref 28
Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.

Online resource allocation with stochastic resource consumption

fields

years

verdicts

representative citing papers

citing papers explorer