Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.
Online resource allocation with stochastic resource consumption
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.
Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.
citing papers explorer
-
Tight Lower Bounds for the Multi-Secretary Problem via Bellman Certificates
Establishes Ω((log T)^2) lower bound on regret for multi-secretary problem with gapped distributions via Bellman certificates, showing prior O((log T)^2) upper bounds are tight.
-
Cross-Epoch Adaptive Rollout Optimization for RL Post-Training
CERO uses Beta posteriors and Fenchel-dual online optimization to adaptively allocate a fixed rollout budget across prompts and epochs in LLM RL, outperforming fixed-allocation GRPO on math reasoning benchmarks.
-
Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy
Derives regret lower and upper bounds for online resource allocation under continuous consumption using active weighted-mass exponent p, attaining o(sqrt(T)) regret without non-degeneracy assumptions.