A Lagrangian-relaxation plus imitation-learning pipeline adaptively allocates test-time compute to LLMs, outperforming uniform baselines by up to 12.8% relative accuracy on MATH while staying within a fixed average budget.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization
A Lagrangian-relaxation plus imitation-learning pipeline adaptively allocates test-time compute to LLMs, outperforming uniform baselines by up to 12.8% relative accuracy on MATH while staying within a fixed average budget.