BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

Feifei Zhang; Hao Wen; Jie Wang; Liye Chen; Xinrui Wu; Ya-Qin Zhang; Yi Sun; Yuanchun Li; Yunhao Liu; Yunxin Liu

arxiv: 2508.17196 · v2 · pith:MI3IKDKLnew · submitted 2025-08-24 · 💻 cs.LG · cs.AI

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

Hao Wen , Xinrui Wu , Yi Sun , Feifei Zhang , Liye Chen , Jie Wang , Yunxin Liu , Yunhao Liu

show 2 more authors

Ya-Qin Zhang Yuanchun Li

This is my paper

classification 💻 cs.LG cs.AI

keywords reasoningbudgetbudgetthinkercontrolbudget-awareeffectivellmsmodel

0 comments

read the original abstract

Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.
Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight
cs.AI 2026-05 unverdicted novelty 6.0

Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to prune up to 50% of wasted tokens and recover safe actions from 80% of unsafe traces, more than doubling success rates wi...
Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight
cs.AI 2026-05 conditional novelty 6.0

Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to cut up to 50% wasted reasoning tokens and recover safe actions from 80% of unsafe traces, more than doubling success rat...