pith. sign in

arxiv: 2601.07036 · v2 · pith:4QTCFJWSnew · submitted 2026-01-11 · 💻 cs.CL · cs.AI· cs.LG

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

classification 💻 cs.CL cs.AIcs.LG
keywords reasoningmid-thinktrainingbehaviorcontrolledinstructionsintermediate-budgetprompting
0
0 comments X
read the original abstract

Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CausalGuard: Conformal Inference under Graph Uncertainty

    cs.LG 2026-05 unverdicted novelty 6.0

    CausalGuard aggregates LLM-proposed and data-pruned DAGs to weight doubly robust pseudo-outcomes and applies conformal calibration to deliver finite-sample marginal coverage for conditional average treatment effects u...

  2. Reliability-Gated Source Anchoring for Continual Test-Time Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    RMemSafe gates source anchoring via entropy in CTTA, reducing error by 1.05pp on ResNet-50 when source accuracy collapses and showing shallower degradation slope than prior methods.

  3. Reliability-Gated Source Anchoring for Continual Test-Time Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    RMemSafe attenuates source anchoring via entropy gating when the frozen source model degrades, yielding lower error than prior methods on continual corruption benchmarks and shallower degradation under source failure.