Dynamic Rollout Editing reduces overthinking in RL-trained LLMs by editing post-answer continuations in successful rollouts and preferring the edited versions within GRPO groups.
Answer convergence as a signal for early stopping in reasoning.arXiv preprint arXiv:2506.02536
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
RecurGuard monitors recurrence rate, volume growth, and query progress in exposed reasoning traces to terminate generation on token-consumption attacks, reporting 99% detection on OverThink and 92% on ExtendAttack with near-zero false positives.
RL-trained lightweight controller using answer statistics improves trade-offs among correctness, latency, and total samples in adaptive sampling for LLM test-time scaling.
Conformal risk control with upper and lower thresholds lets LLMs adaptively stop reasoning while guaranteeing a maximum error rate and minimizing token use.
Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.
TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
DTSR enables large reasoning models to dynamically assess chain-of-thought sufficiency via reflection signals and a sufficiency check, reducing reasoning length by 28.9-34.9% with minimal performance loss on Qwen3 models.
citing papers explorer
-
Entropy After </Think> for reasoning model early exiting
Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.