Wait, we don’t need to “wait”! removing thinking tokens improves reasoning efficiency

Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou · 2025 · DOI 10.18653/v1/2025.findings-emnlp.394

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

cs.AI · 2026-01-30 · unverdicted · novelty 6.0

AI model failures on complex tasks become increasingly incoherent with longer reasoning chains, making consistent misalignment less likely than chaotic errors as capabilities scale.

Efficient Test-Time Scaling via Temporal Reasoning Aggregation

cs.AI · 2026-04-19 · unverdicted · novelty 5.0

TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.

citing papers explorer

Showing 4 of 4 citing papers.

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models cs.CL · 2026-05-17 · unverdicted · none · ref 69
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? cs.AI · 2026-01-30 · unverdicted · none · ref 9
AI model failures on complex tasks become increasingly incoherent with longer reasoning chains, making consistent misalignment less likely than chaotic errors as capabilities scale.
Efficient Test-Time Scaling via Temporal Reasoning Aggregation cs.AI · 2026-04-19 · unverdicted · none · ref 70
TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking cs.AI · 2026-04-09 · unverdicted · none · ref 33
SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.

Wait, we don’t need to “wait”! removing thinking tokens improves reasoning efficiency

fields

years

verdicts

representative citing papers

citing papers explorer