Mitigating overthinking in large reasoning models via manifold steering

Yao Huang, Huanran Chen, Shouwei Ruan, Yichi Zhang, Xingxing Wei, Yinpeng Dong · 2025 · arXiv 2505.22411

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding

cs.SE · 2026-04-05 · accept · novelty 7.0

SADU benchmark shows top VLMs reach only 70% accuracy on software architecture diagram tasks, revealing gaps in visual reasoning for engineering artifacts.

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.

CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine

cs.CL · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

CLEAR reveals that LLMs' accuracy on medical questions drops and their 'humility deficit' grows as the number of plausible answers increases and abstention options shift from assertive to uncertain phrasing.

Reasoning Models Don't Just Think Longer, They Move Differently

cs.CL · 2026-05-14 · unverdicted · novelty 5.0

After length-correcting hidden-state trajectories during chain-of-thought, reasoning models show systematically different geometry on harder problems than baselines, strongest in competitive programming.

citing papers explorer

Showing 4 of 4 citing papers.

Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding cs.SE · 2026-04-05 · accept · none · ref 27
SADU benchmark shows top VLMs reach only 70% accuracy on software architecture diagram tasks, revealing gaps in visual reasoning for engineering artifacts.
Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 22
BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.
CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine cs.CL · 2026-05-01 · unverdicted · none · ref 25 · 2 links
CLEAR reveals that LLMs' accuracy on medical questions drops and their 'humility deficit' grows as the number of plausible answers increases and abstention options shift from assertive to uncertain phrasing.
Reasoning Models Don't Just Think Longer, They Move Differently cs.CL · 2026-05-14 · unverdicted · none · ref 1
After length-correcting hidden-state trajectories during chain-of-thought, reasoning models show systematically different geometry on harder problems than baselines, strongest in competitive programming.

Mitigating overthinking in large reasoning models via manifold steering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer