hub

Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms

Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie · 2025 · arXiv 2505.00127

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.

Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and knowledge benchmarks.

Reasoning Models Don't Just Think Longer, They Move Differently

cs.CL · 2026-05-14 · unverdicted · novelty 5.0

After length-correcting hidden-state trajectories during chain-of-thought, reasoning models show systematically different geometry on harder problems than baselines, strongest in competitive programming.

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation

cs.CL · 2025-10-20 · unverdicted · novelty 5.0

Reasoning LLMs aggregate social biases through stereotype repetition and irrelevant information injection in their thinking processes, and a self-review prompt mitigates this on BBQ, StereoSet, and BOLD benchmarks.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

cs.LG · 2026-02-09

The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models

cs.AI · 2025-10-23

citing papers explorer

Showing 12 of 12 citing papers.

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective cs.LG · 2026-05-20 · unverdicted · none · ref 84
Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions cs.CV · 2026-05-15 · unverdicted · none · ref 73
GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.
LACO: Adaptive Latent Communication for Collaborative Driving cs.AI · 2026-05-21 · unverdicted · none · ref 29
LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.
CLORE: Content-Level Optimization for Reasoning Efficiency cs.AI · 2026-05-21 · unverdicted · none · ref 43
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 92
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models cs.CL · 2026-05-17 · unverdicted · none · ref 50
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training cs.AI · 2026-05-08 · unverdicted · none · ref 5
ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and knowledge benchmarks.
Reasoning Models Don't Just Think Longer, They Move Differently cs.CL · 2026-05-14 · unverdicted · none · ref 6
After length-correcting hidden-state trajectories during chain-of-thought, reasoning models show systematically different geometry on harder problems than baselines, strongest in competitive programming.
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation cs.CL · 2025-10-20 · unverdicted · none · ref 15
Reasoning LLMs aggregate social biases through stereotype repetition and irrelevant information injection in their thinking processes, and a self-review prompt mitigates this on BBQ, StereoSet, and BOLD benchmarks.
Self-Aligned Reward: Towards Effective and Efficient Reasoners cs.LG · 2025-09-05 · unverdicted · none · ref 41
Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression cs.LG · 2026-02-09 · unreviewed · ref 27
The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models cs.AI · 2025-10-23 · unreviewed · ref 13

Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer