hub

Betweenunderthinkingandoverthinking: Anempiricalstudyofreasoninglengthandcorrectnessinllms.arXivpreprint

Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie · 2025 · arXiv 2505.00127

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Chain of Thought risk decomposes into oracle-trajectory benefit and trajectory-mismatch cost, with stability determining bounded, linear, or exponential error growth.

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.

Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

ICR creates a virtual shorter distribution from shortest correct on-policy responses to regularize RL post-training toward concise yet accurate reasoning, improving the accuracy-length Pareto frontier on math and knowledge benchmarks.

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation

cs.CL · 2025-10-20 · unverdicted · novelty 5.0

Reasoning LLMs aggregate social biases through stereotype repetition and irrelevant information injection in their thinking processes, and a self-review prompt mitigates this on BBQ, StereoSet, and BOLD benchmarks.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

Reasoning Models Don't Just Think Longer, They Move Differently

cs.CL · 2026-05-14

Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression

cs.LG · 2026-02-09

The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models

cs.AI · 2025-10-23