pith. sign in

hub Canonical reference

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Canonical reference. 93% of citing Pith papers cite this work as background.

43 Pith papers citing it
Background 93% of classified citations
abstract

The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference. These models employ extended chain-of-thought (CoT) processes, exploring multiple strategies to enhance problem-solving capabilities. However, a critical question remains: How to intelligently and efficiently scale computational resources during testing. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models, where excessive computational resources are allocated for simple problems with minimal benefit. We introduce novel efficiency metrics from both outcome and process perspectives to evaluate the rational use of computational resources by o1-like models. Using a self-training paradigm, we propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy. Experimental results show that our approach successfully reduces computational overhead while preserving model performance across a range of testsets with varying difficulty levels, such as GSM8K, MATH500, GPQA, and AIME.

hub tools

citation-role summary

background 15

citation-polarity summary

representative citing papers

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.

Entropy After </Think> for reasoning model early exiting

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Reasoning Compression with Mixed-Policy Distillation

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Mixed-Policy Distillation transfers concise reasoning behavior from larger to smaller LLMs by having the teacher compress student-generated trajectories, cutting token usage up to 27% while raising benchmark scores.

citing papers explorer

Showing 43 of 43 citing papers.