o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
arXiv preprint arXiv:2404.12253 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.
citing papers explorer
No citing papers match the current filters.