LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
Can large language models reason and plan? , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
IFPV integrates multi-perspective hierarchical agents for generative planning with an adversarial cognitive simulation engine for verification, reporting 19.4% higher mission success, 41.7% lower cost versus LLM baseline, and 31.8% higher suppression versus rule-based validation in combat simulation
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
citing papers explorer
-
LIMO: Less is More for Reasoning
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
-
IFPV: An Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification
IFPV integrates multi-perspective hierarchical agents for generative planning with an adversarial cognitive simulation engine for verification, reporting 19.4% higher mission success, 41.7% lower cost versus LLM baseline, and 31.8% higher suppression versus rule-based validation in combat simulation
-
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.