pith. machine review for the scientific record. sign in

arxiv: 2601.12538 · v1 · submitted 2026-01-18 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Agentic Reasoning for Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-17 15:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords agentic reasoninglarge language modelsautonomous agentsplanning and tool useself-evolving agentsmulti-agent systemsin-context reasoningreinforcement learning
0
0 comments X

The pith

Agentic reasoning turns large language models into autonomous agents that plan, act, and adapt through interaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey organizes methods that let large language models function as agents in open and changing environments instead of closed problems. It divides the approaches into three layers: foundational capabilities for planning and tool use in stable settings, self-evolving processes where agents improve through feedback and memory, and collective systems where multiple agents coordinate and share knowledge. The work also separates in-context orchestration used at test time from optimization through training, and it reviews applications across science, robotics, and healthcare. It ends by identifying open problems such as personalization and long-horizon interaction needed for practical use.

Core claim

Agentic reasoning reframes large language models as autonomous agents that plan, act, and learn through continual interaction with their environments. The survey organizes this capability along three complementary dimensions: foundational agentic reasoning that establishes core single-agent skills including planning, tool use, and search in stable environments; self-evolving agentic reasoning that studies refinement through feedback, memory, and adaptation; and collective multi-agent reasoning that extends intelligence to collaborative coordination, knowledge sharing, and shared goals. These layers are further split into in-context reasoning that scales test-time interaction through structed

What carries the argument

The three complementary dimensions—foundational agentic reasoning for core single-agent capabilities, self-evolving agentic reasoning for refinement through feedback and adaptation, and collective multi-agent reasoning for coordination and shared goals—organize the field and bridge thought with action.

If this is right

  • Foundational methods support reliable planning and tool use by single agents in stable environments.
  • Self-evolving techniques enable agents to improve their own performance using memory and feedback over repeated interactions.
  • Collective reasoning allows multiple agents to coordinate actions and share knowledge toward common objectives.
  • Applications in robotics, healthcare, and autonomous research follow directly from applying the organized roadmap.
  • Open challenges in long-horizon interaction and scalable multi-agent training must be resolved for broader deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The three-dimension roadmap could guide researchers in systematically identifying gaps for personalization of agent behaviors.
  • Integrating explicit world modeling may emerge naturally as an extension of the foundational and self-evolving layers.
  • Governance requirements for real-world agents might be derived from the coordination mechanisms in the collective dimension.
  • Testable extensions could involve applying the in-context versus post-training split to new benchmarks in mathematics or science.

Load-bearing premise

The three dimensions of foundational, self-evolving, and collective agentic reasoning cover the entire field comprehensively without significant overlap or omission.

What would settle it

Discovery of a major agentic reasoning method or framework that requires a fourth distinct category or shows substantial overlap across the proposed dimensions would challenge the survey's organizational structure.

read the original abstract

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we organize agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning, which establishes core single-agent capabilities including planning, tool use, and search in stable environments; self-evolving agentic reasoning, which studies how agents refine these capabilities through feedback, memory, and adaptation; and collective multi-agent reasoning, which extends intelligence to collaborative settings involving coordination, knowledge sharing, and shared goals. Across these layers, we distinguish in-context reasoning, which scales test-time interaction through structured orchestration, from post-training reasoning, which optimizes behaviors via reinforcement learning and supervised fine-tuning. We further review representative agentic reasoning frameworks across real-world applications and benchmarks, including science, robotics, healthcare, autonomous research, and mathematics. This survey synthesizes agentic reasoning methods into a unified roadmap bridging thought and action, and outlines open challenges and future directions, including personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance for real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. This survey organizes agentic reasoning for LLMs along three complementary dimensions: foundational agentic reasoning establishing core single-agent capabilities (planning, tool use, search) in stable environments; self-evolving agentic reasoning focusing on refinement via feedback, memory, and adaptation; and collective multi-agent reasoning addressing coordination, knowledge sharing, and shared goals. It distinguishes in-context orchestration from post-training optimization, reviews frameworks in applications like science, robotics, healthcare, autonomous research, and mathematics, and outlines open challenges such as personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance.

Significance. If the taxonomy holds, this survey makes a useful contribution by synthesizing a rapidly growing literature into a unified roadmap that connects reasoning processes with agentic action. The explicit separation of in-context scaling from post-training optimization provides a practical lens for comparing approaches, and the enumeration of concrete open challenges (personalization, world modeling, governance) supplies clear signposts for future work. The review of domain-specific frameworks adds concrete grounding to the high-level structure.

minor comments (2)
  1. [Introduction] Introduction: the positioning of the three dimensions as complementary and comprehensive would be clearer if the manuscript briefly noted selection criteria for the taxonomy and acknowledged possible boundary overlaps (e.g., adaptive multi-agent systems) rather than treating the partition as self-evident.
  2. [Applications and benchmarks] Applications and benchmarks section: a compact summary table mapping representative frameworks to the three dimensions, listing primary techniques and benchmark results, would make the review more scannable and allow readers to assess coverage at a glance.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey and for the recommendation of minor revision. The referee's summary accurately captures the three-layer taxonomy (foundational, self-evolving, and collective), the distinction between in-context orchestration and post-training optimization, and the enumerated open challenges. We appreciate the recognition that this structure provides a useful roadmap connecting reasoning processes with agentic action.

Circularity Check

0 steps flagged

No significant circularity in this literature survey

full rationale

This paper is a literature survey synthesizing existing agentic reasoning methods for LLMs into a high-level roadmap. It organizes the field along three complementary dimensions (foundational, self-evolving, and collective) and distinguishes in-context from post-training approaches, but presents this taxonomy explicitly as an editorial organizing lens drawn from external references rather than a derived result. No mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems appear in the manuscript. All claims are positioned as reviews of prior work, with no self-citation chains or self-definitional reductions that would make the central synthesis equivalent to its inputs by construction. The paper is therefore self-contained against external benchmarks and receives a score of zero.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a literature survey containing no new free parameters, invented entities, or original axioms beyond standard assumptions in AI research. All substantive claims rest on the body of cited prior work.

axioms (1)
  • domain assumption LLMs can be reframed as autonomous agents capable of planning, acting, and learning through interaction in dynamic environments
    Presented in the abstract as the core paradigm shift underlying the entire survey.

pith-pipeline@v0.9.0 · 5638 in / 1313 out tokens · 64578 ms · 2026-05-17T15:08:51.511537+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

    cs.CR 2026-04 unverdicted novelty 8.0

    The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

  2. ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

    cs.AI 2026-05 conditional novelty 7.0

    ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.

  3. Learning Agentic Policy from Action Guidance

    cs.CL 2026-05 unverdicted novelty 7.0

    ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

  4. MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

    cs.AI 2026-05 unverdicted novelty 6.0

    MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-...

  5. The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions

    cs.MA 2026-05 unverdicted novelty 6.0

    Multi-agent LLM interactions induce cognitive loafing via a formalized Interaction Depth Limit and Sovereignty Gap, where models subjugate correct derivations to social compliance, with lead agent identity disproporti...

  6. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 6.0

    Skill1 trains one policy to jointly evolve skill query generation, re-ranking, task solving, and distillation from a single task-success signal, with low-frequency trends crediting selection and high-frequency variati...

  7. Confidence Estimation in Automatic Short Answer Grading with LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    A hybrid confidence framework for LLM-based short answer grading combines model signals with aleatoric uncertainty from semantic clustering of responses and improves selective grading reliability over single-source methods.

  8. HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

    cs.AI 2026-04 unverdicted novelty 6.0

    HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

  9. Agentic Frameworks for Reasoning Tasks: An Empirical Study

    cs.AI 2026-04 unverdicted novelty 6.0

    An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.

  10. Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation

    cs.IR 2026-03 unverdicted novelty 6.0

    MoS applies theme-aware routing to extract multi-scale theme-specific subsequences from noisy long user sequences, achieving state-of-the-art recommendation performance with fewer FLOPs than comparable MoE models.

  11. M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models

    cs.AI 2026-05 unverdicted novelty 5.0

    M2A uses null-space model merging to combine mathematical and agentic reasoning in LLMs, raising SWE-Bench Verified performance from 44.0% to 51.2% on Qwen3-8B without retraining.

  12. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 5.0

    Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency var...

  13. Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 5.0

    Skill1 co-evolves skill selection, utilization, and distillation inside a single policy using only task-outcome reward, with low-frequency trends crediting selection and high-frequency variation crediting distillation...

  14. Confidence Estimation in Automatic Short Answer Grading with LLMs

    cs.CL 2026-04 unverdicted novelty 5.0

    A hybrid confidence framework for LLM-based automatic short answer grading integrates model-based signals with aleatoric uncertainty from semantic clustering of responses and yields more reliable estimates than single...

  15. Heterogeneous Scientific Foundation Model Collaboration

    cs.AI 2026-04 unverdicted novelty 5.0

    Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.

  16. TDD Governance for Multi-Agent Code Generation via Prompt Engineering

    cs.SE 2026-04 unverdicted novelty 5.0

    An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code g...

  17. Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

    cs.CL 2026-05 unverdicted novelty 4.0

    This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a...

  18. WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

    cs.AI 2026-04 unverdicted novelty 4.0

    WebUncertainty improves web agent performance on benchmarks by adaptively selecting planning modes based on task uncertainty and using confidence-induced action uncertainty in MCTS to quantify aleatoric and epistemic ...

  19. ActionNex: A Virtual Outage Manager for Cloud Computing

    cs.AI 2026-04 unverdicted novelty 4.0

    ActionNex is an agentic system for cloud outage management that compresses multimodal signals into critical events, uses hierarchical memory for reasoning, and recommends actions with 71.4% precision on real Azure outages.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 16 Pith papers · 55 internal anchors

  1. [1]

    Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

  2. [2]

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

    Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625, 2022

  3. [3]

    Pal: Program-aided language models

    Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. InInternational Conference on Machine Learning, pages 10764–10799. PMLR, 2023

  4. [4]

    Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

  5. [5]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  6. [6]

    Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

  7. [7]

    Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

    Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

  8. [8]

    A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

  9. [9]

    Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136, 2025

  10. [10]

    A Survey on Retrieval-Augmented Text Generation for Large Language Models

    Yizheng Huang and Jimmy Huang. A survey on retrieval-augmented text generation for large language models.arXiv preprint arXiv:2404.10981, 2024

  11. [11]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. Openhands: An open platform for ai software developers as generalist agents.arXiv preprint arXiv:2407.16741, 2024. 74 Agentic Reasoning for Large Language Models

  12. [12]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

  13. [13]

    Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

    Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, et al. Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

  14. [14]

    Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023

  15. [15]

    Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

    Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Kristian Kersting, Jeff Z. Pan, Hinrich Schütze, Volker Tresp, and Yunpu Ma. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025

  16. [16]

    Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

    Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

  17. [17]

    MetaGPT: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 202...

  18. [18]

    Unleashing cogni- tive synergy in large language models: A task-solving agent through multi-persona self-collaboration

    Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng Ji. Unleashing cogni- tive synergy in large language models: A task-solving agent through multi-persona self-collaboration. InProc. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2024), 2024

  19. [19]

    Battleagentbench: A benchmark for evaluating cooperation and competition capabilities of language models in multi-agent systems.arXiv preprint arXiv:2408.15971, 2024

    Wei Wang, Dan Zhang, Tao Feng, Boyan Wang, and Jie Tang. Battleagentbench: A benchmark for evaluating cooperation and competition capabilities of language models in multi-agent systems.arXiv preprint arXiv:2408.15971, 2024

  20. [20]

    AgentBench: Evaluating LLMs as Agents

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. Agentbench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023. URLhttps://www...

  21. [21]

    Multiagentbench: Evaluating the collaboration and competition of llm agents.arXiv preprint arXiv:2503.01935, 2025

    Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, et al. Multiagentbench: Evaluating the collaboration and competition of llm agents.arXiv preprint arXiv:2503.01935, 2025

  22. [22]

    Tree-of-code: A self-growing tree framework for end-to-end code generation and execution in complex tasks

    Ziyi Ni, Yifan Li, Ning Yang, Dou Shen, Pin Lyu, and Daxiang Dong. Tree-of-code: A self-growing tree framework for end-to-end code generation and execution in complex tasks. InFindings of the Association for Computational Linguistics: ACL 2025, pages 9804–9819, 2025

  23. [23]

    Search-o1: Agentic Search-Enhanced Large Reasoning Models

    XiaoxiLi, GuantingDong, JiajieJin, YuyaoZhang, YujiaZhou, YutaoZhu, PeitianZhang, andZhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models.arXiv preprint arXiv:2501.05366, 2025. 75 Agentic Reasoning for Large Language Models

  24. [24]

    A-MEM: Agentic Memory for LLM Agents

    Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

  25. [25]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    TianxinWei,NoveenSachdeva,BenjaminColeman,ZhankuiHe,YuanchenBei,XuyingNing,Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, et al. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

  26. [26]

    Coevolving with the other you: Fine-tuning llm with sequential cooperative multi-agent reinforcement learning

    Hao Ma, Tianyi Hu, Zhiqiang Pu, Liu Boyin, Xiaolin Ai, Yanyan Liang, and Min Chen. Coevolving with the other you: Fine-tuning llm with sequential cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 37:15497–15525, 2024

  27. [27]

    Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

    Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-r1: Training llms to reason and leverage search engines with reinforcement learning.arXiv preprint arXiv:2503.09516, 2025

  28. [28]

    Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning.arXiv preprint arXiv:2505.16421, 2025

    Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, et al. Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning.arXiv preprint arXiv:2505.16421, 2025

  29. [29]

    Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

    Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

  30. [30]

    Mathematical discoveries from program search with large language models.Nature, 625(7995): 468–475, 2024

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models.Nature, 625(7995): 468–475, 2024

  31. [31]

    Vibe coding vs

    Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic AI, 2025

  32. [32]

    Vibe coding — wikipedia.https://en.wikipedia.org/wiki/Vibe_coding, 2025

    Andrej Karpathy. Vibe coding — wikipedia.https://en.wikipedia.org/wiki/Vibe_coding, 2025

  33. [33]

    ChemCrow: Augmenting large-language models with chemistry tools

    Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

  34. [34]

    Physical ai agents: Integrating cognitive intelligence with real-world action

    Fouad Bousetouane. Physical ai agents: Integrating cognitive intelligence with real-world action. arXiv preprint arXiv:2501.08944, 2025

  35. [35]

    Matexpert: Decomposing materials discovery by mimicking human experts.arXiv preprint arXiv:2410.21317, 2024

    Qianggang Ding, Santiago Miret, and Bang Liu. Matexpert: Decomposing materials discovery by mimicking human experts.arXiv preprint arXiv:2410.21317, 2024

  36. [36]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

  37. [37]

    Embodiedrag: Dynamic 3d scene graph retrieval for efficient and scalable robot task planning.arXiv preprint arXiv:2410.23968, 2024

    Booker Meghan, Byrd Grayson, Kemp Bethany, Schmidt Aurora, and Rivera Corban. Embodiedrag: Dynamic 3d scene graph retrieval for efficient and scalable robot task planning.arXiv preprint arXiv:2410.23968, 2024. URLhttps://www.arxiv.org/abs/2410.23968. 76 Agentic Reasoning for Large Language Models

  38. [38]

    Embodied-r: collaborative frame- work for activating embodied spatial reasoning in foundation models via reinforcement learning,

    Baining Zhao, Ziyou Wang, Jianjie Fang, Chen Gao, Fanhang Man, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li, and Wenwu Zhu. Embodied-r: Collaborative framework for activating embodied spatial reasoning in foundation models via reinforcement learning.arXiv preprint arXiv:2504.12680, 2025

  39. [39]

    Mmedagent: Learning to use medical tools with multi-modal agent.arXiv preprint arXiv:2407.02483, 2024

    Binxu Li, Tiankai Yan, Yuanting Pan, Jie Luo, Ruiyang Ji, Jiayuan Ding, Zhe Xu, Shilong Liu, Haoyu Dong, Zihao Lin, et al. Mmedagent: Learning to use medical tools with multi-modal agent.arXiv preprint arXiv:2407.02483, 2024

  40. [40]

    Biomni: A general-purpose biomedical ai agent.biorxiv, 2025

    Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, et al. Biomni: A general-purpose biomedical ai agent.biorxiv, 2025

  41. [41]

    WebSailor: Navigating Super-human Reasoning for Web Agent

    Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, et al. Websailor: Navigating super-human reasoning for web agent. arXiv preprint arXiv:2507.02592, 2025

  42. [42]

    SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

    Boyuan Zheng, Michael Y Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, et al. Skillweaver: Web agents can self-improve by discovering and honing skills.arXiv preprint arXiv:2504.07079, 2025

  43. [43]

    Ai agents vs

    Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.arXiv preprint arXiv:2505.10468, 2025

  44. [44]

    A dynamic llm-powered agent network for task-oriented agent collaboration

    Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. A dynamic llm-powered agent network for task-oriented agent collaboration. InFirst Conference on Language Modeling, 2024

  45. [45]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854, 2023. URLhttps://www.arxiv. org/abs/2307.13854

  46. [46]

    VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Gra- ham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, and Daniel Fried. Visualwebarena: Evaluat- ing multimodal agents on realistic visual web tasks.arXiv preprint arXiv:2401.13649, 2024. URL https://www.arxiv.org/abs/2401.13649

  47. [47]

    Videowebarena: Evaluating long context multimodal agents with video understanding web tasks.arXiv preprint arXiv:2410.19100, 2024

    Lawrence Jang, Yinheng Li, Dan Zhao, Charles Ding, Justin Lin, Paul Pu Liang, Rogerio Bonatti, and Kazuhito Koishida. Videowebarena: Evaluating long context multimodal agents with video understanding web tasks.arXiv preprint arXiv:2410.19100, 2024

  48. [48]

    ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

    Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020

  49. [49]

    Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023

  50. [50]

    Mind2web 2: Evaluating agentic search with agent-as-a-judge.arXiv preprint arXiv:2506.21506, 2025

    Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, et al. Mind2web 2: Evaluating agentic search with agent-as-a-judge.arXiv preprint arXiv:2506.21506, 2025. 77 Agentic Reasoning for Large Language Models

  51. [51]

    Towards reasoning in large language models: A survey

    Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022

  52. [52]

    Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

    Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models.arXiv preprint arXiv:2503.09567, 2025

  53. [53]

    Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

    Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, et al. Towards large reasoning models: A survey of reinforced reasoning with large language models.arXiv preprint arXiv:2501.09686, 2025

  54. [54]

    A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems.arXiv preprint arXiv:2504.09037, 2025

    Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, et al. A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems.arXiv preprint arXiv:2504.09037, 2025

  55. [55]

    A survey of reinforcement learning for large reasoning models.arXiv preprint arXiv:2509.08827, 2025

    Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, et al. A survey of reinforcement learning for large reasoning models.arXiv preprint arXiv:2509.08827, 2025

  56. [56]

    The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

    Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. The landscape of agentic reinforcement learning for llms: A survey. arXiv preprint arXiv:2509.02547, 2025

  57. [57]

    A comprehensive survey on reinforcement learning-based agentic search: Foundations, roles, optimizations, evaluations, and applications.arXiv preprint arXiv:2510.16724, 2025

    Minhua Lin, Zongyu Wu, Zhichao Xu, Hui Liu, Xianfeng Tang, Qi He, Charu Aggarwal, Xiang Zhang, and Suhang Wang. A comprehensive survey on reinforcement learning-based agentic search: Foundations, roles, optimizations, evaluations, and applications.arXiv preprint arXiv:2510.16724, 2025

  58. [58]

    A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

    Jinyuan Fang, Yanwen Peng, Xi Zhang, Yingxu Wang, Xinhao Yi, Guibin Zhang, Yi Xu, Bin Wu, Siwei Liu, Zihao Li, et al. A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407, 2025

  59. [59]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

  60. [60]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma,PeiyiWang,XiaoBi,etal. Deepseek-r1: Incentivizingreasoningcapabilityinllmsviareinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  61. [61]

    Deepretrieval: Hacking real search engines and retrievers with large language models via reinforcement learning.arXiv preprint arXiv:2503.00223, 2025

    Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, and Jiawei Han. Deepretrieval: Hacking real search engines and retrievers with large language models via reinforcement learning.arXiv preprint arXiv:2503.00223, 2025

  62. [62]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  63. [63]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 78 Agentic Reasoning for Large Language Models

  64. [64]

    Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

    Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, and Jiaya Jia. Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

  65. [65]

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale

    Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale. arXiv preprint arXiv:2503.14476, 2025

  66. [66]

    Autogen: Enabling next-gen LLM applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst Conference on Language Modeling, 2024. URLhttps://openreview.net/forum?id=BAakY1hNKS

  67. [67]

    Camel: Commu- nicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems, 36:51991–52008, 2023

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems, 36:51991–52008, 2023

  68. [68]

    Gptswarm: Language agents as optimizable graphs

    Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmid- huber. Gptswarm: Language agents as optimizable graphs. InForty-first International Conference on Machine Learning, 2024

  69. [69]

    Multi-agent deep research: Training multi-agent systems with m-grpo.arXiv preprint arXiv:2511.13288, 2025

    Haoyang Hong, Jiajun Yin, Yuan Wang, Jingnan Liu, Zhe Chen, Ailing Yu, Ji Li, Zhiling Ye, Hansong Xiao, Yefei Chen, et al. Multi-agent deep research: Training multi-agent systems with m-grpo.arXiv preprint arXiv:2511.13288, 2025

  70. [70]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

  71. [71]

    ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

    BinfengXu, ZhiyuanPeng, BowenLei, SubhabrataMukherjee, YuchenLiu, andDongkuanXu. REWOO: Decoupling reasoning from observations for efficient augmented language models.arXiv preprint arXiv:2305.18323, 2023

  72. [72]

    LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

    Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. LLM+P: Empowering large language models with optimal planning proficiency.arXiv preprint arXiv:2304.11477, 2023

  73. [73]

    On the planning abilities of large language models: A critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

    Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the planning abilities of large language models: A critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

  74. [74]

    Graph of thoughts: Solving elaborate problems with large language models

    Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

  75. [75]

    Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

    Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, and Ming Jin. Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

  76. [76]

    Hypertree planning: Enhancing llm reasoning via hierarchical thinking

    Runquan Gui, Zhihai Wang, Jie Wang, Chi Ma, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Defu Lian, Enhong Chen, and Feng Wu. Hypertree planning: Enhancing llm reasoning via hierarchical thinking. arXiv preprint arXiv:2505.02322, 2025. 79 Agentic Reasoning for Large Language Models

  77. [77]

    Reflect-then-plan: Offline model-based planning through a doubly bayesian lens.arXiv preprint arXiv:2506.06261, 2025

    Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, and Pascal Poupart. Reflect-then-plan: Offline model-based planning through a doubly bayesian lens.arXiv preprint arXiv:2506.06261, 2025

  78. [78]

    Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37:126544–126565, 2024

    Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37:126544–126565, 2024

  79. [79]

    Codenav: Beyond tool-use to using real-world codebases with llm agents.arXiv preprint arXiv:2406.12276, 2024

    Tanmay Gupta, Luca Weihs, and Aniruddha Kembhavi. Codenav: Beyond tool-use to using real-world codebases with llm agents.arXiv preprint arXiv:2406.12276, 2024

  80. [80]

    Plan-on-graph: Self-correcting adaptive planning of large language model on knowledge graphs.Advances in Neural Information Processing Systems, 37:37665–37691, 2024

    Liyi Chen, Panrong Tong, Zhongming Jin, Ying Sun, Jieping Ye, and Hui Xiong. Plan-on-graph: Self-correcting adaptive planning of large language model on knowledge graphs.Advances in Neural Information Processing Systems, 37:37665–37691, 2024

Showing first 80 references.