arxiv: 2601.12538 · v1 · submitted 2026-01-18 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Agentic Reasoning for Large Language Models

Tianxin Wei , Ting-Wei Li , Zhining Liu , Xuying Ning , Ze Yang , Jiaru Zou , Zhichen Zeng , Ruizhong Qiu

show 21 more authors

Xiao Lin Dongqi Fu Zihao Li Mengting Ai Duo Zhou Wenxuan Bao Yunzhe Li Gaotang Li Cheng Qian Yu Wang Xiangru Tang Yin Xiao Liri Fang Hui Liu Xianfeng Tang Yuji Zhang Chi Wang Jiaxuan You Heng Ji Hanghang Tong Jingrui He

Authors on Pith no claims yet

Pith reviewed 2026-05-17 15:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords agentic reasoninglarge language modelsautonomous agentsplanning and tool useself-evolving agentsmulti-agent systemsin-context reasoningreinforcement learning

0 comments

The pith

Agentic reasoning turns large language models into autonomous agents that plan, act, and adapt through interaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey organizes methods that let large language models function as agents in open and changing environments instead of closed problems. It divides the approaches into three layers: foundational capabilities for planning and tool use in stable settings, self-evolving processes where agents improve through feedback and memory, and collective systems where multiple agents coordinate and share knowledge. The work also separates in-context orchestration used at test time from optimization through training, and it reviews applications across science, robotics, and healthcare. It ends by identifying open problems such as personalization and long-horizon interaction needed for practical use.

Core claim

Agentic reasoning reframes large language models as autonomous agents that plan, act, and learn through continual interaction with their environments. The survey organizes this capability along three complementary dimensions: foundational agentic reasoning that establishes core single-agent skills including planning, tool use, and search in stable environments; self-evolving agentic reasoning that studies refinement through feedback, memory, and adaptation; and collective multi-agent reasoning that extends intelligence to collaborative coordination, knowledge sharing, and shared goals. These layers are further split into in-context reasoning that scales test-time interaction through structed

What carries the argument

The three complementary dimensions—foundational agentic reasoning for core single-agent capabilities, self-evolving agentic reasoning for refinement through feedback and adaptation, and collective multi-agent reasoning for coordination and shared goals—organize the field and bridge thought with action.

If this is right

Foundational methods support reliable planning and tool use by single agents in stable environments.
Self-evolving techniques enable agents to improve their own performance using memory and feedback over repeated interactions.
Collective reasoning allows multiple agents to coordinate actions and share knowledge toward common objectives.
Applications in robotics, healthcare, and autonomous research follow directly from applying the organized roadmap.
Open challenges in long-horizon interaction and scalable multi-agent training must be resolved for broader deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three-dimension roadmap could guide researchers in systematically identifying gaps for personalization of agent behaviors.
Integrating explicit world modeling may emerge naturally as an extension of the foundational and self-evolving layers.
Governance requirements for real-world agents might be derived from the coordination mechanisms in the collective dimension.
Testable extensions could involve applying the in-context versus post-training split to new benchmarks in mathematics or science.

Load-bearing premise

The three dimensions of foundational, self-evolving, and collective agentic reasoning cover the entire field comprehensively without significant overlap or omission.

What would settle it

Discovery of a major agentic reasoning method or framework that requires a fourth distinct category or shows substantial overlap across the proposed dimensions would challenge the survey's organizational structure.

read the original abstract

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we organize agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning, which establishes core single-agent capabilities including planning, tool use, and search in stable environments; self-evolving agentic reasoning, which studies how agents refine these capabilities through feedback, memory, and adaptation; and collective multi-agent reasoning, which extends intelligence to collaborative settings involving coordination, knowledge sharing, and shared goals. Across these layers, we distinguish in-context reasoning, which scales test-time interaction through structured orchestration, from post-training reasoning, which optimizes behaviors via reinforcement learning and supervised fine-tuning. We further review representative agentic reasoning frameworks across real-world applications and benchmarks, including science, robotics, healthcare, autonomous research, and mathematics. This survey synthesizes agentic reasoning methods into a unified roadmap bridging thought and action, and outlines open challenges and future directions, including personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance for real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes agentic reasoning into three layers and splits in-context from post-training methods, giving a usable map of the literature but no new techniques or results.

read the letter

This survey organizes agentic reasoning for LLMs into three layers: foundational single-agent skills like planning and tool use, self-evolving adaptation through feedback and memory, and collective multi-agent coordination. It also separates in-context orchestration at test time from post-training optimization via RL or fine-tuning. The abstract and structure show an attempt to connect these pieces into a single roadmap that covers applications in robotics, science, healthcare, and math while listing practical open problems such as personalization, long-horizon tasks, world modeling, and governance. That synthesis is the main contribution and can help readers who are new to the area get oriented without reading dozens of separate papers. The review of representative frameworks and benchmarks is straightforward and covers the expected ground. The softer spots are in the taxonomy itself. The three dimensions are presented as complementary, yet real systems frequently combine self-evolution with multi-agent coordination, so the boundaries are not shown to be clean or exhaustive. As a survey the paper contains no new derivations, experiments, or falsifiable claims, which means its value depends on how accurately and completely it cites and groups the prior work. Readers looking for a high-level overview of LLM agents will find it useful. Those seeking original methods or fresh data will not. I would bring this to a reading group to discuss whether the layers hold up in practice. I would not cite it in my own work unless pointing to the listed challenges. It deserves peer review because the field moves quickly and a balanced synthesis can still be worth referee time to tighten the categories and check coverage.

Referee Report

0 major / 2 minor

Summary. This survey organizes agentic reasoning for LLMs along three complementary dimensions: foundational agentic reasoning establishing core single-agent capabilities (planning, tool use, search) in stable environments; self-evolving agentic reasoning focusing on refinement via feedback, memory, and adaptation; and collective multi-agent reasoning addressing coordination, knowledge sharing, and shared goals. It distinguishes in-context orchestration from post-training optimization, reviews frameworks in applications like science, robotics, healthcare, autonomous research, and mathematics, and outlines open challenges such as personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance.

Significance. If the taxonomy holds, this survey makes a useful contribution by synthesizing a rapidly growing literature into a unified roadmap that connects reasoning processes with agentic action. The explicit separation of in-context scaling from post-training optimization provides a practical lens for comparing approaches, and the enumeration of concrete open challenges (personalization, world modeling, governance) supplies clear signposts for future work. The review of domain-specific frameworks adds concrete grounding to the high-level structure.

minor comments (2)

[Introduction] Introduction: the positioning of the three dimensions as complementary and comprehensive would be clearer if the manuscript briefly noted selection criteria for the taxonomy and acknowledged possible boundary overlaps (e.g., adaptive multi-agent systems) rather than treating the partition as self-evident.
[Applications and benchmarks] Applications and benchmarks section: a compact summary table mapping representative frameworks to the three dimensions, listing primary techniques and benchmark results, would make the review more scannable and allow readers to assess coverage at a glance.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey and for the recommendation of minor revision. The referee's summary accurately captures the three-layer taxonomy (foundational, self-evolving, and collective), the distinction between in-context orchestration and post-training optimization, and the enumerated open challenges. We appreciate the recognition that this structure provides a useful roadmap connecting reasoning processes with agentic action.

Circularity Check

0 steps flagged

No significant circularity in this literature survey

full rationale

This paper is a literature survey synthesizing existing agentic reasoning methods for LLMs into a high-level roadmap. It organizes the field along three complementary dimensions (foundational, self-evolving, and collective) and distinguishes in-context from post-training approaches, but presents this taxonomy explicitly as an editorial organizing lens drawn from external references rather than a derived result. No mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems appear in the manuscript. All claims are positioned as reviews of prior work, with no self-citation chains or self-definitional reductions that would make the central synthesis equivalent to its inputs by construction. The paper is therefore self-contained against external benchmarks and receives a score of zero.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a literature survey containing no new free parameters, invented entities, or original axioms beyond standard assumptions in AI research. All substantive claims rest on the body of cited prior work.

axioms (1)

domain assumption LLMs can be reframed as autonomous agents capable of planning, acting, and learning through interaction in dynamic environments
Presented in the abstract as the core paradigm shift underlying the entire survey.

pith-pipeline@v0.9.0 · 5638 in / 1313 out tokens · 64578 ms · 2026-05-17T15:08:51.511537+00:00 · methodology

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
cs.CR 2026-04 unverdicted novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
cs.AI 2026-05 conditional novelty 7.0

ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.
Learning Agentic Policy from Action Guidance
cs.CL 2026-05 unverdicted novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-...
The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions
cs.MA 2026-05 unverdicted novelty 6.0

Multi-agent LLM interactions induce cognitive loafing via a formalized Interaction Depth Limit and Sovereignty Gap, where models subjugate correct derivations to social compliance, with lead agent identity disproporti...
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

Skill1 trains one policy to jointly evolve skill query generation, re-ranking, task solving, and distillation from a single task-success signal, with low-frequency trends crediting selection and high-frequency variati...
Confidence Estimation in Automatic Short Answer Grading with LLMs
cs.CL 2026-04 unverdicted novelty 6.0

A hybrid confidence framework for LLM-based short answer grading combines model signals with aleatoric uncertainty from semantic clustering of responses and improves selective grading reliability over single-source methods.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
cs.AI 2026-04 unverdicted novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Agentic Frameworks for Reasoning Tasks: An Empirical Study
cs.AI 2026-04 unverdicted novelty 6.0

An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.
Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation
cs.IR 2026-03 unverdicted novelty 6.0

MoS applies theme-aware routing to extract multi-scale theme-specific subsequences from noisy long user sequences, achieving state-of-the-art recommendation performance with fewer FLOPs than comparable MoE models.
M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
cs.AI 2026-05 unverdicted novelty 5.0

M2A uses null-space model merging to combine mathematical and agentic reasoning in LLMs, raising SWE-Bench Verified performance from 44.0% to 51.2% on Qwen3-8B without retraining.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 5.0

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency var...
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 5.0

Skill1 co-evolves skill selection, utilization, and distillation inside a single policy using only task-outcome reward, with low-frequency trends crediting selection and high-frequency variation crediting distillation...
Confidence Estimation in Automatic Short Answer Grading with LLMs
cs.CL 2026-04 unverdicted novelty 5.0

A hybrid confidence framework for LLM-based automatic short answer grading integrates model-based signals with aleatoric uncertainty from semantic clustering of responses and yields more reliable estimates than single...
Heterogeneous Scientific Foundation Model Collaboration
cs.AI 2026-04 unverdicted novelty 5.0

Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
TDD Governance for Multi-Agent Code Generation via Prompt Engineering
cs.SE 2026-04 unverdicted novelty 5.0

An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code g...
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces
cs.CL 2026-05 unverdicted novelty 4.0

This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a...
WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent
cs.AI 2026-04 unverdicted novelty 4.0

WebUncertainty improves web agent performance on benchmarks by adaptively selecting planning modes based on task uncertainty and using confidence-induced action uncertainty in MCTS to quantify aleatoric and epistemic ...
ActionNex: A Virtual Outage Manager for Cloud Computing
cs.AI 2026-04 unverdicted novelty 4.0

ActionNex is an agentic system for cloud outage management that compresses multimodal signals into critical events, uses hierarchical memory for reasoning, and recommends actions with 71.4% precision on real Azure outages.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 16 Pith papers · 55 internal anchors

[1]

Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[2]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models.arXiv preprint arXiv:2205.10625, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Pal: Program-aided language models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. InInternational Conference on Machine Learning, pages 10764–10799. PMLR, 2023

work page 2023
[4]

Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems, 36:11809–11822, 2023

work page 2023
[5]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[6]

Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems, 36:68539–68551, 2023

work page 2023
[7]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

work page 2023
[8]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

work page 2024
[9]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Agentic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Yizheng Huang and Jimmy Huang. A survey on retrieval-augmented text generation for large language models.arXiv preprint arXiv:2404.10981, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. Openhands: An open platform for ai software developers as generalist agents.arXiv preprint arXiv:2407.16741, 2024. 74 Agentic Reasoning for Large Language Models

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, et al. Memos: An operating system for memory-augmented generation (mag) in large language models.arXiv preprint arXiv:2505.22101, 2025

work page arXiv 2025
[14]

Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in Neural Information Processing Systems, 36:8634–8652, 2023

work page 2023
[15]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Kristian Kersting, Jeff Z. Pan, Hinrich Schütze, Volker Tresp, and Yunpu Ma. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

work page arXiv 2023
[17]

MetaGPT: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 202...

work page 2024
[18]

Unleashing cogni- tive synergy in large language models: A task-solving agent through multi-persona self-collaboration

Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng Ji. Unleashing cogni- tive synergy in large language models: A task-solving agent through multi-persona self-collaboration. InProc. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2024), 2024

work page 2024
[19]

Battleagentbench: A benchmark for evaluating cooperation and competition capabilities of language models in multi-agent systems.arXiv preprint arXiv:2408.15971, 2024

Wei Wang, Dan Zhang, Tao Feng, Boyan Wang, and Jie Tang. Battleagentbench: A benchmark for evaluating cooperation and competition capabilities of language models in multi-agent systems.arXiv preprint arXiv:2408.15971, 2024

work page arXiv 2024
[20]

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. Agentbench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023. URLhttps://www...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Multiagentbench: Evaluating the collaboration and competition of llm agents.arXiv preprint arXiv:2503.01935, 2025

Kunlun Zhu, Hongyi Du, Zhaochen Hong, Xiaocheng Yang, Shuyi Guo, Zhe Wang, Zhenhailong Wang, Cheng Qian, Xiangru Tang, Heng Ji, et al. Multiagentbench: Evaluating the collaboration and competition of llm agents.arXiv preprint arXiv:2503.01935, 2025

work page arXiv 2025
[22]

Tree-of-code: A self-growing tree framework for end-to-end code generation and execution in complex tasks

Ziyi Ni, Yifan Li, Ning Yang, Dou Shen, Pin Lyu, and Daxiang Dong. Tree-of-code: A self-growing tree framework for end-to-end code generation and execution in complex tasks. InFindings of the Association for Computational Linguistics: ACL 2025, pages 9804–9819, 2025

work page 2025
[23]

Search-o1: Agentic Search-Enhanced Large Reasoning Models

XiaoxiLi, GuantingDong, JiajieJin, YuyaoZhang, YujiaZhou, YutaoZhu, PeitianZhang, andZhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models.arXiv preprint arXiv:2501.05366, 2025. 75 Agentic Reasoning for Large Language Models

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

TianxinWei,NoveenSachdeva,BenjaminColeman,ZhankuiHe,YuanchenBei,XuyingNing,Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, et al. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Coevolving with the other you: Fine-tuning llm with sequential cooperative multi-agent reinforcement learning

Hao Ma, Tianyi Hu, Zhiqiang Pu, Liu Boyin, Xiaolin Ai, Yanyan Liang, and Min Chen. Coevolving with the other you: Fine-tuning llm with sequential cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 37:15497–15525, 2024

work page 2024
[27]

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-r1: Training llms to reason and leverage search engines with reinforcement learning.arXiv preprint arXiv:2503.09516, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning.arXiv preprint arXiv:2505.16421, 2025

Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, et al. Webagent-r1: Training web agents via end-to-end multi-turn reinforcement learning.arXiv preprint arXiv:2505.16421, 2025

work page arXiv 2025
[29]

Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

work page 2024
[30]

Mathematical discoveries from program search with large language models.Nature, 625(7995): 468–475, 2024

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models.Nature, 625(7995): 468–475, 2024

work page 2024
[31]

Vibe coding vs

Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic AI, 2025

work page 2025
[32]

Vibe coding — wikipedia.https://en.wikipedia.org/wiki/Vibe_coding, 2025

Andrej Karpathy. Vibe coding — wikipedia.https://en.wikipedia.org/wiki/Vibe_coding, 2025

work page 2025
[33]

ChemCrow: Augmenting large-language models with chemistry tools

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

Physical ai agents: Integrating cognitive intelligence with real-world action

Fouad Bousetouane. Physical ai agents: Integrating cognitive intelligence with real-world action. arXiv preprint arXiv:2501.08944, 2025

work page arXiv 2025
[35]

Matexpert: Decomposing materials discovery by mimicking human experts.arXiv preprint arXiv:2410.21317, 2024

Qianggang Ding, Santiago Miret, and Bang Liu. Matexpert: Decomposing materials discovery by mimicking human experts.arXiv preprint arXiv:2410.21317, 2024

work page arXiv 2024
[36]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Embodiedrag: Dynamic 3d scene graph retrieval for efficient and scalable robot task planning.arXiv preprint arXiv:2410.23968, 2024

Booker Meghan, Byrd Grayson, Kemp Bethany, Schmidt Aurora, and Rivera Corban. Embodiedrag: Dynamic 3d scene graph retrieval for efficient and scalable robot task planning.arXiv preprint arXiv:2410.23968, 2024. URLhttps://www.arxiv.org/abs/2410.23968. 76 Agentic Reasoning for Large Language Models

work page arXiv 2024
[38]

Embodied-r: collaborative frame- work for activating embodied spatial reasoning in foundation models via reinforcement learning,

Baining Zhao, Ziyou Wang, Jianjie Fang, Chen Gao, Fanhang Man, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li, and Wenwu Zhu. Embodied-r: Collaborative framework for activating embodied spatial reasoning in foundation models via reinforcement learning.arXiv preprint arXiv:2504.12680, 2025

work page arXiv 2025
[39]

Mmedagent: Learning to use medical tools with multi-modal agent.arXiv preprint arXiv:2407.02483, 2024

Binxu Li, Tiankai Yan, Yuanting Pan, Jie Luo, Ruiyang Ji, Jiayuan Ding, Zhe Xu, Shilong Liu, Haoyu Dong, Zihao Lin, et al. Mmedagent: Learning to use medical tools with multi-modal agent.arXiv preprint arXiv:2407.02483, 2024

work page arXiv 2024
[40]

Biomni: A general-purpose biomedical ai agent.biorxiv, 2025

Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, et al. Biomni: A general-purpose biomedical ai agent.biorxiv, 2025

work page 2025
[41]

WebSailor: Navigating Super-human Reasoning for Web Agent

Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, et al. Websailor: Navigating super-human reasoning for web agent. arXiv preprint arXiv:2507.02592, 2025

work page internal anchor Pith review arXiv 2025
[42]

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Boyuan Zheng, Michael Y Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, et al. Skillweaver: Web agents can self-improve by discovering and honing skills.arXiv preprint arXiv:2504.07079, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Ai agents vs

Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.arXiv preprint arXiv:2505.10468, 2025

work page arXiv 2025
[44]

A dynamic llm-powered agent network for task-oriented agent collaboration

Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. A dynamic llm-powered agent network for task-oriented agent collaboration. InFirst Conference on Language Modeling, 2024

work page 2024
[45]

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854, 2023. URLhttps://www.arxiv. org/abs/2307.13854

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Gra- ham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, and Daniel Fried. Visualwebarena: Evaluat- ing multimodal agents on realistic visual web tasks.arXiv preprint arXiv:2401.13649, 2024. URL https://www.arxiv.org/abs/2401.13649

work page internal anchor Pith review arXiv 2024
[47]

Videowebarena: Evaluating long context multimodal agents with video understanding web tasks.arXiv preprint arXiv:2410.19100, 2024

Lawrence Jang, Yinheng Li, Dan Zhao, Charles Ding, Justin Lin, Paul Pu Liang, Rogerio Bonatti, and Kazuhito Koishida. Videowebarena: Evaluating long context multimodal agents with video understanding web tasks.arXiv preprint arXiv:2410.19100, 2024

work page arXiv 2024
[48]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[49]

Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web.Advances in Neural Information Processing Systems, 36:28091–28114, 2023

work page 2023
[50]

Mind2web 2: Evaluating agentic search with agent-as-a-judge.arXiv preprint arXiv:2506.21506, 2025

Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, et al. Mind2web 2: Evaluating agentic search with agent-as-a-judge.arXiv preprint arXiv:2506.21506, 2025. 77 Agentic Reasoning for Large Language Models

work page arXiv 2025
[51]

Towards reasoning in large language models: A survey

Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022

work page arXiv 2022
[52]

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models.arXiv preprint arXiv:2503.09567, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, et al. Towards large reasoning models: A survey of reinforced reasoning with large language models.arXiv preprint arXiv:2501.09686, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems.arXiv preprint arXiv:2504.09037, 2025

Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, et al. A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems.arXiv preprint arXiv:2504.09037, 2025

work page arXiv 2025
[55]

A survey of reinforcement learning for large reasoning models.arXiv preprint arXiv:2509.08827, 2025

Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, et al. A survey of reinforcement learning for large reasoning models.arXiv preprint arXiv:2509.08827, 2025

work page arXiv 2025
[56]

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. The landscape of agentic reinforcement learning for llms: A survey. arXiv preprint arXiv:2509.02547, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

A comprehensive survey on reinforcement learning-based agentic search: Foundations, roles, optimizations, evaluations, and applications.arXiv preprint arXiv:2510.16724, 2025

Minhua Lin, Zongyu Wu, Zhichao Xu, Hui Liu, Xianfeng Tang, Qi He, Charu Aggarwal, Xiang Zhang, and Suhang Wang. A comprehensive survey on reinforcement learning-based agentic search: Foundations, roles, optimizations, evaluations, and applications.arXiv preprint arXiv:2510.16724, 2025

work page arXiv 2025
[58]

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Jinyuan Fang, Yanwen Peng, Xi Zhang, Yingxu Wang, Xinhao Yi, Guibin Zhang, Yi Xu, Bin Wu, Siwei Liu, Zihao Li, et al. A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma,PeiyiWang,XiaoBi,etal. Deepseek-r1: Incentivizingreasoningcapabilityinllmsviareinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[61]

Deepretrieval: Hacking real search engines and retrievers with large language models via reinforcement learning.arXiv preprint arXiv:2503.00223, 2025

Pengcheng Jiang, Jiacheng Lin, Lang Cao, Runchu Tian, SeongKu Kang, Zifeng Wang, Jimeng Sun, and Jiawei Han. Deepretrieval: Hacking real search engines and retrievers with large language models via reinforcement learning.arXiv preprint arXiv:2503.00223, 2025

work page arXiv 2025
[62]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[63]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 78 Agentic Reasoning for Large Language Models

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, and Jiaya Jia. Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025

work page arXiv 2025
[65]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale. arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[66]

Autogen: Enabling next-gen LLM applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst Conference on Language Modeling, 2024. URLhttps://openreview.net/forum?id=BAakY1hNKS

work page 2024
[67]

Camel: Commu- nicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems, 36:51991–52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for" mind" exploration of large language model society.Advances in Neural Information Processing Systems, 36:51991–52008, 2023

work page 2023
[68]

Gptswarm: Language agents as optimizable graphs

Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmid- huber. Gptswarm: Language agents as optimizable graphs. InForty-first International Conference on Machine Learning, 2024

work page 2024
[69]

Multi-agent deep research: Training multi-agent systems with m-grpo.arXiv preprint arXiv:2511.13288, 2025

Haoyang Hong, Jiajun Yin, Yuan Wang, Jingnan Liu, Zhe Chen, Ailing Yu, Ji Li, Zhiling Ye, Hansong Xiao, Yefei Chen, et al. Multi-agent deep research: Training multi-agent systems with m-grpo.arXiv preprint arXiv:2511.13288, 2025

work page arXiv 2025
[70]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[71]

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

BinfengXu, ZhiyuanPeng, BowenLei, SubhabrataMukherjee, YuchenLiu, andDongkuanXu. REWOO: Decoupling reasoning from observations for efficient augmented language models.arXiv preprint arXiv:2305.18323, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[72]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. LLM+P: Empowering large language models with optimal planning proficiency.arXiv preprint arXiv:2304.11477, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[73]

On the planning abilities of large language models: A critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the planning abilities of large language models: A critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

work page 2023
[74]

Graph of thoughts: Solving elaborate problems with large language models

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

work page 2024
[75]

Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, and Ming Jin. Algorithm of thoughts: Enhancing exploration of ideas in large language models.arXiv preprint arXiv:2308.10379, 2023

work page arXiv 2023
[76]

Hypertree planning: Enhancing llm reasoning via hierarchical thinking

Runquan Gui, Zhihai Wang, Jie Wang, Chi Ma, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Defu Lian, Enhong Chen, and Feng Wu. Hypertree planning: Enhancing llm reasoning via hierarchical thinking. arXiv preprint arXiv:2505.02322, 2025. 79 Agentic Reasoning for Large Language Models

work page arXiv 2025
[77]

Reflect-then-plan: Offline model-based planning through a doubly bayesian lens.arXiv preprint arXiv:2506.06261, 2025

Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, and Pascal Poupart. Reflect-then-plan: Offline model-based planning through a doubly bayesian lens.arXiv preprint arXiv:2506.06261, 2025

work page arXiv 2025
[78]

Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37:126544–126565, 2024

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis.Advances in Neural Information Processing Systems, 37:126544–126565, 2024

work page 2024
[79]

Codenav: Beyond tool-use to using real-world codebases with llm agents.arXiv preprint arXiv:2406.12276, 2024

Tanmay Gupta, Luca Weihs, and Aniruddha Kembhavi. Codenav: Beyond tool-use to using real-world codebases with llm agents.arXiv preprint arXiv:2406.12276, 2024

work page arXiv 2024
[80]

Plan-on-graph: Self-correcting adaptive planning of large language model on knowledge graphs.Advances in Neural Information Processing Systems, 37:37665–37691, 2024

Liyi Chen, Panrong Tong, Zhongming Jin, Ying Sun, Jieping Ye, and Hui Xiong. Plan-on-graph: Self-correcting adaptive planning of large language model on knowledge graphs.Advances in Neural Information Processing Systems, 37:37665–37691, 2024

work page 2024

Showing first 80 references.