Recognition: 2 theorem links
· Lean TheoremAgentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Pith reviewed 2026-05-13 03:16 UTC · model grok-4.3
The pith
Embedding autonomous agents into RAG pipelines enables dynamic retrieval strategies and adaptive workflows that static systems cannot achieve.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agentic RAG embeds autonomous AI agents into the RAG pipeline. These agents leverage design patterns of reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration, thereby delivering flexibility, scalability, and context-awareness across diverse applications.
What carries the argument
A four-axis taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation that classifies Agentic RAG architectures and surfaces their design trade-offs.
If this is right
- Agentic designs support multi-step reasoning and complex task management where static RAG fails.
- Healthcare, finance, education, and enterprise document systems gain flexibility and context-awareness from the added agent patterns.
- Designers obtain concrete lessons on trade-offs from the comparative analysis of current frameworks.
- Future work must address evaluation methods, agent coordination, memory management, efficiency, and governance.
Where Pith is reading between the lines
- Similar agent patterns could be tested in non-RAG LLM settings such as pure planning or tool-augmented generation to check for broader applicability.
- New systems that require extra classification dimensions would indicate the taxonomy needs expansion.
- Multi-agent collaboration structures raise questions about how human oversight can be integrated without losing autonomy benefits.
Load-bearing premise
The four-axis taxonomy comprehensively captures all meaningful design trade-offs in existing and future Agentic RAG systems.
What would settle it
A working Agentic RAG implementation that delivers clear gains in adaptability yet cannot be placed on any of the four taxonomy axes, or controlled tests in which no agent-equipped system outperforms static RAG on multi-step reasoning benchmarks.
read the original abstract
Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management. Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver flexibility, scalability, and context-awareness across diverse applications. This paper presents an analytical survey of Agentic RAG systems. It traces the evolution of RAG paradigms, introduces a principled taxonomy of Agentic RAG architectures based on agent cardinality, control structure, autonomy, and knowledge representation, and provides a comparative analysis of design trade-offs across existing frameworks. The survey examines applications in healthcare, finance, education, and enterprise document processing, and distills practical lessons for system designers and practitioners. Finally, it identifies key open research challenges related to evaluation, coordination, memory management, efficiency, and governance, outlining directions for future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys Agentic Retrieval-Augmented Generation (Agentic RAG), which augments traditional RAG by embedding autonomous AI agents that employ design patterns such as reflection, planning, tool use, and multi-agent collaboration. It traces the evolution from static RAG systems, introduces a four-axis taxonomy (agent cardinality, control structure, autonomy, knowledge representation), performs a comparative analysis of design trade-offs, reviews applications in healthcare, finance, education, and enterprise document processing, and identifies open challenges in evaluation, coordination, memory management, efficiency, and governance.
Significance. If the taxonomy can be shown to be comprehensive and the mappings of existing systems made explicit, the survey would provide a useful organizing framework for an emerging subfield, helping researchers compare architectures and identify gaps. The synthesis of agentic patterns and domain applications adds value for practitioners seeking to move beyond static retrieval pipelines.
major comments (2)
- [Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.
- [Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.
minor comments (2)
- [Abstract] Abstract: could explicitly state how many frameworks were reviewed and what the primary quantitative or qualitative findings from the comparative analysis are.
- [Taxonomy presentation] Figure or table presenting the taxonomy: ensure each reviewed system is explicitly assigned to a cell with a brief justification so readers can verify the classification.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which highlights important opportunities to strengthen the rigor of our taxonomy and the evidential basis of our analysis. We address each major comment below and commit to targeted revisions that preserve the survey's scope while improving clarity and substantiation.
read point-by-point responses
-
Referee: [Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.
Authors: We agree that an explicit rationale for the taxonomy's construction would strengthen the central claim. The four axes were identified through iterative analysis of recurring design decisions across the surveyed literature, where they best differentiate architectural families and associated trade-offs in flexibility versus complexity. While surveys commonly introduce taxonomies via synthesis rather than axiomatic derivation, we will add a dedicated subsection detailing the selection process, why alternative dimensions (such as retrieval-frequency adaptation) are treated as secondary and subsumed under autonomy and control, and an exhaustive mapping table assigning every cited system to a unique axis combination. This revision will make the completeness argument explicit without altering the taxonomy itself. revision: partial
-
Referee: [Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.
Authors: As a survey, the manuscript synthesizes existing literature rather than conducting new experiments or ablations. We will nevertheless enhance the applications and comparative analysis sections by adding explicit system-to-taxonomy mappings and by extracting and tabulating concrete quantitative results (e.g., accuracy gains, latency or cost reductions) reported in the original papers for each reviewed system. Where such metrics are unavailable in the source literature, we will note the limitation. These additions will ground the claimed benefits and practical lessons in documented evidence while remaining within the survey format. revision: partial
Circularity Check
No significant circularity: survey performs synthesis and classification without derivations or reductions.
full rationale
This paper is an analytical survey that traces RAG evolution, proposes a four-axis taxonomy for classification, reviews applications, and lists open challenges. No equations, predictions, fitted parameters, or derivation chains exist that could reduce to prior definitions or self-citations by construction. The taxonomy is introduced as an organizing framework for existing systems rather than derived from first principles in a self-referential manner. Any self-citations (if present) support literature review and are not load-bearing for a central claim that reduces to them. The work is self-contained as synthesis against external literature benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Agentic RAG transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 29 Pith papers
-
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
-
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can intrinsically retrieve and reuse pre-encoded evidence chunks via decoder attention queries, unifying retrieval with generation and outperforming external RAG pipelines on QA benchmarks.
-
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
-
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow
RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
-
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding
A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
-
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning
E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.
-
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
-
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...
-
An Agentic Approach to Metadata Reasoning
Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99...
-
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Corpus2Skill distills corpora into navigable hierarchical skill trees that LLM agents actively explore for QA and RAG, outperforming dense retrieval and RAPTOR on enterprise benchmarks and characterizing when navigati...
-
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying
ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
-
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
CuSearch reallocates fixed training budget toward deeper-search rollouts in RLVR for agentic RAG, treating search depth as an annotation-free proxy for supervision density and reporting up to 11.8 exact-match gains ov...
-
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery
PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
-
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
-
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
-
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.
-
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.
-
Mind DeepResearch Technical Report
MindDR combines a Planning Agent, DeepSearch Agent, and Report Agent with SFT cold-start, Search-RL, Report-RL, and preference alignment to reach competitive scores on research benchmarks using 30B-scale models.
-
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
A structured survey of confidential computing for agentic AI that catalogs TEE platforms, agent-specific threats, transferable defenses, and remaining gaps in end-to-end frameworks.
-
Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU
Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.
-
LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling
LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
-
Toward Agentic RAG for Ukrainian
Agentic RAG for Ukrainian improves answer accuracy via retries but is still limited by document and page retrieval quality.
-
PAL: Personal Adaptive Learner
PAL is an AI platform that converts lecture videos into real-time adaptive interactive learning with dynamic questions and tailored end-of-session summaries.
-
Automotive Engineering-Centric Agentic AI Workflow Framework
The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.
-
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.
-
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...
Reference graph
Works this paper leans on
-
[1]
Large language models: A survey, 2024
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey, 2024
work page 2024
-
[2]
Exploring language models: A comprehensive survey and analysis
Aditi Singh. Exploring language models: A comprehensive survey and analysis. In2023 International Con- ference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), pages 1–4, 2023
work page 2023
-
[3]
A survey of large language models, 2024
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models, 2024
work page 2024
-
[4]
A complete survey on llm-based ai chatbots, 2024
Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, and Chaoning Zhang. A complete survey on llm-based ai chatbots, 2024
work page 2024
-
[5]
A survey of ai text-to-image and ai text-to-video generators
Aditi Singh. A survey of ai text-to-image and ai text-to-video generators. In2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC), pages 32–36, 2023
work page 2023
-
[6]
Exploring prompt engineering: A systematic review with swot analysis, 2024
Aditi Singh, Abul Ehtesham, Gaurav Kumar Gupta, Nikhil Kumar Chatta, Saket Kumar, and Tala Talaei Khoei. Exploring prompt engineering: A systematic review with swot analysis, 2024
work page 2024
-
[7]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, November 2024
work page 2024
-
[8]
Ioannidis, Huzefa Rangwala, and Christos Faloutsos
Meng-Chieh Lee, Qi Zhu, Costas Mavromatis, Zhen Han, Soji Adeshina, Vassilis N. Ioannidis, Huzefa Rangwala, and Christos Faloutsos. Agent-g: An agentic framework for graph retrieval augmented generation, 2024
work page 2024
-
[9]
Retrieval-augmented generation for ai-generated content: A survey, 2024
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey, 2024
work page 2024
-
[10]
Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation, 2023. 36 Table 5: Downstream Tasks and Datasets for RAG Evaluation (Adapted from [21] Category Task Type Datasets and References QA Single-hop QA Natural Questions (NQ) [68], TriviaQA [69], SQu...
work page 2023
-
[11]
A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023
Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023
work page 2023
-
[12]
Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.Information Fusion, 126:103599, 2026
work page 2026
-
[13]
Building effective agents, 2024
Anthropic. Building effective agents, 2024. https://www.anthropic.com/research/ building-effective-agents. Accessed: January 15, 2026
work page 2024
-
[14]
A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows,
E. Bandara, R. Gore, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, X. Liang, S. H. Bouk, A. Hass, S. Rajapakse, N. W. Keong, K. De Zoysa, A. Withanage, and N. Loganathan. A practical guide for designing, developing, and deploying production-grade agentic ai workflows.arXiv preprint, abs/2512.08769, 2025. Old Dominion University, Norfolk, V A, USA; Deloit...
-
[15]
Agentic retrieval-augmented generation for time series analysis, 2024
Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval-augmented generation for time series analysis, 2024
work page 2024
-
[16]
Towards reasoning in large language models: A survey, 2023
Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023
work page 2023
-
[17]
Graph retrieval-augmented generation: A survey, 2024
Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey, 2024
work page 2024
-
[18]
Revolutionizing mental health care through langchain: A journey with a large language model
Aditi Singh, Abul Ehtesham, Saifuddin Mahmud, and Jong-Hoon Kim. Revolutionizing mental health care through langchain: A journey with a large language model. InIEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), pages 0073–0078, 2024
work page 2024
-
[19]
Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, and Abul Ehtesham. Digital diagnostics: The potential of large language models in recognizing symptoms of common illnesses.AI, 6(1), 2025
work page 2025
-
[20]
Encouraging responsible use of generative ai in education: A reward-based learning approach
Aditi Singh, Abul Ehtesham, Saket Kumar, Gaurav Kumar Gupta, and Tala Talaei Khoei. Encouraging responsible use of generative ai in education: A reward-based learning approach. In Tim Schlippe, Eric C. K. Cheng, and Tianchong Wang, editors,Artificial Intelligence in Education Technologies: New Development and Innovative Practices, pages 404–413, Singapore...
work page 2025
-
[21]
Retrieval-augmented generation for large language models: A survey, 2024
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024
work page 2024
-
[22]
Dense passage retrieval for open-domain question answering, 2020
Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. Dense passage retrieval for open-domain question answering, 2020
work page 2020
-
[23]
A survey on the memory mechanism of large language model based agents, 2024
Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents, 2024
work page 2024
-
[24]
Critic: Large language models can self-correct with tool-interactive critiquing, 2024
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large language models can self-correct with tool-interactive critiquing, 2024
work page 2024
-
[25]
Understanding the planning of llm agents: A survey, 2024
Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey, 2024
work page 2024
-
[26]
Enhancing ai systems with agentic workflows patterns in large language model
Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Enhancing ai systems with agentic workflows patterns in large language model. InIEEE World AI IoT Congress (AIIoT), pages 527–532, 2024
work page 2024
-
[27]
How agents can improve llm performance
DeepLearning.AI. How agents can improve llm performance. https://www.deeplearning.ai/the-batch/ how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io , 2024. Ac- cessed: 2026-01-13
work page 2024
-
[28]
Self-refine: Iterative refinement with self-feedback, 2023
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback, 2023
work page 2023
-
[29]
Reflexion: Language agents with verbal reinforcement learning, 2023
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning, 2023
work page 2023
-
[30]
Chawla, Olaf Wiest, and Xiangliang Zhang
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges, 2024
work page 2024
-
[31]
Langgraph workflows tutorial, 2025
LangChain. Langgraph workflows tutorial, 2025. https://langchain-ai.github.io/langgraph/ tutorials/workflows/. Accessed: January 18, 2026
work page 2025
-
[32]
Weaviate Blog. What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is% 20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline.Accessed: 2026-01-14
work page 2026
-
[33]
Corrective retrieval augmented generation, 2024
Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation, 2024
work page 2024
-
[34]
Langgraph crag: Contextualized retrieval-augmented generation tutorial
LangGraph CRAG Tutorial. Langgraph crag: Contextualized retrieval-augmented generation tutorial. https: //langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/. Accessed: 2026-01-14
work page 2026
-
[35]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity, 2024
work page 2024
-
[36]
Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial
LangGraph Adaptive RAG Tutorial. Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial. https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/. Accessed: 2026-01-14
work page 2026
-
[37]
Zhili Shen, Chenxin Diao, Pavlos V ougiouklis, Pascual Merita, Shriram Piramanayagam, Damien Graux, Dandan Tu, Zeren Jiang, Ruofei Lai, Yang Ren, and Jeff Z. Pan. Gear: Graph-enhanced agent for retrieval-augmented generation, 2024. 38
work page 2024
-
[38]
Introducing agentic document workflows
LlamaIndex. Introducing agentic document workflows. https://www.llamaindex.ai/blog/ introducing-agentic-document-workflows, 2025. Accessed: 2026-01-13
work page 2025
-
[39]
How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales
AWS Machine Learning Blog. How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales. https://aws.amazon.com/blogs/machine-learning/ how-twitch-used-agentic-workflow-with-rag-on-amazon-bedrock-to-supercharge-ad-sales/ ,
-
[40]
Accessed: 2026-01-13
work page 2026
-
[41]
Patient case summary workflow using llamacloud
LlamaCloud Demo Repository. Patient case summary workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/ patient_case_summary/patient_case_summary.ipynb, 2025. Accessed: 2026-01-13
work page 2025
-
[42]
Contract review workflow using llamacloud
LlamaCloud Demo Repository. Contract review workflow using llamacloud. https://github.com/ run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/ contract_review.ipynb, 2025. Accessed: 2026-01-13
work page 2025
-
[43]
Auto insurance claims workflow using llamacloud
LlamaCloud Demo Repository. Auto insurance claims workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_ insurance_claims/auto_insurance_claims.ipynb, 2025. Accessed: 2026-01-13
work page 2025
-
[44]
Research paper report generation workflow using llamacloud
LlamaCloud Demo Repository. Research paper report generation workflow using llamacloud. https://github.com/run-llama/llamacloud-demo/blob/main/examples/report_generation/ research_paper_report_generation.ipynb, 2025. Accessed: 2026-01-13
work page 2025
-
[45]
Langgraph agentic rag: Nodes and edges tutorial
LangGraph Agentic RAG Tutorial. Langgraph agentic rag: Nodes and edges tutorial. https://langchain-ai. github.io/langgraph/tutorials/rag/langgraph_agentic_rag/#nodes-and-edges. Accessed: 2026-01-14
work page 2026
-
[46]
LlamaIndex Blog. Agentic rag with llamaindex. https://www.llamaindex.ai/blog/ agentic-rag-with-llamaindex-2721b8a49ff6. Accessed: 2026-01-14
work page 2026
-
[47]
Hugging Face Cookbook. Agentic rag: Turbocharge your retrieval-augmented generation with query reformula- tion and self-query.https://huggingface.co/learn/cookbook/en/agent_rag. Accessed: 2026-01-14
work page 2026
-
[48]
Agentic rag: Combining rag with agents for enhanced information retrieval
Qdrant Blog. Agentic rag: Combining rag with agents for enhanced information retrieval. https://qdrant. tech/articles/agentic-rag/. Accessed: 2026-01-14
work page 2026
-
[49]
crewai: A github repository for ai projects
crewAI Inc. crewai: A github repository for ai projects. https://github.com/crewAIInc/crewAI, 2025. Accessed: 2026-01-15
work page 2025
-
[50]
Ag2: A github repository for advanced generative ai research
AG2AI Contributors. Ag2: A github repository for advanced generative ai research. https://github.com/ ag2ai/ag2, 2025. Accessed: 2026-01-15
work page 2025
-
[51]
Autogen: Enabling next-gen llm applications via multi-agent conversation framework
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. 2023
work page 2023
-
[52]
Training language model agents without modifying language models.ICML’24, 2024
Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, and Qingyun Wu. Training language model agents without modifying language models.ICML’24, 2024
work page 2024
-
[53]
Swarm: Lightweight multi-agent orchestration framework
OpenAI. Swarm: Lightweight multi-agent orchestration framework. https://github.com/openai/swarm. Accessed: 2026-01-14
work page 2026
-
[54]
LlamaIndex Documentation. Agentic rag using vertex ai. https://docs.llamaindex.ai/en/stable/ examples/agent/agentic_rag_using_vertex_ai/. Accessed: 2026-01-14
work page 2026
-
[55]
Semantic kernel overview, 2025
Microsoft. Semantic kernel overview, 2025. https://learn.microsoft.com/en-us/semantic-kernel/ overview/. Accessed: January 18, 2026
work page 2025
-
[56]
Semantic kernel github repository, 2025
Microsoft. Semantic kernel github repository, 2025. https://github.com/microsoft/semantic-kernel. Accessed: January 18, 2026
work page 2025
-
[57]
Agentic rag: Ai agents with ibm granite models
IBM Granite Community. Agentic rag: Ai agents with ibm granite models. https://github.com/ ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_ RAG.ipynb. Accessed: 2026-01-14
work page 2026
-
[58]
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. From llm reasoning to autonomous ai agents: A comprehensive review.arXiv preprint, abs/2504.19678, 2025. Guelma University, Algeria; Technology Innovation Institute, UAE; Eotvos Lorand University, Hungary; Khalifa University of Science and Technology, UAE; Corresponding author: ferrag.mohamedamin...
work page internal anchor Pith review arXiv 2025
-
[59]
Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021. 39
work page 2021
-
[60]
Ms marco: A human generated machine reading comprehension dataset, 2018
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. Ms marco: A human generated machine reading comprehension dataset, 2018
work page 2018
-
[61]
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. V oorhees, and Ian Soboroff. Overview of the trec 2022 deep learning track. InText REtrieval Conference (TREC). NIST, TREC, March 2023
work page 2022
-
[62]
Musique: Multihop questions via single-hop question composition, 2022
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition, 2022
work page 2022
-
[63]
Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020
work page 2020
-
[64]
Cohen, Ruslan Salakhutdinov, and Christo- pher D
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christo- pher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018
work page 2018
-
[65]
Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024
Robert Friel, Masha Belyi, and Atindriyo Sanyal. Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024
work page 2024
-
[66]
Bergen: A benchmarking library for retrieval-augmented generation, 2024
David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, and Stéphane Clinchant. Bergen: A benchmarking library for retrieval-augmented generation, 2024
work page 2024
-
[67]
Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024
Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, and Zhicheng Dou. Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024
work page 2024
-
[68]
Gnn-rag: Graph neural retrieval for large language model reasoning, 2024
Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning, 2024
work page 2024
-
[69]
Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research.Transact...
work page 2019
-
[70]
Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension, 2017
work page 2017
-
[71]
Squad: 100,000+ questions for machine comprehension of text, 2016
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text, 2016
work page 2016
-
[72]
Chou, Roy Frostig, and Percy Liang
Jonathan Berant, Andrew K. Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question- answer pairs. InConference on Empirical Methods in Natural Language Processing, 2013
work page 2013
-
[73]
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol...
work page 2023
-
[74]
Eli5: Long form question answering, 2019
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. Eli5: Long form question answering, 2019
work page 2019
-
[75]
The narrativeqa reading comprehension challenge
Tomáš Koˇciský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. The narrativeqa reading comprehension challenge. 2017
work page 2017
-
[76]
Asqa: Factoid questions meet long-form answers, 2023
Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and Ming-Wei Chang. Asqa: Factoid questions meet long-form answers, 2023
work page 2023
-
[77]
QMSum: A new benchmark for query-based multi-domain meeting summarization
Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir Radev. QMSum: A new benchmark for query-based multi-domain meeting summarization. pages 5905–5921, June 2021
work page 2021
-
[78]
Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information- seeking questions and answers anchored in research papers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 ...
work page 2021
-
[79]
COVID-QA: A question answering dataset for COVID-19
Timo Möller, Anthony Reina, Raghavan Jayakumar, and Malte Pietsch. COVID-QA: A question answering dataset for COVID-19. InACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), 2020. 40
work page 2020
-
[80]
Cmb: A comprehensive medical benchmark in chinese, 2024
Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, and Haizhou Li. Cmb: A comprehensive medical benchmark in chinese, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.