pith. machine review for the scientific record. sign in

arxiv: 2501.09136 · v4 · submitted 2025-01-15 · 💻 cs.AI · cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Authors on Pith no claims yet

Pith reviewed 2026-05-13 03:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.IR
keywords agentic ragretrieval-augmented generationlarge language modelsautonomous agentsmulti-agent collaborationrag taxonomydynamic workflowsai agents
0
0 comments X

The pith

Embedding autonomous agents into RAG pipelines enables dynamic retrieval strategies and adaptive workflows that static systems cannot achieve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that traditional RAG systems are limited by fixed workflows and cannot support multi-step reasoning or complex task management. It shows that inserting autonomous AI agents equipped with reflection, planning, tool use, and multi-agent collaboration overcomes these constraints by allowing real-time adjustment of retrieval and context refinement. The survey supplies a four-axis taxonomy to organize existing architectures, compares their trade-offs, reviews domain applications, and flags open issues in evaluation and coordination that matter to anyone building reliable LLM systems for evolving data environments.

Core claim

Agentic RAG embeds autonomous AI agents into the RAG pipeline. These agents leverage design patterns of reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration, thereby delivering flexibility, scalability, and context-awareness across diverse applications.

What carries the argument

A four-axis taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation that classifies Agentic RAG architectures and surfaces their design trade-offs.

If this is right

  • Agentic designs support multi-step reasoning and complex task management where static RAG fails.
  • Healthcare, finance, education, and enterprise document systems gain flexibility and context-awareness from the added agent patterns.
  • Designers obtain concrete lessons on trade-offs from the comparative analysis of current frameworks.
  • Future work must address evaluation methods, agent coordination, memory management, efficiency, and governance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar agent patterns could be tested in non-RAG LLM settings such as pure planning or tool-augmented generation to check for broader applicability.
  • New systems that require extra classification dimensions would indicate the taxonomy needs expansion.
  • Multi-agent collaboration structures raise questions about how human oversight can be integrated without losing autonomy benefits.

Load-bearing premise

The four-axis taxonomy comprehensively captures all meaningful design trade-offs in existing and future Agentic RAG systems.

What would settle it

A working Agentic RAG implementation that delivers clear gains in adaptability yet cannot be placed on any of the four taxonomy axes, or controlled tests in which no agent-equipped system outperforms static RAG on multi-step reasoning benchmarks.

read the original abstract

Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management. Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver flexibility, scalability, and context-awareness across diverse applications. This paper presents an analytical survey of Agentic RAG systems. It traces the evolution of RAG paradigms, introduces a principled taxonomy of Agentic RAG architectures based on agent cardinality, control structure, autonomy, and knowledge representation, and provides a comparative analysis of design trade-offs across existing frameworks. The survey examines applications in healthcare, finance, education, and enterprise document processing, and distills practical lessons for system designers and practitioners. Finally, it identifies key open research challenges related to evaluation, coordination, memory management, efficiency, and governance, outlining directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys Agentic Retrieval-Augmented Generation (Agentic RAG), which augments traditional RAG by embedding autonomous AI agents that employ design patterns such as reflection, planning, tool use, and multi-agent collaboration. It traces the evolution from static RAG systems, introduces a four-axis taxonomy (agent cardinality, control structure, autonomy, knowledge representation), performs a comparative analysis of design trade-offs, reviews applications in healthcare, finance, education, and enterprise document processing, and identifies open challenges in evaluation, coordination, memory management, efficiency, and governance.

Significance. If the taxonomy can be shown to be comprehensive and the mappings of existing systems made explicit, the survey would provide a useful organizing framework for an emerging subfield, helping researchers compare architectures and identify gaps. The synthesis of agentic patterns and domain applications adds value for practitioners seeking to move beyond static retrieval pipelines.

major comments (2)
  1. [Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.
  2. [Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.
minor comments (2)
  1. [Abstract] Abstract: could explicitly state how many frameworks were reviewed and what the primary quantitative or qualitative findings from the comparative analysis are.
  2. [Taxonomy presentation] Figure or table presenting the taxonomy: ensure each reviewed system is explicitly assigned to a cell with a brief justification so readers can verify the classification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important opportunities to strengthen the rigor of our taxonomy and the evidential basis of our analysis. We address each major comment below and commit to targeted revisions that preserve the survey's scope while improving clarity and substantiation.

read point-by-point responses
  1. Referee: [Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.

    Authors: We agree that an explicit rationale for the taxonomy's construction would strengthen the central claim. The four axes were identified through iterative analysis of recurring design decisions across the surveyed literature, where they best differentiate architectural families and associated trade-offs in flexibility versus complexity. While surveys commonly introduce taxonomies via synthesis rather than axiomatic derivation, we will add a dedicated subsection detailing the selection process, why alternative dimensions (such as retrieval-frequency adaptation) are treated as secondary and subsumed under autonomy and control, and an exhaustive mapping table assigning every cited system to a unique axis combination. This revision will make the completeness argument explicit without altering the taxonomy itself. revision: partial

  2. Referee: [Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.

    Authors: As a survey, the manuscript synthesizes existing literature rather than conducting new experiments or ablations. We will nevertheless enhance the applications and comparative analysis sections by adding explicit system-to-taxonomy mappings and by extracting and tabulating concrete quantitative results (e.g., accuracy gains, latency or cost reductions) reported in the original papers for each reviewed system. Where such metrics are unavailable in the source literature, we will note the limitation. These additions will ground the claimed benefits and practical lessons in documented evidence while remaining within the survey format. revision: partial

Circularity Check

0 steps flagged

No significant circularity: survey performs synthesis and classification without derivations or reductions.

full rationale

This paper is an analytical survey that traces RAG evolution, proposes a four-axis taxonomy for classification, reviews applications, and lists open challenges. No equations, predictions, fitted parameters, or derivation chains exist that could reduce to prior definitions or self-citations by construction. The taxonomy is introduced as an organizing framework for existing systems rather than derived from first principles in a self-referential manner. Any self-citations (if present) support literature review and are not load-bearing for a central claim that reduces to them. The work is self-contained as synthesis against external literature benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper that synthesizes existing research on RAG and agentic systems; it introduces no new free parameters, mathematical axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5610 in / 1181 out tokens · 105602 ms · 2026-05-13T03:16:41.115715+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

    cs.AI 2026-05 unverdicted novelty 7.0

    CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.

  2. LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

    cs.CL 2026-05 unverdicted novelty 7.0

    LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.

  3. Retrieval from Within: An Intrinsic Capability of Attention-Based Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Attention-based models can intrinsically retrieve and reuse pre-encoded evidence chunks via decoder attention queries, unifying retrieval with generation and outperforming external RAG pipelines on QA benchmarks.

  4. SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

    cs.CL 2026-05 unverdicted novelty 7.0

    SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

  5. RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow

    cs.SE 2026-04 unverdicted novelty 7.0

    RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.

  6. A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

    cs.AI 2026-04 unverdicted novelty 7.0

    A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.

  7. E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

    cs.SE 2026-04 unverdicted novelty 7.0

    E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.

  8. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    cs.CL 2025-11 unverdicted novelty 7.0

    Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

  9. Retrieval from Within: An Intrinsic Capability of Attention-Based Models

    cs.LG 2026-05 unverdicted novelty 6.0

    Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

  10. Agentic Retrieval-Augmented Generation for Financial Document Question Answering

    cs.AI 2026-05 unverdicted novelty 6.0

    FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...

  11. An Agentic Approach to Metadata Reasoning

    cs.DB 2026-04 unverdicted novelty 6.0

    Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99...

  12. Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

    cs.IR 2026-04 unverdicted novelty 6.0

    Corpus2Skill distills corpora into navigable hierarchical skill trees that LLM agents actively explore for QA and RAG, outperforming dense retrieval and RAPTOR on enterprise benchmarks and characterizing when navigati...

  13. ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

    cs.CR 2026-04 unverdicted novelty 6.0

    ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.

  14. CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

    cs.AI 2026-05 unverdicted novelty 5.0

    CuSearch reallocates fixed training budget toward deeper-search rollouts in RLVR for agentic RAG, treating search depth as an annotation-free proxy for supervision density and reporting up to 11.8 exact-match gains ov...

  15. Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery

    cs.IR 2026-05 conditional novelty 5.0

    PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.

  16. AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

    cs.AI 2026-05 unverdicted novelty 5.0

    AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.

  17. When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

    cs.CR 2026-05 unverdicted novelty 5.0

    A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.

  18. SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms

    cs.DB 2026-04 unverdicted novelty 5.0

    SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.

  19. Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

    cs.IR 2026-04 unverdicted novelty 5.0

    QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.

  20. Mind DeepResearch Technical Report

    cs.AI 2026-04 unverdicted novelty 5.0

    MindDR combines a Planning Agent, DeepSearch Agent, and Report Agent with SFT cold-start, Search-RL, Report-RL, and preference alignment to reach competitive scores on research benchmarks using 30B-scale models.

  21. When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

    cs.CR 2026-05 unverdicted novelty 4.0

    A structured survey of confidential computing for agentic AI that catalogs TEE platforms, agent-specific threats, transferable defenses, and remaining gaps in end-to-end frameworks.

  22. Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU

    cs.AI 2026-04 unverdicted novelty 4.0

    Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.

  23. LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling

    physics.comp-ph 2026-04 unverdicted novelty 4.0

    LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.

  24. From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    cs.AI 2025-04 accept novelty 4.0

    A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

  25. Toward Agentic RAG for Ukrainian

    cs.AI 2026-04 unverdicted novelty 3.0

    Agentic RAG for Ukrainian improves answer accuracy via retries but is still limited by document and page retrieval quality.

  26. PAL: Personal Adaptive Learner

    cs.AI 2026-04 unverdicted novelty 3.0

    PAL is an AI platform that converts lecture videos into real-time adaptive interactive learning with dynamic questions and tailored end-of-session summaries.

  27. Automotive Engineering-Centric Agentic AI Workflow Framework

    cs.AI 2026-04 unverdicted novelty 3.0

    The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.

  28. A Brief Overview: Agentic Reinforcement Learning In Large Language Models

    cs.AI 2026-04 unverdicted novelty 2.0

    This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.

  29. A Brief Overview: Agentic Reinforcement Learning In Large Language Models

    cs.AI 2026-04 unverdicted novelty 2.0

    The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · cited by 25 Pith papers · 1 internal anchor

  1. [1]

    Large language models: A survey, 2024

    Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey, 2024

  2. [2]

    Exploring language models: A comprehensive survey and analysis

    Aditi Singh. Exploring language models: A comprehensive survey and analysis. In2023 International Con- ference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), pages 1–4, 2023

  3. [3]

    A survey of large language models, 2024

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models, 2024

  4. [4]

    A complete survey on llm-based ai chatbots, 2024

    Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, and Chaoning Zhang. A complete survey on llm-based ai chatbots, 2024

  5. [5]

    A survey of ai text-to-image and ai text-to-video generators

    Aditi Singh. A survey of ai text-to-image and ai text-to-video generators. In2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC), pages 32–36, 2023

  6. [6]

    Exploring prompt engineering: A systematic review with swot analysis, 2024

    Aditi Singh, Abul Ehtesham, Gaurav Kumar Gupta, Nikhil Kumar Chatta, Saket Kumar, and Tala Talaei Khoei. Exploring prompt engineering: A systematic review with swot analysis, 2024

  7. [7]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, November 2024

    Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, November 2024

  8. [8]

    Ioannidis, Huzefa Rangwala, and Christos Faloutsos

    Meng-Chieh Lee, Qi Zhu, Costas Mavromatis, Zhen Han, Soji Adeshina, Vassilis N. Ioannidis, Huzefa Rangwala, and Christos Faloutsos. Agent-g: An agentic framework for graph retrieval augmented generation, 2024

  9. [9]

    Retrieval-augmented generation for ai-generated content: A survey, 2024

    Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey, 2024

  10. [10]

    Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig

    Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation, 2023. 36 Table 5: Downstream Tasks and Datasets for RAG Evaluation (Adapted from [21] Category Task Type Datasets and References QA Single-hop QA Natural Questions (NQ) [68], TriviaQA [69], SQu...

  11. [11]

    A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023

    Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023

  12. [12]

    Roumeliotis, and Manoj Karkee

    Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.Information Fusion, 126:103599, 2026

  13. [13]

    Building effective agents, 2024

    Anthropic. Building effective agents, 2024. https://www.anthropic.com/research/ building-effective-agents. Accessed: January 15, 2026

  14. [14]

    A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows,

    E. Bandara, R. Gore, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, X. Liang, S. H. Bouk, A. Hass, S. Rajapakse, N. W. Keong, K. De Zoysa, A. Withanage, and N. Loganathan. A practical guide for designing, developing, and deploying production-grade agentic ai workflows.arXiv preprint, abs/2512.08769, 2025. Old Dominion University, Norfolk, V A, USA; Deloit...

  15. [15]

    Agentic retrieval-augmented generation for time series analysis, 2024

    Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval-augmented generation for time series analysis, 2024

  16. [16]

    Towards reasoning in large language models: A survey, 2023

    Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023

  17. [17]

    Graph retrieval-augmented generation: A survey, 2024

    Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey, 2024

  18. [18]

    Revolutionizing mental health care through langchain: A journey with a large language model

    Aditi Singh, Abul Ehtesham, Saifuddin Mahmud, and Jong-Hoon Kim. Revolutionizing mental health care through langchain: A journey with a large language model. InIEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), pages 0073–0078, 2024

  19. [19]

    Digital diagnostics: The potential of large language models in recognizing symptoms of common illnesses.AI, 6(1), 2025

    Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, and Abul Ehtesham. Digital diagnostics: The potential of large language models in recognizing symptoms of common illnesses.AI, 6(1), 2025

  20. [20]

    Encouraging responsible use of generative ai in education: A reward-based learning approach

    Aditi Singh, Abul Ehtesham, Saket Kumar, Gaurav Kumar Gupta, and Tala Talaei Khoei. Encouraging responsible use of generative ai in education: A reward-based learning approach. In Tim Schlippe, Eric C. K. Cheng, and Tianchong Wang, editors,Artificial Intelligence in Education Technologies: New Development and Innovative Practices, pages 404–413, Singapore...

  21. [21]

    Retrieval-augmented generation for large language models: A survey, 2024

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024

  22. [22]

    Dense passage retrieval for open-domain question answering, 2020

    Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. Dense passage retrieval for open-domain question answering, 2020

  23. [23]

    A survey on the memory mechanism of large language model based agents, 2024

    Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents, 2024

  24. [24]

    Critic: Large language models can self-correct with tool-interactive critiquing, 2024

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large language models can self-correct with tool-interactive critiquing, 2024

  25. [25]

    Understanding the planning of llm agents: A survey, 2024

    Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey, 2024

  26. [26]

    Enhancing ai systems with agentic workflows patterns in large language model

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Enhancing ai systems with agentic workflows patterns in large language model. InIEEE World AI IoT Congress (AIIoT), pages 527–532, 2024

  27. [27]

    How agents can improve llm performance

    DeepLearning.AI. How agents can improve llm performance. https://www.deeplearning.ai/the-batch/ how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io , 2024. Ac- cessed: 2026-01-13

  28. [28]

    Self-refine: Iterative refinement with self-feedback, 2023

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback, 2023

  29. [29]

    Reflexion: Language agents with verbal reinforcement learning, 2023

    Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning, 2023

  30. [30]

    Chawla, Olaf Wiest, and Xiangliang Zhang

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges, 2024

  31. [31]

    Langgraph workflows tutorial, 2025

    LangChain. Langgraph workflows tutorial, 2025. https://langchain-ai.github.io/langgraph/ tutorials/workflows/. Accessed: January 18, 2026

  32. [32]

    What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is% 20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline.Accessed: 2026-01-14

    Weaviate Blog. What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is% 20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline.Accessed: 2026-01-14

  33. [33]

    Corrective retrieval augmented generation, 2024

    Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation, 2024

  34. [34]

    Langgraph crag: Contextualized retrieval-augmented generation tutorial

    LangGraph CRAG Tutorial. Langgraph crag: Contextualized retrieval-augmented generation tutorial. https: //langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/. Accessed: 2026-01-14

  35. [35]

    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity, 2024

  36. [36]

    Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial

    LangGraph Adaptive RAG Tutorial. Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial. https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/. Accessed: 2026-01-14

  37. [37]

    Zhili Shen, Chenxin Diao, Pavlos V ougiouklis, Pascual Merita, Shriram Piramanayagam, Damien Graux, Dandan Tu, Zeren Jiang, Ruofei Lai, Yang Ren, and Jeff Z. Pan. Gear: Graph-enhanced agent for retrieval-augmented generation, 2024. 38

  38. [38]

    Introducing agentic document workflows

    LlamaIndex. Introducing agentic document workflows. https://www.llamaindex.ai/blog/ introducing-agentic-document-workflows, 2025. Accessed: 2026-01-13

  39. [39]

    How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales

    AWS Machine Learning Blog. How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales. https://aws.amazon.com/blogs/machine-learning/ how-twitch-used-agentic-workflow-with-rag-on-amazon-bedrock-to-supercharge-ad-sales/ ,

  40. [40]

    Accessed: 2026-01-13

  41. [41]

    Patient case summary workflow using llamacloud

    LlamaCloud Demo Repository. Patient case summary workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/ patient_case_summary/patient_case_summary.ipynb, 2025. Accessed: 2026-01-13

  42. [42]

    Contract review workflow using llamacloud

    LlamaCloud Demo Repository. Contract review workflow using llamacloud. https://github.com/ run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/ contract_review.ipynb, 2025. Accessed: 2026-01-13

  43. [43]

    Auto insurance claims workflow using llamacloud

    LlamaCloud Demo Repository. Auto insurance claims workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_ insurance_claims/auto_insurance_claims.ipynb, 2025. Accessed: 2026-01-13

  44. [44]

    Research paper report generation workflow using llamacloud

    LlamaCloud Demo Repository. Research paper report generation workflow using llamacloud. https://github.com/run-llama/llamacloud-demo/blob/main/examples/report_generation/ research_paper_report_generation.ipynb, 2025. Accessed: 2026-01-13

  45. [45]

    Langgraph agentic rag: Nodes and edges tutorial

    LangGraph Agentic RAG Tutorial. Langgraph agentic rag: Nodes and edges tutorial. https://langchain-ai. github.io/langgraph/tutorials/rag/langgraph_agentic_rag/#nodes-and-edges. Accessed: 2026-01-14

  46. [46]

    Agentic rag with llamaindex

    LlamaIndex Blog. Agentic rag with llamaindex. https://www.llamaindex.ai/blog/ agentic-rag-with-llamaindex-2721b8a49ff6. Accessed: 2026-01-14

  47. [47]

    Agentic rag: Turbocharge your retrieval-augmented generation with query reformula- tion and self-query.https://huggingface.co/learn/cookbook/en/agent_rag

    Hugging Face Cookbook. Agentic rag: Turbocharge your retrieval-augmented generation with query reformula- tion and self-query.https://huggingface.co/learn/cookbook/en/agent_rag. Accessed: 2026-01-14

  48. [48]

    Agentic rag: Combining rag with agents for enhanced information retrieval

    Qdrant Blog. Agentic rag: Combining rag with agents for enhanced information retrieval. https://qdrant. tech/articles/agentic-rag/. Accessed: 2026-01-14

  49. [49]

    crewai: A github repository for ai projects

    crewAI Inc. crewai: A github repository for ai projects. https://github.com/crewAIInc/crewAI, 2025. Accessed: 2026-01-15

  50. [50]

    Ag2: A github repository for advanced generative ai research

    AG2AI Contributors. Ag2: A github repository for advanced generative ai research. https://github.com/ ag2ai/ag2, 2025. Accessed: 2026-01-15

  51. [51]

    Autogen: Enabling next-gen llm applications via multi-agent conversation framework

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. 2023

  52. [52]

    Training language model agents without modifying language models.ICML’24, 2024

    Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, and Qingyun Wu. Training language model agents without modifying language models.ICML’24, 2024

  53. [53]

    Swarm: Lightweight multi-agent orchestration framework

    OpenAI. Swarm: Lightweight multi-agent orchestration framework. https://github.com/openai/swarm. Accessed: 2026-01-14

  54. [54]

    Agentic rag using vertex ai

    LlamaIndex Documentation. Agentic rag using vertex ai. https://docs.llamaindex.ai/en/stable/ examples/agent/agentic_rag_using_vertex_ai/. Accessed: 2026-01-14

  55. [55]

    Semantic kernel overview, 2025

    Microsoft. Semantic kernel overview, 2025. https://learn.microsoft.com/en-us/semantic-kernel/ overview/. Accessed: January 18, 2026

  56. [56]

    Semantic kernel github repository, 2025

    Microsoft. Semantic kernel github repository, 2025. https://github.com/microsoft/semantic-kernel. Accessed: January 18, 2026

  57. [57]

    Agentic rag: Ai agents with ibm granite models

    IBM Granite Community. Agentic rag: Ai agents with ibm granite models. https://github.com/ ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_ RAG.ipynb. Accessed: 2026-01-14

  58. [58]

    From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. From llm reasoning to autonomous ai agents: A comprehensive review.arXiv preprint, abs/2504.19678, 2025. Guelma University, Algeria; Technology Innovation Institute, UAE; Eotvos Lorand University, Hungary; Khalifa University of Science and Technology, UAE; Corresponding author: ferrag.mohamedamin...

  59. [59]

    Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021

    Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021. 39

  60. [60]

    Ms marco: A human generated machine reading comprehension dataset, 2018

    Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. Ms marco: A human generated machine reading comprehension dataset, 2018

  61. [61]

    V oorhees, and Ian Soboroff

    Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. V oorhees, and Ian Soboroff. Overview of the trec 2022 deep learning track. InText REtrieval Conference (TREC). NIST, TREC, March 2023

  62. [62]

    Musique: Multihop questions via single-hop question composition, 2022

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition, 2022

  63. [63]

    Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020

  64. [64]

    Cohen, Ruslan Salakhutdinov, and Christo- pher D

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christo- pher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018

  65. [65]

    Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024

    Robert Friel, Masha Belyi, and Atindriyo Sanyal. Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024

  66. [66]

    Bergen: A benchmarking library for retrieval-augmented generation, 2024

    David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, and Stéphane Clinchant. Bergen: A benchmarking library for retrieval-augmented generation, 2024

  67. [67]

    Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024

    Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, and Zhicheng Dou. Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024

  68. [68]

    Gnn-rag: Graph neural retrieval for large language model reasoning, 2024

    Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning, 2024

  69. [69]

    Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

    Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research.Transact...

  70. [70]

    Weld, and Luke Zettlemoyer

    Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension, 2017

  71. [71]

    Squad: 100,000+ questions for machine comprehension of text, 2016

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text, 2016

  72. [72]

    Chou, Roy Frostig, and Percy Liang

    Jonathan Berant, Andrew K. Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question- answer pairs. InConference on Empirical Methods in Natural Language Processing, 2013

  73. [73]

    When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

    Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol...

  74. [74]

    Eli5: Long form question answering, 2019

    Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. Eli5: Long form question answering, 2019

  75. [75]

    The narrativeqa reading comprehension challenge

    Tomáš Koˇciský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. The narrativeqa reading comprehension challenge. 2017

  76. [76]

    Asqa: Factoid questions meet long-form answers, 2023

    Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and Ming-Wei Chang. Asqa: Factoid questions meet long-form answers, 2023

  77. [77]

    QMSum: A new benchmark for query-based multi-domain meeting summarization

    Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir Radev. QMSum: A new benchmark for query-based multi-domain meeting summarization. pages 5905–5921, June 2021

  78. [78]

    Smith, and Matt Gardner

    Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information- seeking questions and answers anchored in research papers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 ...

  79. [79]

    COVID-QA: A question answering dataset for COVID-19

    Timo Möller, Anthony Reina, Raghavan Jayakumar, and Malte Pietsch. COVID-QA: A question answering dataset for COVID-19. InACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), 2020. 40

  80. [80]

    Cmb: A comprehensive medical benchmark in chinese, 2024

    Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, and Haizhou Li. Cmb: A comprehensive medical benchmark in chinese, 2024

Showing first 80 references.