arxiv: 2501.09136 · v4 · submitted 2025-01-15 · 💻 cs.AI · cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh , Abul Ehtesham , Saket Kumar , Tala Talaei Khoei , Athanasios V. Vasilakos

Authors on Pith no claims yet

Pith reviewed 2026-05-13 03:16 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.IR

keywords agentic ragretrieval-augmented generationlarge language modelsautonomous agentsmulti-agent collaborationrag taxonomydynamic workflowsai agents

0 comments

The pith

Embedding autonomous agents into RAG pipelines enables dynamic retrieval strategies and adaptive workflows that static systems cannot achieve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that traditional RAG systems are limited by fixed workflows and cannot support multi-step reasoning or complex task management. It shows that inserting autonomous AI agents equipped with reflection, planning, tool use, and multi-agent collaboration overcomes these constraints by allowing real-time adjustment of retrieval and context refinement. The survey supplies a four-axis taxonomy to organize existing architectures, compares their trade-offs, reviews domain applications, and flags open issues in evaluation and coordination that matter to anyone building reliable LLM systems for evolving data environments.

Core claim

Agentic RAG embeds autonomous AI agents into the RAG pipeline. These agents leverage design patterns of reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration, thereby delivering flexibility, scalability, and context-awareness across diverse applications.

What carries the argument

A four-axis taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation that classifies Agentic RAG architectures and surfaces their design trade-offs.

If this is right

Agentic designs support multi-step reasoning and complex task management where static RAG fails.
Healthcare, finance, education, and enterprise document systems gain flexibility and context-awareness from the added agent patterns.
Designers obtain concrete lessons on trade-offs from the comparative analysis of current frameworks.
Future work must address evaluation methods, agent coordination, memory management, efficiency, and governance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar agent patterns could be tested in non-RAG LLM settings such as pure planning or tool-augmented generation to check for broader applicability.
New systems that require extra classification dimensions would indicate the taxonomy needs expansion.
Multi-agent collaboration structures raise questions about how human oversight can be integrated without losing autonomy benefits.

Load-bearing premise

The four-axis taxonomy comprehensively captures all meaningful design trade-offs in existing and future Agentic RAG systems.

What would settle it

A working Agentic RAG implementation that delivers clear gains in adaptability yet cannot be placed on any of the four taxonomy axes, or controlled tests in which no agent-equipped system outperforms static RAG on multi-step reasoning benchmarks.

read the original abstract

Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management. Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver flexibility, scalability, and context-awareness across diverse applications. This paper presents an analytical survey of Agentic RAG systems. It traces the evolution of RAG paradigms, introduces a principled taxonomy of Agentic RAG architectures based on agent cardinality, control structure, autonomy, and knowledge representation, and provides a comparative analysis of design trade-offs across existing frameworks. The survey examines applications in healthcare, finance, education, and enterprise document processing, and distills practical lessons for system designers and practitioners. Finally, it identifies key open research challenges related to evaluation, coordination, memory management, efficiency, and governance, outlining directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that maps Agentic RAG with a new four-axis taxonomy but gives no real argument that those axes are exhaustive or the best cut.

read the letter

The main thing to know is that this paper is a survey that introduces a taxonomy for Agentic RAG organized around agent cardinality, control structure, autonomy, and knowledge representation. It walks through the move from static RAG to systems that add reflection, planning, tool use, and multi-agent patterns, then applies the taxonomy to compare designs and covers applications in healthcare, finance, education, and enterprise work before listing open issues like evaluation and memory management.

Referee Report

2 major / 2 minor

Summary. The paper surveys Agentic Retrieval-Augmented Generation (Agentic RAG), which augments traditional RAG by embedding autonomous AI agents that employ design patterns such as reflection, planning, tool use, and multi-agent collaboration. It traces the evolution from static RAG systems, introduces a four-axis taxonomy (agent cardinality, control structure, autonomy, knowledge representation), performs a comparative analysis of design trade-offs, reviews applications in healthcare, finance, education, and enterprise document processing, and identifies open challenges in evaluation, coordination, memory management, efficiency, and governance.

Significance. If the taxonomy can be shown to be comprehensive and the mappings of existing systems made explicit, the survey would provide a useful organizing framework for an emerging subfield, helping researchers compare architectures and identify gaps. The synthesis of agentic patterns and domain applications adds value for practitioners seeking to move beyond static retrieval pipelines.

major comments (2)

[Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.
[Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.

minor comments (2)

[Abstract] Abstract: could explicitly state how many frameworks were reviewed and what the primary quantitative or qualitative findings from the comparative analysis are.
[Taxonomy presentation] Figure or table presenting the taxonomy: ensure each reviewed system is explicitly assigned to a cell with a brief justification so readers can verify the classification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important opportunities to strengthen the rigor of our taxonomy and the evidential basis of our analysis. We address each major comment below and commit to targeted revisions that preserve the survey's scope while improving clarity and substantiation.

read point-by-point responses

Referee: [Taxonomy section] Taxonomy section (the section introducing the four-axis taxonomy): the manuscript asserts that the taxonomy based on agent cardinality, control structure, autonomy, and knowledge representation is 'principled' and captures design trade-offs, yet provides no derivation, completeness argument, or exhaustive mapping showing that every cited system fits uniquely into one combination of axes and that omitted dimensions (e.g., retrieval-frequency adaptation or cost-latency trade-offs) are redundant. This directly undermines the central claim that the taxonomy organizes all meaningful Agentic RAG architectures.

Authors: We agree that an explicit rationale for the taxonomy's construction would strengthen the central claim. The four axes were identified through iterative analysis of recurring design decisions across the surveyed literature, where they best differentiate architectural families and associated trade-offs in flexibility versus complexity. While surveys commonly introduce taxonomies via synthesis rather than axiomatic derivation, we will add a dedicated subsection detailing the selection process, why alternative dimensions (such as retrieval-frequency adaptation) are treated as secondary and subsumed under autonomy and control, and an exhaustive mapping table assigning every cited system to a unique axis combination. This revision will make the completeness argument explicit without altering the taxonomy itself. revision: partial
Referee: [Applications and comparative analysis section] Applications and comparative analysis section: the claimed flexibility, scalability, and context-awareness enabled by Agentic RAG are supported only by high-level descriptions of patterns rather than concrete quantitative comparisons, ablation results, or specific system-to-taxonomy mappings with performance metrics; without these, the practical lessons for system designers rest on assertion rather than demonstrated evidence.

Authors: As a survey, the manuscript synthesizes existing literature rather than conducting new experiments or ablations. We will nevertheless enhance the applications and comparative analysis sections by adding explicit system-to-taxonomy mappings and by extracting and tabulating concrete quantitative results (e.g., accuracy gains, latency or cost reductions) reported in the original papers for each reviewed system. Where such metrics are unavailable in the source literature, we will note the limitation. These additions will ground the claimed benefits and practical lessons in documented evidence while remaining within the survey format. revision: partial

Circularity Check

0 steps flagged

No significant circularity: survey performs synthesis and classification without derivations or reductions.

full rationale

This paper is an analytical survey that traces RAG evolution, proposes a four-axis taxonomy for classification, reviews applications, and lists open challenges. No equations, predictions, fitted parameters, or derivation chains exist that could reduce to prior definitions or self-citations by construction. The taxonomy is introduced as an organizing framework for existing systems rather than derived from first principles in a self-referential manner. Any self-citations (if present) support literature review and are not load-bearing for a central claim that reduces to them. The work is self-contained as synthesis against external literature benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper that synthesizes existing research on RAG and agentic systems; it introduces no new free parameters, mathematical axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5610 in / 1181 out tokens · 105602 ms · 2026-05-13T03:16:41.115715+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Agentic RAG transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
cs.AI 2026-05 unverdicted novelty 7.0

CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
cs.CL 2026-05 unverdicted novelty 7.0

LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
cs.LG 2026-05 unverdicted novelty 7.0

Attention-based models can intrinsically retrieve and reuse pre-encoded evidence chunks via decoder attention queries, unifying retrieval with generation and outperforming external RAG pipelines on QA benchmarks.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
cs.CL 2026-05 unverdicted novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow
cs.SE 2026-04 unverdicted novelty 7.0

RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding
cs.AI 2026-04 unverdicted novelty 7.0

A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning
cs.SE 2026-04 unverdicted novelty 7.0

E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
cs.LG 2026-05 unverdicted novelty 6.0

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
cs.AI 2026-05 unverdicted novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...
An Agentic Approach to Metadata Reasoning
cs.DB 2026-04 unverdicted novelty 6.0

Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99...
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
cs.IR 2026-04 unverdicted novelty 6.0

Corpus2Skill distills corpora into navigable hierarchical skill trees that LLM agents actively explore for QA and RAG, outperforming dense retrieval and RAPTOR on enterprise benchmarks and characterizing when navigati...
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying
cs.CR 2026-04 unverdicted novelty 6.0

ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
cs.AI 2026-05 unverdicted novelty 5.0

CuSearch reallocates fixed training budget toward deeper-search rollouts in RLVR for agentic RAG, treating search depth as an annotation-free proxy for supervision density and reporting up to 11.8 exact-match gains ov...
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery
cs.IR 2026-05 conditional novelty 5.0

PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
cs.AI 2026-05 unverdicted novelty 5.0

AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
cs.CR 2026-05 unverdicted novelty 5.0

A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
cs.DB 2026-04 unverdicted novelty 5.0

SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
cs.IR 2026-04 unverdicted novelty 5.0

QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.
Mind DeepResearch Technical Report
cs.AI 2026-04 unverdicted novelty 5.0

MindDR combines a Planning Agent, DeepSearch Agent, and Report Agent with SFT cold-start, Search-RL, Report-RL, and preference alignment to reach competitive scores on research benchmarks using 30B-scale models.
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
cs.CR 2026-05 unverdicted novelty 4.0

A structured survey of confidential computing for agentic AI that catalogs TEE platforms, agent-specific threats, transferable defenses, and remaining gaps in end-to-end frameworks.
Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU
cs.AI 2026-04 unverdicted novelty 4.0

Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.
LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling
physics.comp-ph 2026-04 unverdicted novelty 4.0

LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
cs.AI 2025-04 accept novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
Toward Agentic RAG for Ukrainian
cs.AI 2026-04 unverdicted novelty 3.0

Agentic RAG for Ukrainian improves answer accuracy via retries but is still limited by document and page retrieval quality.
PAL: Personal Adaptive Learner
cs.AI 2026-04 unverdicted novelty 3.0

PAL is an AI platform that converts lecture videos into real-time adaptive interactive learning with dynamic questions and tailored end-of-session summaries.
Automotive Engineering-Centric Agentic AI Workflow Framework
cs.AI 2026-04 unverdicted novelty 3.0

The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · cited by 25 Pith papers · 1 internal anchor

[1]

Large language models: A survey, 2024

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey, 2024

work page 2024
[2]

Exploring language models: A comprehensive survey and analysis

Aditi Singh. Exploring language models: A comprehensive survey and analysis. In2023 International Con- ference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), pages 1–4, 2023

work page 2023
[3]

A survey of large language models, 2024

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models, 2024

work page 2024
[4]

A complete survey on llm-based ai chatbots, 2024

Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, and Chaoning Zhang. A complete survey on llm-based ai chatbots, 2024

work page 2024
[5]

A survey of ai text-to-image and ai text-to-video generators

Aditi Singh. A survey of ai text-to-image and ai text-to-video generators. In2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC), pages 32–36, 2023

work page 2023
[6]

Exploring prompt engineering: A systematic review with swot analysis, 2024

Aditi Singh, Abul Ehtesham, Gaurav Kumar Gupta, Nikhil Kumar Chatta, Saket Kumar, and Tala Talaei Khoei. Exploring prompt engineering: A systematic review with swot analysis, 2024

work page 2024
[7]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, November 2024

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, November 2024

work page 2024
[8]

Ioannidis, Huzefa Rangwala, and Christos Faloutsos

Meng-Chieh Lee, Qi Zhu, Costas Mavromatis, Zhen Han, Soji Adeshina, Vassilis N. Ioannidis, Huzefa Rangwala, and Christos Faloutsos. Agent-g: An agentic framework for graph retrieval augmented generation, 2024

work page 2024
[9]

Retrieval-augmented generation for ai-generated content: A survey, 2024

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, and Bin Cui. Retrieval-augmented generation for ai-generated content: A survey, 2024

work page 2024
[10]

Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig

Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation, 2023. 36 Table 5: Downstream Tasks and Datasets for RAG Evaluation (Adapted from [21] Category Task Type Datasets and References QA Single-hop QA Natural Questions (NQ) [68], TriviaQA [69], SQu...

work page 2023
[11]

A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023

Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge, 2023

work page 2023
[12]

Roumeliotis, and Manoj Karkee

Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges.Information Fusion, 126:103599, 2026

work page 2026
[13]

Building effective agents, 2024

Anthropic. Building effective agents, 2024. https://www.anthropic.com/research/ building-effective-agents. Accessed: January 15, 2026

work page 2024
[14]

A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows,

E. Bandara, R. Gore, P. Foytik, S. Shetty, R. Mukkamala, A. Rahman, X. Liang, S. H. Bouk, A. Hass, S. Rajapakse, N. W. Keong, K. De Zoysa, A. Withanage, and N. Loganathan. A practical guide for designing, developing, and deploying production-grade agentic ai workflows.arXiv preprint, abs/2512.08769, 2025. Old Dominion University, Norfolk, V A, USA; Deloit...

work page arXiv 2025
[15]

Agentic retrieval-augmented generation for time series analysis, 2024

Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval-augmented generation for time series analysis, 2024

work page 2024
[16]

Towards reasoning in large language models: A survey, 2023

Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey, 2023

work page 2023
[17]

Graph retrieval-augmented generation: A survey, 2024

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey, 2024

work page 2024
[18]

Revolutionizing mental health care through langchain: A journey with a large language model

Aditi Singh, Abul Ehtesham, Saifuddin Mahmud, and Jong-Hoon Kim. Revolutionizing mental health care through langchain: A journey with a large language model. InIEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), pages 0073–0078, 2024

work page 2024
[19]

Digital diagnostics: The potential of large language models in recognizing symptoms of common illnesses.AI, 6(1), 2025

Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, and Abul Ehtesham. Digital diagnostics: The potential of large language models in recognizing symptoms of common illnesses.AI, 6(1), 2025

work page 2025
[20]

Encouraging responsible use of generative ai in education: A reward-based learning approach

Aditi Singh, Abul Ehtesham, Saket Kumar, Gaurav Kumar Gupta, and Tala Talaei Khoei. Encouraging responsible use of generative ai in education: A reward-based learning approach. In Tim Schlippe, Eric C. K. Cheng, and Tianchong Wang, editors,Artificial Intelligence in Education Technologies: New Development and Innovative Practices, pages 404–413, Singapore...

work page 2025
[21]

Retrieval-augmented generation for large language models: A survey, 2024

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024

work page 2024
[22]

Dense passage retrieval for open-domain question answering, 2020

Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. Dense passage retrieval for open-domain question answering, 2020

work page 2020
[23]

A survey on the memory mechanism of large language model based agents, 2024

Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model based agents, 2024

work page 2024
[24]

Critic: Large language models can self-correct with tool-interactive critiquing, 2024

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. Critic: Large language models can self-correct with tool-interactive critiquing, 2024

work page 2024
[25]

Understanding the planning of llm agents: A survey, 2024

Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey, 2024

work page 2024
[26]

Enhancing ai systems with agentic workflows patterns in large language model

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. Enhancing ai systems with agentic workflows patterns in large language model. InIEEE World AI IoT Congress (AIIoT), pages 527–532, 2024

work page 2024
[27]

How agents can improve llm performance

DeepLearning.AI. How agents can improve llm performance. https://www.deeplearning.ai/the-batch/ how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io , 2024. Ac- cessed: 2026-01-13

work page 2024
[28]

Self-refine: Iterative refinement with self-feedback, 2023

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback, 2023

work page 2023
[29]

Reflexion: Language agents with verbal reinforcement learning, 2023

Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning, 2023

work page 2023
[30]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges, 2024

work page 2024
[31]

Langgraph workflows tutorial, 2025

LangChain. Langgraph workflows tutorial, 2025. https://langchain-ai.github.io/langgraph/ tutorials/workflows/. Accessed: January 18, 2026

work page 2025
[32]

What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is% 20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline.Accessed: 2026-01-14

Weaviate Blog. What is agentic rag? https://weaviate.io/blog/what-is-agentic-rag#:~:text=is% 20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline.Accessed: 2026-01-14

work page 2026
[33]

Corrective retrieval augmented generation, 2024

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation, 2024

work page 2024
[34]

Langgraph crag: Contextualized retrieval-augmented generation tutorial

LangGraph CRAG Tutorial. Langgraph crag: Contextualized retrieval-augmented generation tutorial. https: //langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/. Accessed: 2026-01-14

work page 2026
[35]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity, 2024

work page 2024
[36]

Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial

LangGraph Adaptive RAG Tutorial. Langgraph adaptive rag: Adaptive retrieval-augmented generation tu- torial. https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/. Accessed: 2026-01-14

work page 2026
[37]

Zhili Shen, Chenxin Diao, Pavlos V ougiouklis, Pascual Merita, Shriram Piramanayagam, Damien Graux, Dandan Tu, Zeren Jiang, Ruofei Lai, Yang Ren, and Jeff Z. Pan. Gear: Graph-enhanced agent for retrieval-augmented generation, 2024. 38

work page 2024
[38]

Introducing agentic document workflows

LlamaIndex. Introducing agentic document workflows. https://www.llamaindex.ai/blog/ introducing-agentic-document-workflows, 2025. Accessed: 2026-01-13

work page 2025
[39]

How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales

AWS Machine Learning Blog. How twitch used agentic workflow with rag on amazon bedrock to supercharge ad sales. https://aws.amazon.com/blogs/machine-learning/ how-twitch-used-agentic-workflow-with-rag-on-amazon-bedrock-to-supercharge-ad-sales/ ,

work page
[40]

Accessed: 2026-01-13

work page 2026
[41]

Patient case summary workflow using llamacloud

LlamaCloud Demo Repository. Patient case summary workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/ patient_case_summary/patient_case_summary.ipynb, 2025. Accessed: 2026-01-13

work page 2025
[42]

Contract review workflow using llamacloud

LlamaCloud Demo Repository. Contract review workflow using llamacloud. https://github.com/ run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/ contract_review.ipynb, 2025. Accessed: 2026-01-13

work page 2025
[43]

Auto insurance claims workflow using llamacloud

LlamaCloud Demo Repository. Auto insurance claims workflow using llamacloud. https: //github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_ insurance_claims/auto_insurance_claims.ipynb, 2025. Accessed: 2026-01-13

work page 2025
[44]

Research paper report generation workflow using llamacloud

LlamaCloud Demo Repository. Research paper report generation workflow using llamacloud. https://github.com/run-llama/llamacloud-demo/blob/main/examples/report_generation/ research_paper_report_generation.ipynb, 2025. Accessed: 2026-01-13

work page 2025
[45]

Langgraph agentic rag: Nodes and edges tutorial

LangGraph Agentic RAG Tutorial. Langgraph agentic rag: Nodes and edges tutorial. https://langchain-ai. github.io/langgraph/tutorials/rag/langgraph_agentic_rag/#nodes-and-edges. Accessed: 2026-01-14

work page 2026
[46]

Agentic rag with llamaindex

LlamaIndex Blog. Agentic rag with llamaindex. https://www.llamaindex.ai/blog/ agentic-rag-with-llamaindex-2721b8a49ff6. Accessed: 2026-01-14

work page 2026
[47]

Agentic rag: Turbocharge your retrieval-augmented generation with query reformula- tion and self-query.https://huggingface.co/learn/cookbook/en/agent_rag

Hugging Face Cookbook. Agentic rag: Turbocharge your retrieval-augmented generation with query reformula- tion and self-query.https://huggingface.co/learn/cookbook/en/agent_rag. Accessed: 2026-01-14

work page 2026
[48]

Agentic rag: Combining rag with agents for enhanced information retrieval

Qdrant Blog. Agentic rag: Combining rag with agents for enhanced information retrieval. https://qdrant. tech/articles/agentic-rag/. Accessed: 2026-01-14

work page 2026
[49]

crewai: A github repository for ai projects

crewAI Inc. crewai: A github repository for ai projects. https://github.com/crewAIInc/crewAI, 2025. Accessed: 2026-01-15

work page 2025
[50]

Ag2: A github repository for advanced generative ai research

AG2AI Contributors. Ag2: A github repository for advanced generative ai research. https://github.com/ ag2ai/ag2, 2025. Accessed: 2026-01-15

work page 2025
[51]

Autogen: Enabling next-gen llm applications via multi-agent conversation framework

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. 2023

work page 2023
[52]

Training language model agents without modifying language models.ICML’24, 2024

Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, and Qingyun Wu. Training language model agents without modifying language models.ICML’24, 2024

work page 2024
[53]

Swarm: Lightweight multi-agent orchestration framework

OpenAI. Swarm: Lightweight multi-agent orchestration framework. https://github.com/openai/swarm. Accessed: 2026-01-14

work page 2026
[54]

Agentic rag using vertex ai

LlamaIndex Documentation. Agentic rag using vertex ai. https://docs.llamaindex.ai/en/stable/ examples/agent/agentic_rag_using_vertex_ai/. Accessed: 2026-01-14

work page 2026
[55]

Semantic kernel overview, 2025

Microsoft. Semantic kernel overview, 2025. https://learn.microsoft.com/en-us/semantic-kernel/ overview/. Accessed: January 18, 2026

work page 2025
[56]

Semantic kernel github repository, 2025

Microsoft. Semantic kernel github repository, 2025. https://github.com/microsoft/semantic-kernel. Accessed: January 18, 2026

work page 2025
[57]

Agentic rag: Ai agents with ibm granite models

IBM Granite Community. Agentic rag: Ai agents with ibm granite models. https://github.com/ ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_ RAG.ipynb. Accessed: 2026-01-14

work page 2026
[58]

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. From llm reasoning to autonomous ai agents: A comprehensive review.arXiv preprint, abs/2504.19678, 2025. Guelma University, Algeria; Technology Innovation Institute, UAE; Eotvos Lorand University, Hungary; Khalifa University of Science and Technology, UAE; Corresponding author: ferrag.mohamedamin...

work page internal anchor Pith review arXiv 2025
[59]

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, 2021. 39

work page 2021
[60]

Ms marco: A human generated machine reading comprehension dataset, 2018

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. Ms marco: A human generated machine reading comprehension dataset, 2018

work page 2018
[61]

V oorhees, and Ian Soboroff

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin, Ellen M. V oorhees, and Ian Soboroff. Overview of the trec 2022 deep learning track. InText REtrieval Conference (TREC). NIST, TREC, March 2023

work page 2022
[62]

Musique: Multihop questions via single-hop question composition, 2022

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Musique: Multihop questions via single-hop question composition, 2022

work page 2022
[63]

Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps, 2020

work page 2020
[64]

Cohen, Ruslan Salakhutdinov, and Christo- pher D

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christo- pher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018

work page 2018
[65]

Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024

Robert Friel, Masha Belyi, and Atindriyo Sanyal. Ragbench: Explainable benchmark for retrieval-augmented generation systems, 2024

work page 2024
[66]

Bergen: A benchmarking library for retrieval-augmented generation, 2024

David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, and Stéphane Clinchant. Bergen: A benchmarking library for retrieval-augmented generation, 2024

work page 2024
[67]

Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024

Jiajie Jin, Yutao Zhu, Xinyu Yang, Chenghao Zhang, and Zhicheng Dou. Flashrag: A modular toolkit for efficient retrieval-augmented generation research, 2024

work page 2024
[68]

Gnn-rag: Graph neural retrieval for large language model reasoning, 2024

Costas Mavromatis and George Karypis. Gnn-rag: Graph neural retrieval for large language model reasoning, 2024

work page 2024
[69]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research.Transact...

work page 2019
[70]

Weld, and Luke Zettlemoyer

Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension, 2017

work page 2017
[71]

Squad: 100,000+ questions for machine comprehension of text, 2016

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text, 2016

work page 2016
[72]

Chou, Roy Frostig, and Percy Liang

Jonathan Berant, Andrew K. Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question- answer pairs. InConference on Empirical Methods in Natural Language Processing, 2013

work page 2013
[73]

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol...

work page 2023
[74]

Eli5: Long form question answering, 2019

Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. Eli5: Long form question answering, 2019

work page 2019
[75]

The narrativeqa reading comprehension challenge

Tomáš Koˇciský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. The narrativeqa reading comprehension challenge. 2017

work page 2017
[76]

Asqa: Factoid questions meet long-form answers, 2023

Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and Ming-Wei Chang. Asqa: Factoid questions meet long-form answers, 2023

work page 2023
[77]

QMSum: A new benchmark for query-based multi-domain meeting summarization

Ming Zhong, Da Yin, Tao Yu, Ahmad Zaidi, Mutethia Mutuma, Rahul Jha, Ahmed Hassan Awadallah, Asli Celikyilmaz, Yang Liu, Xipeng Qiu, and Dragomir Radev. QMSum: A new benchmark for query-based multi-domain meeting summarization. pages 5905–5921, June 2021

work page 2021
[78]

Smith, and Matt Gardner

Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information- seeking questions and answers anchored in research papers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 ...

work page 2021
[79]

COVID-QA: A question answering dataset for COVID-19

Timo Möller, Anthony Reina, Raghavan Jayakumar, and Malte Pietsch. COVID-QA: A question answering dataset for COVID-19. InACL 2020 Workshop on Natural Language Processing for COVID-19 (NLP-COVID), 2020. 40

work page 2020
[80]

Cmb: A comprehensive medical benchmark in chinese, 2024

Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, and Haizhou Li. Cmb: A comprehensive medical benchmark in chinese, 2024

work page 2024

Showing first 80 references.