LLM-Powered AI Agent Systems and Their Applications in Industry

Guannan Liang; Qianqian Tong

arxiv: 2505.16120 · v2 · submitted 2025-05-22 · 💻 cs.AI

LLM-Powered AI Agent Systems and Their Applications in Industry

Guannan Liang , Qianqian Tong This is my paper

Pith reviewed 2026-05-22 14:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM-powered agentsAI agent systemsindustry applicationsmulti-modal LLMsagent challengessoftware-based agentshybrid agent systemsagent evolution

0 comments

The pith

LLM-powered agents deliver flexibility and cross-domain reasoning that rule-based systems lack, supporting uses from customer service to healthcare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review traces agent systems from rigid pre-LLM designs to current architectures built around large language models. It shows how these systems gain the ability to process text, images, audio, and tabular data for adaptive behavior in real settings. The authors group the systems into software-based, physical, and adaptive hybrid types, then map them to concrete industry uses. They also list practical drawbacks such as response delays and security gaps and suggest fixes. A reader would care because the shift could change how organizations automate varied, language-heavy tasks without heavy custom coding.

Core claim

The paper establishes that unlike traditional rule-based agents with limited task scope, LLM-powered agents offer greater flexibility, cross-domain reasoning, and natural language interaction. With multi-modal LLMs, these systems process diverse data types including text, images, audio, and structured data to enable richer real-world behavior. The review categorizes current systems into software-based, physical, and adaptive hybrid types, surveys applications in customer service, software development, manufacturing automation, personalized education, financial trading, and healthcare, and examines challenges such as high inference latency, output uncertainty, lack of evaluation metrics, and

What carries the argument

The categorization of agent systems into software-based, physical, and adaptive hybrid systems, which organizes the current landscape and shows how each type supports adaptive real-world tasks.

If this is right

Customer service can shift to natural-language interactions without scripted responses.
Software development gains agents that reason across code and requirements.
Manufacturing automation incorporates multi-modal data for adaptive control.
Personalized education and financial trading receive more responsive agent support.
Healthcare applications benefit from agents that handle mixed data sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Industry teams may begin testing hybrid agents that combine software logic with physical sensors for specific workflows.
Reducing latency through optimized inference could open real-time uses in trading and control systems.
Security fixes for output uncertainty might become standard requirements before deployment in regulated sectors.
The same categorization lens could help compare agents across new domains like logistics or legal review.

Load-bearing premise

The three-way split into software-based, physical, and adaptive hybrid systems is assumed to cover existing agent designs without major omissions or the need for different groupings.

What would settle it

A survey or deployment record showing a common agent system that fits none of the three categories or performs no better than rule-based agents in flexibility and cross-domain tasks.

Figures

Figures reproduced from arXiv: 2505.16120 by Guannan Liang, Qianqian Tong.

**Figure 1.** Figure 1: LLM-Powered AI Agent System. By combining the capabilities of software-based and physical agents, hybrid agents emerge as a powerful class of systems that enable seamless integration with the real world. Adaptive and Hybrid Agents (Real-World Integration) operate in a feedback-driven environment, continuously learning from both digital and physical interactions by processing multimodal data such as text,… view at source ↗

**Figure 2.** Figure 2: Architecture of LLM-Powered Agent System. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

The emergence of Large Language Models (LLMs) has reshaped agent systems. Unlike traditional rule-based agents with limited task scope, LLM-powered agents offer greater flexibility, cross-domain reasoning, and natural language interaction. Moreover, with the integration of multi-modal LLMs, current agent systems are highly capable of processing diverse data modalities, including text, images, audio, and structured tabular data, enabling richer and more adaptive real-world behavior. This paper comprehensively examines the evolution of agent systems from the pre-LLM era to current LLM-powered architectures. We categorize agent systems into software-based, physical, and adaptive hybrid systems, highlighting applications across customer service, software development, manufacturing automation, personalized education, financial trading, and healthcare. We further discuss the primary challenges posed by LLM-powered agents, including high inference latency, output uncertainty, lack of evaluation metrics, and security vulnerabilities, and propose potential solutions to mitigate these concerns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a plain descriptive survey that organizes existing LLM agent work into three categories and lists applications plus challenges, but adds no new results or fixes.

read the letter

Colleague, the main thing to know is that this paper is a literature survey on LLM-powered agents in industry. It does not introduce new architectures, run experiments, or derive any fresh technical claims. Instead it traces the move from rule-based systems to LLM ones, notes multi-modal extensions, and groups current systems into software-based, physical, and adaptive hybrid types while naming uses in customer service, software development, manufacturing, education, trading, and healthcare. It also flags four practical problems: inference latency, output uncertainty, missing evaluation metrics, and security risks, then sketches possible mitigations. That organization is the paper's main contribution. The structure is straightforward and the listed application areas line up with what is already discussed in the broader literature, so a reader new to the topic can get a quick map of the landscape. The challenge section is similarly grounded in common complaints rather than in any original analysis. The soft spots are modest but real. The three-way taxonomy is presented as useful without evidence that it is exhaustive or that the boundaries are sharp; hybrid systems in particular tend to blur the lines in practice. Because the paper is purely descriptive, it does not test whether the proposed solutions to latency or uncertainty actually work, nor does it compare the cited systems on any shared benchmark. The abstract claims comprehensiveness, yet a survey of this length inevitably leaves out some recent papers and alternative taxonomies that other authors have used. No internal contradictions appear, and the citations seem to be drawn from the standard sources in the area. This paper is for someone who wants an organized entry point rather than for specialists looking for new mechanisms or data. A practitioner scanning for application ideas or a student preparing a background section could get value from the examples and the challenge list. It is not the kind of work that will reshape how agents are built, but the framing is coherent enough that a journal or workshop that accepts surveys could reasonably send it out for review. I would not cite it for a technical claim, yet I would not object to it appearing as a reference for the current state of industrial LLM agents.

Referee Report

0 major / 3 minor

Summary. The paper surveys the evolution of agent systems from pre-LLM rule-based approaches to current LLM-powered architectures. It emphasizes advantages in flexibility, cross-domain reasoning, natural language interaction, and multi-modal data processing. The manuscript categorizes agent systems into software-based, physical, and adaptive hybrid systems, reviews applications in customer service, software development, manufacturing automation, personalized education, financial trading, and healthcare, and outlines challenges including high inference latency, output uncertainty, lack of evaluation metrics, and security vulnerabilities while suggesting potential solutions.

Significance. As a descriptive survey without new empirical results, formal proofs, or falsifiable predictions, the paper's value lies in its organizational framework and synthesis of existing literature. If the taxonomy is well-motivated and the review of applications and challenges is balanced and up-to-date, it could help researchers and practitioners navigate the field. The manuscript receives credit for attempting to structure a fast-moving area through a three-way categorization lens rather than claiming exhaustiveness.

minor comments (3)

[Categorization section] The categorization into software-based, physical, and adaptive hybrid systems is presented without explicit discussion of boundary cases or comparison to alternative taxonomies in the literature; adding a short subsection justifying the framework would improve clarity.
[Applications section] Applications are listed at a high level; including one or two concrete, cited examples per domain (e.g., a specific manufacturing automation case) would make the claims more tangible without altering the survey nature.
[Challenges and solutions] The proposed solutions to challenges such as inference latency and security vulnerabilities should be tied to specific references or ongoing work rather than left as high-level suggestions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. We appreciate the recognition that the paper's value lies in its organizational framework and synthesis of the literature on LLM-powered agent systems. The three-way categorization into software-based, physical, and adaptive hybrid systems is intended to help structure this fast-moving area, and we are glad this approach was noted favorably.

Circularity Check

0 steps flagged

No significant circularity in descriptive survey

full rationale

The paper is a review article that summarizes the evolution of agent systems, describes LLM-powered architectures, proposes a high-level categorization into software-based, physical, and adaptive hybrid systems as an organizational framework, and lists applications and challenges. No equations, derivations, predictions, or fitted parameters exist. The taxonomy is presented as a review lens rather than a claim derived from or reducing to its own inputs. External literature is cited without load-bearing self-citation chains that would make central claims equivalent to unverified prior work by the same authors. The content remains self-contained against external benchmarks as a descriptive survey.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper containing no free parameters, mathematical axioms, or invented entities. All content rests on summaries of previously published work in the LLM and agent literature.

pith-pipeline@v0.9.0 · 5678 in / 987 out tokens · 36340 ms · 2026-05-22T14:01:09.143703+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review
cs.SE 2026-02 unverdicted novelty 3.0

A review of 114 studies classifies motivations into nine categories, analyzes common models and benchmarks, synthesizes challenges into six categories with 26 subcategories and solutions, and identifies six future res...

Reference graph

Works this paper leans on

122 extracted references · 122 canonical work pages · cited by 1 Pith paper · 14 internal anchors

[1]

Multi-agent systems: which research for which applications,

E. Oliveira, K. Fischer, and O. Stepankova, “Multi-agent systems: which research for which applications,”Robotics and Autonomous Systems, vol. 27, no. 1-2, pp. 91–106, 1999

work page 1999
[2]

Agent AI: Surveying the Horizons of Multimodal Interaction

Z. Durante, Q. Huang, N. Wake, R. Gong, J. S. Park, B. Sarkar, R. Taori, Y . Noda, D. Terzopoulos, Y . Choiet al., “Agent ai: Surveying the horizons of multimodal interaction,”arXiv preprint arXiv:2401.03568, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

T. Masterman, S. Besen, M. Sawtell, and A. Chao, “The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey,”arXiv preprint arXiv:2404.11584, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Large multimodal agents: A survey,

J. Xie, Z. Chen, R. Zhang, X. Wan, and G. Li, “Large multimodal agents: A survey,”arXiv preprint arXiv:2402.15116, 2024

work page arXiv 2024
[5]

Multi-agent systems: A survey,

A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,”Ieee Access, vol. 6, pp. 28 573–28 593, 2018

work page 2018
[6]

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Y . Li, H. Wen, W. Wang, X. Li, Y . Yuan, G. Liu, J. Liu, W. Xu, X. Wang, Y . Sunet al., “Personal llm agents: Insights and sur- vey about the capability, efficiency and security,”arXiv preprint arXiv:2401.05459, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y . Wang, R. Tang, and E. Chen, “Understanding the planning of llm agents: A survey,”arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

T. Guo, X. Chen, Y . Wang, R. Chang, S. Pei, N. V . Chawla, O. Wiest, and X. Zhang, “Large language model based multi-agents: A survey of progress and challenges,”arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,

X. Li, S. Wang, S. Zeng, Y . Wu, and Y . Yang, “A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, p. 9, 2024

work page 2024
[10]

Mycin: a knowledge-based consultation program for infectious disease diagnosis,

W. Van Melle, “Mycin: a knowledge-based consultation program for infectious disease diagnosis,”International journal of man-machine studies, vol. 10, no. 3, pp. 313–322, 1978

work page 1978
[11]

Dendral and meta-dendral: Their applications dimension,

B. G. Buchanan and E. A. Feigenbaum, “Dendral and meta-dendral: Their applications dimension,” inReadings in artificial intelligence. Elsevier, 1981, pp. 313–322

work page 1981
[12]

Multi-agent deep reinforcement learning: a survey,

S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,”Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022

work page 2022
[13]

Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,

T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,”IEEE transactions on cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020

work page 2020
[14]

A survey and critique of multiagent deep reinforcement learning,

P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi-Agent Systems, vol. 33, no. 6, pp. 750–797, 2019

work page 2019
[15]

Deep reinforcement learning: A brief survey,

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,”IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017

work page 2017
[16]

Intelligent agents: Theory and practice,

M. Wooldridge and N. R. Jennings, “Intelligent agents: Theory and practice,”The knowledge engineering review, vol. 10, no. 2, pp. 115– 152, 1995

work page 1995
[17]

Chatgpt by openai,

OpenAI, “Chatgpt by openai,” https://openai.com/index/chatgpt/

work page
[18]

Claude: An ai assistant by anthropic,

Anthropic, “Claude: An ai assistant by anthropic,” https://www.anthropic.com/claude

work page
[19]

Gemini: Ai by google,

Google, “Gemini: Ai by google,” https://gemini.google.com/app

work page
[20]

Deepseek: Next-generation open llms,

DeepSeek, “Deepseek: Next-generation open llms,” https://www.deepseek.com/

work page
[21]

Llmfactor: Extracting profitable factors through prompts for explainable stock movement prediction,

M. Wang, K. Izumi, and H. Sakaji, “Llmfactor: Extracting profitable factors through prompts for explainable stock movement prediction,” arXiv preprint arXiv:2406.10811, 2024

work page arXiv 2024
[22]

Can large language models beat wall street? unveiling the potential o f ai in stock selection

G. Fatouros, K. Metaxas, J. Soldatos, and D. Kyriazis, “Can large language models beat wall street? unveiling the potential of ai in stock selection,”arXiv preprint arXiv:2401.03737, 2024

work page arXiv 2024
[23]

Can chatgpt forecast stock price movements? return pre- dictability and large language models

A. Lopez-Lira and Y . Tang, “Can chatgpt forecast stock price move- ments? return predictability and large language models,”arXiv preprint arXiv:2304.07619, 2023

work page arXiv 2023
[24]

TradingGPT: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance.arXiv preprint arXiv:2309.03736,

Y . Li, Y . Yu, H. Li, Z. Chen, and K. Khashanah, “Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance,”arXiv preprint arXiv:2309.03736, 2023

work page arXiv 2023
[25]

A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,

W. Zhang, L. Zhao, H. Xia, S. Sun, J. Sun, M. Qin, X. Li, Y . Zhao, Y . Zhao, X. Caiet al., “A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 4314–4325

work page 2024
[26]

Llm-enhanced human-machine interaction for adaptive decision making in dynamic manufacturing process environments,

Z. Keskin, D. Joosten, N. Klasen, M. Huber, C. Liu, B. Drescher, and R. H. Schmitt, “Llm-enhanced human-machine interaction for adaptive decision making in dynamic manufacturing process environments,” IEEE Access, 2025

work page 2025
[27]

The use of artificial intelligence to optimize the routing of vehicles and reduce traffic congestion in urban areas,

S. Dikshit, A. Atiq, M. Shahid, V . Dwivedi, and A. Thusu, “The use of artificial intelligence to optimize the routing of vehicles and reduce traffic congestion in urban areas,”EAI Endorsed Transactions on Energy Web, vol. 10, pp. 1–13, 2023

work page 2023
[28]

Mdagents: An adaptive collaboration of llms for medical decision-making,

Y . Kim, C. Park, H. Jeong, Y . S. Chan, X. Xu, D. McDuff, H. Lee, M. Ghassemi, C. Breazeal, H. Parket al., “Mdagents: An adaptive collaboration of llms for medical decision-making,”Advances in Neural Information Processing Systems, vol. 37, pp. 79 410–79 452, 2024

work page 2024
[29]

Medaide: Towards an omni medical aide via specialized llm-based multi-agent collaboration,

J. Wei, D. Yang, Y . Li, Q. Xu, Z. Chen, M. Li, Y . Jiang, X. Hou, and L. Zhang, “Medaide: Towards an omni medical aide via specialized llm-based multi-agent collaboration,”arXiv preprint arXiv:2410.12532, 2024

work page arXiv 2024
[30]

Polaris: A safety-focused llm constellation architecture for healthcare,

S. Mukherjee, P. Gamble, M. S. Ausin, N. Kant, K. Aggarwal, N. Manjunath, D. Datta, Z. Liu, J. Ding, S. Busaccaet al., “Polaris: A safety-focused llm constellation architecture for healthcare,”arXiv preprint arXiv:2403.13313, 2024

work page arXiv 2024
[31]

Evaluating large language models as agents in the clinic,

N. Mehandru, B. Y . Miao, E. R. Almaraz, M. Sushil, A. J. Butte, and A. Alaa, “Evaluating large language models as agents in the clinic,” NPJ digital medicine, vol. 7, no. 1, p. 84, 2024

work page 2024
[32]

Ai-powered product data management in industry 4.0: A bibliographical analysis

S. Mazumdar, “Ai-powered product data management in industry 4.0: A bibliographical analysis.”

work page
[33]

Ai for predictive maintenance in industrial systems,

A. Abbas, “Ai for predictive maintenance in industrial systems,” International Journal of Advanced Engineering Technologies and In- novations, vol. 1, no. 1, pp. 31–51, 2024

work page 2024
[34]

Ai-powered supply chains towards greater efficiency,

N. Shobhana, “Ai-powered supply chains towards greater efficiency,” in Complex AI Dynamics and Interactions in Management. IGI Global, 2024, pp. 229–249

work page 2024
[35]

Automated decision making comes of age,

T. H. Davenport and J. G. Harris, “Automated decision making comes of age,”MIT Sloan Management Review, vol. 46, no. 4, p. 83, 2005

work page 2005
[36]

Q-learning: Theory and applications,

J. Clifton and E. Laber, “Q-learning: Theory and applications,”Annual Review of Statistics and Its Application, vol. 7, no. 1, pp. 279–301, 2020

work page 2020
[37]

Conservative q-learning for offline reinforcement learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,”Advances in neural information processing systems, vol. 33, pp. 1179–1191, 2020

work page 2020
[38]

Deep reinforcement learning with double q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016

work page 2016
[39]

A survey of deep q-networks used for reinforcement learning: state of the art,

A. M. Hafiz, “A survey of deep q-networks used for reinforcement learning: state of the art,”Intelligent Communication Technologies and Virtual Mobile Networks: Proceedings of ICICV 2022, pp. 393–402, 2022

work page 2022
[40]

Policy gradient methods for reinforcement learning with function approximation,

R. S. Sutton, D. McAllester, S. Singh, and Y . Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999

work page 1999
[41]

Mastering the game of go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Boltonet al., “Mastering the game of go without human knowledge,”nature, vol. 550, no. 7676, pp. 354–359, 2017

work page 2017
[42]

A Comprehensive Overview of Large Language Models

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”arXiv preprint arXiv:2307.06435, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

Z. Lin, S. Basu, M. Beigi, V . Manjunatha, R. A. Rossi, Z. Wang, Y . Zhou, S. Balasubramanian, A. Zarei, K. Rezaeiet al., “A survey on mechanistic interpretability for multi-modal foundation models,”arXiv preprint arXiv:2502.17516, 2025

work page arXiv 2025
[44]

A survey of llm-based agents in medicine: How far are we from baymax?

W. Wang, Z. Ma, Z. Wang, C. Wu, W. Chen, X. Li, and Y . Yuan, “A survey of llm-based agents in medicine: How far are we from baymax?” arXiv preprint arXiv:2502.11211, 2025

work page arXiv 2025
[45]

Large language model agent in financial trading: A survey,

H. Ding, Y . Li, J. Wang, and H. Chen, “Large language model agent in financial trading: A survey,”arXiv preprint arXiv:2408.06361, 2024

work page arXiv 2024
[46]

arXiv preprint arXiv:2410.21418 (2024)

Y . Li, H. Zhao, H. Jiang, Y . Pan, Z. Liu, Z. Wu, P. Shu, J. Tian, T. Yang, S. Xuet al., “Large language models for manufacturing,” arXiv preprint arXiv:2410.21418, 2024

work page arXiv 2024
[47]

Framework for llm applications in manufacturing,

C. I. Garcia, M. A. DiBattista, T. A. Letelier, H. D. Halloran, and J. A. Camelio, “Framework for llm applications in manufacturing,” Manufacturing Letters, vol. 41, pp. 253–263, 2024

work page 2024
[48]

A large language model-based multi-agent manufacturing system for intelligent shopfloor,

Z. Zhao, D. Tang, H. Zhu, Z. Zhang, K. Chen, C. Liu, and Y . Ji, “A large language model-based multi-agent manufacturing system for intelligent shopfloor,”arXiv preprint arXiv:2405.16887, 2024

work page arXiv 2024
[49]

Introducing the model context protocol,

Anthropic, “Introducing the model context protocol,” https://www.anthropic.com/news/model-context-protocol, 2024, accessed: 2024-04-10

work page 2024
[50]

A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms),

A. Singh, A. Ehtesham, S. Kumar, and T. T. Khoei, “A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms),” 2025

work page 2025
[51]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[52]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, vol. 2, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

On faithfulness and factuality in abstractive summarization

J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faith- fulness and factuality in abstractive summarization,”arXiv preprint arXiv:2005.00661, 2020

work page arXiv 2005
[54]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023

work page 2023
[55]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

H. Inan, K. Upasani, J. Chi, R. Rungta, K. Iyer, Y . Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggineet al., “Llama guard: Llm-based input-output safeguard for human-ai conversations,”arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

Safeguarding large language models: A survey

Y . Dong, R. Mu, Y . Zhang, S. Sun, T. Zhang, C. Wu, G. Jin, Y . Qi, J. Hu, J. Menget al., “Safeguarding large language models: A survey,” arXiv preprint arXiv:2406.02622, 2024

work page arXiv 2024
[57]

Building guardrails for large language models,

Y . Dong, R. Mu, G. Jin, Y . Qi, J. Hu, X. Zhao, J. Meng, W. Ruan, and X. Huang, “Building guardrails for large language models,”arXiv preprint arXiv:2402.01822, 2024

work page arXiv 2024
[58]

Llm-based chatbots for mining software repositories: Challenges and opportunities,

S. Abedu, A. Abdellatif, and E. Shihab, “Llm-based chatbots for mining software repositories: Challenges and opportunities,” inProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 2024, pp. 201–210

work page 2024
[59]

Large-language-models (llm)-based ai chatbots: Ar- chitecture, in-depth analysis and their performance evaluation,

V . Kumar, P. Srivastava, A. Dwivedi, I. Budhiraja, D. Ghosh, V . Goyal, and R. Arora, “Large-language-models (llm)-based ai chatbots: Ar- chitecture, in-depth analysis and their performance evaluation,” in International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer, 2023, pp. 237–249

work page 2023
[60]

A complete survey on llm-based ai chatbots,

S. K. Dam, C. S. Hong, Y . Qiao, and C. Zhang, “A complete survey on llm-based ai chatbots,”arXiv preprint arXiv:2406.16937, 2024

work page arXiv 2024
[61]

Just-in-time news: An ai chatbot for the modern information age

F. Sufi, “Just-in-time news: An ai chatbot for the modern information age.”AI, vol. 6, no. 2, 2025

work page 2025
[62]

13 generative ai and llm: Case study in e-commerce,

R. Iyer, V . C. Maralapalle, P. Mahesh, and D. Patil, “13 generative ai and llm: Case study in e-commerce,”Generative AI and LLMs: Natural Language Processing and Generative Adversarial Networks, p. 253, 2024

work page 2024
[63]

From llms to llm- based agents for software engineering: A survey of current, challenges and future,

H. Jin, L. Huang, H. Cai, J. Yan, B. Li, and H. Chen, “From llms to llm- based agents for software engineering: A survey of current, challenges and future,”arXiv preprint arXiv:2408.02479, 2024

work page arXiv 2024
[64]

Automatic programming: Large language models and beyond,

M. R. Lyu, B. Ray, A. Roychoudhury, S. H. Tan, and P. Thongtanunam, “Automatic programming: Large language models and beyond,”ACM Transactions on Software Engineering and Methodology, 2024

work page 2024
[65]

A comprehensive overview of large language models (llms) for cyber defences: Opportunities and direc- tions,

M. Hassanin and N. Moustafa, “A comprehensive overview of large language models (llms) for cyber defences: Opportunities and direc- tions,”arXiv preprint arXiv:2405.14487, 2024

work page arXiv 2024
[66]

From vulnerability to defense: The role of large language models in enhancing cybersecurity,

W. Kasri, Y . Himeur, H. A. Alkhazaleh, S. Tarapiah, S. Atalla, W. Man- soor, and H. Al-Ahmad, “From vulnerability to defense: The role of large language models in enhancing cybersecurity,”Computation, vol. 13, no. 2, p. 30, 2025

work page 2025
[67]

Github copilot: Your ai pair programmer,

GitHub, “Github copilot: Your ai pair programmer,” https://github.com/features/copilot

work page
[68]

Cursor: Ai-powered code editor,

Cursor, “Cursor: Ai-powered code editor,” https://www.cursor.com/en

work page
[69]

Llm agents for education: Advances and applications,

Z. Chu, S. Wang, J. Xie, T. Zhu, Y . Yan, J. Ye, A. Zhong, X. Hu, J. Liang, P. S. Yuet al., “Llm agents for education: Advances and applications,”arXiv preprint arXiv:2503.11733, 2025

work page arXiv 2025
[70]

The role of large language models in personalized learning: a systematic review of educational impact,

S. Sharma, P. Mittal, M. Kumar, and V . Bhardwaj, “The role of large language models in personalized learning: a systematic review of educational impact,”Discover Sustainability, vol. 6, no. 1, pp. 1– 24, 2025

work page 2025
[71]

[Xuet al., 2025 ] Songlin Xu, Hao-Ning Wen, Hongyi Pan, Dallas Dominguez, Dongyin Hu, and Xinyu Zhang

S. Xu, X. Zhang, and L. Qin, “Eduagent: Generative student agents in learning,”arXiv preprint arXiv:2404.07963, 2024

work page arXiv 2024
[72]

Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students,

H. Jin, M. Yoo, J. Park, Y . Lee, X. Wang, and J. Kim, “Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students,”arXiv preprint arXiv:2410.04078, 2024

work page arXiv 2024
[73]

Al-khwarizmi: Discovering physical laws with foundation models,

C. E. Mower and H. Bou-Ammar, “Al-khwarizmi: Discovering physical laws with foundation models,”arXiv preprint arXiv:2502.01702, 2025

work page arXiv 2025
[74]

Content knowledge identification with multi-agent large language models (llms),

K. Yang, Y . Chu, T. Darwin, A. Han, H. Li, H. Wen, Y . Copur- Gencturk, J. Tang, and H. Liu, “Content knowledge identification with multi-agent large language models (llms),” inInternational Conference on Artificial Intelligence in Education. Springer, 2024, pp. 284–292

work page 2024
[75]

Mathagent: Leveraging a mixture-of-math-agent framework for real-world multi- modal mathematical error detection,

Y . Yan, S. Wang, J. Huo, P. S. Yu, X. Hu, and Q. Wen, “Mathagent: Leveraging a mixture-of-math-agent framework for real-world multi- modal mathematical error detection,”arXiv preprint arXiv:2503.18132, 2025

work page arXiv 2025
[76]

Newton: Are large language models capable of physical reasoning?

Y . R. Wang, J. Duan, D. Fox, and S. Srinivasa, “Newton: Are large language models capable of physical reasoning?”arXiv preprint arXiv:2310.07018, 2023

work page arXiv 2023
[77]

Augmenting large language models with chemistry tools,

A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller, “Augmenting large language models with chemistry tools,”Nature Machine Intelligence, vol. 6, no. 5, pp. 525–535, 2024

work page 2024
[78]

Ni, and Jian Guo

S. Wang, H. Yuan, L. M. Ni, and J. Guo, “Quantagent: Seeking holy grail in trading by self-improving large language model,”arXiv preprint arXiv:2402.03755, 2024

work page arXiv 2024
[79]

org/abs/2308.00016

S. Wang, H. Yuan, L. Zhou, L. M. Ni, H.-Y . Shum, and J. Guo, “Alpha- gpt: Human-ai interactive alpha mining for quantitative investment,” arXiv preprint arXiv:2308.00016, 2023

work page arXiv 2023
[80]

Deploying foundation model powered agent services: A survey,

W. Xu, J. Chen, P. Zheng, X. Yi, T. Tian, W. Zhu, Q. Wan, H. Wang, Y . Fan, Q. Suet al., “Deploying foundation model powered agent services: A survey,”arXiv preprint arXiv:2412.13437, 2024

work page arXiv 2024

Showing first 80 references.

[1] [1]

Multi-agent systems: which research for which applications,

E. Oliveira, K. Fischer, and O. Stepankova, “Multi-agent systems: which research for which applications,”Robotics and Autonomous Systems, vol. 27, no. 1-2, pp. 91–106, 1999

work page 1999

[2] [2]

Agent AI: Surveying the Horizons of Multimodal Interaction

Z. Durante, Q. Huang, N. Wake, R. Gong, J. S. Park, B. Sarkar, R. Taori, Y . Noda, D. Terzopoulos, Y . Choiet al., “Agent ai: Surveying the horizons of multimodal interaction,”arXiv preprint arXiv:2401.03568, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

T. Masterman, S. Besen, M. Sawtell, and A. Chao, “The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey,”arXiv preprint arXiv:2404.11584, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Large multimodal agents: A survey,

J. Xie, Z. Chen, R. Zhang, X. Wan, and G. Li, “Large multimodal agents: A survey,”arXiv preprint arXiv:2402.15116, 2024

work page arXiv 2024

[5] [5]

Multi-agent systems: A survey,

A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,”Ieee Access, vol. 6, pp. 28 573–28 593, 2018

work page 2018

[6] [6]

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Y . Li, H. Wen, W. Wang, X. Li, Y . Yuan, G. Liu, J. Liu, W. Xu, X. Wang, Y . Sunet al., “Personal llm agents: Insights and sur- vey about the capability, efficiency and security,”arXiv preprint arXiv:2401.05459, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y . Wang, R. Tang, and E. Chen, “Understanding the planning of llm agents: A survey,”arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

T. Guo, X. Chen, Y . Wang, R. Chang, S. Pei, N. V . Chawla, O. Wiest, and X. Zhang, “Large language model based multi-agents: A survey of progress and challenges,”arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,

X. Li, S. Wang, S. Zeng, Y . Wu, and Y . Yang, “A survey on llm- based multi-agent systems: workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, p. 9, 2024

work page 2024

[10] [10]

Mycin: a knowledge-based consultation program for infectious disease diagnosis,

W. Van Melle, “Mycin: a knowledge-based consultation program for infectious disease diagnosis,”International journal of man-machine studies, vol. 10, no. 3, pp. 313–322, 1978

work page 1978

[11] [11]

Dendral and meta-dendral: Their applications dimension,

B. G. Buchanan and E. A. Feigenbaum, “Dendral and meta-dendral: Their applications dimension,” inReadings in artificial intelligence. Elsevier, 1981, pp. 313–322

work page 1981

[12] [12]

Multi-agent deep reinforcement learning: a survey,

S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,”Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022

work page 2022

[13] [13]

Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,

T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,”IEEE transactions on cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020

work page 2020

[14] [14]

A survey and critique of multiagent deep reinforcement learning,

P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi-Agent Systems, vol. 33, no. 6, pp. 750–797, 2019

work page 2019

[15] [15]

Deep reinforcement learning: A brief survey,

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,”IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017

work page 2017

[16] [16]

Intelligent agents: Theory and practice,

M. Wooldridge and N. R. Jennings, “Intelligent agents: Theory and practice,”The knowledge engineering review, vol. 10, no. 2, pp. 115– 152, 1995

work page 1995

[17] [17]

Chatgpt by openai,

OpenAI, “Chatgpt by openai,” https://openai.com/index/chatgpt/

work page

[18] [18]

Claude: An ai assistant by anthropic,

Anthropic, “Claude: An ai assistant by anthropic,” https://www.anthropic.com/claude

work page

[19] [19]

Gemini: Ai by google,

Google, “Gemini: Ai by google,” https://gemini.google.com/app

work page

[20] [20]

Deepseek: Next-generation open llms,

DeepSeek, “Deepseek: Next-generation open llms,” https://www.deepseek.com/

work page

[21] [21]

Llmfactor: Extracting profitable factors through prompts for explainable stock movement prediction,

M. Wang, K. Izumi, and H. Sakaji, “Llmfactor: Extracting profitable factors through prompts for explainable stock movement prediction,” arXiv preprint arXiv:2406.10811, 2024

work page arXiv 2024

[22] [22]

Can large language models beat wall street? unveiling the potential o f ai in stock selection

G. Fatouros, K. Metaxas, J. Soldatos, and D. Kyriazis, “Can large language models beat wall street? unveiling the potential of ai in stock selection,”arXiv preprint arXiv:2401.03737, 2024

work page arXiv 2024

[23] [23]

Can chatgpt forecast stock price movements? return pre- dictability and large language models

A. Lopez-Lira and Y . Tang, “Can chatgpt forecast stock price move- ments? return predictability and large language models,”arXiv preprint arXiv:2304.07619, 2023

work page arXiv 2023

[24] [24]

TradingGPT: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance.arXiv preprint arXiv:2309.03736,

Y . Li, Y . Yu, H. Li, Z. Chen, and K. Khashanah, “Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance,”arXiv preprint arXiv:2309.03736, 2023

work page arXiv 2023

[25] [25]

A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,

W. Zhang, L. Zhao, H. Xia, S. Sun, J. Sun, M. Qin, X. Li, Y . Zhao, Y . Zhao, X. Caiet al., “A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 4314–4325

work page 2024

[26] [26]

Llm-enhanced human-machine interaction for adaptive decision making in dynamic manufacturing process environments,

Z. Keskin, D. Joosten, N. Klasen, M. Huber, C. Liu, B. Drescher, and R. H. Schmitt, “Llm-enhanced human-machine interaction for adaptive decision making in dynamic manufacturing process environments,” IEEE Access, 2025

work page 2025

[27] [27]

The use of artificial intelligence to optimize the routing of vehicles and reduce traffic congestion in urban areas,

S. Dikshit, A. Atiq, M. Shahid, V . Dwivedi, and A. Thusu, “The use of artificial intelligence to optimize the routing of vehicles and reduce traffic congestion in urban areas,”EAI Endorsed Transactions on Energy Web, vol. 10, pp. 1–13, 2023

work page 2023

[28] [28]

Mdagents: An adaptive collaboration of llms for medical decision-making,

Y . Kim, C. Park, H. Jeong, Y . S. Chan, X. Xu, D. McDuff, H. Lee, M. Ghassemi, C. Breazeal, H. Parket al., “Mdagents: An adaptive collaboration of llms for medical decision-making,”Advances in Neural Information Processing Systems, vol. 37, pp. 79 410–79 452, 2024

work page 2024

[29] [29]

Medaide: Towards an omni medical aide via specialized llm-based multi-agent collaboration,

J. Wei, D. Yang, Y . Li, Q. Xu, Z. Chen, M. Li, Y . Jiang, X. Hou, and L. Zhang, “Medaide: Towards an omni medical aide via specialized llm-based multi-agent collaboration,”arXiv preprint arXiv:2410.12532, 2024

work page arXiv 2024

[30] [30]

Polaris: A safety-focused llm constellation architecture for healthcare,

S. Mukherjee, P. Gamble, M. S. Ausin, N. Kant, K. Aggarwal, N. Manjunath, D. Datta, Z. Liu, J. Ding, S. Busaccaet al., “Polaris: A safety-focused llm constellation architecture for healthcare,”arXiv preprint arXiv:2403.13313, 2024

work page arXiv 2024

[31] [31]

Evaluating large language models as agents in the clinic,

N. Mehandru, B. Y . Miao, E. R. Almaraz, M. Sushil, A. J. Butte, and A. Alaa, “Evaluating large language models as agents in the clinic,” NPJ digital medicine, vol. 7, no. 1, p. 84, 2024

work page 2024

[32] [32]

Ai-powered product data management in industry 4.0: A bibliographical analysis

S. Mazumdar, “Ai-powered product data management in industry 4.0: A bibliographical analysis.”

work page

[33] [33]

Ai for predictive maintenance in industrial systems,

A. Abbas, “Ai for predictive maintenance in industrial systems,” International Journal of Advanced Engineering Technologies and In- novations, vol. 1, no. 1, pp. 31–51, 2024

work page 2024

[34] [34]

Ai-powered supply chains towards greater efficiency,

N. Shobhana, “Ai-powered supply chains towards greater efficiency,” in Complex AI Dynamics and Interactions in Management. IGI Global, 2024, pp. 229–249

work page 2024

[35] [35]

Automated decision making comes of age,

T. H. Davenport and J. G. Harris, “Automated decision making comes of age,”MIT Sloan Management Review, vol. 46, no. 4, p. 83, 2005

work page 2005

[36] [36]

Q-learning: Theory and applications,

J. Clifton and E. Laber, “Q-learning: Theory and applications,”Annual Review of Statistics and Its Application, vol. 7, no. 1, pp. 279–301, 2020

work page 2020

[37] [37]

Conservative q-learning for offline reinforcement learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,”Advances in neural information processing systems, vol. 33, pp. 1179–1191, 2020

work page 2020

[38] [38]

Deep reinforcement learning with double q-learning,

H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016

work page 2016

[39] [39]

A survey of deep q-networks used for reinforcement learning: state of the art,

A. M. Hafiz, “A survey of deep q-networks used for reinforcement learning: state of the art,”Intelligent Communication Technologies and Virtual Mobile Networks: Proceedings of ICICV 2022, pp. 393–402, 2022

work page 2022

[40] [40]

Policy gradient methods for reinforcement learning with function approximation,

R. S. Sutton, D. McAllester, S. Singh, and Y . Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999

work page 1999

[41] [41]

Mastering the game of go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Boltonet al., “Mastering the game of go without human knowledge,”nature, vol. 550, no. 7676, pp. 354–359, 2017

work page 2017

[42] [42]

A Comprehensive Overview of Large Language Models

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”arXiv preprint arXiv:2307.06435, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

arXiv, ://arxiv.org/abs/2502.17516, arXiv:2502.17516 [cs], doi:10.48550/arXiv.2502.17516

Z. Lin, S. Basu, M. Beigi, V . Manjunatha, R. A. Rossi, Z. Wang, Y . Zhou, S. Balasubramanian, A. Zarei, K. Rezaeiet al., “A survey on mechanistic interpretability for multi-modal foundation models,”arXiv preprint arXiv:2502.17516, 2025

work page arXiv 2025

[44] [44]

A survey of llm-based agents in medicine: How far are we from baymax?

W. Wang, Z. Ma, Z. Wang, C. Wu, W. Chen, X. Li, and Y . Yuan, “A survey of llm-based agents in medicine: How far are we from baymax?” arXiv preprint arXiv:2502.11211, 2025

work page arXiv 2025

[45] [45]

Large language model agent in financial trading: A survey,

H. Ding, Y . Li, J. Wang, and H. Chen, “Large language model agent in financial trading: A survey,”arXiv preprint arXiv:2408.06361, 2024

work page arXiv 2024

[46] [46]

arXiv preprint arXiv:2410.21418 (2024)

Y . Li, H. Zhao, H. Jiang, Y . Pan, Z. Liu, Z. Wu, P. Shu, J. Tian, T. Yang, S. Xuet al., “Large language models for manufacturing,” arXiv preprint arXiv:2410.21418, 2024

work page arXiv 2024

[47] [47]

Framework for llm applications in manufacturing,

C. I. Garcia, M. A. DiBattista, T. A. Letelier, H. D. Halloran, and J. A. Camelio, “Framework for llm applications in manufacturing,” Manufacturing Letters, vol. 41, pp. 253–263, 2024

work page 2024

[48] [48]

A large language model-based multi-agent manufacturing system for intelligent shopfloor,

Z. Zhao, D. Tang, H. Zhu, Z. Zhang, K. Chen, C. Liu, and Y . Ji, “A large language model-based multi-agent manufacturing system for intelligent shopfloor,”arXiv preprint arXiv:2405.16887, 2024

work page arXiv 2024

[49] [49]

Introducing the model context protocol,

Anthropic, “Introducing the model context protocol,” https://www.anthropic.com/news/model-context-protocol, 2024, accessed: 2024-04-10

work page 2024

[50] [50]

A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms),

A. Singh, A. Ehtesham, S. Kumar, and T. T. Khoei, “A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms),” 2025

work page 2025

[51] [51]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020

[52] [52]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, vol. 2, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

On faithfulness and factuality in abstractive summarization

J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faith- fulness and factuality in abstractive summarization,”arXiv preprint arXiv:2005.00661, 2020

work page arXiv 2005

[54] [54]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023

work page 2023

[55] [55]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

H. Inan, K. Upasani, J. Chi, R. Rungta, K. Iyer, Y . Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggineet al., “Llama guard: Llm-based input-output safeguard for human-ai conversations,”arXiv preprint arXiv:2312.06674, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[56] [56]

Safeguarding large language models: A survey

Y . Dong, R. Mu, Y . Zhang, S. Sun, T. Zhang, C. Wu, G. Jin, Y . Qi, J. Hu, J. Menget al., “Safeguarding large language models: A survey,” arXiv preprint arXiv:2406.02622, 2024

work page arXiv 2024

[57] [57]

Building guardrails for large language models,

Y . Dong, R. Mu, G. Jin, Y . Qi, J. Hu, X. Zhao, J. Meng, W. Ruan, and X. Huang, “Building guardrails for large language models,”arXiv preprint arXiv:2402.01822, 2024

work page arXiv 2024

[58] [58]

Llm-based chatbots for mining software repositories: Challenges and opportunities,

S. Abedu, A. Abdellatif, and E. Shihab, “Llm-based chatbots for mining software repositories: Challenges and opportunities,” inProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 2024, pp. 201–210

work page 2024

[59] [59]

Large-language-models (llm)-based ai chatbots: Ar- chitecture, in-depth analysis and their performance evaluation,

V . Kumar, P. Srivastava, A. Dwivedi, I. Budhiraja, D. Ghosh, V . Goyal, and R. Arora, “Large-language-models (llm)-based ai chatbots: Ar- chitecture, in-depth analysis and their performance evaluation,” in International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer, 2023, pp. 237–249

work page 2023

[60] [60]

A complete survey on llm-based ai chatbots,

S. K. Dam, C. S. Hong, Y . Qiao, and C. Zhang, “A complete survey on llm-based ai chatbots,”arXiv preprint arXiv:2406.16937, 2024

work page arXiv 2024

[61] [61]

Just-in-time news: An ai chatbot for the modern information age

F. Sufi, “Just-in-time news: An ai chatbot for the modern information age.”AI, vol. 6, no. 2, 2025

work page 2025

[62] [62]

13 generative ai and llm: Case study in e-commerce,

R. Iyer, V . C. Maralapalle, P. Mahesh, and D. Patil, “13 generative ai and llm: Case study in e-commerce,”Generative AI and LLMs: Natural Language Processing and Generative Adversarial Networks, p. 253, 2024

work page 2024

[63] [63]

From llms to llm- based agents for software engineering: A survey of current, challenges and future,

H. Jin, L. Huang, H. Cai, J. Yan, B. Li, and H. Chen, “From llms to llm- based agents for software engineering: A survey of current, challenges and future,”arXiv preprint arXiv:2408.02479, 2024

work page arXiv 2024

[64] [64]

Automatic programming: Large language models and beyond,

M. R. Lyu, B. Ray, A. Roychoudhury, S. H. Tan, and P. Thongtanunam, “Automatic programming: Large language models and beyond,”ACM Transactions on Software Engineering and Methodology, 2024

work page 2024

[65] [65]

A comprehensive overview of large language models (llms) for cyber defences: Opportunities and direc- tions,

M. Hassanin and N. Moustafa, “A comprehensive overview of large language models (llms) for cyber defences: Opportunities and direc- tions,”arXiv preprint arXiv:2405.14487, 2024

work page arXiv 2024

[66] [66]

From vulnerability to defense: The role of large language models in enhancing cybersecurity,

W. Kasri, Y . Himeur, H. A. Alkhazaleh, S. Tarapiah, S. Atalla, W. Man- soor, and H. Al-Ahmad, “From vulnerability to defense: The role of large language models in enhancing cybersecurity,”Computation, vol. 13, no. 2, p. 30, 2025

work page 2025

[67] [67]

Github copilot: Your ai pair programmer,

GitHub, “Github copilot: Your ai pair programmer,” https://github.com/features/copilot

work page

[68] [68]

Cursor: Ai-powered code editor,

Cursor, “Cursor: Ai-powered code editor,” https://www.cursor.com/en

work page

[69] [69]

Llm agents for education: Advances and applications,

Z. Chu, S. Wang, J. Xie, T. Zhu, Y . Yan, J. Ye, A. Zhong, X. Hu, J. Liang, P. S. Yuet al., “Llm agents for education: Advances and applications,”arXiv preprint arXiv:2503.11733, 2025

work page arXiv 2025

[70] [70]

The role of large language models in personalized learning: a systematic review of educational impact,

S. Sharma, P. Mittal, M. Kumar, and V . Bhardwaj, “The role of large language models in personalized learning: a systematic review of educational impact,”Discover Sustainability, vol. 6, no. 1, pp. 1– 24, 2025

work page 2025

[71] [71]

[Xuet al., 2025 ] Songlin Xu, Hao-Ning Wen, Hongyi Pan, Dallas Dominguez, Dongyin Hu, and Xinyu Zhang

S. Xu, X. Zhang, and L. Qin, “Eduagent: Generative student agents in learning,”arXiv preprint arXiv:2404.07963, 2024

work page arXiv 2024

[72] [72]

Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students,

H. Jin, M. Yoo, J. Park, Y . Lee, X. Wang, and J. Kim, “Teachtune: Reviewing pedagogical agents against diverse student profiles with simulated students,”arXiv preprint arXiv:2410.04078, 2024

work page arXiv 2024

[73] [73]

Al-khwarizmi: Discovering physical laws with foundation models,

C. E. Mower and H. Bou-Ammar, “Al-khwarizmi: Discovering physical laws with foundation models,”arXiv preprint arXiv:2502.01702, 2025

work page arXiv 2025

[74] [74]

Content knowledge identification with multi-agent large language models (llms),

K. Yang, Y . Chu, T. Darwin, A. Han, H. Li, H. Wen, Y . Copur- Gencturk, J. Tang, and H. Liu, “Content knowledge identification with multi-agent large language models (llms),” inInternational Conference on Artificial Intelligence in Education. Springer, 2024, pp. 284–292

work page 2024

[75] [75]

Mathagent: Leveraging a mixture-of-math-agent framework for real-world multi- modal mathematical error detection,

Y . Yan, S. Wang, J. Huo, P. S. Yu, X. Hu, and Q. Wen, “Mathagent: Leveraging a mixture-of-math-agent framework for real-world multi- modal mathematical error detection,”arXiv preprint arXiv:2503.18132, 2025

work page arXiv 2025

[76] [76]

Newton: Are large language models capable of physical reasoning?

Y . R. Wang, J. Duan, D. Fox, and S. Srinivasa, “Newton: Are large language models capable of physical reasoning?”arXiv preprint arXiv:2310.07018, 2023

work page arXiv 2023

[77] [77]

Augmenting large language models with chemistry tools,

A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. D. White, and P. Schwaller, “Augmenting large language models with chemistry tools,”Nature Machine Intelligence, vol. 6, no. 5, pp. 525–535, 2024

work page 2024

[78] [78]

Ni, and Jian Guo

S. Wang, H. Yuan, L. M. Ni, and J. Guo, “Quantagent: Seeking holy grail in trading by self-improving large language model,”arXiv preprint arXiv:2402.03755, 2024

work page arXiv 2024

[79] [79]

org/abs/2308.00016

S. Wang, H. Yuan, L. Zhou, L. M. Ni, H.-Y . Shum, and J. Guo, “Alpha- gpt: Human-ai interactive alpha mining for quantitative investment,” arXiv preprint arXiv:2308.00016, 2023

work page arXiv 2023

[80] [80]

Deploying foundation model powered agent services: A survey,

W. Xu, J. Chen, P. Zheng, X. Yi, T. Tian, W. Zhu, Q. Wan, H. Wang, Y . Fan, Q. Suet al., “Deploying foundation model powered agent services: A survey,”arXiv preprint arXiv:2412.13437, 2024

work page arXiv 2024