{"total":34,"items":[{"citing_arxiv_id":"2605.22733","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools","primary_cat":"cs.AI","submitted_at":"2026-05-21T17:03:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HarnessAPI derives streaming HTTP endpoints, OpenAPI UI, and MCP tools from a single handler.py plus Pydantic schemas, cutting framework boilerplate by 74%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15184","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Is Grep All You Need? How Agent Harnesses Reshape Agentic Search","primary_cat":"cs.CL","submitted_at":"2026-05-14T17:58:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13762","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments","primary_cat":"cs.MA","submitted_at":"2026-05-13T16:41:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10555","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-05-11T13:30:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The Agent-First Tool API paradigm raises AI agent task success from 64% to 88% and cuts human interventions by 72.7% through semantic phases, structured contracts, and risk governance in a production enterprise system.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Lu, and Y . Zhuang, \"HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,\" in Advances in Neural Information Processing Systems (NeurIPS), 2023. [22] Y . Ge, W. Hua, K. Mei, J. Ji, J. Tan, S. Xu, Z. Li, and Y . Zhang, \"OpenAGI: When LLM meets domain experts,\" inAdvances in Neural Information Processing Systems (NeurIPS), 2023. [23] L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Lin, W. X. Zhao, Z. Wei, and J. Wen, \"A survey on large language model based autonomous agents,\"arXiv preprint arXiv:2308.11432, 2023. [24] J. Ruan, Y . Chen, B. Zhang, Z. Xu, T. Bao, G. Du, S. Shi, H. Mao, X. Zeng, and R. Zhao, \"TPTU: Task planning and tool usage of large"},{"citing_arxiv_id":"2605.08904","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces","primary_cat":"cs.AI","submitted_at":"2026-05-09T11:51:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08769","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-05-09T07:55:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Additional results indicate that process reward is most useful when terminal success is extremely sparse, and qualitative case studies illustrate that EvoMAS adapts agent coordination as the task state evolves. 1 Introduction Large language model (LLM)-based agents [23, 16, 9] extend pure language generation with capabil- ities such as tool use [34, 24, 25, 20, 22, 14, 27], planning [30, 33], and contextual reasoning [12], enabling strong empirical performance across tasks including question answering, data analysis, code generation, and web interaction [ 17, 18, 2, 43, 5]. Building on single-agent systems, multi-agent frameworks further improve performance through specialization, parallel reasoning, and mutual verification among agents [32, 9, 13, 6, 3], offering a promising approach for solving complex tasks"},{"citing_arxiv_id":"2605.00741","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems","primary_cat":"cs.CR","submitted_at":"2026-05-01T15:42:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19657","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"An AI Agent Execution Environment to Safeguard User Data","primary_cat":"cs.CR","submitted_at":"2026-04-21T16:45:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"taken from other sources to fit our threat model, removing Table 3.A sample of tasks from our benchmark suite, with the number of tools potentially used and the task's source. Task ID Description Tools Source 1 Order food. 3 [ 1] 2 Analyze website with network tools. 7 [ 72] 3 Schedule meeting across time zones. 3 [ 72] 5 Classify csv data and send. 5 [ 78] 9 Access remote DB and send. 4 [ 35] 14 Read file and follow instructions. 5 [ 13] 19 Filter file data and send. 4 [ 13] private data from user prompts and using GAAP's private data DB to store it instead. In total, these tasks have access to a collection of 10 MCP servers with 48 tools, some of which we implemented ourselves and some of which we imported from open-source repositories [43, 44, 56, 68]."},{"citing_arxiv_id":"2604.16966","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning","primary_cat":"cs.CR","submitted_at":"2026-04-18T11:15:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13180","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications","primary_cat":"cs.AI","submitted_at":"2026-04-14T18:02:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12129","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents","primary_cat":"cs.AI","submitted_at":"2026-04-13T23:23:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Aethon enables near-constant-time instantiation of stateful AI agents via reference-based replication over compositional views, layered memory, and copy-on-write semantics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10513","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis","primary_cat":"cs.AI","submitted_at":"2026-04-12T08:02:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"deviations can propagate through subsequent actions, producing trace-dependent failures that cannot be diagnosed from final out- puts alone [28]. Prior work has primarily addressed this variability through sam- pling and selection. Self-consistency and Tree-of-Thoughts frame reasoning as a search process over multiple candidate trajecto- ries, aggregating outputs to improve per-instance accuracy [27, 31]. Other approaches reduce harmful variance through alignment tun- ing, constitutional methods, or constrained decoding [ 3, 4, 16]. While effective at improving outcomes, these techniques do not modify the underlying specifications that repeatedly generate un- stable behavior. Observability, Diagnosis, and Trace Analysis.As agent pipelines grow more complex, observability has become essential for reli-"},{"citing_arxiv_id":"2605.16282","ref_index":53,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents","primary_cat":"cs.CY","submitted_at":"2026-04-11T04:25:19+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09917","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Toward Explanatory Equilibrium: Verifiable Reasoning as a Coordination Mechanism under Asymmetric Information","primary_cat":"cs.MA","submitted_at":"2026-04-10T21:21:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Structured reasoning artifacts enable coordination in LLM multi-agent systems by preventing approval and welfare collapse under asymmetric information while keeping bad-approval rates low across audit regimes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09889","ref_index":83,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach","primary_cat":"cs.AI","submitted_at":"2026-04-10T20:36:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08407","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain","primary_cat":"cs.CR","submitted_at":"2026-04-09T16:06:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08601","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains","primary_cat":"cs.AI","submitted_at":"2026-04-07T22:51:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"OpenKedge redefines AI agent state mutations as a governed process using intent proposals, policy-evaluated execution contracts, and cryptographic evidence chains to enable safe, auditable agentic behavior.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06217","ref_index":66,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The End of the Foundation Model Era: Open-Weight Models, Sovereign AI, and Inference as Infrastructure","primary_cat":"cs.CY","submitted_at":"2026-03-18T04:49:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Open-weight models have ended the foundation model era by eliminating pre-training as a durable moat and enabling sovereign AI control through direct access to model weights.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.20867","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SoK: Agentic Skills -- Beyond Tool Use in LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-02-24T13:11:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.03690","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents","primary_cat":"cs.SE","submitted_at":"2025-11-05T18:16:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The OpenHands Software Agent SDK provides a composable architecture for production software agents featuring native sandboxed execution, multi-LLM routing, and built-in security analysis.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.16853","ref_index":54,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agentic Inequality","primary_cat":"cs.CY","submitted_at":"2025-10-19T14:32:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces the concept of agentic inequality and develops a three-dimensional framework (availability, quality, quantity) to analyze how autonomous AI agents could deepen or mitigate existing divides through scalable goal delegation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21035","ref_index":120,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"into manageable sub-goals [128, 161, 33, 115, 77] and executing them sequentially. More sophisticated agents organize reasoning into tree [145, 41] or graph structures [10], enabling exploration of multiple solution paths. Critical to agent performance are mechanisms for self-reflection and iterative refinement [134, 74, 123, 19, 140], consistency checking [120], and integration with external tools and knowledge bases [64, 160, 86, 37, 85], which promises to transform LLMs from passive text generators into active problem-solving agents. Multi-Agent System Given the capabilities of LLMs, multi-agent collaboration is expected to further enhance problem-solving performance [126, 107, 30, 117, 159]. In such systems, agents adopt specialized"},{"citing_arxiv_id":"2507.04227","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mobile GUI Agents under Real-world Threats: Are We There Yet?","primary_cat":"cs.CR","submitted_at":"2025-07-06T03:31:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces an app-content instrumentation framework and benchmark showing that examined GUI agents suffer 42.0% and 36.1% average misleading rates from third-party content in dynamic and static tests respectively.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.01990","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems","primary_cat":"cs.AI","submitted_at":"2025-03-31T18:00:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"alsoserveasguidepostsforfutureenhancements, ensuringourapproachremainsbothgroundedinrigorous science and open to ongoing innovation. 1.4 Navigating This Book T his book is structured to provide a comprehensive, modular, and interdisciplinary examination of intelligent agents, drawing inspiration from cognitive science, neuroscience, and other disciplines to guide the next wave of advancements in AI. While many existing surveys [59,60,61,62,63,64, 65, 66, 67] offer valuable insights into various aspects of agent research, we provide a detailed comparison of their focal points in Table 1.3. Our work distinguishes itself by systematically comparing biological cognition with computational frameworks to identify synergies, gaps, and opportunities for innovation. By bridging Navigating This Book 37 these domains, we aim to provide a unique perspective that highlights not only where agents excel but also"},{"citing_arxiv_id":"2411.04468","ref_index":54,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","primary_cat":"cs.AI","submitted_at":"2024-11-07T06:36:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.23218","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OS-ATLAS: A Foundation Action Model for Generalist GUI Agents","primary_cat":"cs.CL","submitted_at":"2024-10-30T17:10:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.10162","ref_index":212,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models","primary_cat":"cs.AI","submitted_at":"2024-06-14T16:26:20+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.13501","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Survey on the Memory Mechanism of Large Language Model based Agents","primary_cat":"cs.AI","submitted_at":"2024-04-21T01:49:46+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"example, if a trip-planning agent intends to book a ticket, it should send an order request to the ticket website, and observe the response before taking the next action. A personal assistant agent should adjust its behaviors according to the user's feedback, providing personalized responses to improve user's satisfaction. To further push the boundary of LLMs towards AGI, recent years have witnessed a large number of studies on LLM-based agents [3, 4], where the key is to equip LLMs with additional modules to enhance their self-evolving capability in real-world environments. Among all the added modules, memory is a key component that differentiates the agents from original LLMs, making an agent truly an agent (see Figure 1). It plays an extremely important role in determining how the agent accumulates knowledge, processes historical experience, retrieves"},{"citing_arxiv_id":"2402.06196","ref_index":175,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Large Language Models: A Survey","primary_cat":"cs.CL","submitted_at":"2024-02-09T05:37:09+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[172], [173], [174]. a) Prompt engineering techniques for agents: Like RAG and Tools, prompt engineering techniques that specif- ically address the needs of LLM-based agents have been developed. Three such examples are Reasoning without Ob- servation (ReWOO), Reason and Act (ReAct), and Dialog- Enabled Resolving Agents (DERA). Reasoning without Observation (ReWOO) [175] aims to decouple reasoning from direct observations. ReWOO operates by enabling LLMs to formulate comprehensive reasoning plans or meta-plans without immediate reliance on external data or tools. This approach allows the agent to create a struc- tured framework for reasoning that can be executed once the necessary data or observations are available."},{"citing_arxiv_id":"2402.02716","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Understanding the planning of LLM agents: A survey","primary_cat":"cs.AI","submitted_at":"2024-02-05T04:25:24+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.03568","ref_index":52,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agent AI: Surveying the Horizons of Multimodal Interaction","primary_cat":"cs.AI","submitted_at":"2024-01-07T19:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.07099","ref_index":51,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ClausewitzGPT Framework: A New Frontier in Theoretical Large Language Model Enhanced Information Operations","primary_cat":"cs.CY","submitted_at":"2023-10-11T00:39:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Introduces the ClausewitzGPT equation as a mathematical formulation to quantify risks in LLM-augmented information operations, drawing on Clausewitz principles and emphasizing ethical autonomous AI agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.07864","ref_index":90,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Rise and Potential of Large Language Model Based Agents: A Survey","primary_cat":"cs.AI","submitted_at":"2023-09-14T17:12:03+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"emergent capabilities and have gained immense popularity [24; 25; 26; 41], researchers have started to leverage these models to construct AI agents [22; 27; 28; 89]. Specifically, they employ LLMs as the primary component of brain or controller of these agents and expand their perceptual and action space through strategies such as multimodal perception and tool utilization [90; 91; 92; 93; 94]. These LLM- based agents can exhibit reasoning and planning abilities comparable to symbolic agents through techniques like Chain-of-Thought (CoT) and problem decomposition [95; 96; 97; 98; 99; 100; 101]. They can also acquire interactive capabilities with the environment, akin to reactive agents, by learning from feedback and performing new actions [ 102; 103; 104]."},{"citing_arxiv_id":"2307.06435","ref_index":231,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Comprehensive Overview of Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-07-12T20:01:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ing LLMs-powered agents [224, 216], where LLMs behave as the brain of agents. LLMs have been incorporated in web agents [166, 167], coding agents [229], tool agents [27, 223], embodied agents [26], and conversational agents [195], requir- ing minimal to no fine-tuning\". Below we summarize the re- search in LLMs-based autonomous agents. For a more detailed discussion, please refer to [230, 231]. LLMs Steering Autonomous Agents: LLMs are the cognitive controllers of the autonomous agents. They generate plans, rea- son about tasks, incorporate memory to complete tasks, and adapt the outline depending on the feedback from the environ- ment. Depending on the acquired capabilities of LLMs, many methods fine-tune, propose a better prompting approach, or uti-"}],"limit":50,"offset":0}