{"total":31,"items":[{"citing_arxiv_id":"2606.26614","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization","primary_cat":"cs.HC","submitted_at":"2026-06-25T05:19:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"HiLSVA introduces a plan-first multi-agent LLM system for scientific visualization that incorporates explicit human oversight, stepwise provenance, and learn-at-test-time adaptation, evaluated via case studies and a 12-participant user study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10106","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"What makes a harness a harness: necessary and sufficient conditions for an agent harness","primary_cat":"cs.SE","submitted_at":"2026-06-08T19:35:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes and tests a constitutive definition of 'agent harness' via conceptual analysis of literature and six real systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17675","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability","primary_cat":"cs.SE","submitted_at":"2026-05-17T22:08:52+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16689","ref_index":49,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Against the Monolithic Wireless World Model: Why NextG Needs Composable and Agentic Intelligence","primary_cat":"eess.SP","submitted_at":"2026-05-15T22:56:11+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11706","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GRAFT: Graph-Tokenized LLMs for Tool Planning","primary_cat":"cs.LG","submitted_at":"2026-05-12T07:59:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GRAFT internalizes tool dependency graphs via dedicated special tokens in LLMs and applies on-policy context distillation to achieve higher exact sequence matching and dependency legality than prior external-graph methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"dency legality, advancing the development of more reliable LLMs for practical workflows. 2 Related Work 2.1 LLM-based Tool Planning Early LLM-based methods mainly formulate tool planning as subtask-tool matching. They decompose a user query into subtasks and select suitable tools through prompting [13, 14, 15], retrieval [15, 16], or tool-token generation [17]. More recently, agentic tool-use methods [18] formulate planning as a closed-loop decision process, where the LLM selects the next tool action based on the previous context, and execution observations, e.g., ReAct [19], AgentGym [20], and AgentFlow [21]. This interaction-based paradigm can refine decisions after tool execution, but it relies on a strong backbone LLM and requires repeated LLM calls with observation updates."},{"citing_arxiv_id":"2605.07069","ref_index":65,"ref_count":3,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems","primary_cat":"cs.MA","submitted_at":"2026-05-08T00:30:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"not only by their individual capabilities but by interactions with other agents - and humans - over time. In these settings, outcomes are not determined only by the performance of a single agent, but emerge from the patterns of interactions across agent-agent and agent-human populations within structured networks. We have built agents capable of reasoning, planning and negotiation [66, 92, 62], but have largely studied them in terms of individual task completion or as independent units contributing to a collective objective. A recent NeurIPS position paper argues \"Large Language Models Miss the Multi-Agent Mark\" because of the failure to account for population dynamics [56]; indeed, today's agents have mostly been single-task agentic systems, and emerging social groups remain unstudied."},{"citing_arxiv_id":"2605.06812","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Towards Security-Auditable LLM Agents: A Unified Graph Representation","primary_cat":"cs.AI","submitted_at":"2026-05-07T18:14:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agent-BOM is a unified hierarchical attributed directed graph that models static capability bases and dynamic semantic states of LLM agents for path-level security auditing and risk assessment.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Large language models(LLMs) are evolving from single-turn text generators into agentic AI systems with autonomous and persistent execution capabilities [1]. Modern agent frameworks Corresponding author: Yutao Hu. commonly support goal decomposition, context management, long-term memory, external tool invocation, environment inter- action, and multi-agent collaboration [2]. These mechanisms allow agents to go beyond natural-language generation. Agents can read external resources, invoke software interfaces, change environment states, and carry out complex tasks across multi- ple turns or multiple agents. This expansion reshapes the security boundary of agentic systems. Traditional software behavior is usually driven by"},{"citing_arxiv_id":"2605.01247","ref_index":75,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FP-Agent: Fingerprinting AI Browsing Agents","primary_cat":"cs.CR","submitted_at":"2026-05-02T04:58:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Behavioral fingerprints distinguish AI browsing agents from humans and each other, enabling superior detection compared to current bot systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27859","ref_index":59,"ref_count":3,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rethinking Agentic Reinforcement Learning In Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-04-30T13:43:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Gui-r1: A generalist r1-style vision-language action model for gui agents.arXiv preprint arXiv:2504.10458(2025d). [58] Xinji Mai, Haotian Xu, Xing Wang, Weinong Ma, Jian Li, Yingying Zhang, and Wenqiang Zhang. 2025. Agent rl scaling law: Agent rl with spontaneous code execution for mathematical problem solving.arXiv preprint arXiv:2505.07773(2025). [59] Tula Masterman, Sandi Besen Smith, Mason Sawtell, and Alex Chao. 2024. The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584(2024). [60] Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. SimPO: Simple preference optimization with a reference-free reward. (2024). https://openreview."},{"citing_arxiv_id":"2604.17708","ref_index":103,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization","primary_cat":"cs.AI","submitted_at":"2026-04-20T01:44:18+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15972","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Weak-Link Optimization for Multi-Agent Reasoning and Collaboration","primary_cat":"cs.AI","submitted_at":"2026-04-17T11:36:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sponse, researchers proposed reasoning methods such as Chain of Thought (CoT) [3], which formalize human reasoning ap- proaches into prompt templates and emphasize subtask decom- position and multi-step reasoning. Recent studies have further explored task-driven alignment and structure-aware reasoning- chain optimization [4], [5]. Concurrently, the emergence of AI Agents [6], particularly multi-agent frameworks [7] leveraging planning, reflection, and tool utilization capabilities across collaborating specialized agents, has significantly enhanced LLMs' performance on complex problem-solving tasks [8]. Recent advances further extend collaborative reasoning beyond This work was supported by the National Natural Science Foundation of"},{"citing_arxiv_id":"2604.13800","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development","primary_cat":"cs.RO","submitted_at":"2026-04-15T12:36:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11945","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow","primary_cat":"cs.LG","submitted_at":"2026-04-13T18:36:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoSurrogate is a multi-agent LLM framework that autonomously constructs, tunes, and validates deep learning surrogates for subsurface flow from natural language, outperforming expert baselines on a 3D carbon storage task.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06296","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent","primary_cat":"cs.LG","submitted_at":"2026-04-07T17:13:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03976","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Quantifying Trust: Financial Risk Management for Trustworthy AI Agents","primary_cat":"cs.AI","submitted_at":"2026-04-05T05:42:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.28166","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Evaluating Privilege Usage of Agents with Real-World Tools","primary_cat":"cs.CR","submitted_at":"2026-03-30T08:35:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GrantBox evaluates LLM agents using real-world tools and finds they remain vulnerable to sophisticated prompt injection attacks with an 84.80% average success rate.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13848","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration","primary_cat":"cs.AI","submitted_at":"2026-03-08T18:32:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.11224","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation","primary_cat":"cs.SE","submitted_at":"2026-02-11T13:31:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.19185","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications","primary_cat":"cs.SE","submitted_at":"2025-09-23T16:02:09+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.12626","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow","primary_cat":"cs.HC","submitted_at":"2025-09-16T03:43:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DoubleAgents shows that a distributed-cognition design with coordination agent, dashboard, and policy module increases user comfort and reliance on AI agents for coordination tasks over time.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"of reasoning, acting, and interacting as defining features ofagenticLLMs. Tool use, encompassing retrieval- augmented generation (RAG) and API utilization, is a central paradigm, extensively discussed in Li et al. [21] and further conceptualized by Wang et al. [22]. Planning and reasoning strategies form another pillar, with 4 surveys such as Masterman et al. [23] and Kumar et al. [24] highlighting common design patterns like plan-execute-reflect loops, while Tao et al. [25] extend this to self-evolution, where agents iteratively refine knowledge and strategies without substantial human intervention. Other directions explore collaborative, cross-modal, and embodied settings, from multi-agent systems [26] to multimodal integration [27], and"},{"citing_arxiv_id":"2508.08127","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks","primary_cat":"cs.AI","submitted_at":"2025-08-11T16:04:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BlindGuard introduces an unsupervised hierarchical agent encoder plus corruption-guided contrastive detector that identifies malicious agents in LLM-based multi-agent systems without any attack labels or prior knowledge of malicious behaviors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.11763","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents","primary_cat":"cs.CL","submitted_at":"2025-06-13T13:17:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeepResearch Bench supplies 100 expert-crafted PhD-level tasks and two human-aligned evaluation frameworks to measure deep research agents on report quality and citation accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.02153","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Small Language Models are the Future of Agentic AI","primary_cat":"cs.AI","submitted_at":"2025-06-02T18:35:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.23723","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering","primary_cat":"cs.CL","submitted_at":"2025-05-29T17:54:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A 7B Qwen-2.5 LLM trained with a new RL framework on only 9 ML tasks achieves performance comparable to much larger proprietary LLM agents at lower computational cost with cross-task generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.16120","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LLM-Powered AI Agent Systems and Their Applications in Industry","primary_cat":"cs.AI","submitted_at":"2025-05-22T01:52:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Artificial Intelligence (AI) techniques enable the development of AI agent systems, which integrates perception, reasoning, learning, and action to behave intelligently in a dynamic environment [2]. Recent progress in large language models (LLMs) has significantly changed the AI agent system, driving advances in automation and human-AI collaboration [3]-[9]. Compared to traditional agent systems, which mainly relied on task-specific rules [10], [11] or reinforcement learning (RL) [12]-[15], LLM-powered AI agent system provides significantly more adaptability in dynamic and open environments. Agents can process and generate insights from diverse data modalities, including text, images, audio, and structured tabular data."},{"citing_arxiv_id":"2504.01990","ref_index":65,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems","primary_cat":"cs.AI","submitted_at":"2025-03-31T18:00:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"these domains, we aim to provide a unique perspective that highlights not only where agents excel but also where significant advancements are needed to unlock their full potential. Table1.3: Summaryofexistingreviewswithdifferentfocalpoints. •indicatesprimaryfocuswhile ◦indicates secondary or minor focus. Survey Cognition Memory World Model Reward Action Self Evolve MultiAgent Safety Zhang et al. [66] • • ◦ ◦ ◦ • ◦ ◦ Guo et al. [65] • • ◦ ◦ ◦ • • ◦ Yu et al. [67] • • ◦ ◦ • ◦ • • Wang et al. [62] • • ◦ ◦ • ◦ • ◦ Masterman et al. [64] • • ◦ ◦ • ◦ • ◦ Xi et al. [61] • • ◦ ◦ • • • • Huang et al. [60] • • ◦ • • • • • Durante et al. [59] • • ◦ • • • • • This Book • • • • • • • • The book is divided into four key parts: •In Part I: Modular Design of Intelligent Agents, we introduce the core modules of agents, including"},{"citing_arxiv_id":"2503.21460","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Model Agent: A Survey on Methodology, Applications and Challenges","primary_cat":"cs.CL","submitted_at":"2025-03-27T12:50:17+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"into their fundamental methodological components, in- cluding role definition, memory mechanisms, planning capabilities, and action execution [21]. 2) Build-Collaborate-Evolve framework: We analyze three interconnected dimensions of LLM agents - construction, collaboration, and evolution - offering a more holistic understanding than previous approaches [22], [23]. This integrated architectural perspective highlights the continuity between individual LLM agent design and collaborative systems, whereas prior studies have often examined these aspects separately [22], [24]. 3) Frontier applications and real-world focus: Beyond addressing theoretical concepts, our work examines cutting-edge tools, communication protocols, and di-"},{"citing_arxiv_id":"2411.18279","ref_index":202,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Model-Brained GUI Agents: A Survey","primary_cat":"cs.AI","submitted_at":"2024-11-27T12:13:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"External Knowledge Long-term Other external knowledge sources aiding task completion External Knowledge Base Task Success Metrics Long-term Metrics from task success or failure rates across sessions Database, Disk based on accumulated knowledge. The agent's memory is generally divided into two main types: Short-Term Memory [201] and Long-Term Memory [202]. We show an overview of different types of memory in GUI agents in Table 6. 5.6.1 Short-Term Memory Short-Term Memory (STM) provides the primary, ephemeral context used by the LLM during runtime [203]. STM stores information pertinent to the current task, such as recent plans, actions, results, and environmental states, and continuously updates to reflect the task's ongoing status."},{"citing_arxiv_id":"2411.04468","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks","primary_cat":"cs.AI","submitted_at":"2024-11-07T06:36:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2409.02977","ref_index":234,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Model-Based Agents for Software Engineering: A Survey","primary_cat":"cs.SE","submitted_at":"2024-09-04T15:59:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"MetaGPT [229]✓Waterfall Pre-defined Vertical Direct Communication + Memory AgentVerse [230]✓- Task-Adaptive Vertical Direct Communication AutoAgents [231]✓- Task-Adaptive Vertical Direct Communication + Memory Rasheedet al.[232]✓Waterfall Pre-defined Vertical + Horizontal Direct Communication Co-Learning [233]✓- Pre-defined Vertical + Horizontal Direct Communication AISD [234]✓Waterfall Pre-defined Vertical Direct Communication LLM4PLC [235]×- Pre-defined Vertical Direct Communication CodePori [236]✓Waterfall Pre-defined Vertical Direct Communication FlowGen Waterfall [237]✓Waterfall Pre-defined Vertical + Horizontal Direct Communication FlowGen TDD [237]✓Agile Pre-defined Vertical Direct Communication FlowGen Scrum [237]✓Agile Pre-defined Vertical + Horizontal Direct Communication"}],"limit":50,"offset":0}