hub Canonical reference

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

Tula Masterman, Sandi Besen, Mason Sawtell, Alex Chao · 2024 · cs.AI · arXiv 2404.11584

Canonical reference. 91% of citing Pith papers cite this work as background.

31 Pith papers citing it

Background 91% of classified citations

open full Pith review browse 31 citing papers arXiv PDF

abstract

This survey paper examines the recent advancements in AI agent implementations, with a focus on their ability to achieve complex goals that require enhanced reasoning, planning, and tool execution capabilities. The primary objectives of this work are to a) communicate the current capabilities and limitations of existing AI agent implementations, b) share insights gained from our observations of these systems in action, and c) suggest important considerations for future developments in AI agent design. We achieve this by providing overviews of single-agent and multi-agent architectures, identifying key patterns and divergences in design choices, and evaluating their overall impact on accomplishing a provided goal. Our contribution outlines key themes when selecting an agentic architecture, the impact of leadership on agent systems, agent communication styles, and key phases for planning, execution, and reflection that enable robust AI agent systems.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1

citation-polarity summary

background 10 baseline 1

representative citing papers

FP-Agent: Fingerprinting AI Browsing Agents

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

Behavioral fingerprints distinguish AI browsing agents from humans and each other, enabling superior detection compared to current bot systems.

Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

cs.AI · 2026-04-17 · unverdicted · novelty 7.0

WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

cs.AI · 2026-03-08 · unverdicted · novelty 7.0

GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation

cs.SE · 2026-02-11 · unverdicted · novelty 7.0

Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

cs.SE · 2025-09-23 · conditional · novelty 7.0

Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.

ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

cs.CL · 2025-05-29 · unverdicted · novelty 7.0

A 7B Qwen-2.5 LLM trained with a new RL framework on only 9 ML tasks achieves performance comparable to much larger proprietary LLM agents at lower computational cost with cross-task generalization.

GRAFT: Graph-Tokenized LLMs for Tool Planning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

GRAFT internalizes tool dependency graphs via dedicated special tokens in LLMs and applies on-policy context distillation to achieve higher exact sequence matching and dependency legality than prior external-graph methods.

Towards Security-Auditable LLM Agents: A Unified Graph Representation

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

Agent-BOM is a unified hierarchical attributed directed graph that models static capability bases and dynamic semantic states of LLM agents for path-level security auditing and risk assessment.

AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

AutoSurrogate is a multi-agent LLM framework that autonomously constructs, tunes, and validates deep learning surrogates for subsurface flow from natural language, outperforming expert baselines on a 3D carbon storage task.

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

cs.AI · 2026-04-05 · unverdicted · novelty 6.0

The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.

Evaluating Privilege Usage of Agents with Real-World Tools

cs.CR · 2026-03-30 · unverdicted · novelty 6.0

GrantBox evaluates LLM agents using real-world tools and finds they remain vulnerable to sophisticated prompt injection attacks with an 84.80% average success rate.

DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow

cs.HC · 2025-09-16 · unverdicted · novelty 6.0

DoubleAgents shows that a distributed-cognition design with coordination agent, dashboard, and policy module increases user comfort and reliance on AI agents for coordination tasks over time.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

cs.AI · 2025-08-11 · unverdicted · novelty 6.0

BlindGuard introduces an unsupervised hierarchical agent encoder plus corruption-guided contrastive detector that identifies malicious agents in LLM-based multi-agent systems without any attack labels or prior knowledge of malicious behaviors.

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

cs.CL · 2025-06-13 · conditional · novelty 6.0

DeepResearch Bench supplies 100 expert-crafted PhD-level tasks and two human-aligned evaluation frameworks to measure deep research agents on report quality and citation accuracy.

EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

cs.RO · 2026-04-15 · unverdicted · novelty 5.0

EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.

AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

cs.LG · 2026-04-07 · unverdicted · novelty 5.0

AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

cs.AI · 2024-11-07 · unverdicted · novelty 5.0

Magentic-One is a modular multi-agent system that matches state-of-the-art performance on GAIA, AssistantBench, and WebArena using an orchestrator-led team of specialized agents.

HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization

cs.HC · 2026-06-25 · unverdicted · novelty 4.0

HiLSVA introduces a plan-first multi-agent LLM system for scientific visualization that incorporates explicit human oversight, stepwise provenance, and learn-at-test-time adaptation, evaluated via case studies and a 12-participant user study.

What makes a harness a harness: necessary and sufficient conditions for an agent harness

cs.SE · 2026-06-08 · unverdicted · novelty 4.0

Proposes and tests a constitutive definition of 'agent harness' via conceptual analysis of literature and six real systems.

Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability

cs.SE · 2026-05-17 · conditional · novelty 4.0

Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.

Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems

cs.MA · 2026-05-08 · unverdicted · novelty 4.0 · 3 refs

Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Against the Monolithic Wireless World Model: Why NextG Needs Composable and Agentic Intelligence eess.SP · 2026-05-15 · unreviewed · ref 49 · internal anchor

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer