hub Canonical reference

Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune · 2024 · cs.AI · DOI 10.48550/arxiv.2408.08435 · arXiv 2408.08435

Canonical reference. 93% of citing Pith papers cite this work as background.

55 Pith papers citing it

Background 93% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We describe a newly forming research area, Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways. We further demonstrate that there is an unexplored yet promising approach within ADAS where agents can be defined in code and new agents can be automatically discovered by a meta agent programming ever better ones in code. Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, workflows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents. Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Provided we develop it safely, our work illustrates the potential of an exciting new research direction toward automatically designing ever-more powerful agentic systems to benefit humanity.

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 13 baseline 1

citation-polarity summary

background 13 baseline 1

representative citing papers

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

cs.AI · 2026-05-18 · unverdicted · novelty 8.0

Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

cs.CL · 2026-05-13 · unverdicted · novelty 8.0

FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

cs.CL · 2026-05-08 · conditional · novelty 8.0 · 2 refs

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.

AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution

cs.AI · 2026-06-28 · reject · novelty 7.0

Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.

Glite ARF: Verifier-Driven Research with Parallel LLM Coding Agents

cs.MA · 2026-06-25 · accept · novelty 7.0

Glite ARF introduces a verifier-driven three-role framework for parallel LLM coding agents, demonstrated by first- and second-place finishes in the BEA 2026 vocabulary-difficulty shared task across three languages with 29.9-35.9% RMSE reduction at ~$450 API cost.

Self-Harness: Harnesses That Improve Themselves

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

cs.AI · 2026-06-06 · unverdicted · novelty 7.0

PACE is a training-free anytime-valid commit gate using testing-by-betting e-processes that controls per-candidate false-commit probability for self-evolving agents and reduces spurious edits compared to greedy acceptance.

MasFACT: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer

cs.LG · 2026-05-17 · conditional · novelty 7.0

Geometry-aware FGW prior transfer plus PAC-Bayes residual adaptation reduces topology forgetting and raises average accuracy across continual multi-agent topology learning streams.

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.

Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution

cs.SE · 2026-02-27 · unverdicted · novelty 7.0

IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.

When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling

cs.SE · 2026-01-21 · unverdicted · novelty 7.0

A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.

Who Broke the System? Failure Localization in LLM-Based Multi-Agent Systems

cs.CR · 2026-07-08 · conditional · novelty 6.0

AgentLocate localizes multi-agent LLM failures to a responsible agent and earliest decisive step via judge hypotheses, confidence-weighted multi-evaluator verification, and LoRA refinement.

AutoMem: Automated Learning of Memory as a Cognitive Skill

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

AutoMem automates memory structure revision and proficiency training in LLMs, delivering 2x-4x performance gains on long-horizon games without altering task-action behavior.

MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

cs.AI · 2026-06-21 · unverdicted · novelty 6.0

MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.

Recursive Self-Evolving Agents via Held-Out Selection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.

The Illusion of Multi-Agent Advantage

cs.AI · 2026-06-11 · unverdicted · novelty 6.0

Automatically generated multi-agent systems underperform CoT-SC on benchmarks and a new diagnostic dataset, exposing architectural bloat that fails to deliver functional utility.

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

cs.SE · 2026-05-31 · unverdicted · novelty 6.0

BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.

FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search

cs.AI · 2026-05-30 · unverdicted · novelty 6.0

FALAT improves failure attribution in LLM agent trajectories via dependency-guided search, achieving 46.0% step-level accuracy on algorithm-generated and 29.1% on hand-crafted trajectories in the Who&When benchmark.

Learning to Construct Practical Agentic Systems

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

A modular agent framework with pseudo-tools and learned fixed workflows that are cheaper and more accurate than dynamic planning, plus multi-objective optimization for cost and quality.

Towards Direct Evaluation of Harness Optimizers via Priority Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.

citing papers explorer

Showing 50 of 55 citing papers.

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints cs.AI · 2026-05-18 · unverdicted · none · ref 20 · internal anchor
Formalizes interface-constrained semi-Markov decision processes and proves a finite-sample bound for neural IC-Q that decomposes into neural approximation error, interface gap, and mixing-time residual, with experiments showing parity to centralized oracles.
FlowCompile: An Optimizing Compiler for Structured LLM Workflows cs.CL · 2026-05-13 · unverdicted · none · ref 24 · internal anchor
FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 35 · 2 links · internal anchor
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling cs.CL · 2026-05-08 · conditional · none · ref 43 · 2 links · internal anchor
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
AI Trading's Alpha Singularity: Emergent Market Reasoning through Agent-to-Agent Self-Evolution cs.AI · 2026-06-28 · reject · none · ref 12 · internal anchor
Multi-agent LLM system Agora under Sealed Joint Search conditions produces +1.87 holdout Sharpe on CSI 1000 over a 91-day sealed period, exceeding the best baseline at +1.334 under favorable seed.
Glite ARF: Verifier-Driven Research with Parallel LLM Coding Agents cs.MA · 2026-06-25 · accept · none · ref 45 · internal anchor
Glite ARF introduces a verifier-driven three-role framework for parallel LLM coding agents, demonstrated by first- and second-place finishes in the BEA 2026 vocabulary-difficulty shared task across three languages with 29.9-35.9% RMSE reduction at ~$450 API cost.
Self-Harness: Harnesses That Improve Themselves cs.CL · 2026-06-08 · unverdicted · none · ref 3 · internal anchor
Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.
PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents cs.AI · 2026-06-06 · unverdicted · none · ref 6 · internal anchor
PACE is a training-free anytime-valid commit gate using testing-by-betting e-processes that controls per-candidate false-commit probability for self-evolving agents and reduces spurious edits compared to greedy acceptance.
MasFACT: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer cs.LG · 2026-05-17 · conditional · none · ref 17 · internal anchor
Geometry-aware FGW prior transfer plus PAC-Bayes residual adaptation reduces topology forgetting and raises average accuracy across continual multi-agent topology learning streams.
Harnessing Agentic Evolution cs.AI · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems cs.CL · 2026-05-10 · unverdicted · none · ref 9 · internal anchor
TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery cs.CR · 2026-04-22 · unverdicted · none · ref 21 · internal anchor
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.
Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution cs.SE · 2026-02-27 · unverdicted · none · ref 11 · internal anchor
IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.
When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling cs.SE · 2026-01-21 · unverdicted · none · ref 28 · internal anchor
A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.
Who Broke the System? Failure Localization in LLM-Based Multi-Agent Systems cs.CR · 2026-07-08 · conditional · none · ref 70 · internal anchor
AgentLocate localizes multi-agent LLM failures to a responsible agent and earliest decisive step via judge hypotheses, confidence-weighted multi-evaluator verification, and LoRA refinement.
AutoMem: Automated Learning of Memory as a Cognitive Skill cs.AI · 2026-07-01 · unverdicted · none · ref 3 · internal anchor
AutoMem automates memory structure revision and proficiency training in LLMs, delivering 2x-4x performance gains on long-horizon games without altering task-action behavior.
MetaPS: Adaptive Programmatic Strategy Selection for Market Agents cs.AI · 2026-06-21 · unverdicted · none · ref 69 · internal anchor
MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.
Recursive Self-Evolving Agents via Held-Out Selection cs.AI · 2026-06-17 · unverdicted · none · ref 7 · internal anchor
RSEA adds a strict held-out keep-better gate to recursive self-evolution of agent artifacts, yielding monotone-safe gains or parity with the base ReAct agent on ALFWorld, GAIA, τ-bench, and WebShop.
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation cs.CL · 2026-06-16 · unverdicted · none · ref 130 · internal anchor
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
The Illusion of Multi-Agent Advantage cs.AI · 2026-06-11 · unverdicted · none · ref 13 · internal anchor
Automatically generated multi-agent systems underperform CoT-SC on benchmarks and a new diagnostic dataset, exposing architectural bloat that fails to deliver functional utility.
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution cs.SE · 2026-05-31 · unverdicted · none · ref 29 · internal anchor
BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.
FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search cs.AI · 2026-05-30 · unverdicted · none · ref 42 · internal anchor
FALAT improves failure attribution in LLM agent trajectories via dependency-guided search, achieving 46.0% step-level accuracy on algorithm-generated and 29.1% on hand-crafted trajectories in the Who&When benchmark.
Learning to Construct Practical Agentic Systems cs.LG · 2026-05-29 · unverdicted · none · ref 15 · internal anchor
A modular agent framework with pseudo-tools and learned fixed workflows that are cheaper and more accurate than dynamic planning, plus multi-objective optimization for cost and quality.
Towards Direct Evaluation of Harness Optimizers via Priority Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 13 · internal anchor
Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows cs.AI · 2026-05-19 · unverdicted · none · ref 10 · internal anchor
AgentCo-op retrieves and assembles existing agents and tools into interoperable workflows for open-world scientific tasks, showing effectiveness in genomics case studies and competitive benchmark results with lower costs.
optimize_anything: A Universal API for Optimizing any Text Parameter cs.CL · 2026-05-19 · unverdicted · none · ref 12 · internal anchor
A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.
Harnesses for Inference-Time Alignment over Execution Trajectories cs.LG · 2026-05-15 · unverdicted · none · ref 37 · internal anchor
Partial harnesses for LLM agents, specifying only initial execution steps, achieve higher pass rates than fully decomposed workflows, as analyzed through trajectory alignment and validated in synthetic and terminal benchmarks.
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning cs.AI · 2026-05-14 · unverdicted · none · ref 17 · internal anchor
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
SkillEvolver: Skill Learning as a Meta-Skill cs.AI · 2026-05-11 · unverdicted · none · ref 3 · internal anchor
A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation cs.AI · 2026-05-10 · unverdicted · none · ref 2 · internal anchor
Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization cs.AI · 2026-05-09 · unverdicted · none · ref 17 · 2 links · internal anchor
AgentPSO applies a particle-swarm-inspired update rule to evolve natural-language reasoning skills across multiple LLM agents, yielding gains over static and test-time multi-agent baselines with cross-benchmark transfer.
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems cs.AI · 2026-04-23 · unverdicted · none · ref 33 · internal anchor
DiffMAS jointly optimizes latent communication and reasoning in multi-agent LLM systems via parameter-efficient supervised training on trajectories, yielding consistent gains over baselines on math, science, and code benchmarks.
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology cs.AI · 2026-04-19 · unverdicted · none · ref 14 · internal anchor
SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
AgentGA: Evolving Code Solutions in Agent-Seed Space cs.AI · 2026-04-16 · unverdicted · none · ref 12 · 2 links · internal anchor
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems cs.AI · 2026-04-16 · conditional · none · ref 4 · internal anchor
End-to-end prompt optimization in compound AI systems is no better than chance unless the task has exploitable output structure the model can produce but does not default to.
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems cs.SE · 2026-04-14 · conditional · none · ref 20 · internal anchor
Claude Code answers recurring agent design questions with a thin model loop wrapped in dense safety, context, extensibility, and persistence harnesses, and those same questions get different answers in OpenClaw and Hermes.
Self-Optimizing Multi-Agent Systems for Deep Research cs.IR · 2026-04-03 · unverdicted · none · ref 6 · internal anchor
Multi-agent deep research systems self-optimize prompts through self-play to match or outperform expert-crafted versions.
Autonomous Business System via Neuro-symbolic AI cs.AI · 2026-01-22 · conditional · none · ref 30 · internal anchor
AUTOBUS is a neuro-symbolic architecture that uses AI agents to generate executable logic programs from business instructions and knowledge graphs for end-to-end process automation with human supervision.
ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution cs.CL · 2025-09-17 · unverdicted · none · ref 219 · internal anchor
ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.
An AI system to help scientists write expert-level empirical software cs.AI · 2025-09-08 · unverdicted · none · ref 18 · 2 links · internal anchor
ERA combines LLMs and tree search to produce expert-level empirical software that outperforms top human methods on single-cell analysis leaderboards and CDC COVID-19 forecasts.
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis cs.AI · 2025-07-28 · unverdicted · none · ref 46 · internal anchor
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems cs.LG · 2025-06-11 · unverdicted · none · ref 22 · internal anchor
Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.
SKILL-DISCO: Distilling and Compiling Agent Traces into Reusable Procedural Skills cs.AI · 2026-06-25 · unverdicted · none · ref 47 · internal anchor
SkillDisCo distills reusable PFSM subgraphs from successful agent traces and compiles them into callable procedural skills, improving success rates and reducing turns on ALFWorld and WebArena.
HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization cs.HC · 2026-06-25 · conditional · none · ref 28 · internal anchor
HiLSVA shows that a human-in-the-loop LLM agent system can help novices and experts complete scientific visualization tasks, while human oversight adds measurable execution time.
TailorMind: Towards Preference-Aligned Multimodal Content Generation cs.AI · 2026-06-22 · unverdicted · none · ref 1 · internal anchor
TailorMind links hypergraph collaborative filtering and textual gradient descent with multimodal generation to produce user-tailored content, showing gains in novelty, aesthetics, and reranking recall on a new benchmark from three platforms.
Sakana Fugu Technical Report cs.LG · 2026-06-19 · unverdicted · none · ref 44 · internal anchor
Sakana Fugu trains LLM orchestrators using fine-tuning, evolutionary algorithms, and RL to build query-adaptive multi-agent scaffolds, claiming SOTA results on benchmarks including SWE-Bench Pro and GPQA-Diamond.
Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows cs.AI · 2026-06-04 · conditional · none · ref 74 · internal anchor
Under controlled identical protocols, only one of six multi-agent LLM systems marginally exceeds a single-agent baseline on benchmark-balanced accuracy while the rest trail and cost more; a runtime workflow reaches 66.72% on GAIA.
Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents cs.AI · 2026-05-20 · unverdicted · none · ref 8 · 3 links · internal anchor
Insights Generator is a multi-agent system that produces evidence-backed insights from corpora of LLM agent traces and yields 30.4pp performance gains when humans apply the reports.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction cs.AI · 2026-04-29 · unverdicted · none · ref 8 · internal anchor
Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution cs.AI · 2026-06-09 · conditional · none · ref 2 · internal anchor
A single LLM trained as both agent and environment—predicting future states and re-weighting training data by failure mode—improves multi-step agent performance by about 4% over GiGPO baselines.

Automated Design of Agentic Systems

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer