arxiv: 2408.08435 · v2 · submitted 2024-08-15 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Automated Design of Agentic Systems

Shengran Hu , Cong Lu , Jeff Clune

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:03 UTC · model grok-4.3

classification 💻 cs.AI

keywords automated agent designmeta agent searchagentic systemscode-based agentsfoundation model agentsself-improving agentsADAS

0 comments

The pith

A meta-agent can program new agents in code that outperform hand-designed ones and transfer across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that agentic systems built from foundation models are still mostly hand-designed, but machine learning history shows learned solutions eventually win. It proposes defining agents as code and letting a meta-agent iteratively generate and test new designs from an expanding archive of prior agents. Experiments in coding, science, and math tasks show the discovered agents beat current hand-crafted baselines and keep their edge when moved to different domains or models. Because code is Turing-complete, the method can in principle reach any agent structure including new prompts, tool uses, and workflows. The core bet is that this automated search will scale to produce more capable agents than humans can design by hand.

Core claim

Defining agents in executable code and using a meta-agent to program successive improvements from an archive of past agents yields novel agentic systems that outperform state-of-the-art hand-designed agents on coding, science, and math tasks, with the invented agents retaining superior performance when transferred to new domains and different foundation models.

What carries the argument

Meta Agent Search, an iterative loop in which a meta-agent reads an archive of previously discovered agents, proposes new code for an improved agent, evaluates it on tasks, and adds successful designs back to the archive.

If this is right

Novel combinations of prompts, tool calls, and multi-step workflows can be found without human designers specifying their structure in advance.
Performance advantages discovered on one task family persist when the same agent code is applied to unrelated tasks such as moving from coding to math.
Because programming languages are Turing complete, the search space includes every possible agentic system that can be expressed as a program.
The approach replaces manual iteration on agent scaffolding with an automated loop that grows an archive of reusable, high-performing designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the method scales with more compute, the bottleneck in agent development could shift from human insight to the quality of the meta-agent's code-generation ability.
Representing agents as code rather than fixed templates allows the search to explore structures that are hard for humans to invent or even describe.
The same archive-based search might be applied to other programmable systems such as automated algorithm design or neural architecture search.
Safety considerations become central because the method can generate arbitrarily complex agent behaviors without explicit human oversight of each step.

Load-bearing premise

The meta-agent will keep generating functional, non-trivial new agent code instead of producing mostly ineffective or hallucinated programs that stall the search.

What would settle it

Run the method on the same tasks but with a fresh set of held-out models and domains; if the automatically discovered agents no longer outperform the hand-designed baselines, the central claim fails.

read the original abstract

Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We describe a newly forming research area, Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways. We further demonstrate that there is an unexplored yet promising approach within ADAS where agents can be defined in code and new agents can be automatically discovered by a meta agent programming ever better ones in code. Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, workflows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents. Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Provided we develop it safely, our work illustrates the potential of an exciting new research direction toward automatically designing ever-more powerful agentic systems to benefit humanity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a meta-agent can discover better agent code via iterative search with some cross-domain transfer, but the same-task evaluation loop risks inflating results through overfitting.

read the letter

The main point is that this work introduces Meta Agent Search, where a meta-agent maintains an archive of agent code and prompts an LLM to write new versions, leading to agents that outperform hand-designed ones on coding, science, and math tasks while transferring across domains and models. The framing treats agents as arbitrary Turing-complete programs rather than fixed prompts or workflows, which is the clearest step beyond prior optimization approaches. Experiments show progressive improvement in the archive and the transfer result stands out as the empirical highlight. The paper does a solid job motivating the broader ADAS area and giving a simple, implementable algorithm that produces those outcomes. The code-based search is a genuine extension because it can in principle express any combination of tools, reflections, or structures. On the soft spots, the iterative evaluation on the exact same tasks used for final claims creates a plausible overfitting path. Without held-out validation sets or explicit checks against task-specific exploitation, the search could favor agents that latch onto benchmark quirks rather than robust designs, which might explain some of the reported transfer if the base model carries shared priors. The abstract does not detail statistical testing or controls for this, so the strength of the transfer claim depends on those specifics in the full manuscript. This paper is aimed at researchers working on LLM agents and automated system design. Readers interested in scaling discovery beyond manual engineering would get concrete value from the method and the initial results. It deserves peer review because the direction is new, the algorithm is straightforward to reproduce, and the empirical patterns are worth referee scrutiny even if the evaluation needs tightening.

Referee Report

3 major / 2 minor

Summary. The paper introduces Automated Design of Agentic Systems (ADAS) as a new research area and presents Meta Agent Search, an algorithm in which a meta-agent iteratively programs new agents in code based on an expanding archive of prior discoveries. Experiments across coding, science, and math domains claim that the discovered agents outperform state-of-the-art hand-designed agents, with the key result that these agents exhibit strong performance transfer across domains and models.

Significance. If the results hold under stricter evaluation, the work is significant because it provides a concrete, code-based instantiation of automated agent design that leverages the Turing completeness of programming languages to explore novel agent structures. The reported cross-domain transfer, if genuine, would be a notable strength indicating that the search discovers robust rather than brittle designs.

major comments (3)

[§4 Experiments] §4 (Experiments) and evaluation protocol: Agents are evaluated directly on the same coding/science/math task distributions used to populate the archive during search, with no mention of held-out validation sets, separate test splits, or regularization against task-specific exploitation. This setup makes the outperformance and cross-domain transfer claims vulnerable to overfitting to benchmark idiosyncrasies; a concrete fix would be to re-run searches with held-out data and report whether gains persist.
[§3 Meta Agent Search] §3 (Meta Agent Search algorithm): The description of how the meta-agent selects and generates new agents from the archive lacks detail on safeguards against non-functional, hallucinated, or trivial code, and on archive management (e.g., pruning or diversity mechanisms). This is load-bearing for the claim of reliable progressive improvement, as ineffective programs could dominate without explicit controls.
[Results tables] Results tables (e.g., Tables 1–3): No reporting of number of independent runs, standard deviations, or statistical significance tests for the performance deltas versus baselines. Without these, it is impossible to determine whether the reported gains are reliable or could be explained by variance in the underlying foundation-model evaluations.

minor comments (2)

[Figure 1] Figure 1 and algorithm pseudocode: The diagram of the iterative search loop would benefit from explicit arrows or labels indicating how performance feedback updates the archive and influences the next meta-agent prompt.
[Abstract] Abstract: The phrase 'greatly outperform' is used without referencing specific quantitative improvements or tables, reducing precision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the rigor of our presentation. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses where appropriate.

read point-by-point responses

Referee: [§4 Experiments] §4 (Experiments) and evaluation protocol: Agents are evaluated directly on the same coding/science/math task distributions used to populate the archive during search, with no mention of held-out validation sets, separate test splits, or regularization against task-specific exploitation. This setup makes the outperformance and cross-domain transfer claims vulnerable to overfitting to benchmark idiosyncrasies; a concrete fix would be to re-run searches with held-out data and report whether gains persist.

Authors: We acknowledge the concern regarding potential overfitting to the task distributions used during search. However, the observed strong transfer performance across entirely distinct domains (coding to science/math) and models already provides substantial evidence against task-specific exploitation, as the agents were never exposed to the target domain or model during search. To further address this, we will add experiments in the revised manuscript that re-run the Meta Agent Search process using held-out task splits for final evaluation and report whether the performance gains and transfer properties persist under this stricter protocol. revision: yes
Referee: [§3 Meta Agent Search] §3 (Meta Agent Search algorithm): The description of how the meta-agent selects and generates new agents from the archive lacks detail on safeguards against non-functional, hallucinated, or trivial code, and on archive management (e.g., pruning or diversity mechanisms). This is load-bearing for the claim of reliable progressive improvement, as ineffective programs could dominate without explicit controls.

Authors: We will expand Section 3 with additional details on the safeguards and archive management. This includes the specific prompting techniques used to minimize hallucinated or non-functional code, the execution-based validation steps that filter out invalid agents before archive insertion, and the mechanisms for maintaining diversity (e.g., embedding-based selection) along with periodic pruning of low-performing or redundant entries. These controls are already present in the implementation and ensure reliable progressive improvement; we will make them explicit in the text. revision: yes
Referee: [Results tables] Results tables (e.g., Tables 1–3): No reporting of number of independent runs, standard deviations, or statistical significance tests for the performance deltas versus baselines. Without these, it is impossible to determine whether the reported gains are reliable or could be explained by variance in the underlying foundation-model evaluations.

Authors: We agree that statistical reporting is essential for assessing result reliability. In the revised manuscript, we will update Tables 1–3 (and any related figures) to include the number of independent runs (we performed 5 runs per experiment), report standard deviations alongside mean performance, and add statistical significance tests (paired t-tests with p-values) comparing our discovered agents against the baselines. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark validation on external tasks shows no circular reduction

full rationale

The paper's core contribution is an iterative archive-based search algorithm (Meta Agent Search) that generates and evaluates agent code on independent coding/science/math benchmarks. Performance claims are measured against hand-designed baselines using standard task metrics, not derived from or defined in terms of the search process itself. No load-bearing self-citations, fitted parameters renamed as predictions, or self-definitional equations appear in the derivation; results are falsifiable external evaluations. This matches the default case of honest non-circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions about foundation models serving as composable modules and on the expressiveness of code for agent behaviors, without introducing new free parameters or invented physical entities.

axioms (1)

domain assumption Foundation models can serve as reliable modules within larger agentic systems when combined via code
Invoked in the setup of using LLMs for prompts, reflection, and tool use.

pith-pipeline@v0.9.0 · 5571 in / 1278 out tokens · 54696 ms · 2026-05-15T08:03:09.049437+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlowCompile: An Optimizing Compiler for Structured LLM Workflows
cs.CL 2026-05 unverdicted novelty 8.0

FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
cs.AI 2026-05 unverdicted novelty 8.0

SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
cs.AI 2026-05 accept novelty 8.0

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 conditional novelty 8.0

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning task...
Harnessing Agentic Evolution
cs.AI 2026-05 unverdicted novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
cs.CL 2026-05 unverdicted novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over str...
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization
cs.AI 2026-05 unverdicted novelty 7.0

AgentPSO evolves reusable multi-agent reasoning skills via PSO-inspired natural-language updates, outperforming static agents and test-time multi-agent baselines on math and general reasoning tasks with cross-benchmar...
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 unverdicted novelty 7.0

AutoTTS discovers superior test-time scaling strategies for LLMs via cheap controller synthesis in a pre-collected trajectory environment, outperforming manual baselines on math benchmarks with low discovery cost.
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery
cs.CR 2026-04 unverdicted novelty 7.0

AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new z...
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
SkillEvolver: Skill Learning as a Meta-Skill
cs.AI 2026-05 unverdicted novelty 6.0

A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
cs.AI 2026-05 unverdicted novelty 6.0

Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution und...
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
cs.AI 2026-04 unverdicted novelty 6.0

DiffMAS jointly optimizes latent communication and reasoning in multi-agent LLM systems via parameter-efficient supervised training on trajectories, yielding consistent gains over baselines on math, science, and code ...
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology
cs.AI 2026-04 unverdicted novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
AgentGA: Evolving Code Solutions in Agent-Seed Space
cs.AI 2026-04 unverdicted novelty 6.0

AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
AgentGA: Evolving Code Solutions in Agent-Seed Space
cs.AI 2026-04 unverdicted novelty 6.0

AgentGA uses a genetic algorithm to evolve agent seeds and achieves 74.52% human-exceeding performance on tabular AutoML tasks versus 54.15% for the AIDE baseline.
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
cs.AI 2026-04 unverdicted novelty 6.0

Prompt optimization in compound AI systems is statistically indistinguishable from random chance except when tasks have exploitable output structure; a two-stage diagnostic predicts success.
Self-Optimizing Multi-Agent Systems for Deep Research
cs.IR 2026-04 unverdicted novelty 6.0

Multi-agent deep research systems self-optimize prompts through self-play to match or outperform expert-crafted versions.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
cs.AI 2026-04 unverdicted novelty 5.0

Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
cs.SE 2026-04 unverdicted novelty 5.0

Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in ...
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
cs.AI 2026-04 unverdicted novelty 4.0

A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
cs.AI 2025-07 accept novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Reference graph

Works this paper leans on

208 extracted references · 208 canonical work pages · cited by 19 Pith papers · 19 internal anchors

[2]

Journal of Machine Learning Research , volume=

Varibad: Variational bayes-adaptive deep rl via meta-learning , author=. Journal of Machine Learning Research , volume=

work page
[3]

International Conference on Machine Learning , pages=

Exploration in approximate hyper-state space for meta reinforcement learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[5]

The Twelfth International Conference on Learning Representations , year=

Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors , author=. The Twelfth International Conference on Learning Representations , year=

work page
[6]

In Proceedings of the 26th ACM Conference on Eco- nomics and Computation, pages 786–786

Autoagents: A framework for automatic agent generation , author=. arXiv preprint arXiv:2309.17288 , year=

work page arXiv
[7]

Forty-first International Conference on Machine Learning , year=

GPTSwarm: Language Agents as Optimizable Graphs , author=. Forty-first International Conference on Machine Learning , year=

work page
[9]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

work page
[10]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

work page
[12]

Advances in Neural Information Processing Systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in Neural Information Processing Systems , volume=

work page
[16]

The Twelfth International Conference on Learning Representations , year=

DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines , author=. The Twelfth International Conference on Learning Representations , year=

work page
[17]

Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , author=

work page
[18]

The Twelfth International Conference on Learning Representations , year=

Large Language Models as Optimizers , author=. The Twelfth International Conference on Learning Representations , year=

work page
[22]

The Shift from Models to Compound AI Systems , author=

work page
[23]

arXiv preprint arXiv:2309.15402 , year=

A survey of chain of thought reasoning: Advances, frontiers and future , author=. arXiv preprint arXiv:2309.15402 , year=

work page arXiv
[25]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Large language model based multi-agents: A survey of progress and challenges , author=. arXiv preprint arXiv:2402.01680 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Advances in Neural Information Processing Systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

LLM-powered Autonomous Agents

Weng, Lilian. LLM-powered Autonomous Agents. lilianweng.github.io. 2023

work page 2023
[28]

Frontiers of Computer Science , volume=

A survey on large language model based autonomous agents , author=. Frontiers of Computer Science , volume=. 2024 , publisher=

work page 2024
[29]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

A prompt pattern catalog to enhance prompt engineering with chatgpt , author=. arXiv preprint arXiv:2302.11382 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

and Triggs, B

Dalal, N. and Triggs, B. , booktitle=. Histograms of oriented gradients for human detection , year=

work page
[32]

, booktitle=

Lowe, D.G. , booktitle=. Object recognition from local scale-invariant features , year=

work page
[33]

Advances in neural information processing systems , volume=

Imagenet classification with deep convolutional neural networks , author=. Advances in neural information processing systems , volume=

work page
[34]

2019 , publisher=

Automated machine learning: methods, systems, challenges , author=. 2019 , publisher=

work page 2019
[36]

Journal of Machine Learning Research , volume=

Neural architecture search: A survey , author=. Journal of Machine Learning Research , volume=

work page
[37]

Englewood Cliffs, NJ , year=

The structuring of organizations: a synthesis of the research , author=. Englewood Cliffs, NJ , year=

work page
[38]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Revisiting residual networks for adversarial robustness , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[39]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

work page
[40]

Open Questions in Creating Safe Open-ended

Ecoffet, Adrien and Clune, Jeff and Lehman, Joel , booktitle=. Open Questions in Creating Safe Open-ended. 2020 , organization=

work page 2020
[41]

Journal of Evolution and Technology , pages =

Bostrom, N , editor =. Journal of Evolution and Technology , pages =. 2002 , series =

work page 2002
[42]

2014 , publisher=

Sotala, Kaj and Yampolskiy, Roman V , journal=. 2014 , publisher=

work page 2014
[43]

Yudkowsky, Eliezer and others , journal=

work page
[44]

Rise of concerns about

Dietterich, Thomas G and Horvitz, Eric J , journal=. Rise of concerns about. 2015 , publisher=

work page 2015
[45]

1983 , publisher=

A mechanical proof of the Turing completeness of pure LISP , author=. 1983 , publisher=

work page 1983
[46]

2024 , howpublished =

Abrahim Ladha , title =. 2024 , howpublished =

work page 2024
[47]

2024 , url=

Jenny Zhang and Joel Lehman and Kenneth Stanley and Jeff Clune , booktitle=. 2024 , url=

work page 2024
[48]

2018 , publisher=

Reinforcement learning: An introduction , author=. 2018 , publisher=

work page 2018
[51]

The Eleventh International Conference on Learning Representations , year=

Language models are multilingual chain-of-thought reasoners , author=. The Eleventh International Conference on Learning Representations , year=

work page
[53]

International Conference on Machine Learning , pages=

Pal: Program-aided language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[54]

GitHub repository , howpublished =

LangChainAI , title =. GitHub repository , howpublished =. 2022 , publisher =

work page 2022
[55]

Advances in Neural Information Processing Systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in Neural Information Processing Systems , volume=

work page
[56]

Nature , volume=

Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=

work page 2024
[57]

2024 , booktitle=

Offline Training of Language Model Agents with Functions as Learnable Weights , author=. 2024 , booktitle=

work page 2024
[59]

2024 , month =

Greenblatt, Ryan , title =. 2024 , month =

work page 2024
[60]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

work page
[61]

2023 , eprint=

GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author=. 2023 , eprint=

work page 2023
[62]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

work page 2024
[63]

2024 , month =

Anthropic , title =. 2024 , month =

work page 2024
[64]

2022 , month =

OpenAI , title =. 2022 , month =

work page 2022
[66]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

work page
[67]

2024 , month =

Ng, Andrew , title =. 2024 , month =

work page 2024
[68]

2024 , month =

Harrison Chase , title =. 2024 , month =

work page 2024
[69]

The Eleventh International Conference on Learning Representations , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

work page
[70]

Hu, Shengran and Clune, Jeff , journal=

work page
[71]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Toolformer: Language Models Can Teach Themselves to Use Tools , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[72]

Zoph and V

B. Zoph and V. Vasudevan and J. Shlens and Q. V. Le , booktitle =. Learning Transferable Architectures for Scalable Image Recognition , year =. doi:10.1109/CVPR.2018.00907 , publisher =

work page doi:10.1109/cvpr.2018.00907 2018
[73]

Complex & Intelligent Systems , pages=

Accelerating multi-objective neural architecture search by random-weight evaluation , author=. Complex & Intelligent Systems , pages=. 2021 , publisher=

work page 2021
[74]

Proceedings of the genetic and evolutionary computation conference , pages=

Nsga-net: neural architecture search using multi-objective genetic algorithm , author=. Proceedings of the genetic and evolutionary computation conference , pages=

work page
[75]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[77]

Bartlett and Ilya Sutskever and Pieter Abbeel , year=

Yan Duan and John Schulman and Xi Chen and Peter L. Bartlett and Ilya Sutskever and Pieter Abbeel , year=

work page
[80]

International Conference on Evolutionary Multi-Criterion Optimization , pages=

Multi-objective neural architecture search with almost no training , author=. International Conference on Evolutionary Multi-Criterion Optimization , pages=. 2021 , organization=

work page 2021
[81]

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , volume=

Co-generation of game levels and game-playing agents , author=. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , volume=

work page
[82]

2024 , eprint=

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference , author=. 2024 , eprint=

work page 2024
[83]

Forty-first International Conference on Machine Learning , year=

Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model , author=. Forty-first International Conference on Machine Learning , year=

work page
[84]

ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference , year=

Arbitrary Order Meta-Learning with Simple Population-Based Evolution , author=. ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference , year=

work page 2023
[86]

Advances in Neural Information Processing Systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=

work page
[87]

2023 , note =

Richards, Toran Bruce , title =. 2023 , note =

work page 2023
[88]

2023 , journal =

Voyager: An Open-Ended Embodied Agent with Large Language Models , author =. 2023 , journal =

work page 2023
[89]

2023 , month =

Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish , title =. 2023 , month =

work page 2023
[92]

Rokon, Md Omar Faruk and Islam, Risul and Darki, Ahmad and Papalexakis, Evangelos E and Faloutsos, Michalis , booktitle=

work page
[94]

Network Security , volume=

Ethical hackers: putting on the white hat , author=. Network Security , volume=. 2011 , publisher=

work page 2011
[95]

Science , volume=

Managing extreme AI risks amid rapid progress , author=. Science , volume=. 2024 , publisher=

work page 2024
[96]

2024 , month =

Meta , title =. 2024 , month =

work page 2024
[98]

2024 , publisher =

Tim Rocktäschel , title =. 2024 , publisher =

work page 2024
[99]

The Twelfth International Conference on Learning Representations , year=

Eureka: Human-Level Reward Design via Coding Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

work page
[100]

Conference on Robot Learning , pages=

Language to Rewards for Robotic Skill Synthesis , author=. Conference on Robot Learning , pages=. 2023 , organization=

work page 2023
[101]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Deepmad: Mathematical architecture design for deep convolutional neural network , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[103]

International conference on machine learning , pages=

Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[104]

Evolutionary computation , volume=

Abandoning objectives: Evolution through the search for novelty alone , author=. Evolutionary computation , volume=. 2011 , publisher=

work page 2011
[106]

Communications of the ACM , volume=

Native client: A sandbox for portable, untrusted x86 native code , author=. Communications of the ACM , volume=. 2010 , publisher=

work page 2010
[108]

IEEE Transactions on Evolutionary Computation , volume=

Quality and diversity optimization: A unifying modular framework , author=. IEEE Transactions on Evolutionary Computation , volume=. 2017 , publisher=

work page 2017
[109]

2015 , publisher=

Why greatness cannot be planned: The myth of the objective , author=. 2015 , publisher=

work page 2015

Showing first 80 references.