pith. machine review for the scientific record. sign in

arxiv: 2408.08435 · v2 · submitted 2024-08-15 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Automated Design of Agentic Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:03 UTC · model grok-4.3

classification 💻 cs.AI
keywords automated agent designmeta agent searchagentic systemscode-based agentsfoundation model agentsself-improving agentsADAS
0
0 comments X

The pith

A meta-agent can program new agents in code that outperform hand-designed ones and transfer across domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that agentic systems built from foundation models are still mostly hand-designed, but machine learning history shows learned solutions eventually win. It proposes defining agents as code and letting a meta-agent iteratively generate and test new designs from an expanding archive of prior agents. Experiments in coding, science, and math tasks show the discovered agents beat current hand-crafted baselines and keep their edge when moved to different domains or models. Because code is Turing-complete, the method can in principle reach any agent structure including new prompts, tool uses, and workflows. The core bet is that this automated search will scale to produce more capable agents than humans can design by hand.

Core claim

Defining agents in executable code and using a meta-agent to program successive improvements from an archive of past agents yields novel agentic systems that outperform state-of-the-art hand-designed agents on coding, science, and math tasks, with the invented agents retaining superior performance when transferred to new domains and different foundation models.

What carries the argument

Meta Agent Search, an iterative loop in which a meta-agent reads an archive of previously discovered agents, proposes new code for an improved agent, evaluates it on tasks, and adds successful designs back to the archive.

If this is right

  • Novel combinations of prompts, tool calls, and multi-step workflows can be found without human designers specifying their structure in advance.
  • Performance advantages discovered on one task family persist when the same agent code is applied to unrelated tasks such as moving from coding to math.
  • Because programming languages are Turing complete, the search space includes every possible agentic system that can be expressed as a program.
  • The approach replaces manual iteration on agent scaffolding with an automated loop that grows an archive of reusable, high-performing designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the method scales with more compute, the bottleneck in agent development could shift from human insight to the quality of the meta-agent's code-generation ability.
  • Representing agents as code rather than fixed templates allows the search to explore structures that are hard for humans to invent or even describe.
  • The same archive-based search might be applied to other programmable systems such as automated algorithm design or neural architecture search.
  • Safety considerations become central because the method can generate arbitrarily complex agent behaviors without explicit human oversight of each step.

Load-bearing premise

The meta-agent will keep generating functional, non-trivial new agent code instead of producing mostly ineffective or hallucinated programs that stall the search.

What would settle it

Run the method on the same tasks but with a fresh set of held-out models and domains; if the automatically discovered agents no longer outperform the hand-designed baselines, the central claim fails.

read the original abstract

Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We describe a newly forming research area, Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways. We further demonstrate that there is an unexplored yet promising approach within ADAS where agents can be defined in code and new agents can be automatically discovered by a meta agent programming ever better ones in code. Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, workflows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents. Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Provided we develop it safely, our work illustrates the potential of an exciting new research direction toward automatically designing ever-more powerful agentic systems to benefit humanity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Automated Design of Agentic Systems (ADAS) as a new research area and presents Meta Agent Search, an algorithm in which a meta-agent iteratively programs new agents in code based on an expanding archive of prior discoveries. Experiments across coding, science, and math domains claim that the discovered agents outperform state-of-the-art hand-designed agents, with the key result that these agents exhibit strong performance transfer across domains and models.

Significance. If the results hold under stricter evaluation, the work is significant because it provides a concrete, code-based instantiation of automated agent design that leverages the Turing completeness of programming languages to explore novel agent structures. The reported cross-domain transfer, if genuine, would be a notable strength indicating that the search discovers robust rather than brittle designs.

major comments (3)
  1. [§4 Experiments] §4 (Experiments) and evaluation protocol: Agents are evaluated directly on the same coding/science/math task distributions used to populate the archive during search, with no mention of held-out validation sets, separate test splits, or regularization against task-specific exploitation. This setup makes the outperformance and cross-domain transfer claims vulnerable to overfitting to benchmark idiosyncrasies; a concrete fix would be to re-run searches with held-out data and report whether gains persist.
  2. [§3 Meta Agent Search] §3 (Meta Agent Search algorithm): The description of how the meta-agent selects and generates new agents from the archive lacks detail on safeguards against non-functional, hallucinated, or trivial code, and on archive management (e.g., pruning or diversity mechanisms). This is load-bearing for the claim of reliable progressive improvement, as ineffective programs could dominate without explicit controls.
  3. [Results tables] Results tables (e.g., Tables 1–3): No reporting of number of independent runs, standard deviations, or statistical significance tests for the performance deltas versus baselines. Without these, it is impossible to determine whether the reported gains are reliable or could be explained by variance in the underlying foundation-model evaluations.
minor comments (2)
  1. [Figure 1] Figure 1 and algorithm pseudocode: The diagram of the iterative search loop would benefit from explicit arrows or labels indicating how performance feedback updates the archive and influences the next meta-agent prompt.
  2. [Abstract] Abstract: The phrase 'greatly outperform' is used without referencing specific quantitative improvements or tables, reducing precision.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the rigor of our presentation. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses where appropriate.

read point-by-point responses
  1. Referee: [§4 Experiments] §4 (Experiments) and evaluation protocol: Agents are evaluated directly on the same coding/science/math task distributions used to populate the archive during search, with no mention of held-out validation sets, separate test splits, or regularization against task-specific exploitation. This setup makes the outperformance and cross-domain transfer claims vulnerable to overfitting to benchmark idiosyncrasies; a concrete fix would be to re-run searches with held-out data and report whether gains persist.

    Authors: We acknowledge the concern regarding potential overfitting to the task distributions used during search. However, the observed strong transfer performance across entirely distinct domains (coding to science/math) and models already provides substantial evidence against task-specific exploitation, as the agents were never exposed to the target domain or model during search. To further address this, we will add experiments in the revised manuscript that re-run the Meta Agent Search process using held-out task splits for final evaluation and report whether the performance gains and transfer properties persist under this stricter protocol. revision: yes

  2. Referee: [§3 Meta Agent Search] §3 (Meta Agent Search algorithm): The description of how the meta-agent selects and generates new agents from the archive lacks detail on safeguards against non-functional, hallucinated, or trivial code, and on archive management (e.g., pruning or diversity mechanisms). This is load-bearing for the claim of reliable progressive improvement, as ineffective programs could dominate without explicit controls.

    Authors: We will expand Section 3 with additional details on the safeguards and archive management. This includes the specific prompting techniques used to minimize hallucinated or non-functional code, the execution-based validation steps that filter out invalid agents before archive insertion, and the mechanisms for maintaining diversity (e.g., embedding-based selection) along with periodic pruning of low-performing or redundant entries. These controls are already present in the implementation and ensure reliable progressive improvement; we will make them explicit in the text. revision: yes

  3. Referee: [Results tables] Results tables (e.g., Tables 1–3): No reporting of number of independent runs, standard deviations, or statistical significance tests for the performance deltas versus baselines. Without these, it is impossible to determine whether the reported gains are reliable or could be explained by variance in the underlying foundation-model evaluations.

    Authors: We agree that statistical reporting is essential for assessing result reliability. In the revised manuscript, we will update Tables 1–3 (and any related figures) to include the number of independent runs (we performed 5 runs per experiment), report standard deviations alongside mean performance, and add statistical significance tests (paired t-tests with p-values) comparing our discovered agents against the baselines. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark validation on external tasks shows no circular reduction

full rationale

The paper's core contribution is an iterative archive-based search algorithm (Meta Agent Search) that generates and evaluates agent code on independent coding/science/math benchmarks. Performance claims are measured against hand-designed baselines using standard task metrics, not derived from or defined in terms of the search process itself. No load-bearing self-citations, fitted parameters renamed as predictions, or self-definitional equations appear in the derivation; results are falsifiable external evaluations. This matches the default case of honest non-circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions about foundation models serving as composable modules and on the expressiveness of code for agent behaviors, without introducing new free parameters or invented physical entities.

axioms (1)
  • domain assumption Foundation models can serve as reliable modules within larger agentic systems when combined via code
    Invoked in the setup of using LLMs for prompts, reflection, and tool use.

pith-pipeline@v0.9.0 · 5571 in / 1278 out tokens · 54696 ms · 2026-05-15T08:03:09.049437+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FlowCompile: An Optimizing Compiler for Structured LLM Workflows

    cs.CL 2026-05 unverdicted novelty 8.0

    FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.

  2. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 unverdicted novelty 8.0

    SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.

  3. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 accept novelty 8.0

    SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

  4. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    cs.CL 2026-05 conditional novelty 8.0

    AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning task...

  5. Harnessing Agentic Evolution

    cs.AI 2026-05 unverdicted novelty 7.0

    AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

  6. TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

    cs.CL 2026-05 unverdicted novelty 7.0

    TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over str...

  7. AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization

    cs.AI 2026-05 unverdicted novelty 7.0

    AgentPSO evolves reusable multi-agent reasoning skills via PSO-inspired natural-language updates, outperforming static agents and test-time multi-agent baselines on math and general reasoning tasks with cross-benchmar...

  8. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    cs.CL 2026-05 unverdicted novelty 7.0

    AutoTTS discovers superior test-time scaling strategies for LLMs via cheap controller synthesis in a pre-collected trajectory environment, outperforming manual baselines on math benchmarks with low discovery cost.

  9. Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

    cs.CR 2026-04 unverdicted novelty 7.0

    AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new z...

  10. LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

    cs.AI 2026-05 unverdicted novelty 6.0

    LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.

  11. SkillEvolver: Skill Learning as a Meta-Skill

    cs.AI 2026-05 unverdicted novelty 6.0

    A meta-skill authors and refines prose-and-code skills for agents by learning from post-deployment failures with an overfit audit, achieving 56.8% accuracy on SkillsBench tasks versus 43.6% for human-curated skills.

  12. Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

    cs.AI 2026-05 unverdicted novelty 6.0

    Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution und...

  13. Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

    cs.AI 2026-04 unverdicted novelty 6.0

    DiffMAS jointly optimizes latent communication and reasoning in multi-agent LLM systems via parameter-efficient supervised training on trajectories, yielding consistent gains over baselines on math, science, and code ...

  14. SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

    cs.AI 2026-04 unverdicted novelty 6.0

    SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.

  15. AgentGA: Evolving Code Solutions in Agent-Seed Space

    cs.AI 2026-04 unverdicted novelty 6.0

    AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.

  16. AgentGA: Evolving Code Solutions in Agent-Seed Space

    cs.AI 2026-04 unverdicted novelty 6.0

    AgentGA uses a genetic algorithm to evolve agent seeds and achieves 74.52% human-exceeding performance on tabular AutoML tasks versus 54.15% for the AIDE baseline.

  17. Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

    cs.AI 2026-04 unverdicted novelty 6.0

    Prompt optimization in compound AI systems is statistically indistinguishable from random chance except when tasks have exploitable output structure; a two-stage diagnostic predicts success.

  18. Self-Optimizing Multi-Agent Systems for Deep Research

    cs.IR 2026-04 unverdicted novelty 6.0

    Multi-agent deep research systems self-optimize prompts through self-play to match or outperform expert-crafted versions.

  19. Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

    cs.AI 2026-04 unverdicted novelty 5.0

    Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.

  20. Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

    cs.SE 2026-04 unverdicted novelty 5.0

    Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in ...

  21. Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures

    cs.AI 2026-04 unverdicted novelty 4.0

    A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.

  22. A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    cs.AI 2025-07 accept novelty 4.0

    The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Reference graph

Works this paper leans on

208 extracted references · 208 canonical work pages · cited by 19 Pith papers · 19 internal anchors

  1. [2]

    Journal of Machine Learning Research , volume=

    Varibad: Variational bayes-adaptive deep rl via meta-learning , author=. Journal of Machine Learning Research , volume=

  2. [3]

    International Conference on Machine Learning , pages=

    Exploration in approximate hyper-state space for meta reinforcement learning , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  3. [5]

    The Twelfth International Conference on Learning Representations , year=

    Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors , author=. The Twelfth International Conference on Learning Representations , year=

  4. [6]

    In Proceedings of the 26th ACM Conference on Eco- nomics and Computation, pages 786–786

    Autoagents: A framework for automatic agent generation , author=. arXiv preprint arXiv:2309.17288 , year=

  5. [7]

    Forty-first International Conference on Machine Learning , year=

    GPTSwarm: Language Agents as Optimizable Graphs , author=. Forty-first International Conference on Machine Learning , year=

  6. [9]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  7. [10]

    The Eleventh International Conference on Learning Representations , year=

    Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  8. [12]

    Advances in Neural Information Processing Systems , volume=

    Self-refine: Iterative refinement with self-feedback , author=. Advances in Neural Information Processing Systems , volume=

  9. [16]

    The Twelfth International Conference on Learning Representations , year=

    DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines , author=. The Twelfth International Conference on Learning Representations , year=

  10. [17]

    Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , author=

  11. [18]

    The Twelfth International Conference on Learning Representations , year=

    Large Language Models as Optimizers , author=. The Twelfth International Conference on Learning Representations , year=

  12. [22]

    The Shift from Models to Compound AI Systems , author=

  13. [23]

    arXiv preprint arXiv:2309.15402 , year=

    A survey of chain of thought reasoning: Advances, frontiers and future , author=. arXiv preprint arXiv:2309.15402 , year=

  14. [25]

    Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    Large language model based multi-agents: A survey of progress and challenges , author=. arXiv preprint arXiv:2402.01680 , year=

  15. [26]

    Advances in Neural Information Processing Systems , volume=

    Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  16. [27]

    LLM-powered Autonomous Agents

    Weng, Lilian. LLM-powered Autonomous Agents. lilianweng.github.io. 2023

  17. [28]

    Frontiers of Computer Science , volume=

    A survey on large language model based autonomous agents , author=. Frontiers of Computer Science , volume=. 2024 , publisher=

  18. [29]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    A prompt pattern catalog to enhance prompt engineering with chatgpt , author=. arXiv preprint arXiv:2302.11382 , year=

  19. [31]

    and Triggs, B

    Dalal, N. and Triggs, B. , booktitle=. Histograms of oriented gradients for human detection , year=

  20. [32]

    , booktitle=

    Lowe, D.G. , booktitle=. Object recognition from local scale-invariant features , year=

  21. [33]

    Advances in neural information processing systems , volume=

    Imagenet classification with deep convolutional neural networks , author=. Advances in neural information processing systems , volume=

  22. [34]

    2019 , publisher=

    Automated machine learning: methods, systems, challenges , author=. 2019 , publisher=

  23. [36]

    Journal of Machine Learning Research , volume=

    Neural architecture search: A survey , author=. Journal of Machine Learning Research , volume=

  24. [37]

    Englewood Cliffs, NJ , year=

    The structuring of organizations: a synthesis of the research , author=. Englewood Cliffs, NJ , year=

  25. [38]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Revisiting residual networks for adversarial robustness , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  26. [39]

    Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

    Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

  27. [40]

    Open Questions in Creating Safe Open-ended

    Ecoffet, Adrien and Clune, Jeff and Lehman, Joel , booktitle=. Open Questions in Creating Safe Open-ended. 2020 , organization=

  28. [41]

    Journal of Evolution and Technology , pages =

    Bostrom, N , editor =. Journal of Evolution and Technology , pages =. 2002 , series =

  29. [42]

    2014 , publisher=

    Sotala, Kaj and Yampolskiy, Roman V , journal=. 2014 , publisher=

  30. [43]

    Yudkowsky, Eliezer and others , journal=

  31. [44]

    Rise of concerns about

    Dietterich, Thomas G and Horvitz, Eric J , journal=. Rise of concerns about. 2015 , publisher=

  32. [45]

    1983 , publisher=

    A mechanical proof of the Turing completeness of pure LISP , author=. 1983 , publisher=

  33. [46]

    2024 , howpublished =

    Abrahim Ladha , title =. 2024 , howpublished =

  34. [47]

    2024 , url=

    Jenny Zhang and Joel Lehman and Kenneth Stanley and Jeff Clune , booktitle=. 2024 , url=

  35. [48]

    2018 , publisher=

    Reinforcement learning: An introduction , author=. 2018 , publisher=

  36. [51]

    The Eleventh International Conference on Learning Representations , year=

    Language models are multilingual chain-of-thought reasoners , author=. The Eleventh International Conference on Learning Representations , year=

  37. [53]

    International Conference on Machine Learning , pages=

    Pal: Program-aided language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  38. [54]

    GitHub repository , howpublished =

    LangChainAI , title =. GitHub repository , howpublished =. 2022 , publisher =

  39. [55]

    Advances in Neural Information Processing Systems , volume=

    Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in Neural Information Processing Systems , volume=

  40. [56]

    Nature , volume=

    Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=

  41. [57]

    2024 , booktitle=

    Offline Training of Language Model Agents with Functions as Learnable Weights , author=. 2024 , booktitle=

  42. [59]

    2024 , month =

    Greenblatt, Ryan , title =. 2024 , month =

  43. [60]

    International Conference on Learning Representations , year=

    Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

  44. [61]

    2023 , eprint=

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author=. 2023 , eprint=

  45. [62]

    2024 , eprint=

    GPT-4 Technical Report , author=. 2024 , eprint=

  46. [63]

    2024 , month =

    Anthropic , title =. 2024 , month =

  47. [64]

    2022 , month =

    OpenAI , title =. 2022 , month =

  48. [66]

    Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

    A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

  49. [67]

    2024 , month =

    Ng, Andrew , title =. 2024 , month =

  50. [68]

    2024 , month =

    Harrison Chase , title =. 2024 , month =

  51. [69]

    The Eleventh International Conference on Learning Representations , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  52. [70]

    Hu, Shengran and Clune, Jeff , journal=

  53. [71]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Toolformer: Language Models Can Teach Themselves to Use Tools , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  54. [72]

    Zoph and V

    B. Zoph and V. Vasudevan and J. Shlens and Q. V. Le , booktitle =. Learning Transferable Architectures for Scalable Image Recognition , year =. doi:10.1109/CVPR.2018.00907 , publisher =

  55. [73]

    Complex & Intelligent Systems , pages=

    Accelerating multi-objective neural architecture search by random-weight evaluation , author=. Complex & Intelligent Systems , pages=. 2021 , publisher=

  56. [74]

    Proceedings of the genetic and evolutionary computation conference , pages=

    Nsga-net: neural architecture search using multi-objective genetic algorithm , author=. Proceedings of the genetic and evolutionary computation conference , pages=

  57. [75]

    International conference on machine learning , pages=

    Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

  58. [77]

    Bartlett and Ilya Sutskever and Pieter Abbeel , year=

    Yan Duan and John Schulman and Xi Chen and Peter L. Bartlett and Ilya Sutskever and Pieter Abbeel , year=

  59. [80]

    International Conference on Evolutionary Multi-Criterion Optimization , pages=

    Multi-objective neural architecture search with almost no training , author=. International Conference on Evolutionary Multi-Criterion Optimization , pages=. 2021 , organization=

  60. [81]

    Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , volume=

    Co-generation of game levels and game-playing agents , author=. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , volume=

  61. [82]

    2024 , eprint=

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference , author=. 2024 , eprint=

  62. [83]

    Forty-first International Conference on Machine Learning , year=

    Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model , author=. Forty-first International Conference on Machine Learning , year=

  63. [84]

    ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference , year=

    Arbitrary Order Meta-Learning with Simple Population-Based Evolution , author=. ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference , year=

  64. [86]

    Advances in Neural Information Processing Systems , volume=

    Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=

  65. [87]

    2023 , note =

    Richards, Toran Bruce , title =. 2023 , note =

  66. [88]

    2023 , journal =

    Voyager: An Open-Ended Embodied Agent with Large Language Models , author =. 2023 , journal =

  67. [89]

    2023 , month =

    Vemprala, Sai and Bonatti, Rogerio and Bucker, Arthur and Kapoor, Ashish , title =. 2023 , month =

  68. [92]

    Rokon, Md Omar Faruk and Islam, Risul and Darki, Ahmad and Papalexakis, Evangelos E and Faloutsos, Michalis , booktitle=

  69. [94]

    Network Security , volume=

    Ethical hackers: putting on the white hat , author=. Network Security , volume=. 2011 , publisher=

  70. [95]

    Science , volume=

    Managing extreme AI risks amid rapid progress , author=. Science , volume=. 2024 , publisher=

  71. [96]

    2024 , month =

    Meta , title =. 2024 , month =

  72. [98]

    2024 , publisher =

    Tim Rocktäschel , title =. 2024 , publisher =

  73. [99]

    The Twelfth International Conference on Learning Representations , year=

    Eureka: Human-Level Reward Design via Coding Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  74. [100]

    Conference on Robot Learning , pages=

    Language to Rewards for Robotic Skill Synthesis , author=. Conference on Robot Learning , pages=. 2023 , organization=

  75. [101]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Deepmad: Mathematical architecture design for deep convolutional neural network , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  76. [103]

    International conference on machine learning , pages=

    Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions , author=. International conference on machine learning , pages=. 2020 , organization=

  77. [104]

    Evolutionary computation , volume=

    Abandoning objectives: Evolution through the search for novelty alone , author=. Evolutionary computation , volume=. 2011 , publisher=

  78. [106]

    Communications of the ACM , volume=

    Native client: A sandbox for portable, untrusted x86 native code , author=. Communications of the ACM , volume=. 2010 , publisher=

  79. [108]

    IEEE Transactions on Evolutionary Computation , volume=

    Quality and diversity optimization: A unifying modular framework , author=. IEEE Transactions on Evolutionary Computation , volume=. 2017 , publisher=

  80. [109]

    2015 , publisher=

    Why greatness cannot be planned: The myth of the objective , author=. 2015 , publisher=

Showing first 80 references.