pith. sign in

hub Canonical reference

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Canonical reference. 85% of citing Pith papers cite this work as background.

42 Pith papers citing it
Background 85% of classified citations
abstract

Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner. Meta-learning can automate the discovery of novel algorithms, but is limited by first-order improvements and the human design of a suitable search space. The G\"odel machine proposed a theoretical alternative: a self-improving AI that repeatedly modifies itself in a provably beneficial manner. Unfortunately, proving that most changes are net beneficial is impossible in practice. We introduce the Darwin G\"odel Machine (DGM), a self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks. Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents. It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent. This open-ended exploration forms a growing tree of diverse, high-quality agents and allows the parallel exploration of many different paths through the search space. Empirically, the DGM automatically improves its coding capabilities (e.g., better code editing tools, long-context window management, peer-review mechanisms), increasing performance on SWE-bench from 20.0% to 50.0%, and on Polyglot from 14.2% to 30.7%. Furthermore, the DGM significantly outperforms baselines without self-improvement or open-ended exploration. All experiments were done with safety precautions (e.g., sandboxing, human oversight). The DGM is a significant step toward self-improving AI, capable of gathering its own stepping stones along paths that unfold into endless innovation.

hub tools

citation-role summary

background 11 dataset 1 other 1

citation-polarity summary

years

2026 36 2025 6

clear filters

representative citing papers

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents

cs.AI · 2026-06-06 · unverdicted · novelty 7.0

PACE is a training-free anytime-valid commit gate using testing-by-betting e-processes that controls per-candidate false-commit probability for self-evolving agents and reduces spurious edits compared to greedy acceptance.

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

Optimizing ground state preparation protocols with autoresearch

quant-ph · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

AI coding agents evolve simple ground-state protocols into improved versions for VQE, DMRG, and AFQMC on spin models and molecules by using executable energy scores under fixed compute budgets.

Analytic Concept-Centric Memory for Agentic Embodied Manipulation

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Proposes a structured concept-centric memory system for embodied agents that connects object, scene, transition, and skill memories to support coarse-to-fine retrieval and improve task performance over baselines.

Symbolon: Symbolic Execution by Learning Code Transformation

cs.CR · 2026-06-27 · unverdicted · novelty 6.0

Symbolon learns diverse code transformations via search on small programs, distills them into agent skills, and applies them to improve KLEE symbolic execution, yielding 3.69x coverage gains and 21 new Linux kernel bugs.

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.

MemPro: Agentic Memory Systems as Evolvable Programs

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

MemPro evolves the entire MCR pipeline as runnable programs via failure-guided refinement on a version tree and outperforms static baselines on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA.

Open-Ended Task Discovery via Bayesian Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly cs.CR · 2026-05-25 · unverdicted · none · ref 75 · internal anchor

    CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.

  • Symbolon: Symbolic Execution by Learning Code Transformation cs.CR · 2026-06-27 · unverdicted · none · ref 139 · internal anchor

    Symbolon learns diverse code transformations via search on small programs, distills them into agent skills, and applies them to improve KLEE symbolic execution, yielding 3.69x coverage gains and 21 new Linux kernel bugs.