pith. sign in

hub Canonical reference

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Canonical reference. 100% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 100% of classified citations
abstract

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. CORAL replaces rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous multi-agent execution, and heartbeat-based interventions. It also provides practical safeguards, including isolated workspaces, evaluator separation, resource management, and agent session and health management. Evaluated on diverse mathematical, algorithmic, and systems optimization tasks, CORAL sets new state-of-the-art results on 10 tasks, achieving 3-10 times higher improvement rates with far fewer evaluations than fixed evolutionary search baselines across tasks. On Anthropic's kernel engineering task, four co-evolving agents improve the best known score from 1363 to 1103 cycles. Mechanistic analyses further show how these gains arise from knowledge reuse and multi-agent exploration and communication. Together, these results suggest that greater agent autonomy and multi-agent evolution can substantially improve open-ended discovery. Code is available at https://github.com/Human-Agent-Society/CORAL.

hub tools

citation-role summary

background 6

citation-polarity summary

years

2026 12

roles

background 6

polarities

background 6

representative citing papers

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Evolutionary Ensemble of Agents

cs.NE · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Prism unifies file, vector, graph, and evolutionary memory under a decision-theoretic framework with entropy-gated stratification, causal graphs, value-of-information retrieval, heartbeat consolidation, and replicator-decay dynamics, reporting 88.1 on LOCOMO and 2.8x gains on CORAL tasks.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

citing papers explorer

Showing 12 of 12 citing papers.

  • FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale cs.LG · 2026-05-14 · conditional · none · ref 29 · internal anchor

    FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.

  • Harnessing Agentic Evolution cs.AI · 2026-05-13 · unverdicted · none · ref 23 · internal anchor

    AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

  • TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems cs.CL · 2026-05-10 · unverdicted · none · ref 14 · internal anchor

    TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.

  • Towards Direct Evaluation of Harness Optimizers via Priority Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 30 · internal anchor

    Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.

  • RMA: an Agentic System for Research-Level Mathematical Problems cs.AI · 2026-05-20 · unverdicted · none · ref 22 · internal anchor

    RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

  • DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery cs.LG · 2026-05-14 · unverdicted · none · ref 25 · internal anchor

    DrugSAGE accumulates cross-task memory of skills, statistical evidence, and recurring errors to let LLM agents achieve top-ranked performance on molecular property prediction tasks with reduced or zero test-time search.

  • HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 38 · internal anchor

    HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.

  • CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness q-bio.NC · 2026-04-30 · unverdicted · none · ref 19 · internal anchor

    CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.

  • Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 100 · internal anchor

    SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

  • Evolutionary Ensemble of Agents cs.NE · 2026-05-09 · unverdicted · none · ref 15 · 2 links · internal anchor

    EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

  • Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery cs.AI · 2026-04-08 · unverdicted · none · ref 2 · internal anchor

    Prism unifies file, vector, graph, and evolutionary memory under a decision-theoretic framework with entropy-gated stratification, causal graphs, value-of-information retrieval, heartbeat consolidation, and replicator-decay dynamics, reporting 88.1 on LOCOMO and 2.8x gains on CORAL tasks.

  • AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 155 · internal anchor

    The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.