hub Canonical reference

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al · 2026 · cs.AI · arXiv 2604.01658

Canonical reference. 100% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 12 citing papers arXiv PDF

abstract

Large language model (LLM)-based evolution is a promising approach for open-ended discovery, where progress requires sustained search and knowledge accumulation. Existing methods still rely heavily on fixed heuristics and hard-coded exploration rules, which limit the autonomy of LLM agents. We present CORAL, the first framework for autonomous multi-agent evolution on open-ended problems. CORAL replaces rigid control with long-running agents that explore, reflect, and collaborate through shared persistent memory, asynchronous multi-agent execution, and heartbeat-based interventions. It also provides practical safeguards, including isolated workspaces, evaluator separation, resource management, and agent session and health management. Evaluated on diverse mathematical, algorithmic, and systems optimization tasks, CORAL sets new state-of-the-art results on 10 tasks, achieving 3-10 times higher improvement rates with far fewer evaluations than fixed evolutionary search baselines across tasks. On Anthropic's kernel engineering task, four co-evolving agents improve the best known score from 1363 to 1103 cycles. Mechanistic analyses further show how these gains arise from knowledge reuse and multi-agent exploration and communication. Together, these results suggest that greater agent autonomy and multi-agent evolution can substantially improve open-ended discovery. Code is available at https://github.com/Human-Agent-Society/CORAL.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

cs.LG · 2026-05-14 · conditional · novelty 7.0

FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.

Towards Direct Evaluation of Harness Optimizers via Priority Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

DrugSAGE accumulates cross-task memory of skills, statistical evidence, and recurring errors to let LLM agents achieve top-ranked performance on molecular property prediction tasks with reduced or zero test-time search.

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

q-bio.NC · 2026-04-30 · unverdicted · novelty 6.0

CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

Evolutionary Ensemble of Agents

cs.NE · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery

cs.AI · 2026-04-08 · unverdicted · novelty 5.0

Prism unifies file, vector, graph, and evolutionary memory under a decision-theoretic framework with entropy-gated stratification, causal graphs, value-of-information retrieval, heartbeat consolidation, and replicator-decay dynamics, reporting 88.1 on LOCOMO and 2.8x gains on CORAL tasks.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

citing papers explorer

Showing 12 of 12 citing papers.

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale cs.LG · 2026-05-14 · conditional · none · ref 29 · internal anchor
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
Harnessing Agentic Evolution cs.AI · 2026-05-13 · unverdicted · none · ref 23 · internal anchor
AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems cs.CL · 2026-05-10 · unverdicted · none · ref 14 · internal anchor
TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.
Towards Direct Evaluation of Harness Optimizers via Priority Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 30 · internal anchor
Priority ranking offers a low-cost direct evaluation for harness optimizers that correlates with their real multi-step optimization performance, supported by the Shor dataset of 182 scenarios.
RMA: an Agentic System for Research-Level Mathematical Problems cs.AI · 2026-05-20 · unverdicted · none · ref 22 · internal anchor
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery cs.LG · 2026-05-14 · unverdicted · none · ref 25 · internal anchor
DrugSAGE accumulates cross-task memory of skills, statistical evidence, and recurring errors to let LLM agents achieve top-ranked performance on molecular property prediction tasks with reduced or zero test-time search.
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 38 · internal anchor
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness q-bio.NC · 2026-04-30 · unverdicted · none · ref 19 · internal anchor
CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 100 · internal anchor
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
Evolutionary Ensemble of Agents cs.NE · 2026-05-09 · unverdicted · none · ref 15 · 2 links · internal anchor
EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.
Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery cs.AI · 2026-04-08 · unverdicted · none · ref 2 · internal anchor
Prism unifies file, vector, graph, and evolutionary memory under a decision-theoretic framework with entropy-gated stratification, causal graphs, value-of-information retrieval, heartbeat consolidation, and replicator-decay dynamics, reporting 88.1 on LOCOMO and 2.8x gains on CORAL tasks.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 155 · internal anchor
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer