Maestro: Open-ended environment design for multi-agent reinforcement learning

Samvelyan, Mikayel, Khan, Akbir, Dennis, Michael, Jiang, Minqi and · 2023 · arXiv 2303.03376

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

PACE: Parameter Change for Unsupervised Environment Design

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

PACE uses the squared L2 norm of policy parameter changes from a first-order approximation as an efficient proxy for environment value in UED, outperforming baselines with higher IQM and lower optimality gap on MiniGrid and Craftax OOD tests.

citing papers explorer

Showing 3 of 3 citing papers.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 67 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 282
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
PACE: Parameter Change for Unsupervised Environment Design cs.LG · 2026-05-02 · unverdicted · none · ref 10
PACE uses the squared L2 norm of policy parameter changes from a first-order approximation as an efficient proxy for environment value in UED, outperforming baselines with higher IQM and lower optimality gap on MiniGrid and Craftax OOD tests.

Maestro: Open-ended environment design for multi-agent reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer