Evolving Curricula with Regret-Based Environment Design

Edward Grefenstette; Jack Parker-Holder; Jakob Foerster; Michael Dennis; Mikayel Samvelyan; Minqi Jiang; Tim Rockt\"aschel

arxiv: 2203.01302 · v3 · pith:3MMLPTF5new · submitted 2022-03-02 · 💻 cs.LG

Evolving Curricula with Regret-Based Environment Design

Jack Parker-Holder , Minqi Jiang , Michael Dennis , Mikayel Samvelyan , Jakob Foerster , Edward Grefenstette , Tim Rockt\"aschel This is my paper

classification 💻 cs.LG

keywords environmentlevelsregret-basedcurriculadesignmethodsaccelagent

0 comments

read the original abstract

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at accelagent.github.io.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PROWL: Prioritized Regret-Driven Optimization for World Model Learning
cs.LG 2026-05 unverdicted novelty 5.0

PROWL introduces a KL-constrained adversarial curriculum and prioritized adversarial trajectory buffer to actively discover and correct rare failure modes in action-conditioned video world models.
Learning to Reason at the Frontier of Learnability
cs.LG 2025-02 unverdicted novelty 4.0

A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.