Mix&Match - Agent Curricula for Reinforcement Learning

Leonard Hasenclever; Max Jaderberg; Nicolas Heess; Razvan Pascanu; Siddhant M. Jayakumar; Simon Osindero; Wojciech Marian Czarnecki; Yee Whye Teh

Mix&Match - Agent Curricula for Reinforcement Learning

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1806.01780 v1 pith:AP452UIR submitted 2018-06-05 cs.LG stat.ML

Mix&Match - Agent Curricula for Reinforcement Learning

Wojciech Marian Czarnecki , Siddhant M. Jayakumar , Max Jaderberg , Leonard Hasenclever , Yee Whye Teh , Simon Osindero , Nicolas Heess , Razvan Pascanu This is my paper

classification cs.LG stat.ML

keywords agentscurriculumagentlearningmethodperformancetrainchallenging

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

We introduce Mix&Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents. In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally. We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods. (2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dota 2 with Large Scale Deep Reinforcement Learning
cs.LG 2019-12 accept novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Attentive Multi-Task Deep Reinforcement Learning
cs.LG 2019-07 unverdicted novelty 6.0

Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.
Growing Action Spaces
cs.LG 2019-06 unverdicted novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.