Multi-task Deep Reinforcement Learning with PopArt

Hado van Hasselt; Hubert Soyer; Lasse Espeholt; Matteo Hessel; Simon Schmitt; Wojciech Czarnecki

arxiv: 1809.04474 · v1 · pith:2RYF5OYSnew · submitted 2018-09-12 · 💻 cs.LG · stat.ML

Multi-task Deep Reinforcement Learning with PopArt

Matteo Hessel , Hubert Soyer , Lasse Espeholt , Wojciech Czarnecki , Simon Schmitt , Hado van Hasselt This is my paper

classification 💻 cs.LG stat.ML

keywords learningtasksperformanceagentsingletaskalgorithmsmulti-task

0 comments

read the original abstract

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Attentive Multi-Task Deep Reinforcement Learning
cs.LG 2019-07 unverdicted novelty 6.0

Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
cs.LG 2019-07 unverdicted novelty 6.0

A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
Disentangled Skill Embeddings for Reinforcement Learning
cs.LG 2019-06 unverdicted novelty 6.0

Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.
TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
cs.AI 2026-05 unverdicted novelty 5.0

TOPPO reformulates PPO with critic balancing to address gradient ill-conditioning in multi-task RL and reports stronger mean and tail performance than SAC baselines on Meta-World+ using fewer parameters and steps.