Many-Goals Reinforcement Learning

Junhyuk Oh; Satinder Singh; Vivek Veeriah

arxiv: 1806.09605 · v1 · pith:7VVFXH6Vnew · submitted 2018-06-22 · 💻 cs.LG · cs.AI· stat.ML

Many-Goals Reinforcement Learning

Vivek Veeriah , Junhyuk Oh , Satinder Singh This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords updatingexploremany-goalsgoalstaskusedbetterextensions

0 comments

read the original abstract

All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goals could be explicitly enumerated and learned separately. In this paper we empirically explore 3 different extensions of the idea of updating many (instead of all) goals in the context of RL with deep neural networks (or DeepRL for short). First, in a direct adaptation of Kaelbling's approach we explore if many-goals updating can be used to achieve mastery in non-tabular visual-observation domains. Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest. Third, we explore whether many-goals updating can be used to provide auxiliary task updates in training a network to learn faster and better on a single main task of interest. We provide comparisons to baselines for each of the 3 extensions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization
cs.LG 2026-05 unverdicted novelty 8.0

GCRL and MISL are unified as control maximization, with three inequivalent GCRL formulations each matched to a MISL objective via bounds on goal-sensitivity.
Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines
cs.LG 2025-07 unverdicted novelty 7.0

Multitask Preplay replays experience from pursued tasks as starting points for counterfactual simulation of unpursued tasks to learn predictive representations that support fast generalization in humans and machines.
Goal-Conditioned Agents that Learn Everything All at Once
cs.LG 2026-05 unverdicted novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.