pith. sign in

arxiv: 1806.09605 · v1 · pith:7VVFXH6Vnew · submitted 2018-06-22 · 💻 cs.LG · cs.AI· stat.ML

Many-Goals Reinforcement Learning

classification 💻 cs.LG cs.AIstat.ML
keywords updatingexploremany-goalsgoalstaskusedbetterextensions
0
0 comments X
read the original abstract

All-goals updating exploits the off-policy nature of Q-learning to update all possible goals an agent could have from each transition in the world, and was introduced into Reinforcement Learning (RL) by Kaelbling (1993). In prior work this was mostly explored in small-state RL problems that allowed tabular representations and where all possible goals could be explicitly enumerated and learned separately. In this paper we empirically explore 3 different extensions of the idea of updating many (instead of all) goals in the context of RL with deep neural networks (or DeepRL for short). First, in a direct adaptation of Kaelbling's approach we explore if many-goals updating can be used to achieve mastery in non-tabular visual-observation domains. Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest. Third, we explore whether many-goals updating can be used to provide auxiliary task updates in training a network to learn faster and better on a single main task of interest. We provide comparisons to baselines for each of the 3 extensions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

    cs.LG 2026-05 unverdicted novelty 8.0

    GCRL and MISL are unified as control maximization, with three inequivalent GCRL formulations each matched to a MISL objective via bounds on goal-sensitivity.

  2. Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

    cs.LG 2025-07 unverdicted novelty 7.0

    Multitask Preplay replays experience from pursued tasks as starting points for counterfactual simulation of unpursued tasks to learn predictive representations that support fast generalization in humans and machines.

  3. Goal-Conditioned Agents that Learn Everything All at Once

    cs.LG 2026-05 unverdicted novelty 6.0

    LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.