Universal Successor Features Approximators

· 2018 · cs.LG · arXiv 1812.07626

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment.

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

cs.LG · 2025-07-08 · unverdicted · novelty 7.0

Multitask Preplay replays experience from pursued tasks as starting points for counterfactual simulation of unpursued tasks to learn predictive representations that support fast generalization in humans and machines.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

Robust Remote Reinforcement Learning over Unreliable Communication Channels using Homomorphic State Encoding

cs.LG · 2025-08-11 · unverdicted · novelty 6.0

HR3L enables robust remote RL training over unreliable channels via homomorphic state encoding without gradient exchange, outperforming prior methods in sample efficiency and adapting to packet loss, delays, and bandwidth limits.

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

Intention-Conditioned Flow Occupancy Models

cs.LG · 2025-06-10 · unverdicted · novelty 5.0

InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.

citing papers explorer

Showing 5 of 5 citing papers.

Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines cs.LG · 2025-07-08 · unverdicted · none · ref 18 · internal anchor
Multitask Preplay replays experience from pursued tasks as starting points for counterfactual simulation of unpursued tasks to learn predictive representations that support fast generalization in humans and machines.
Goal-Conditioned Agents that Learn Everything All at Once cs.LG · 2026-05-22 · unverdicted · none · ref 30 · internal anchor
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Robust Remote Reinforcement Learning over Unreliable Communication Channels using Homomorphic State Encoding cs.LG · 2025-08-11 · unverdicted · none · ref 30 · internal anchor
HR3L enables robust remote RL training over unreliable channels via homomorphic state encoding without gradient exchange, outperforming prior methods in sample efficiency and adapting to packet loss, delays, and bandwidth limits.
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited cs.LG · 2026-05-16 · unverdicted · none · ref 13 · internal anchor
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
Intention-Conditioned Flow Occupancy Models cs.LG · 2025-06-10 · unverdicted · none · ref 12 · internal anchor
InFOM applies flow matching to model intention-conditioned occupancy measures for RL pre-training, reporting 1.8x median return gains and 36% higher success rates on benchmarks.

Universal Successor Features Approximators

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer