pith. machine review for the scientific record. sign in

arxiv: 1606.02396 · v1 · submitted 2016-06-08 · 📊 stat.ML · cs.AI· cs.LG· cs.NE

Recognition: unknown

Deep Successor Reinforcement Learning

Authors on Pith no claims yet
classification 📊 stat.ML cs.AIcs.LGcs.NE
keywords successorrewardgivenlearningdeepreinforcementstatevalue
0
0 comments X
read the original abstract

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    Adapting RFRL objectives as auxiliary tasks with preference-guided exploration outperforms prior MORL methods in performance and data efficiency on MO-Gymnasium tasks.

  2. Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

    cs.LG 2026-03 unverdicted novelty 4.0

    Temporal abstraction functions as a low-pass filter on transition dynamics to lower the effective rank of successor representations while bounding value function error in forward-backward learning.