Temporal Abstraction in Reinforcement Learning with the Successor Representation

Andre Barreto; Doina Precup; Marlos C. Machado; Michael Bowling

arxiv: 2110.05740 · v3 · pith:OBKRCVP3new · submitted 2021-10-12 · 💻 cs.LG · cs.AI

Temporal Abstraction in Reinforcement Learning with the Successor Representation

Marlos C. Machado , Andre Barreto , Doina Precup , Michael Bowling This is my paper

classification 💻 cs.LG cs.AI

keywords optionsrepresentationresultsabstractiondiscoverylearningoptiontemporal

0 comments

read the original abstract

Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for which options one should consider. In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent results, showing how the SR can be used to discover options that facilitate either temporally-extended exploration or planning. We cast these results as instantiations of a general framework for option discovery in which the agent's representation is used to identify useful options, which are then used to further improve its representation. This results in a virtuous, never-ending, cycle in which both the representation and the options are constantly refined based on each other. Beyond option discovery itself, we also discuss how the SR allows us to augment a set of options into a combinatorially large counterpart without additional learning. This is achieved through the combination of previously learned options. Our empirical evaluation focuses on options discovered for exploration and on the use of the SR to combine them. The results of our experiments shed light on important design decisions involved in the definition of options and demonstrate the synergy of different methods based on the SR, such as eigenoptions and the option keyboard.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dynamic Latent Routing
cs.LG 2026-05 unverdicted novelty 7.0

Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across fou...
Exploration and Online Transfer with Behavioral Foundation Models
cs.AI 2026-06 unverdicted novelty 6.0

Proposes framing online zero-shot RL transfer as a bandit problem solved by BFMs, deriving eigenvalue minimization of an uncertainty matrix for exploration under linear reward approximation, validated on a simple environment.
Exploration and Online Transfer with Behavioral Foundation Models
cs.AI 2026-06 unverdicted novelty 6.0

Frames online zero-shot transfer with BFMs as a bandit problem and derives an eigenvalue-minimization exploration strategy under linear reward approximation.