pith. sign in

arxiv: 1906.09480 · v1 · pith:RII6RVZ7new · submitted 2019-06-22 · 📊 stat.ML · cs.LG· cs.NE· q-bio.NC

A neurally plausible model learns successor representations in partially observable environments

Pith reviewed 2026-05-25 17:53 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NEq-bio.NC
keywords successor representationspartially observable environmentsdistributional codesreinforcement learningneural plausibilityvalue functionuncertainty representationnoisy observations
0
0 comments X

The pith

A model extends the distributed distributional code to successor features, enabling neurally plausible reinforcement learning from noisy partial observations where direct policy learning fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model that learns successor representations in partially observable noisy environments by extending a distributed code for uncertainty. This provides a middle ground between model-based and model-free reinforcement learning, supporting fast value computation and adaptation to reward changes. A sympathetic reader would care because many real-world tasks, such as navigation or predator avoidance, involve inferring hidden states from sensory noise rather than observing them directly. The approach shows that such representations can yield effective policies even when standard methods cannot learn them. It grounds the mechanism in a framework intended to match neural computation.

Core claim

We introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.

What carries the argument

Distributional successor features: an extension of the distributed distributional code that represents uncertainty over future states to compute values under partial observability.

If this is right

  • Enables efficient value function computation without requiring full state observability.
  • Supports rapid adaptation when the reward function or goal locations change.
  • Yields successful policies in noisy settings where direct learning of policies is infeasible.
  • Produces representations whose features match patterns seen in neural responses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same code could be tested for its ability to handle multi-step planning under sensory noise in larger state spaces.
  • If neural recordings show population codes matching the distributional successor features during partial-observation tasks, that would align with the model's predictions.
  • The framework suggests a route to combine successor representations with other forms of uncertainty propagation in sequential decision problems.

Load-bearing premise

The distributed distributional code for uncertainty can be extended to successor features while preserving neural plausibility and enabling efficient computation in partially observed settings.

What would settle it

A simulation of a noisy partially observable task in which the model produces no better policies than direct policy learning methods that the paper claims are infeasible.

Figures

Figures reproduced from arXiv: 1906.09480 by Eszter Vertes, Maneesh Sahani.

Figure 1
Figure 1. Figure 1: Learning and inference in a state-space model parametrized by a DDC. (a) The structure of [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Value functions computed using successor features under a random walk policy [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Value functions computed by SFs under the learned policy. Top row shows reward and [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Learning and inference in the DDC state-space model [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
read the original abstract

Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-ground between model-based and model-free reinforcement learning strategies, allowing for fast value computation and rapid adaptation to changes in the reward function or goal locations. Indeed, recent studies suggest that features of neural responses are consistent with the SR framework. However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes a neurally plausible model that extends the distributed distributional code to successor features, enabling learning of successor representations (SR) in partially observable environments. It claims this approach supports efficient value function computation and reinforcement learning in noisy POMDPs where direct policy learning is infeasible, building on prior work on distributional codes for uncertainty representation.

Significance. If the simulations and derivations hold, the work provides a concrete mechanism linking neural uncertainty representations to SR-based RL, offering potential explanations for biological value computation in uncertain settings and a practical algorithm for POMDPs. The modeling proposal is grounded in existing frameworks and addresses a clear gap in applying SR to partial observability.

minor comments (3)
  1. The abstract and introduction would benefit from a brief explicit statement of the key equations defining the distributional successor features (e.g., how the code for successor distributions is updated) to allow readers to assess the extension from the base distributed distributional code without immediately consulting the methods.
  2. Figure captions should include more detail on simulation parameters, such as noise levels in observations and number of trials, to make the results in the POMDP experiments reproducible from the figures alone.
  3. Notation for the successor features and value computation should be unified across sections; currently the transition from the standard SR to the distributional version is not always signposted with equation references.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the accurate summary of our contribution, and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

Minor self-citation to prior distributional code work; central model extension remains independent

full rationale

The paper introduces a new model for distributional successor features in POMDPs by extending the distributed distributional code. It explicitly references building on that prior framework, but the abstract presents the extension itself (neural plausibility, efficient value computation via SR, support for RL where direct policies fail) as the novel contribution without any shown equations, fitted parameters, or self-defined terms that reduce the claimed result to its inputs by construction. No load-bearing uniqueness theorems, ansatzes, or renaming of known results are evident from the provided text. This is a standard modeling proposal whose support would be evaluated externally; the self-citation is not circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into parameters and assumptions; the model relies on extension of a prior distributional code framework whose details are not re-derived here.

free parameters (1)
  • model hyperparameters for feature learning and distribution parameters
    Likely present for training the successor features but unspecified in abstract
axioms (1)
  • domain assumption The distributed distributional code provides a neurally plausible representation of uncertainty that can be extended to successor features.
    The model is described as building directly on this prior framework.

pith-pipeline@v0.9.0 · 5715 in / 1096 out tokens · 35144 ms · 2026-05-25T17:53:42.892987+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    Hunt, Tom Schaul, Hado P

    Andr \'e Barreto, Will Dabney, R \'e mi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van Hasselt, and David Silver. Successor features for transfer in reinforcement learning. In Advances in neural information processing systems, pages 4055--4065, 2017

  2. [2]

    Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

    Andr \'e Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Z \'i dek, and R \'e mi Munos. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement . arXiv:1901.10964 [cs], January 2019. URL http://arxiv.org/abs/1901.10964. arXiv: 1901.10964

  3. [3]

    Daw, Yael Niv, and Peter Dayan

    Nathaniel D. Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8 0 (12): 0 1704, December 2005. ISSN 1546-1726. doi:10.1038/nn1560. URL https://www.nature.com/articles/nn1560

  4. [4]

    Daw, Samuel J

    Nathaniel D. Daw, Samuel J. Gershman, Ben Seymour, Peter Dayan, and Raymond J. Dolan. Model- Based Influences on Humans ' Choices and Striatal Prediction Errors . Neuron, 69 0 (6): 0 1204--1215, March 2011. ISSN 0896-6273. doi:10.1016/j.neuron.2011.02.027. URL http://www.sciencedirect.com/science/article/pii/S0896627311001255

  5. [5]

    Improving Generalization for Temporal Difference Learning : The Successor Representation

    Peter Dayan. Improving Generalization for Temporal Difference Learning : The Successor Representation . Neural Computation, 5 0 (4): 0 613--624, July 1993. ISSN 0899-7667. doi:10.1162/neco.1993.5.4.613. URL https://doi.org/10.1162/neco.1993.5.4.613

  6. [6]

    Peter Dayan and Nathaniel D. Daw. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience, 8 0 (4): 0 429--453, December 2008. ISSN 1531-135X. doi:10.3758/CABN.8.4.429. URL https://doi.org/10.3758/CABN.8.4.429

  7. [7]

    Gershman

    Samuel J. Gershman. The Successor Representation : Its Computational Logic and Neural Substrates . J. Neurosci., 38 0 (33): 0 7193--7200, August 2018. ISSN 0270-6474, 1529-2401. doi:10.1523/JNEUROSCI.0151-18.2018. URL http://www.jneurosci.org/content/38/33/7193

  8. [8]

    O'Doherty

    Jan Gl \"a scher, Nathaniel Daw, Peter Dayan, and John P. O'Doherty. States versus Rewards : Dissociable Neural Prediction Error Signals Underlying Model - Based and Model - Free Reinforcement Learning . Neuron, 66 0 (4): 0 585--595, May 2010. ISSN 0896-6273. doi:10.1016/j.neuron.2010.04.016. URL http://www.sciencedirect.com/science/article/pii/S0896627310002874

  9. [9]

    Borgwardt, Malte J

    Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch \"o lkopf, and Alexander Smola. A Kernel Two - Sample Test . Journal of Machine Learning Research, 13: 0 723--773, March 2012. URL http://jmlr.csail.mit.edu/papers/v13/gretton12a.html

  10. [10]

    wake-sleep

    G E Hinton, P Dayan, B J Frey, and R M Neal. The "wake-sleep" algorithm for unsupervised neural networks. Science, 268 0 (5214): 0 1158--1161, May 1995. ISSN 0036-8075

  11. [11]

    Deep Successor Reinforcement Learning

    Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. Deep Successor Reinforcement Learning . arXiv:1606.02396 [cs, stat], June 2016. URL http://arxiv.org/abs/1606.02396. arXiv: 1606.02396

  12. [12]

    Mattar and Nathaniel D

    Marcelo G. Mattar and Nathaniel D. Daw. Prioritized memory access explains planning and hippocampal replay. Nature Neuroscience, 21 0 (11): 0 1609, November 2018. ISSN 1546-1726. doi:10.1038/s41593-018-0232-z. URL https://www.nature.com/articles/s41593-018-0232-z

  13. [13]

    Dorsal hippocampus contributes to model-based planning

    Kevin J Miller, Matthew M Botvinick, and Carlos D Brody. Dorsal hippocampus contributes to model-based planning. Nature Neuroscience, 20 0 (9): 0 1269--1276, September 2017. ISSN 1097-6256, 1546-1726. doi:10.1038/nn.4613. URL http://www.nature.com/articles/nn.4613

  14. [14]

    Momennejad, E

    I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman. The successor representation in human reinforcement learning. Nature Human Behaviour, 1 0 (9): 0 680, September 2017. ISSN 2397-3374. doi:10.1038/s41562-017-0180-8. URL https://www.nature.com/articles/s41562-017-0180-8

  15. [15]

    Pfeiffer and David J

    Brad E. Pfeiffer and David J. Foster. Hippocampal place-cell sequences depict future paths to remembered goals. Nature, 497 0 (7447): 0 74--79, May 2013. ISSN 1476-4687. doi:10.1038/nature12112. URL https://www.nature.com/articles/nature12112

  16. [16]

    Russek, Ida Momennejad, Matthew M

    Evan M. Russek, Ida Momennejad, Matthew M. Botvinick, Samuel J. Gershman, and Nathaniel D. Daw. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLOS Computational Biology, 13 0 (9): 0 e1005768, September 2017. ISSN 1553-7358. doi:10.1371/journal.pcbi.1005768. URL https://journals.plos.org/ploscompbiol/artic...

  17. [17]

    Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity

    Maneesh Sahani and Peter Dayan. Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity . Neural Computation, 15 0 (10): 0 2255--2279, October 2003. ISSN 0899-7667. doi:10.1162/089976603322362356. URL http://dx.doi.org/10.1162/089976603322362356

  18. [18]

    Stachenfeld, Matthew M

    Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map. Nature Neuroscience, 20 0 (11): 0 1643--1653, November 2017. ISSN 1546-1726. doi:10.1038/nn.4650. URL https://www.nature.com/articles/nn.4650

  19. [19]

    Babayan, Naoshige Uchida, and Samuel J

    Clara Kwon Starkweather, Benedicte M. Babayan, Naoshige Uchida, and Samuel J. Gershman. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci., 20 0 (4): 0 581--589, April 2017. ISSN 1546-1726. doi:10.1038/nn.4520

  20. [20]

    Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion

    Federico Stella, Peter Baracskay, Joseph O Neill, and Jozsef Csicsvari. Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion . Neuron, February 2019. ISSN 0896-6273. doi:10.1016/j.neuron.2019.01.052. URL http://www.sciencedirect.com/science/article/pii/S0896627319300790

  21. [21]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998

  22. [22]

    Flexible and accurate inference and learning for deep generative models

    Eszter V \'e rtes and Maneesh Sahani. Flexible and accurate inference and learning for deep generative models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 4166--4175. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/7671-flexible-a...

  23. [23]

    Wainwright and Michael I

    Martin J. Wainwright and Michael I. Jordan. Graphical Models , Exponential Families , and Variational Inference . Found. Trends Mach. Learn., 1 0 (1-2): 0 1--305, January 2008. ISSN 1935-8237. doi:10.1561/2200000001. URL http://dx.doi.org/10.1561/2200000001

  24. [24]

    Zemel, Peter Dayan, and Alexandre Pouget

    Richard S. Zemel, Peter Dayan, and Alexandre Pouget. Probabilistic interpretation of population codes. Neural computation, 10 0 (2): 0 403--430, 1998