pith. sign in

arxiv: 2605.27929 · v1 · pith:2DD4PFBDnew · submitted 2026-05-27 · 🧬 q-bio.NC · cs.LG

Exploratory Experience Shapes the Geometry of Predictive Representations

Pith reviewed 2026-06-29 09:36 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG
keywords explorationexploitationpredictive codingrepresentational geometrymaze navigationlatent spacemouse behavioractive sensing
0
0 comments X

The pith

Exploratory behavior produces more spatially organized predictive representations that preserve maze transitions in both agents and mice.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how the balance between exploration and exploitation in behavior shapes the geometry of internal predictive representations. It introduces an artificial agent whose controllable parameter switches between information-gain-driven exploration and reward-driven exploitation while updating a predictive-coding model of future states. Exploratory regimes yield latent representations organized around spatial locations and transition structure, whereas exploitative regimes produce less organized ones. When the identical model is trained on real mouse trajectories through the same maze, animals with broader visitation patterns generate geometries that align with those of exploratory agents, while restricted patterns align with exploitative ones. This establishes a direct link between behavioral regime and the structure of learned predictive models.

Core claim

Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. More exploratory mice show representational geometries that closely match those of exploratory agents.

What carries the argument

Predictive-coding perception model updated from the agent's own trajectories, predicting future maze states and reward probability, with a controllable parameter selecting actions by expected information gain during exploration or by predicted reward during exploitation.

Load-bearing premise

That training the same predictive-coding model on mouse trajectories produces representations directly comparable to those from the artificial agent, with differences in mouse visitation patterns reflecting the same exploratory versus exploitative regimes as the agent's parameter.

What would settle it

If training the model on trajectories from the most exploratory mice produces latent geometries that fail to match the spatial organization and transition preservation seen in exploratory agents.

Figures

Figures reproduced from arXiv: 2605.27929 by Abdelrahman Sharafeldin, Advay Balakrishnan, Hannah Choi, Kseniia Shilova.

Figure 1
Figure 1. Figure 1: Predictive-coding agent with exploration-exploitation switching. Top: Overview of the action-perception loop. At each step, the agent evaluates locally valid actions using its current predictive model. In the exploratory regime, actions are sampled according to expected information gain (EIG); in the reward-driven regime, actions are selected using a value map constructed from learned reward predictions an… view at source ↗
Figure 2
Figure 2. Figure 2: Behavioral consequences of exploration-exploitation balance. A: Example trajectories of an exploratory agent and a more reward-driven agent. The reward-driven agent visits the water port more often, whereas the exploratory agent samples a broader range of maze branches. B: Evolution of the learned reward map over training. The reward map is constructed from recent experience by averaging predicted reward a… view at source ↗
Figure 3
Figure 3. Figure 3: Exploration shapes the geometry of predictive latent representations in agents and mice. A: UMAP visualizations of prior latent states for exploratory and reward-driven agents, and for two example mice. Exploratory agents develop depth-aligned, branched latent spaces with clearer separation between outward (root-to-leaf) and inward (leaf-to-root) transitions. More reward-driven agents show less organized l… view at source ↗
Figure 4
Figure 4. Figure 4: Additional task and mouse-trajectory visualizations. A: Binary-tree maze used in the task. Nodes are colored and labeled by index; node 116 is the fixed water-port node. B: Example trajectories from exploratory mouse B5 and more reward-focused mouse C8. Color indicates normalized step within the plotted bouts. C8 repeatedly follows trajectories toward the water-port, whereas B5 samples the maze more broadl… view at source ↗
Figure 5
Figure 5. Figure 5: Additional analyses of reward-map learning and behavioral regimes. A: Transforma￾tion of the learned reward map into a value map. B: Example learned reward maps from individual agents, shown without averaging across random seeds. C: Relationship between reward-driven switching probability and behavioral regime. Top: Duration of reward-driven episodes as a function of the switching probability p. Bottom: Ex… view at source ↗
Figure 6
Figure 6. Figure 6: Single-unit spatial and path-transition tuning. A: Selected units from the recurrent state ht show spatial tuning across the maze. Examples include units tuned to broad maze regions, smaller subregions, terminal leaves, and specific paths. Similar tuning patterns are observed in a long-trained agent and in a combined-mouse model. B: Path-transition tuning of ht units along an example trajectory. Columns co… view at source ↗
read the original abstract

Active sensing links behavior and learning through an action-perception loop: actions determine the observations used to update internal predictive models of perception, which subsequently guide the next actions. Predictive-coding frameworks provide a natural way to model this process, since internal representations are continuously updated to predict future observations. Here, we ask how exploratory and exploitative behavioral strategies shape these internal predictive representations. We build an online learning agent in a tree-like maze with a controllable parameter regulating the balance between exploratory and exploitative regimes. The agent updates a predictive-coding-based perception model from experience generated by its own behavior. The model predicts both future maze states and reward probability, allowing the agent to select actions either by expected information gain during exploration or by predicted reward during exploitation. We show that the resulting internal predictive representations depend strongly on the agent's behavioral regime. Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. We then train this predictive model on natural trajectories of water-deprived mice navigating the same maze and compare the resulting representations with those learned from agent trajectories. More exploratory mice show representational geometries that closely match those of exploratory agents, whereas mice with more restricted visitation patterns resemble reward-driven, exploitative agents. Together, these findings suggest that exploration enables predictive models to form generalized internal representations by organizing latent space around both spatial location and transition context in artificial agents and animals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that a predictive-coding agent in a tree-like maze develops more spatially organized latent representations that better preserve transition structure when its behavior is biased toward exploration (via a controllable balance parameter) rather than exploitation; the same model trained on water-deprived mouse trajectories yields geometries that align with the exploratory-agent regime for mice showing broader visitation and with the exploitative regime for mice showing restricted visitation, suggesting that exploratory experience shapes generalized internal predictive representations in both artificial agents and animals.

Significance. If the central comparison holds after controlling for experienced transition statistics, the result would link behavioral regime directly to the geometry of predictive representations and provide a concrete, testable bridge between controllable artificial agents and biological data.

major comments (1)
  1. [Results on mouse–agent representational comparison] The central mouse–agent alignment claim (abstract and Results on mouse trajectories) rests on the assumption that post-hoc partitioning of mice by visitation metric isolates the same exploratory/exploitative regime as the agent’s controllable balance parameter. The abstract gives no indication that training data volume, visit counts, or empirical transition matrices were equalized across conditions before geometry comparison; if the reported metrics (spatial organization, transition preservation) are sensitive to the distribution of experienced transitions rather than the regime per se, the alignment could be an artifact of unequal coverage.
minor comments (2)
  1. [Methods] Clarify the exact definition and units of the “balance parameter” and how it is held fixed versus varied across agent runs.
  2. [Results] Specify the precise geometry metrics (e.g., which distance or correlation measure quantifies “spatial organization” and “transition preservation”) and report effect sizes with confidence intervals.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight an important methodological consideration for the mouse–agent comparison. We address the concern point-by-point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Results on mouse–agent representational comparison] The central mouse–agent alignment claim (abstract and Results on mouse trajectories) rests on the assumption that post-hoc partitioning of mice by visitation metric isolates the same exploratory/exploitative regime as the agent’s controllable balance parameter. The abstract gives no indication that training data volume, visit counts, or empirical transition matrices were equalized across conditions before geometry comparison; if the reported metrics (spatial organization, transition preservation) are sensitive to the distribution of experienced transitions rather than the regime per se, the alignment could be an artifact of unequal coverage.

    Authors: We agree that equalizing experienced transition statistics is essential to isolate the effect of behavioral regime from coverage differences. The visitation metric used for partitioning directly operationalizes the regime (broader vs. restricted exploration), and the agent’s controllable parameter produces analogous differences in coverage. However, the referee correctly notes that the abstract does not explicitly state controls for data volume or transition matrices. In the revision we will: (1) report visit counts, trajectory numbers, and transition-matrix statistics for each mouse group in the abstract and Results; (2) add a supplementary analysis that subsamples mouse trajectories to match the empirical transition distributions and total visit counts of the agent conditions as closely as possible, then recompute the geometry metrics (spatial organization and transition preservation); and (3) verify whether the mouse–agent alignment persists under these matched conditions. If the alignment remains, it supports that regime per se shapes the representations; if not, we will qualify the claim accordingly. This control directly addresses the potential artifact raised. revision: yes

Circularity Check

0 steps flagged

No circularity: representations shaped by regime-specific trajectories; mouse comparison is external validation

full rationale

The paper constructs an agent whose behavior parameter controls trajectory statistics, then trains the same predictive model on those trajectories and measures geometry metrics on the resulting latents. This is an empirical demonstration that different input distributions produce different geometries, not a self-definitional loop or a fitted parameter renamed as prediction. The mouse analysis partitions real trajectories post-hoc by a visitation metric and applies the identical model; no equations or self-citations are shown that would make the geometry metrics reduce to the regime parameter by construction. The derivation chain is self-contained against the external mouse data and does not rely on load-bearing self-citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the predictive-coding framework as a model of the action-perception loop and on the assumption that the same model architecture can be trained on both synthetic and biological trajectories.

free parameters (1)
  • balance parameter between exploration and exploitation
    Controllable parameter regulating the balance between exploratory and exploitative regimes in the agent.
axioms (1)
  • domain assumption Predictive-coding frameworks provide a natural way to model the action-perception loop
    Stated directly in the abstract as the modeling basis.

pith-pipeline@v0.9.1-grok · 5798 in / 1182 out tokens · 34793 ms · 2026-06-29T09:36:37.046954+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 29 canonical work pages

  1. [1]

    The free-energy principle: A unified brain theory?Nature Reviews Neuroscience, 11(2): 127–138, 2010

    Karl Friston. The free-energy principle: a unified brain theory?Nature Reviews Neuroscience, 11(2): 127–138, February 2010. ISSN 1471-0048. doi: 10.1038/nrn2787. URL https://doi.org/10.1038/ nrn2787

  2. [2]

    Little and Friedrich T

    Daniel Y . Little and Friedrich T. Sommer. Learning and exploration in action-perception loops.Frontiers in Neural Circuits, V olume 7 - 2013, 2013. ISSN 1662-5110. doi: 10.3389/fncir.2013.00037. URLhttps:// www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir.2013.00037

  3. [3]

    Wolpert, and Máté Lengyel

    Scott Cheng-Hsin Yang, Daniel M Wolpert, and Máté Lengyel. Theoretical perspectives on active sensing.Current Opinion in Behavioral Sciences, 11:100–108, 2016. ISSN 2352-1546. doi: https:// doi.org/10.1016/j.cobeha.2016.06.009. URL https://www.sciencedirect.com/science/article/ pii/S2352154616301255. Computational modeling

  4. [4]

    The active inference approach to ecological perception: General information dynamics for natural and artificial embodied cognition

    Adam Linson, Andy Clark, Subramanian Ramamoorthy, and Karl Friston. The active inference approach to ecological perception: General information dynamics for natural and artificial embodied cognition. Frontiers in Robotics and AI, V olume 5 - 2018, 2018. ISSN 2296-9144. doi: 10.3389/frobt.2018.00021. URL https://www.frontiersin.org/journals/robotics-and-ai...

  5. [5]

    Cognitive maps in rats and men,

    Edward C Tolman. Cognitive maps in rats and men.Psychological Review, 55(4):189–208, 1948. doi: 10.1037/h0061626. URLhttps://psycnet.apa.org/record/1949-00103-001

  6. [6]

    Clarendon Press, Oxford, UK,

    John O’Keefe and Lynn Nadel.The Hippocampus as a Cognitive Map. Clarendon Press, Oxford, UK,

  7. [7]

    Rikhye, Nishad Gothoskar, J

    Dileep George, Rajeev V . Rikhye, Nishad Gothoskar, J. Swaroop Guntupalli, Antoine Dedieu, and Miguel Lázaro-Gredilla. Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps.Nature Communications, 12(1):2392, 2021. doi: 10.1038/s41467-021-22559-5. URL https://www.nature.com/articles/s41467-021-22559-5

  8. [8]

    Swaroop Guntupalli, Guangyao Zhou, Carter Wendelken, Miguel Lázaro- Gredilla, and Dileep George

    Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Guangyao Zhou, Carter Wendelken, Miguel Lázaro- Gredilla, and Dileep George. Space is a latent sequence: A theory of the hippocampus.Science Advances, 10(31):eadm8470, 2024. doi: 10.1126/sciadv.adm8470. URL https://www.science.org/doi/abs/ 10.1126/sciadv.adm8470

  9. [9]

    Sequential predictive learning is a unifying theory for hippocampal representation and replay.bioRxiv, 2024

    Daniel Levenstein, Aleksei Efremov, Roy Henha Eyono, Adrien Peyrache, and Blake Richards. Sequential predictive learning is a unifying theory for hippocampal representation and replay.bioRxiv, 2024. doi: 10.1101/2024.04.28.591528. URL https://www.biorxiv.org/content/early/2024/04/29/2024. 04.28.591528

  10. [10]

    Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration.eLife, 10:e66175, jul 2021

    Matthew Rosenberg, Tony Zhang, Pietro Perona, and Markus Meister. Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration.eLife, 10:e66175, jul 2021. ISSN 2050-084X. doi: 10.7554/eLife.66175. URLhttps://doi.org/10.7554/eLife.66175

  11. [11]

    Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience, 2(1):79–87, January 1999. doi: 10.1038/4580

  12. [12]

    A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456):815–836, April 2005

    Karl Friston. A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456):815–836, April 2005. doi: 10.1098/rstb.2005.1622. 10

  13. [13]

    Active sensing with predictive coding and uncertainty minimization.Patterns, 5(6):100983, June 2024

    Abdelrahman Sharafeldin, Nabil Imam, and Hannah Choi. Active sensing with predictive coding and uncertainty minimization.Patterns, 5(6):100983, June 2024. ISSN 2666-3899. doi: 10.1016/j.patter.2024. 100983. URLhttps://doi.org/10.1016/j.patter.2024.100983

  14. [14]

    Infomax control of eye movements.IEEE Transactions on Autonomous Mental Development, 2(2):91–107, 2010

    Nicholas J Butko and Javier R Movellan. Infomax control of eye movements.IEEE Transactions on Autonomous Mental Development, 2(2):91–107, 2010

  15. [15]

    Rajesh P. N. Rao, Dimitrios C. Gklezakos, and Vishwas Sathish. Active predictive coding: A unifying neural model for active perception, compositional learning, and hierarchical planning.Neural Computation, 36(1):1–32, 12 2023. ISSN 0899-7667. doi: 10.1162/neco_a_01627. URL https://doi.org/10.1162/ neco_a_01627

  16. [16]

    The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects.Fron- tiers in Psychology, V olume 4 - 2013, 2013

    Karl Friston, Rick Adams, Laurent Perrinet, and Michael Breakspear. Perceptions as hypotheses: Saccades as experiments.Frontiers in Psychology, V olume 3 - 2012, 2012. ISSN 1664-1078. doi: 10.3389/fpsyg. 2012.00151. URL https://www.frontiersin.org/journals/psychology/articles/10.3389/ fpsyg.2012.00151

  17. [17]

    Seth, and Christopher L

    Alexander Tschantz, Beren Millidge, Anil K. Seth, and Christopher L. Buckley. Reinforcement learning through active inference. 2020. URLhttps://arxiv.org/abs/2002.12636

  18. [18]

    Ashwood, Nicholas A

    Zoe C. Ashwood, Nicholas A. Roy, Iris R. Stone, Anne E. Urai, Anne K. Churchland, Alexandre Pouget, Jonathan W. Pillow, and The International Brain Laboratory. Mice alternate between discrete strategies during perceptual decision-making.Nature Neuroscience, 25(2):201–212, February 2022. ISSN 1546-1726. doi: 10.1038/s41593-021-01007-z. URLhttps://doi.org/1...

  19. [19]

    Multi-intention inverse q-learning for interpretable behavior representation.Transactions on Machine Learning Research, 2024

    Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, and Joschka Boedecker. Multi-intention inverse q-learning for interpretable behavior representation.Transactions on Machine Learning Research, 2024. URLhttps://openreview.net/forum?id=hrKHkmLUFk

  20. [20]

    Markowitz, and Anqi Wu

    Jingyang Ke, Feiyang Wu, Jiyi Wang, Jeffrey E. Markowitz, and Anqi Wu. Inverse reinforcement learning with switching rewards and history dependency for characterizing animal behaviors. InProceedings of the 42nd International Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=yUxVZBYaQA. ICML 2025 poster

  21. [21]

    Improving generalization for temporal difference learning: The successor representation

    Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993. doi: 10.1162/neco.1993.5.4.613

  22. [22]

    Stachenfeld, Matthew M

    Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map.Nature Neuroscience, 20(11):1643–1653, November 2017. ISSN 1546-1726. doi: 10.1038/nn.4650. URLhttps://doi.org/10.1038/nn.4650

  23. [23]

    Predictive learning as a network mechanism for extracting low-dimensional latent space representations.Nature Communications, 12(1):1417, March 2021

    Stefano Recanatesi, Matthew Farrell, Guillaume Lajoie, Sophie Deneve, Mattia Rigotti, and Eric Shea-Brown. Predictive learning as a network mechanism for extracting low-dimensional latent space representations.Nature Communications, 12(1):1417, March 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-21696-1. URLhttps://doi.org/10.1038/s41467-021-21696-1

  24. [24]

    Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski, Alexander Pritzel, Martin J. Chadwick, Thomas Degris, Joseph Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, Stephen Gaffney, Helen King, Koray Kavukcuoglu...

  25. [25]

    Whittington, Timothy H

    James C.R. Whittington, Timothy H. Muller, Shirley Mark, Guifen Chen, Caswell Barry, Neil Burgess, and Timothy E.J. Behrens. The tolman-eichenbaum machine: Unifying space and relational memory through generalization in the hippocampal formation.Cell, 183(5):1249–1263.e23, 2020. ISSN 0092-8674. doi: https://doi.org/10.1016/j.cell.2020.10.024. URL https://w...

  26. [26]

    Di Tullio, Spencer Rooke, and Vijay Balasubramanian

    Zhaoze Wang, Ronald W. Di Tullio, Spencer Rooke, and Vijay Balasubramanian. Time makes space: Emergence of place fields in networks encoding temporally continuous sensory experiences.bioRxiv, page 2024.08.11.607484, July 2025. doi: 10.1101/2024.08.11.607484. Preprint

  27. [27]

    Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure.Nature Communications, 15(1):687, January 2024

    Ian Cone and Claudia Clopath. Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure.Nature Communications, 15(1):687, January 2024. ISSN 2041-

  28. [28]

    URLhttps://doi.org/10.1038/s41467-024-44871-6

    doi: 10.1038/s41467-024-44871-6. URLhttps://doi.org/10.1038/s41467-024-44871-6. 11

  29. [29]

    Predictive sequence learning in the hippocampal formation.Neuron, 112(15):2645–2658.e4, 2024

    Yusi Chen, Huanqiu Zhang, Mia Cameron, and Terrence Sejnowski. Predictive sequence learning in the hippocampal formation.Neuron, 112(15):2645–2658.e4, 2024. ISSN 0896-6273. doi: https:// doi.org/10.1016/j.neuron.2024.05.024. URL https://www.sciencedirect.com/science/article/ pii/S0896627324003714

  30. [30]

    The reorganization and reactivation of hippocampal maps predict spatial memory performance.Nature Neuroscience, 13(8): 995–1002, August 2010

    David Dupret, Joseph O’Neill, Barty Pleydell-Bouverie, and Jozsef Csicsvari. The reorganization and reactivation of hippocampal maps predict spatial memory performance.Nature Neuroscience, 13(8): 995–1002, August 2010. doi: 10.1038/nn.2599

  31. [31]

    Dylan Rich, Albert K

    Huanqiu Zhang, P. Dylan Rich, Albert K. Lee, and Tatyana O. Sharpee. Hippocampal spatial representations exhibit a hyperbolic geometry that expands with experience.Nature Neuroscience, 26(1):131–139, January 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01212-4. URL https://doi.org/10.1038/ s41593-022-01212-4

  32. [32]

    Zhang, Jonathan P

    Wei Guo, Jie J. Zhang, Jonathan P. Newman, and Matthew A. Wilson. Latent learning drives sleep- dependent plasticity in distinct ca1 subpopulations.Cell Reports, 43(12):115028, December 2024. doi: 10.1016/j.celrep.2024.115028

  33. [33]

    Nieh, Manuel Schottdorf, Nicolas W

    Edward H. Nieh, Manuel Schottdorf, Nicolas W. Freeman, Ryan J. Low, Sam Lewallen, Sue Ann Koay, Lucas Pinto, Jeffrey L. Gauthier, Carlos D. Brody, and David W. Tank. Geometry of abstract learned knowledge in the hippocampus.Nature, 595(7865):80–84, July 2021. ISSN 1476-4687. doi: 10.1038/ s41586-021-03652-7. URLhttps://doi.org/10.1038/s41586-021-03652-7

  34. [34]

    Distinct manifold encoding of navigational information in the subiculum and hippocampus.Science Advances, 10(5):eadi4471, 2024

    Shinya Nakai, Takuma Kitanishi, and Kenji Mizuseki. Distinct manifold encoding of navigational information in the subiculum and hippocampus.Science Advances, 10(5):eadi4471, 2024. doi: 10.1126/ sciadv.adi4471. URLhttps://www.science.org/doi/abs/10.1126/sciadv.adi4471. A Methods A.1 Maze environment The maze is represented as a full binary tree with N= 127...