pith. machine review for the scientific record. sign in

arxiv: 2604.08756 · v1 · submitted 2026-04-09 · 💻 cs.AI

Recognition: unknown

Artifacts as Memory Beyond the Agent Boundary

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.AI
keywords reinforcement learningexternal memorysituated cognitionartifactshistory representationRL memoryenvironmental resources
0
0 comments X

The pith

Certain observations called artifacts allow the environment to reduce the information an agent must store internally to represent its history in reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a mathematical framing showing that the environment can function as part of an agent's memory through persistent observations. It defines artifacts as elements in the sensory stream that carry history information and proves they lower the internal representation burden. Experiments demonstrate that agents navigating with visible spatial paths require less memory to reach good policies, with the benefit emerging automatically from what the agent sees. This work translates the situated cognition idea into standard RL terms without requiring special mechanisms. The result suggests agents can sometimes solve tasks by treating surroundings as a form of external storage rather than building everything inside.

Core claim

We introduce a mathematical framing for how the environment can functionally serve as an agent's memory, and prove that certain observations, which we call artifacts, can reduce the information needed to represent history. We corroborate our theory with experiments showing that when agents observe spatial paths, the amount of memory required to learn a performant policy is reduced. Interestingly, this effect arises unintentionally, and implicitly through the agent's sensory stream. We discuss the implications of our findings, and show they satisfy qualitative properties previously used to ground accounts of external memory.

What carries the argument

Artifacts: persistent, observable elements in the environment that carry information about past states or actions, thereby allowing a reduction in the agent's internal history representation.

If this is right

  • Agents that can observe spatial paths learn performant policies with smaller internal memory.
  • The memory reduction occurs automatically through the normal sensory stream without any explicit external-memory design.
  • The framing meets the qualitative tests previously used for accounts of external memory.
  • Environmental features can substitute for some explicit internal memory in RL agents.
  • The approach points toward identifying environments that systematically lower memory demands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Environments could be deliberately shaped with more persistent artifacts to reduce the memory hardware needed for complex RL agents.
  • The same principle might apply to physical robots, where objects left in the workspace serve as memory without constant internal updating.
  • Non-spatial tasks could be tested to check whether artifacts produce similar memory savings outside navigation settings.
  • This framing invites comparisons with how humans use notes or tools to offload cognitive load.

Load-bearing premise

The agent's sensory observations include persistent, observable artifacts whose information content can be treated as reducing the agent's internal history representation without additional assumptions about how those artifacts are generated or maintained.

What would settle it

Run the same path-navigation tasks but block the agent's ability to observe the spatial paths or other artifacts while keeping the task identical; if the theory holds, the memory size needed for equivalent performance must increase.

Figures

Figures reproduced from arXiv: 2604.08756 by Amy Pajak, Esra'a Saleh, Fraser Mince, John D. Martin.

Figure 1
Figure 1. Figure 1: Page Keeping example: Transitions are labeled with probabilities and observations. Actions are omitted for clarity. Interaction starts from s0. A is an artifact of B, and is only observed after s1. Observing A at t = 5 implies that B was observed in the past: specifically at t = 2. Lemma 1. An environment ξ is artifactual if, and only if, for any t > 0 there exist distinct observa￾tions o, o′ , and non-zer… view at source ↗
Figure 2
Figure 2. Figure 2: The base environment used throughout experiments. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Observing a spatial path reduces the necessary capacity to navigate. Averages of total reward for Linear Q-learning (left) and DQN (right) are shown, along with standard-error bars. We find that learning in presence of a shortest path improves performance for nearly every agent and capacity. Many improvements are statistically significant (see Supplement B.2). 0 50 100 150 200 Time Step (x 10 3 ) 0.00 0.01… view at source ↗
Figure 4
Figure 4. Figure 4: Performance improves when agents observe the shortest path: Average reward tends to increase when the shortest path is visible. This can be observed for nearly every capacity of Linear-Q and DQN; it appears most significant for higher capacity systems, but also has a stark affect in the low capacity regime. Averages and standard error regions are computed with 30 seeds. simple policy that indexes on the op… view at source ↗
Figure 5
Figure 5. Figure 5: Environments considered for learning in the presence of other fixed artifacts. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Externalization arises with other fixed artifacts: Average total reward is observed for three paths and one set of geometric landmarks. We find evidence of externalizing memory across all artifacts and with the following number of instances for linear agents: Suboptimal (3), Misleading (2), Random (2), Landmarks (1). For DQN, Suboptimal (2), Landmarks (2), Random (1), Misleading (0). Each bar presents an a… view at source ↗
Figure 7
Figure 7. Figure 7: Externalization with a dynamic path: The left panel illustrates the environment, in which the current policy generates a path vanishing over time. We observe performance uniformly increase over nearly all capacities and Empirical Condition 1 satisfied for C = 256. 5 Discussion Artifactual Environments and Classical Memory. Some readers may wonder how artifactual environments express the basic encode-store-… view at source ↗
Figure 8
Figure 8. Figure 8: Experiment Configuration Details. Hyperparameters used for linear agent (above), DQN agents (middle), dynamic experiments (below). 21 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Environments considered in Experiment 1. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experiment 1. Step-size sweeps. Linear Q-learning (top half), DQN (bottom half). Selected step-size is marked with a star. 22 [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Experiment 1. Significance Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . B.5 Experiment 2. Learning in the Presence of Other Fixed Artifacts 0 16 64 144 256 400 576 Capacity 0 2000 4000 6000 8000 10000 Total Reward No Path Random Path Landmarks Misleading Path … view at source ↗
Figure 12
Figure 12. Figure 12: Capacity vs performance of Linear-Q in the presence of fixed artifacts. Total reward for each artifact type is shown. Each data point presents an average and standard-error from 30 seeds. Capacity ranges from 1 2 to 242 (1 to 576). 23 [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Environments considered in Experiment 2.: Random (top left), Misleading (top right), [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Experiment 2. Step-size sweeps. Linear Q-learning (top half), DQN (lower half). 26 [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Experiment 2. Average Reward. Linear-Q (top half), DQN (bottom half). 27 [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Experiment 2. Linear Significant Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . 28 [PITH_FULL_IMAGE:figures/full_fig_p029_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Experiment 2. DQN Significant Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . B.6 Experiment 3. Learning in the Presence of a Dynamic Path 29 [PITH_FULL_IMAGE:figures/full_fig_p030_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Experiment 3. Dynamic Path Average Reward and Step-size sweeps. Linear-Q. 16 64 256 400 576 Capacity (No Path) 16 64 256 400 Capacity (Dynamic Path) 576 0.02 1.00 1.00 1.00 1.00 0.39 1.00 1.00 1.00 0.08 0.01 0.65 0.00 0.63 0.24 P-values for the Null Hypothesis No Path Dynamic Path p 0.05 (significant) p > 0.05 (not significant) [PITH_FULL_IMAGE:figures/full_fig_p031_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Experiment 3. Linear Significance Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . 30 [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗
read the original abstract

The situated view of cognition holds that intelligent behavior depends not only on internal memory, but on an agent's active use of environmental resources. Here, we begin formalizing this intuition within Reinforcement Learning (RL). We introduce a mathematical framing for how the environment can functionally serve as an agent's memory, and prove that certain observations, which we call artifacts, can reduce the information needed to represent history. We corroborate our theory with experiments showing that when agents observe spatial paths, the amount of memory required to learn a performant policy is reduced. Interestingly, this effect arises unintentionally, and implicitly through the agent's sensory stream. We discuss the implications of our findings, and show they satisfy qualitative properties previously used to ground accounts of external memory. Moving forward, we anticipate further work on this subject could reveal principled ways to exploit the environment as a substitute for explicit internal memory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper introduces a mathematical framing in reinforcement learning for how persistent environmental observations, called artifacts, can functionally serve as an agent's external memory by reducing the information needed to represent history. It claims to prove this reduction and corroborates it with experiments on spatial path navigation where agents observe paths, showing unintentional memory savings through the sensory stream. The work discusses implications and alignment with qualitative properties of external memory accounts.

Significance. If the central proof is rigorous and the experiments properly isolate the artifact mechanism without confounding factors, this provides a formal information-theoretic basis for situated cognition in RL, potentially enabling more efficient agents that leverage environmental persistence as memory. The unintentional emergence in experiments and parameter-free framing are notable strengths that could influence future work on externalized memory in AI systems.

major comments (1)
  1. The abstract and introduction claim a proof that artifacts reduce history representation information, but the specific theorem statement, assumptions on artifact persistence, and derivation steps (likely in the theory section) require explicit verification for rigor; without controls showing the reduction is due to artifacts rather than general observation richness, the central claim risks overgeneralization.
minor comments (3)
  1. Clarify notation for history representation and information measures early in the paper to aid readability.
  2. Add more details on experimental controls, baselines, and statistical significance in the results section to strengthen the empirical corroboration.
  3. Ensure all references to prior work on situated cognition and external memory are complete and accurately cited.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive review and recommendation for minor revision. We address the concern about the rigor of the central claim and the need for isolating controls below, and will update the manuscript accordingly.

read point-by-point responses
  1. Referee: The abstract and introduction claim a proof that artifacts reduce history representation information, but the specific theorem statement, assumptions on artifact persistence, and derivation steps (likely in the theory section) require explicit verification for rigor; without controls showing the reduction is due to artifacts rather than general observation richness, the central claim risks overgeneralization.

    Authors: We agree that the theorem statement, persistence assumptions, and proof steps should be made more prominent and self-contained to avoid any ambiguity. In the revision we will move the full statement of Theorem 1 and its assumptions (artifacts remain fixed in the environment across timesteps unless the agent explicitly modifies them) into the main text, expand the derivation in Section 3 with explicit intermediate steps using the chain rule on conditional mutual information, and add a new paragraph clarifying that the reduction holds only under the persistence condition. We also accept the point on controls: the current experiments compare path observations against no-path baselines, but do not fully isolate persistence from general observation richness. We have therefore run additional ablations in which the observation space is enriched with non-persistent random static features of comparable dimensionality; these yield no comparable memory reduction. The new results and figures will be inserted into Section 5 of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a new formal framing in RL where persistent artifacts in observations are defined to allow reduced internal history representation, then proves an information-theoretic reduction under that definition. The central claim does not reduce to fitted parameters, self-referential definitions, or load-bearing self-citations; the proof targets a general property of observation streams containing persistent elements, corroborated by experiments on spatial paths that demonstrate unintentional memory savings without presupposing the result. No step equates a prediction or theorem to its own inputs by construction, and the weakest assumption (presence of such artifacts) is stated explicitly without smuggling in the target conclusion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on defining artifacts as a new functional category of observations and assuming standard RL observation and history models; no free parameters are mentioned.

axioms (1)
  • domain assumption The agent's observations are generated by an environment that can contain persistent, observable structures carrying historical information.
    Invoked when stating that artifacts reduce the information needed to represent history.
invented entities (1)
  • artifacts no independent evidence
    purpose: Observations that functionally serve as external memory by reducing required internal history representation.
    New term and concept introduced to formalize the situated cognition intuition within RL.

pith-pipeline@v0.9.0 · 5441 in / 1198 out tokens · 50472 ms · 2026-05-10T16:44:51.930686+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 27 canonical work pages · 1 internal anchor

  1. [1]

    On the convergence of bounded agents

    David Abel, Andr \'e Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, and Satinder Singh. On the convergence of bounded agents. arXiv preprint arXiv:2307.11044, 2023

  2. [2]

    Anderson

    John R. Anderson. Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale, NJ, 1993. ISBN 978-0805812343

  3. [3]

    Ronald C. Arkin. Behavior-based Robotics. The MIT Press, Cambridge, MA, 1998

  4. [4]

    The Routledge Handbook of Philosophy of Memory

    Sven Bernecker and Kourken Michaelian (eds.). The Routledge Handbook of Philosophy of Memory. Routledge, New York, 1st edition, 2017. ISBN 9781138909366

  5. [5]

    Bickel and Kjell A

    Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, Volumes I--II Package. Chapman and Hall/CRC, 1st edition, 2015. ISBN 9781498740319

  6. [6]

    Model-Free Episodic Control

    Charles Blundell, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham Ruderman, Joel Z. Leibo, Jack Rae, Daan Wierstra, and Demis Hassabis. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016

  7. [7]

    Boncelet

    Charles G. Boncelet. Image noise models. In Al Bovik (ed.), Handbook of Image and Video Processing, pp.\ 397--409. Academic Press, 2nd edition, 2005. ISBN 9780121197926. doi:10.1016/B978-012119792-6/50087-5

  8. [8]

    Observational learning by reinforcement learning

    Diana Borsa, Nicolas Heess, Bilal Piot, Siqi Liu, Leonard Hasenclever, Remi Munos, and Olivier Pietquin. Observational learning by reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '19, pp.\ 1117–1124, Richland, SC, 2019. International Foundation for Autonomous Agents and Multia...

  9. [9]

    Settling the reward hypothesis

    Michael Bowling, John D Martin, David Abel, and Will Dabney. Settling the reward hypothesis. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.\ 3003--3020. PMLR, 2...

  10. [10]

    Reinforcement learning applied to linear quadratic regulation

    Steven Bradtke. Reinforcement learning applied to linear quadratic regulation. In S. Hanson, J. Cowan, and C. Giles (eds.), Advances in Neural Information Processing Systems, volume 5. Morgan-Kaufmann, 1992

  11. [11]

    Brockett and Daniel Liberzon

    Roger W. Brockett and Daniel Liberzon. Quantized feedback stabilization of linear systems. IEEE Transactions on Automatic Control, 45 0 (7): 0 1279--1289, July 2000. doi:10.1109/9.867021

  12. [12]

    Rodney A. Brooks. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2 0 (1): 0 14--23, 1986. doi:10.1109/JRA.1986.1087032

  13. [13]

    Rodney A. Brooks. Intelligence without representation. Artificial Intelligence, 47 0 (1): 0 139--159, 1991. ISSN 0004-3702. doi:https://doi.org/10.1016/0004-3702(91)90053-M

  14. [14]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 0 1877--1901, 2020

  15. [15]

    Being there: Putting brain, body, and world together again

    Andy Clark. Being there: Putting brain, body, and world together again. MIT press, 1998

  16. [16]

    The extended mind

    Andy Clark and David Chalmers. The extended mind. Analysis, 58 0 (1): 0 7--19, 1998

  17. [17]

    Delchamps

    David F. Delchamps. Stabilizing a linear system with quantized state feedback. IEEE Transactions on Automatic Control, 35 0 (8): 0 916--924, August 1990

  18. [18]

    Simple agent, complex environment: efficient reinforcement learning with agent states

    Shi Dong, Benjamin Van Roy, and Zhengyuan Zhou. Simple agent, complex environment: efficient reinforcement learning with agent states. Journal of Machine Learning Research, 23 0 (1), January 2022. ISSN 1532-4435

  19. [19]

    Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, et al

    Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610 0 (7930): 0 47--53, 2022

  20. [20]

    Gazzaniga (ed.)

    Michael S. Gazzaniga (ed.). The Cognitive Neurosciences. The MIT Press, 9 2009. ISBN 9780262303101. doi:10.7551/mitpress/8029.001.0001

  21. [21]

    Gershman and Nathaniel D

    Samuel J. Gershman and Nathaniel D. Daw. Reinforcement learning and episodic memory in humans and animals: An integrative framework. Annual Review of Psychology, 68 0 (Volume 68, 2017): 0 101--128, 2017. ISSN 1545-2085. doi:https://doi.org/10.1146/annurev-psych-122414-033625

  22. [22]

    La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis et cubitermes sp

    Pierre-Paul Grass \'e . La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis et cubitermes sp. la th \'e orie de la stigmergie: Essai d'interpr \'e tation du comportement des termites constructeurs. Insectes Sociaux, 6 0 (1): 0 41--80, March 1959. ISSN 1420-9098. doi:10.1007/BF02223791

  23. [23]

    Deep recurrent q-learning for partially observable mdps

    Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In AAAI Fall Symposium Series: Sequential Decision Making for Intelligent Agents, pp.\ 29--37, 2015

  24. [24]

    Varieties of artifacts: Embodied, perceptual, cognitive, and affective

    Richard Heersmink. Varieties of artifacts: Embodied, perceptual, cognitive, and affective. Topics in Cognitive Science, 13 0 (4): 0 573--596, 2021. doi:10.1111/tops.12549

  25. [25]

    Stigmergy as a universal coordination mechanism: Components, varieties and applications

    Francis Heylighen. Stigmergy as a universal coordination mechanism: Components, varieties and applications. https://pespmc1.vub.ac.be/Papers/Stigmergy-Springer.pdf, 2015

  26. [26]

    Generalizable episodic memory for deep reinforcement learning

    Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, and Chongjie Zhang. Generalizable episodic memory for deep reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.\ 4380--4390. PMLR, 18--24 Jul 2021

  27. [27]

    Cognition in the Wild

    Edwin Hutchins. Cognition in the Wild. The MIT Press, 02 1995. ISBN 9780262275972

  28. [28]

    Cognitive artifacts

    Edwin Hutchins. Cognitive artifacts. In The MIT Encyclopedia of the Cognitive Sciences , pp.\ 126--127. MIT Press, Cambridge, MA, 2001

  29. [29]

    Universal Artificial Intelligence

    Marcus Hutter. Universal Artificial Intelligence. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, Heidelberg, 1st edition, 2005. ISBN 978-3-540-22139-5. doi:10.1007/b138233

  30. [30]

    Klassen, Phillip Christoffersen, Amir massoud Farahmand, and Sheila A

    Rodrigo Toro Icarte, Richard Valenzano, Toryn Q. Klassen, Phillip Christoffersen, Amir massoud Farahmand, and Sheila A. McIlraith. The act of remembering: a study in partially observable reinforcement learning, 2020

  31. [31]

    Observable operator models for discrete stochastic time series

    Herbert Jaeger. Observable operator models for discrete stochastic time series. Neural Computation, 12 0 (6): 0 1371--1398, 06 2000. ISSN 0899-7667. doi:10.1162/089976600300015411

  32. [32]

    Littman, and Anthony R

    Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 0 (1-2): 0 99--134, 1998. ISSN 0004-3702. doi:https://doi.org/10.1016/S0004-3702(98)00023-X

  33. [33]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020

  34. [34]

    Arbaaz Khan, Clark Zhang, Nikolay Atanasov, Konstantinos Karydis, Vijay Kumar, and Daniel D. Lee. Memory augmented control networks. In International Conference on Learning Representations, 2018

  35. [35]

    Stanley B. Klein. What memory is. WIREs Cognitive Science, 6 0 (1): 0 1--38, 2015. doi:https://doi.org/10.1002/wcs.1333

  36. [36]

    Hippocampal contributions to control: The third way

    M\' a t\' e Lengyel and Peter Dayan. Hippocampal contributions to control: The third way. In J. Platt, D. Koller, Y. Singer, and S. Roweis (eds.), Advances in Neural Information Processing Systems, volume 20, pp.\ 889--896, 2007

  37. [37]

    Self-improving reactive agents based on reinforcement learning, planning and teaching

    Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8 0 (3): 0 293--321, May 1992. ISSN 1573-0565. doi:10.1007/BF00992699

  38. [38]

    Episodic memory deep q-networks

    Zichuan Lin, Tianqi Zhao, Guangwen Yang, and Lintao Zhang. Episodic memory deep q-networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18 , pp.\ 2433--2439. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi:10.24963/ijcai.2018/337

  39. [39]

    Michael L. Littman. Memoryless policies: theoretical limitations and practical results. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior: From Animals to Animats 3: From Animals to Animats 3, SAB94, pp.\ 238–245, Cambridge, MA, USA, 1994. The MIT Press. ISBN 0262531224

  40. [40]

    Littman, Richard S

    Michael L. Littman, Richard S. Sutton, and Satinder Singh. Predictive representations of state. In T. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems, volume 14, pp.\ 1555--1561, 2001

  41. [41]

    Reinforcement Learning for Embedded Agents Facing Complex Tasks

    Mario Mart \' n Mu \ n oz. Reinforcement Learning for Embedded Agents Facing Complex Tasks. Phd thesis, Universitat Polit \`e cnica de Catalunya, Barcelona, Spain, 1998

  42. [42]

    Cognitive integration and the extended mind

    Richard Menary. Cognitive integration and the extended mind. The extended mind, pp.\ 227--243, 2010

  43. [43]

    Is external memory memory? biological memory and extended mind

    Kourken Michaelian. Is external memory memory? biological memory and extended mind. Consciousness and Cognition, 21 0 (3): 0 1154--1165, 2012. ISSN 1053-8100. doi:https://doi.org/10.1016/j.concog.2012.04.008

  44. [44]

    Kourken Michaelian and John Sutton. Memory . In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, S ummer 2017 edition, 2017

  45. [45]

    Nature 518(7540):529–533

    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement l...

  46. [46]

    Unified Theories of Cognition

    Allen Newell. Unified Theories of Cognition. Harvard University Press, Cambridge, MA, 1990. ISBN 9780674920996

  47. [47]

    Control of memory, active perception, and action in minecraft

    Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, and Honglak Lee. Control of memory, active perception, and action in minecraft. In Maria Florina Balcan and Kilian Q. Weinberger (eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp.\ 2790--2799, New York, New York, USA, 2...

  48. [48]

    Empirical design in reinforcement learning

    Andrew Patterson, Samuel Neumann, Martha White, and Adam White. Empirical design in reinforcement learning. Journal of Machine Learning Research, 25 0 (318): 0 1--63, 2024. URL https://jmlr.org/papers/v25/23-0183.html

  49. [49]

    Learning policies with external memory

    Leonid Peshkin, Nicolas Meuleau, and Leslie Pack Kaelbling. Learning policies with external memory. In Proceedings of the 16th International Conference on Machine Learning, ICML '99, pp.\ 307–314, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1558606122

  50. [50]

    Neural episodic control

    Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adri \`a Puigdom \`e nech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, and Charles Blundell. Neural episodic control. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 2827--28...

  51. [51]

    Rao and Michael P

    Anand S. Rao and Michael P. Georgeff. BDI agents: From theory to practice. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), pp.\ 312--319, 1995

  52. [52]

    Slime mold uses an externalized spatial “memory” to navigate in complex environments

    Chris R Reid, Tanya Latty, Audrey Dussutour, and Madeleine Beekman. Slime mold uses an externalized spatial “memory” to navigate in complex environments. Proceedings of the National Academy of Sciences, 109 0 (43): 0 17490--17494, 2012

  53. [53]

    Cognitive stigmergy: Towards a framework based on agents and artifacts

    Alessandro Ricci, Andrea Omicini, Mirko Viroli, Luca Gardelli, and Enrico Oliva. Cognitive stigmergy: Towards a framework based on agents and artifacts. In Danny Weyns, H. Van Dyke Parunak, and Fabien Michel (eds.), Environments for Multi-Agent Systems III, pp.\ 124--140. Springer, 2007. ISBN 978-3-540-71103-2

  54. [54]

    The Analysis of Mind

    Bertrand Russell. The Analysis of Mind. G. Allen & Unwin, London, 1921

  55. [55]

    Prioritized experience replay

    Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. In International Conference on Learning Representations, 2016

  56. [56]

    Mastering the game of go with deep neural networks and tree search

    David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529 0 (7587): 0 484--489, 2016

  57. [57]

    Externalized memory in slime mould and the extended (non-neuronal) mind

    Matthew Sims and Julian Kiverstein. Externalized memory in slime mould and the extended (non-neuronal) mind. Cognitive Systems Research, 73: 0 26--35, 2022. ISSN 1389-0417. doi:https://doi.org/10.1016/j.cogsys.2021.12.001

  58. [58]

    James, and Matthew R

    Satinder Singh, Michael R. James, and Matthew R. Rudary. Predictive state representations: a new theory for modeling dynamical systems. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI '04, pp.\ 512–519, Arlington, Virginia, USA, 2004. AUAI Press. ISBN 0974903906

  59. [59]

    Singh, Tommi Jaakkola, and Michael I

    Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Learning without state-estimation in partially observable markovian decision processes. In William W. Cohen and Haym Hirsh (eds.), Machine Learning Proceedings 1994, pp.\ 284--292. Morgan Kaufmann, San Francisco (CA), 1994. ISBN 978-1-55860-335-6. doi:https://doi.org/10.1016/B978-1-55860-335-6.50042-8

  60. [60]

    Minds: extended or scaffolded? Phenomenology and the Cognitive Sciences, 9 0 (4): 0 465--481, December 2010

    Kim Sterelny. Minds: extended or scaffolded? Phenomenology and the Cognitive Sciences, 9 0 (4): 0 465--481, December 2010. ISSN 1572-8676. doi:10.1007/s11097-010-9174-y

  61. [61]

    From nonlinearity to optimality: pheromone trail foraging by ants

    David J.T Sumpter and Madeleine Beekman. From nonlinearity to optimality: pheromone trail foraging by ants. Animal Behaviour, 66 0 (2): 0 273--280, 2003. ISSN 0003-3472. doi:https://doi.org/10.1006/anbe.2003.2224

  62. [62]

    Constructive memory and distributed cognition: Towards an interdisciplinary framework

    John Sutton. Constructive memory and distributed cognition: Towards an interdisciplinary framework. In B. Kokinov and W. Hirst (eds.), Constructive Memory, pp.\ 290--303. New Bulgarian University, 2003

  63. [63]

    Richard S. Sutton. The bitter lesson. Incomplete Ideas (blog), 2019. URL http://www.incompleteideas.net/IncIdeas/BitterLesson.html

  64. [64]

    Richard S. Sutton. The quest for a common model of the intelligent decision maker. In Proceedings of the 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022), Providence, Rhode Island, USA, 2022

  65. [65]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 2nd edition, 2018

  66. [66]

    Memory allocation in resource-constrained reinforcement learning

    Massimiliano Tamborski and David Abel. Memory allocation in resource-constrained reinforcement learning. arXiv preprint arXiv:2506.17263, 2025

  67. [67]

    Thierry, G

    B. Thierry, G. Theraulaz, J.Y. Gautier, and B. Stiegler. Joint memory. Behavioural Processes, 35 0 (1): 0 127--140, 1995. ISSN 0376-6357. doi:https://doi.org/10.1016/0376-6357(95)00039-9. Cognition and Evolution

  68. [68]

    Episodic and semantic memory

    Endel Tulving. Episodic and semantic memory. In Organization of Memory, pp.\ 381--403. Academic Press, London, UK, 1972

  69. [69]

    Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8 0 (3--4): 0 279--292, May 1992. ISSN 1573-0565. doi:10.1007/BF00992698

  70. [70]

    Edward O. Wilson. Chemical communication among workers of the fire ant solenopsis saevissima (fr. smith) 1. the organization of mass-foraging. Animal Behaviour, 10 0 (1): 0 134--147, 1962. ISSN 0003-3472. doi:https://doi.org/10.1016/0003-3472(62)90141-0

  71. [71]

    Episodic reinforcement learning with associative memory

    Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, and Chongjie Zhang. Episodic reinforcement learning with associative memory. In International Conference on Learning Representations, 2020

  72. [72]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...