arxiv: 2604.08756 · v1 · submitted 2026-04-09 · 💻 cs.AI

Recognition: unknown

Artifacts as Memory Beyond the Agent Boundary

John D. Martin , Fraser Mince , Esra'a Saleh , Amy Pajak

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.AI

keywords reinforcement learningexternal memorysituated cognitionartifactshistory representationRL memoryenvironmental resources

0 comments

The pith

Certain observations called artifacts allow the environment to reduce the information an agent must store internally to represent its history in reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a mathematical framing showing that the environment can function as part of an agent's memory through persistent observations. It defines artifacts as elements in the sensory stream that carry history information and proves they lower the internal representation burden. Experiments demonstrate that agents navigating with visible spatial paths require less memory to reach good policies, with the benefit emerging automatically from what the agent sees. This work translates the situated cognition idea into standard RL terms without requiring special mechanisms. The result suggests agents can sometimes solve tasks by treating surroundings as a form of external storage rather than building everything inside.

Core claim

We introduce a mathematical framing for how the environment can functionally serve as an agent's memory, and prove that certain observations, which we call artifacts, can reduce the information needed to represent history. We corroborate our theory with experiments showing that when agents observe spatial paths, the amount of memory required to learn a performant policy is reduced. Interestingly, this effect arises unintentionally, and implicitly through the agent's sensory stream. We discuss the implications of our findings, and show they satisfy qualitative properties previously used to ground accounts of external memory.

What carries the argument

Artifacts: persistent, observable elements in the environment that carry information about past states or actions, thereby allowing a reduction in the agent's internal history representation.

If this is right

Agents that can observe spatial paths learn performant policies with smaller internal memory.
The memory reduction occurs automatically through the normal sensory stream without any explicit external-memory design.
The framing meets the qualitative tests previously used for accounts of external memory.
Environmental features can substitute for some explicit internal memory in RL agents.
The approach points toward identifying environments that systematically lower memory demands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Environments could be deliberately shaped with more persistent artifacts to reduce the memory hardware needed for complex RL agents.
The same principle might apply to physical robots, where objects left in the workspace serve as memory without constant internal updating.
Non-spatial tasks could be tested to check whether artifacts produce similar memory savings outside navigation settings.
This framing invites comparisons with how humans use notes or tools to offload cognitive load.

Load-bearing premise

The agent's sensory observations include persistent, observable artifacts whose information content can be treated as reducing the agent's internal history representation without additional assumptions about how those artifacts are generated or maintained.

What would settle it

Run the same path-navigation tasks but block the agent's ability to observe the spatial paths or other artifacts while keeping the task identical; if the theory holds, the memory size needed for equivalent performance must increase.

Figures

Figures reproduced from arXiv: 2604.08756 by Amy Pajak, Esra'a Saleh, Fraser Mince, John D. Martin.

**Figure 1.** Figure 1: Page Keeping example: Transitions are labeled with probabilities and observations. Actions are omitted for clarity. Interaction starts from s0. A is an artifact of B, and is only observed after s1. Observing A at t = 5 implies that B was observed in the past: specifically at t = 2. Lemma 1. An environment ξ is artifactual if, and only if, for any t > 0 there exist distinct observations o, o′ , and non-zer… view at source ↗

**Figure 2.** Figure 2: The base environment used throughout experiments. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Observing a spatial path reduces the necessary capacity to navigate. Averages of total reward for Linear Q-learning (left) and DQN (right) are shown, along with standard-error bars. We find that learning in presence of a shortest path improves performance for nearly every agent and capacity. Many improvements are statistically significant (see Supplement B.2). 0 50 100 150 200 Time Step (x 10 3 ) 0.00 0.01… view at source ↗

**Figure 4.** Figure 4: Performance improves when agents observe the shortest path: Average reward tends to increase when the shortest path is visible. This can be observed for nearly every capacity of Linear-Q and DQN; it appears most significant for higher capacity systems, but also has a stark affect in the low capacity regime. Averages and standard error regions are computed with 30 seeds. simple policy that indexes on the op… view at source ↗

**Figure 5.** Figure 5: Environments considered for learning in the presence of other fixed artifacts. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Externalization arises with other fixed artifacts: Average total reward is observed for three paths and one set of geometric landmarks. We find evidence of externalizing memory across all artifacts and with the following number of instances for linear agents: Suboptimal (3), Misleading (2), Random (2), Landmarks (1). For DQN, Suboptimal (2), Landmarks (2), Random (1), Misleading (0). Each bar presents an a… view at source ↗

**Figure 7.** Figure 7: Externalization with a dynamic path: The left panel illustrates the environment, in which the current policy generates a path vanishing over time. We observe performance uniformly increase over nearly all capacities and Empirical Condition 1 satisfied for C = 256. 5 Discussion Artifactual Environments and Classical Memory. Some readers may wonder how artifactual environments express the basic encode-store-… view at source ↗

**Figure 8.** Figure 8: Experiment Configuration Details. Hyperparameters used for linear agent (above), DQN agents (middle), dynamic experiments (below). 21 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Environments considered in Experiment 1. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Experiment 1. Step-size sweeps. Linear Q-learning (top half), DQN (bottom half). Selected step-size is marked with a star. 22 [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: Experiment 1. Significance Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . B.5 Experiment 2. Learning in the Presence of Other Fixed Artifacts 0 16 64 144 256 400 576 Capacity 0 2000 4000 6000 8000 10000 Total Reward No Path Random Path Landmarks Misleading Path … view at source ↗

**Figure 12.** Figure 12: Capacity vs performance of Linear-Q in the presence of fixed artifacts. Total reward for each artifact type is shown. Each data point presents an average and standard-error from 30 seeds. Capacity ranges from 1 2 to 242 (1 to 576). 23 [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Environments considered in Experiment 2.: Random (top left), Misleading (top right), [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 14.** Figure 14: Experiment 2. Step-size sweeps. Linear Q-learning (top half), DQN (lower half). 26 [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

**Figure 15.** Figure 15: Experiment 2. Average Reward. Linear-Q (top half), DQN (bottom half). 27 [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Experiment 2. Linear Significant Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . 28 [PITH_FULL_IMAGE:figures/full_fig_p029_16.png] view at source ↗

**Figure 17.** Figure 17: Experiment 2. DQN Significant Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . B.6 Experiment 3. Learning in the Presence of a Dynamic Path 29 [PITH_FULL_IMAGE:figures/full_fig_p030_17.png] view at source ↗

**Figure 18.** Figure 18: Experiment 3. Dynamic Path Average Reward and Step-size sweeps. Linear-Q. 16 64 256 400 576 Capacity (No Path) 16 64 256 400 Capacity (Dynamic Path) 576 0.02 1.00 1.00 1.00 1.00 0.39 1.00 1.00 1.00 0.08 0.01 0.65 0.00 0.63 0.24 P-values for the Null Hypothesis No Path Dynamic Path p 0.05 (significant) p > 0.05 (not significant) [PITH_FULL_IMAGE:figures/full_fig_p031_18.png] view at source ↗

**Figure 19.** Figure 19: Experiment 3. Linear Significance Tests. Let Pi and Pj be the performances associated with capacities of the row and column. The plot should be read row-wise: when the (i, j)-cell is green, Pi is significantly higher than Pj . 30 [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗

read the original abstract

The situated view of cognition holds that intelligent behavior depends not only on internal memory, but on an agent's active use of environmental resources. Here, we begin formalizing this intuition within Reinforcement Learning (RL). We introduce a mathematical framing for how the environment can functionally serve as an agent's memory, and prove that certain observations, which we call artifacts, can reduce the information needed to represent history. We corroborate our theory with experiments showing that when agents observe spatial paths, the amount of memory required to learn a performant policy is reduced. Interestingly, this effect arises unintentionally, and implicitly through the agent's sensory stream. We discuss the implications of our findings, and show they satisfy qualitative properties previously used to ground accounts of external memory. Moving forward, we anticipate further work on this subject could reveal principled ways to exploit the environment as a substitute for explicit internal memory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes artifacts as external memory in RL with a proof and spatial experiments, but the details on generality and controls are still thin.

read the letter

This paper takes the situated cognition view and makes it precise in RL by defining artifacts as persistent observations that can substitute for internal history. They prove these artifacts cut the information an agent needs to track, and they show experiments where agents navigating paths require less memory when the trails stay visible in observations. The savings appear without any explicit design to exploit them, which is the part that feels most practical.

Referee Report

1 major / 3 minor

Summary. The paper introduces a mathematical framing in reinforcement learning for how persistent environmental observations, called artifacts, can functionally serve as an agent's external memory by reducing the information needed to represent history. It claims to prove this reduction and corroborates it with experiments on spatial path navigation where agents observe paths, showing unintentional memory savings through the sensory stream. The work discusses implications and alignment with qualitative properties of external memory accounts.

Significance. If the central proof is rigorous and the experiments properly isolate the artifact mechanism without confounding factors, this provides a formal information-theoretic basis for situated cognition in RL, potentially enabling more efficient agents that leverage environmental persistence as memory. The unintentional emergence in experiments and parameter-free framing are notable strengths that could influence future work on externalized memory in AI systems.

major comments (1)

The abstract and introduction claim a proof that artifacts reduce history representation information, but the specific theorem statement, assumptions on artifact persistence, and derivation steps (likely in the theory section) require explicit verification for rigor; without controls showing the reduction is due to artifacts rather than general observation richness, the central claim risks overgeneralization.

minor comments (3)

Clarify notation for history representation and information measures early in the paper to aid readability.
Add more details on experimental controls, baselines, and statistical significance in the results section to strengthen the empirical corroboration.
Ensure all references to prior work on situated cognition and external memory are complete and accurately cited.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive review and recommendation for minor revision. We address the concern about the rigor of the central claim and the need for isolating controls below, and will update the manuscript accordingly.

read point-by-point responses

Referee: The abstract and introduction claim a proof that artifacts reduce history representation information, but the specific theorem statement, assumptions on artifact persistence, and derivation steps (likely in the theory section) require explicit verification for rigor; without controls showing the reduction is due to artifacts rather than general observation richness, the central claim risks overgeneralization.

Authors: We agree that the theorem statement, persistence assumptions, and proof steps should be made more prominent and self-contained to avoid any ambiguity. In the revision we will move the full statement of Theorem 1 and its assumptions (artifacts remain fixed in the environment across timesteps unless the agent explicitly modifies them) into the main text, expand the derivation in Section 3 with explicit intermediate steps using the chain rule on conditional mutual information, and add a new paragraph clarifying that the reduction holds only under the persistence condition. We also accept the point on controls: the current experiments compare path observations against no-path baselines, but do not fully isolate persistence from general observation richness. We have therefore run additional ablations in which the observation space is enriched with non-persistent random static features of comparable dimensionality; these yield no comparable memory reduction. The new results and figures will be inserted into Section 5 of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a new formal framing in RL where persistent artifacts in observations are defined to allow reduced internal history representation, then proves an information-theoretic reduction under that definition. The central claim does not reduce to fitted parameters, self-referential definitions, or load-bearing self-citations; the proof targets a general property of observation streams containing persistent elements, corroborated by experiments on spatial paths that demonstrate unintentional memory savings without presupposing the result. No step equates a prediction or theorem to its own inputs by construction, and the weakest assumption (presence of such artifacts) is stated explicitly without smuggling in the target conclusion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on defining artifacts as a new functional category of observations and assuming standard RL observation and history models; no free parameters are mentioned.

axioms (1)

domain assumption The agent's observations are generated by an environment that can contain persistent, observable structures carrying historical information.
Invoked when stating that artifacts reduce the information needed to represent history.

invented entities (1)

artifacts no independent evidence
purpose: Observations that functionally serve as external memory by reducing required internal history representation.
New term and concept introduced to formalize the situated cognition intuition within RL.

pith-pipeline@v0.9.0 · 5441 in / 1198 out tokens · 50472 ms · 2026-05-10T16:44:51.930686+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 27 canonical work pages · 1 internal anchor

[1]

On the convergence of bounded agents

David Abel, Andr \'e Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, and Satinder Singh. On the convergence of bounded agents. arXiv preprint arXiv:2307.11044, 2023

work page arXiv 2023
[2]

Anderson

John R. Anderson. Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale, NJ, 1993. ISBN 978-0805812343

1993
[3]

Ronald C. Arkin. Behavior-based Robotics. The MIT Press, Cambridge, MA, 1998

1998
[4]

The Routledge Handbook of Philosophy of Memory

Sven Bernecker and Kourken Michaelian (eds.). The Routledge Handbook of Philosophy of Memory. Routledge, New York, 1st edition, 2017. ISBN 9781138909366

2017
[5]

Bickel and Kjell A

Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, Volumes I--II Package. Chapman and Hall/CRC, 1st edition, 2015. ISBN 9781498740319

2015
[6]

Model-Free Episodic Control

Charles Blundell, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham Ruderman, Joel Z. Leibo, Jack Rae, Daan Wierstra, and Demis Hassabis. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016

work page Pith review arXiv 2016
[7]

Boncelet

Charles G. Boncelet. Image noise models. In Al Bovik (ed.), Handbook of Image and Video Processing, pp.\ 397--409. Academic Press, 2nd edition, 2005. ISBN 9780121197926. doi:10.1016/B978-012119792-6/50087-5

work page doi:10.1016/b978-012119792-6/50087-5 2005
[8]

Observational learning by reinforcement learning

Diana Borsa, Nicolas Heess, Bilal Piot, Siqi Liu, Leonard Hasenclever, Remi Munos, and Olivier Pietquin. Observational learning by reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multi-agent Systems, AAMAS '19, pp.\ 1117–1124, Richland, SC, 2019. International Foundation for Autonomous Agents and Multia...

2019
[9]

Settling the reward hypothesis

Michael Bowling, John D Martin, David Abel, and Will Dabney. Settling the reward hypothesis. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.\ 3003--3020. PMLR, 2...

2023
[10]

Reinforcement learning applied to linear quadratic regulation

Steven Bradtke. Reinforcement learning applied to linear quadratic regulation. In S. Hanson, J. Cowan, and C. Giles (eds.), Advances in Neural Information Processing Systems, volume 5. Morgan-Kaufmann, 1992

1992
[11]

Brockett and Daniel Liberzon

Roger W. Brockett and Daniel Liberzon. Quantized feedback stabilization of linear systems. IEEE Transactions on Automatic Control, 45 0 (7): 0 1279--1289, July 2000. doi:10.1109/9.867021

work page doi:10.1109/9.867021 2000
[12]

Rodney A. Brooks. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2 0 (1): 0 14--23, 1986. doi:10.1109/JRA.1986.1087032

work page doi:10.1109/jra.1986.1087032 1986
[13]

Rodney A. Brooks. Intelligence without representation. Artificial Intelligence, 47 0 (1): 0 139--159, 1991. ISSN 0004-3702. doi:https://doi.org/10.1016/0004-3702(91)90053-M

work page doi:10.1016/0004-3702(91)90053-m 1991
[14]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 0 1877--1901, 2020

1901
[15]

Being there: Putting brain, body, and world together again

Andy Clark. Being there: Putting brain, body, and world together again. MIT press, 1998

1998
[16]

The extended mind

Andy Clark and David Chalmers. The extended mind. Analysis, 58 0 (1): 0 7--19, 1998

1998
[17]

Delchamps

David F. Delchamps. Stabilizing a linear system with quantized state feedback. IEEE Transactions on Automatic Control, 35 0 (8): 0 916--924, August 1990

1990
[18]

Simple agent, complex environment: efficient reinforcement learning with agent states

Shi Dong, Benjamin Van Roy, and Zhengyuan Zhou. Simple agent, complex environment: efficient reinforcement learning with agent states. Journal of Machine Learning Research, 23 0 (1), January 2022. ISSN 1532-4435

2022
[19]

Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, et al

Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610 0 (7930): 0 47--53, 2022

2022
[20]

Gazzaniga (ed.)

Michael S. Gazzaniga (ed.). The Cognitive Neurosciences. The MIT Press, 9 2009. ISBN 9780262303101. doi:10.7551/mitpress/8029.001.0001

work page doi:10.7551/mitpress/8029.001.0001 2009
[21]

Gershman and Nathaniel D

Samuel J. Gershman and Nathaniel D. Daw. Reinforcement learning and episodic memory in humans and animals: An integrative framework. Annual Review of Psychology, 68 0 (Volume 68, 2017): 0 101--128, 2017. ISSN 1545-2085. doi:https://doi.org/10.1146/annurev-psych-122414-033625

work page doi:10.1146/annurev-psych-122414-033625 2017
[22]

La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis et cubitermes sp

Pierre-Paul Grass \'e . La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis et cubitermes sp. la th \'e orie de la stigmergie: Essai d'interpr \'e tation du comportement des termites constructeurs. Insectes Sociaux, 6 0 (1): 0 41--80, March 1959. ISSN 1420-9098. doi:10.1007/BF02223791

work page doi:10.1007/bf02223791 1959
[23]

Deep recurrent q-learning for partially observable mdps

Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In AAAI Fall Symposium Series: Sequential Decision Making for Intelligent Agents, pp.\ 29--37, 2015

2015
[24]

Varieties of artifacts: Embodied, perceptual, cognitive, and affective

Richard Heersmink. Varieties of artifacts: Embodied, perceptual, cognitive, and affective. Topics in Cognitive Science, 13 0 (4): 0 573--596, 2021. doi:10.1111/tops.12549

work page doi:10.1111/tops.12549 2021
[25]

Stigmergy as a universal coordination mechanism: Components, varieties and applications

Francis Heylighen. Stigmergy as a universal coordination mechanism: Components, varieties and applications. https://pespmc1.vub.ac.be/Papers/Stigmergy-Springer.pdf, 2015

2015
[26]

Generalizable episodic memory for deep reinforcement learning

Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, and Chongjie Zhang. Generalizable episodic memory for deep reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.\ 4380--4390. PMLR, 18--24 Jul 2021

2021
[27]

Cognition in the Wild

Edwin Hutchins. Cognition in the Wild. The MIT Press, 02 1995. ISBN 9780262275972

1995
[28]

Cognitive artifacts

Edwin Hutchins. Cognitive artifacts. In The MIT Encyclopedia of the Cognitive Sciences , pp.\ 126--127. MIT Press, Cambridge, MA, 2001

2001
[29]

Universal Artificial Intelligence

Marcus Hutter. Universal Artificial Intelligence. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, Heidelberg, 1st edition, 2005. ISBN 978-3-540-22139-5. doi:10.1007/b138233

work page doi:10.1007/b138233 2005
[30]

Klassen, Phillip Christoffersen, Amir massoud Farahmand, and Sheila A

Rodrigo Toro Icarte, Richard Valenzano, Toryn Q. Klassen, Phillip Christoffersen, Amir massoud Farahmand, and Sheila A. McIlraith. The act of remembering: a study in partially observable reinforcement learning, 2020

2020
[31]

Observable operator models for discrete stochastic time series

Herbert Jaeger. Observable operator models for discrete stochastic time series. Neural Computation, 12 0 (6): 0 1371--1398, 06 2000. ISSN 0899-7667. doi:10.1162/089976600300015411

work page doi:10.1162/089976600300015411 2000
[32]

Littman, and Anthony R

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 0 (1-2): 0 99--134, 1998. ISSN 0004-3702. doi:https://doi.org/10.1016/S0004-3702(98)00023-X

work page doi:10.1016/s0004-3702(98)00023-x 1998
[33]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[34]

Arbaaz Khan, Clark Zhang, Nikolay Atanasov, Konstantinos Karydis, Vijay Kumar, and Daniel D. Lee. Memory augmented control networks. In International Conference on Learning Representations, 2018

2018
[35]

Stanley B. Klein. What memory is. WIREs Cognitive Science, 6 0 (1): 0 1--38, 2015. doi:https://doi.org/10.1002/wcs.1333

work page doi:10.1002/wcs.1333 2015
[36]

Hippocampal contributions to control: The third way

M\' a t\' e Lengyel and Peter Dayan. Hippocampal contributions to control: The third way. In J. Platt, D. Koller, Y. Singer, and S. Roweis (eds.), Advances in Neural Information Processing Systems, volume 20, pp.\ 889--896, 2007

2007
[37]

Self-improving reactive agents based on reinforcement learning, planning and teaching

Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8 0 (3): 0 293--321, May 1992. ISSN 1573-0565. doi:10.1007/BF00992699

work page doi:10.1007/bf00992699 1992
[38]

Episodic memory deep q-networks

Zichuan Lin, Tianqi Zhao, Guangwen Yang, and Lintao Zhang. Episodic memory deep q-networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18 , pp.\ 2433--2439. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi:10.24963/ijcai.2018/337

work page doi:10.24963/ijcai.2018/337 2018
[39]

Michael L. Littman. Memoryless policies: theoretical limitations and practical results. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior: From Animals to Animats 3: From Animals to Animats 3, SAB94, pp.\ 238–245, Cambridge, MA, USA, 1994. The MIT Press. ISBN 0262531224

1994
[40]

Littman, Richard S

Michael L. Littman, Richard S. Sutton, and Satinder Singh. Predictive representations of state. In T. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems, volume 14, pp.\ 1555--1561, 2001

2001
[41]

Reinforcement Learning for Embedded Agents Facing Complex Tasks

Mario Mart \' n Mu \ n oz. Reinforcement Learning for Embedded Agents Facing Complex Tasks. Phd thesis, Universitat Polit \`e cnica de Catalunya, Barcelona, Spain, 1998

1998
[42]

Cognitive integration and the extended mind

Richard Menary. Cognitive integration and the extended mind. The extended mind, pp.\ 227--243, 2010

2010
[43]

Is external memory memory? biological memory and extended mind

Kourken Michaelian. Is external memory memory? biological memory and extended mind. Consciousness and Cognition, 21 0 (3): 0 1154--1165, 2012. ISSN 1053-8100. doi:https://doi.org/10.1016/j.concog.2012.04.008

work page doi:10.1016/j.concog.2012.04.008 2012
[44]

Kourken Michaelian and John Sutton. Memory . In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, S ummer 2017 edition, 2017

2017
[45]

Nature 518(7540):529–533

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement l...

work page doi:10.1038/nature14236 2015
[46]

Unified Theories of Cognition

Allen Newell. Unified Theories of Cognition. Harvard University Press, Cambridge, MA, 1990. ISBN 9780674920996

1990
[47]

Control of memory, active perception, and action in minecraft

Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, and Honglak Lee. Control of memory, active perception, and action in minecraft. In Maria Florina Balcan and Kilian Q. Weinberger (eds.), Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp.\ 2790--2799, New York, New York, USA, 2...

2016
[48]

Empirical design in reinforcement learning

Andrew Patterson, Samuel Neumann, Martha White, and Adam White. Empirical design in reinforcement learning. Journal of Machine Learning Research, 25 0 (318): 0 1--63, 2024. URL https://jmlr.org/papers/v25/23-0183.html

2024
[49]

Learning policies with external memory

Leonid Peshkin, Nicolas Meuleau, and Leslie Pack Kaelbling. Learning policies with external memory. In Proceedings of the 16th International Conference on Machine Learning, ICML '99, pp.\ 307–314, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1558606122

1999
[50]

Neural episodic control

Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adri \`a Puigdom \`e nech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, and Charles Blundell. Neural episodic control. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 2827--28...

2017
[51]

Rao and Michael P

Anand S. Rao and Michael P. Georgeff. BDI agents: From theory to practice. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), pp.\ 312--319, 1995

1995
[52]

Slime mold uses an externalized spatial “memory” to navigate in complex environments

Chris R Reid, Tanya Latty, Audrey Dussutour, and Madeleine Beekman. Slime mold uses an externalized spatial “memory” to navigate in complex environments. Proceedings of the National Academy of Sciences, 109 0 (43): 0 17490--17494, 2012

2012
[53]

Cognitive stigmergy: Towards a framework based on agents and artifacts

Alessandro Ricci, Andrea Omicini, Mirko Viroli, Luca Gardelli, and Enrico Oliva. Cognitive stigmergy: Towards a framework based on agents and artifacts. In Danny Weyns, H. Van Dyke Parunak, and Fabien Michel (eds.), Environments for Multi-Agent Systems III, pp.\ 124--140. Springer, 2007. ISBN 978-3-540-71103-2

2007
[54]

The Analysis of Mind

Bertrand Russell. The Analysis of Mind. G. Allen & Unwin, London, 1921

1921
[55]

Prioritized experience replay

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. In International Conference on Learning Representations, 2016

2016
[56]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529 0 (7587): 0 484--489, 2016

2016
[57]

Externalized memory in slime mould and the extended (non-neuronal) mind

Matthew Sims and Julian Kiverstein. Externalized memory in slime mould and the extended (non-neuronal) mind. Cognitive Systems Research, 73: 0 26--35, 2022. ISSN 1389-0417. doi:https://doi.org/10.1016/j.cogsys.2021.12.001

work page doi:10.1016/j.cogsys.2021.12.001 2022
[58]

James, and Matthew R

Satinder Singh, Michael R. James, and Matthew R. Rudary. Predictive state representations: a new theory for modeling dynamical systems. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI '04, pp.\ 512–519, Arlington, Virginia, USA, 2004. AUAI Press. ISBN 0974903906

2004
[59]

Singh, Tommi Jaakkola, and Michael I

Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan. Learning without state-estimation in partially observable markovian decision processes. In William W. Cohen and Haym Hirsh (eds.), Machine Learning Proceedings 1994, pp.\ 284--292. Morgan Kaufmann, San Francisco (CA), 1994. ISBN 978-1-55860-335-6. doi:https://doi.org/10.1016/B978-1-55860-335-6.50042-8

work page doi:10.1016/b978-1-55860-335-6.50042-8 1994
[60]

Minds: extended or scaffolded? Phenomenology and the Cognitive Sciences, 9 0 (4): 0 465--481, December 2010

Kim Sterelny. Minds: extended or scaffolded? Phenomenology and the Cognitive Sciences, 9 0 (4): 0 465--481, December 2010. ISSN 1572-8676. doi:10.1007/s11097-010-9174-y

work page doi:10.1007/s11097-010-9174-y 2010
[61]

From nonlinearity to optimality: pheromone trail foraging by ants

David J.T Sumpter and Madeleine Beekman. From nonlinearity to optimality: pheromone trail foraging by ants. Animal Behaviour, 66 0 (2): 0 273--280, 2003. ISSN 0003-3472. doi:https://doi.org/10.1006/anbe.2003.2224

work page doi:10.1006/anbe.2003.2224 2003
[62]

Constructive memory and distributed cognition: Towards an interdisciplinary framework

John Sutton. Constructive memory and distributed cognition: Towards an interdisciplinary framework. In B. Kokinov and W. Hirst (eds.), Constructive Memory, pp.\ 290--303. New Bulgarian University, 2003

2003
[63]

Richard S. Sutton. The bitter lesson. Incomplete Ideas (blog), 2019. URL http://www.incompleteideas.net/IncIdeas/BitterLesson.html

2019
[64]

Richard S. Sutton. The quest for a common model of the intelligent decision maker. In Proceedings of the 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022), Providence, Rhode Island, USA, 2022

2022
[65]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 2nd edition, 2018

2018
[66]

Memory allocation in resource-constrained reinforcement learning

Massimiliano Tamborski and David Abel. Memory allocation in resource-constrained reinforcement learning. arXiv preprint arXiv:2506.17263, 2025

work page arXiv 2025
[67]

Thierry, G

B. Thierry, G. Theraulaz, J.Y. Gautier, and B. Stiegler. Joint memory. Behavioural Processes, 35 0 (1): 0 127--140, 1995. ISSN 0376-6357. doi:https://doi.org/10.1016/0376-6357(95)00039-9. Cognition and Evolution

work page doi:10.1016/0376-6357(95)00039-9 1995
[68]

Episodic and semantic memory

Endel Tulving. Episodic and semantic memory. In Organization of Memory, pp.\ 381--403. Academic Press, London, UK, 1972

1972
[69]

Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8 0 (3--4): 0 279--292, May 1992. ISSN 1573-0565. doi:10.1007/BF00992698

work page doi:10.1007/bf00992698 1992
[70]

Edward O. Wilson. Chemical communication among workers of the fire ant solenopsis saevissima (fr. smith) 1. the organization of mass-foraging. Animal Behaviour, 10 0 (1): 0 134--147, 1962. ISSN 0003-3472. doi:https://doi.org/10.1016/0003-3472(62)90141-0

work page doi:10.1016/0003-3472(62)90141-0 1962
[71]

Episodic reinforcement learning with associative memory

Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, and Chongjie Zhang. Episodic reinforcement learning with associative memory. In International Conference on Learning Representations, 2020

2020
[72]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...