eMEM: A Hybrid Spatio-Temporal Memory System For Embodied Agents
Pith reviewed 2026-06-28 09:39 UTC · model grok-4.3
The pith
A hybrid graph-based memory system unifies semantic, spatial, and temporal indexes for embodied agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
eMEM uses a multi-index architecture consisting of structured storage, approximate nearest-neighbour semantic search, and spatial indexing, all unified behind one graph model, together with a tiered consolidation pipeline that compresses raw perceptual observations into summaries; this design yields strong results on probes for context-dependent retrieval and lure rejection while maintaining retention across long simulated delays.
What carries the argument
The multi-index architecture (structured storage, semantic vector search, and spatial indexing) unified behind a single graph model, together with a tiered consolidation pipeline that transforms raw perceptual observations into compressed summaries.
If this is right
- Ten recall primitives, including concept-to-location resolution and cross-layer recall, become available as direct operations for LLM tool calling.
- The system runs fully embedded and in-process with the agent.
- Retention of room-unique items stays at ceiling level across simulated delays from one hour to one year.
- Multi-layer storage improves context-dependent retrieval while consolidation improves rejection of false associations.
Where Pith is reading between the lines
- The consolidation step could be applied to other sensor streams beyond vision in embodied settings.
- The graph unification might reduce the need for separate external memory services in deployed agents.
- Psychology-derived tasks could serve as a diagnostic layer for memory components in other agent designs.
Load-bearing premise
Performance on the eight cognitive-psychology paradigms in simulated environments accurately measures the memory needs of embodied agents in real physical settings.
What would settle it
Testing the same set of probes on physical robots moving through actual rooms and checking whether the advantage over a flat retrieval baseline disappears.
Figures
read the original abstract
We present eMEM (Embodied Memory), a hybrid graph-based memory system for embodied agents operating in physical environments. Current agent memory architectures, such as Generative Agents, MemGPT, and A-MEM, treat memory as text streams or knowledge graphs, but embodied agents require memory that is simultaneously searchable by meaning, space, and time. eMEM fills this gap with a multi-index architecture (SQLITE for structured storage, hnswlib for approximate nearest neighbour semantic search, and an R-tree for spatial queries) unified behind a single graph model. A tiered consolidation pipeline transforms raw perceptual observations into compressed summaries, mirroring hippocampal-neocortical consolidation in biological systems. Ten agent-facing recall tools expose memory retrieval primitives, including concept-to-location resolution and cross layer recall, as first-class operations for LLM tool calling. The system is fully embedded and runs in-process alongside the agent. In addition we introduce eMEM-Bench v1, a benchmark we construct over ProcTHOR-10K scenes for embodied memory evaluation. The benchmark is organised explicitly around eight cognitive-psychology paradigms (DRM lures, pattern separation, pattern completion, source monitoring, context-dependent retrieval, long-horizon interference, serial position, and a foil augmented retention curve), each chosen so that the result is interpretable against the broader memory-systems literature in humans and prior agent-memory systems; a level of diagnostic that surface-task benchmarks like LoCoMo or OpenEQA cannot provide. eMEM scores 80.8 weighted mean over 988 probes, with a flat retention curve at ceiling from 1 h to 1 yr of simulated delay on room-unique items. We show that a pure RAG baseline (the flat_rag ablation) loses 30 pt on context dependent retrieval and 29 pt on DRM lure rejection, isolating the contribution of multi-layer storage and consolidation respectively. We release both the system and the benchmark code.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents eMEM, a hybrid graph-based memory system for embodied agents that uses a multi-index architecture (SQLITE for structured data, hnswlib for semantic search, R-tree for spatial queries) unified by a graph model, along with a tiered consolidation pipeline inspired by hippocampal-neocortical processes. It introduces eMEM-Bench, a benchmark over ProcTHOR-10K scenes organized around eight cognitive-psychology paradigms (DRM lures, pattern separation, etc.), reporting an 80.8 weighted mean score over 988 probes, a flat retention curve at ceiling from 1h to 1yr on room-unique items, and ablation results showing a pure RAG baseline loses 30pt on context-dependent retrieval and 29pt on DRM lure rejection.
Significance. If the empirical results and benchmark hold, the work supplies a more diagnostic evaluation framework for agent memory that aligns with human memory-systems literature, while demonstrating concrete gains from multi-layer storage and consolidation over text-stream or flat RAG approaches; the release of code and benchmark supports reproducibility.
major comments (2)
- [Benchmark construction (abstract and methods)] The headline performance claims (80.8 weighted mean, flat retention 1h–1yr) and ablation margins rest on eMEM-Bench as a valid proxy for embodied-agent memory demands, yet the benchmark is constructed exclusively in discrete ProcTHOR-10K scenes with perfect perceptual access and synthetic timestamps; the manuscript provides no analysis or experiments addressing how sensor noise, partial observability, motor-induced state changes, or continuous event streams would affect retrieval and consolidation behavior.
- [Ablation experiments] The ablation isolating multi-index storage and consolidation (flat_rag loses 30pt on context-dependent retrieval and 29pt on DRM lure rejection) is load-bearing for the architectural contribution claim, but the manuscript does not detail how the ablation controls for confounding factors such as index size, consolidation parameters, or query formulation across the 988 probes.
minor comments (3)
- [Results] The weighting scheme used to compute the 80.8 weighted mean across the eight paradigms and 988 probes is not specified, making it difficult to interpret the aggregate score.
- [System description] The ten agent-facing recall tools are described at a high level; the manuscript would benefit from explicit pseudocode or interface signatures for the concept-to-location resolution and cross-layer recall primitives.
- [Results] The manuscript should include error bars, per-paradigm breakdowns, or statistical tests for the reported ablation differences to strengthen the quantitative claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate the revisions we will make to improve clarity and completeness.
read point-by-point responses
-
Referee: [Benchmark construction (abstract and methods)] The headline performance claims (80.8 weighted mean, flat retention 1h–1yr) and ablation margins rest on eMEM-Bench as a valid proxy for embodied-agent memory demands, yet the benchmark is constructed exclusively in discrete ProcTHOR-10K scenes with perfect perceptual access and synthetic timestamps; the manuscript provides no analysis or experiments addressing how sensor noise, partial observability, motor-induced state changes, or continuous event streams would affect retrieval and consolidation behavior.
Authors: We agree that eMEM-Bench uses idealized discrete scenes with perfect access and synthetic timestamps. This controlled setup was chosen to enable direct mapping to cognitive-psychology paradigms and isolate memory-system effects. We will add a new limitations subsection discussing how sensor noise, partial observability, and continuous streams could affect performance, along with suggested extensions of the benchmark to those regimes. No new experiments are feasible within the current scope, but the discussion will be added. revision: yes
-
Referee: [Ablation experiments] The ablation isolating multi-index storage and consolidation (flat_rag loses 30pt on context-dependent retrieval and 29pt on DRM lure rejection) is load-bearing for the architectural contribution claim, but the manuscript does not detail how the ablation controls for confounding factors such as index size, consolidation parameters, or query formulation across the 988 probes.
Authors: The flat_rag baseline was constructed by using a single hnswlib index with the identical embedding model and query templates as eMEM's semantic component, while disabling all consolidation and multi-index routing. Index capacity was matched to the total size of eMEM's three stores, and the same 988 probe queries were executed verbatim. We will expand the methods and appendix with explicit parameter tables, pseudocode for the ablation configuration, and verification steps confirming that only the architectural differences were varied. revision: yes
Circularity Check
No circularity: empirical system and benchmark results
full rationale
The paper presents a hybrid memory architecture and eMEM-Bench without any mathematical derivations, parameter fitting, or predictive claims that reduce to inputs by construction. Performance numbers (80.8 weighted mean, ablation deltas) are reported from direct evaluation on 988 probes in ProcTHOR-10K scenes; no equations, self-citations, or ansatzes are invoked to derive these quantities from prior fitted values or author theorems. The benchmark construction and consolidation pipeline are described as engineering choices motivated by cognitive literature, not as outputs forced by self-referential definitions. This is the common case of a self-contained empirical contribution.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A graph model can usefully unify SQLITE, hnswlib, and R-tree indexes for agent memory queries.
- ad hoc to paper Tiered consolidation that mirrors hippocampal-neocortical processes will improve agent memory performance.
Reference graph
Works this paper leans on
-
[1]
Generative Agents: Interactive Simulacra of Human Behavior , author =. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year =. 2304.03442 , archivePrefix =
-
[2]
and Stoica, Ion and Gonzalez, Joseph E
Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G. and Stoica, Ion and Gonzalez, Joseph E. , year =. 2310.08560 , archivePrefix =
-
[3]
2024 , eprint =
Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , booktitle =. 2024 , eprint =
2024
-
[4]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Reflexion: Language Agents with Verbal Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2303.11366 , archivePrefix =
-
[5]
Transactions on Machine Learning Research (TMLR) , year =
Voyager: An Open-Ended Embodied Agent with Large Language Models , author =. Transactions on Machine Learning Research (TMLR) , year =. 2305.16291 , archivePrefix =
-
[6]
2023 , eprint =
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory , author =. 2023 , eprint =
2023
-
[7]
2025 , eprint =
Xu, Wujiang and Liang, Zujie and Mei, Kai and Gao, Hang and Tan, Juntao and Zhang, Yongfeng , booktitle =. 2025 , eprint =
2025
-
[8]
Chhikara, Prateek and Khant, Dev and others , year =. 2504.19413 , archivePrefix =
-
[9]
Rasmussen, Preston and others , year =. 2501.13956 , archivePrefix =
-
[10]
ACM Transactions on Information Systems , year =
A Survey on the Memory Mechanism of Large Language Model-based Agents , author =. ACM Transactions on Information Systems , year =. 2404.13501 , archivePrefix =
-
[11]
Hu, Yujia and others , year =. Memory in the Age of. 2512.13564 , archivePrefix =
-
[12]
Hydra: A Real-time Spatial Perception System for
Hughes, Nathan and Chang, Yun and Carlone, Luca , booktitle =. Hydra: A Real-time Spatial Perception System for. 2022 , eprint =
2022
-
[13]
International Journal of Robotics Research (IJRR) , year =
Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-Time Systems , author =. International Journal of Robotics Research (IJRR) , year =
-
[14]
Gu, Qiao and Kuwajerwala, Ali and others , booktitle =
-
[15]
Rana, Krishan and Haviland, Jesse and Garg, Sourav and Abou-Chakra, Jad and Reid, Ian and Suenderhauf, Niko , booktitle =
-
[16]
Booker, Matthew and Byrd, Gregory and Kemp, Brendan and Schmidt, Adam and Rivera, Christopher , year =. 2410.23968 , archivePrefix =
-
[17]
IEEE International Conference on Robotics and Automation (ICRA) , year =
Visual Language Maps for Robot Navigation , author =. IEEE International Conference on Robotics and Automation (ICRA) , year =. 2210.05714 , archivePrefix =
-
[18]
Liu, Peiqi and Orru, Yaswanth and Vakil, Jay and Paxton, Chris and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel , year =. 2401.12202 , archivePrefix =
-
[19]
, journal =
Raychaudhuri, Sonia and Chang, Angel X. , journal =. Semantic Mapping in Indoor Embodied. 2025 , eprint =
2025
-
[20]
Psychological Review , volume =
Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory , author =. Psychological Review , volume =
-
[21]
Trends in Cognitive Sciences , volume =
What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , author =. Trends in Cognitive Sciences , volume =
-
[22]
Brain Research , volume =
The hippocampus as a spatial map: Preliminary evidence from unit activity in the freely-moving rat , author =. Brain Research , volume =
-
[23]
Nature , volume =
Microstructure of a spatial map in the entorhinal cortex , author =. Nature , volume =
-
[24]
Organization of Memory , editor =
Episodic and Semantic Memory , author =. Organization of Memory , editor =
-
[25]
Elements of Episodic Memory , author =
-
[26]
Nature Reviews Neuroscience , volume =
The memory function of sleep , author =. Nature Reviews Neuroscience , volume =
-
[27]
Psychological Research , volume =
System consolidation of memory during sleep , author =. Psychological Research , volume =
-
[28]
Nature , volume =
Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval , author =. Nature , volume =
-
[29]
British Journal of Psychology , volume =
Context-dependent memory in two natural environments: On land and underwater , author =. British Journal of Psychology , volume =
-
[30]
Psychological Review , volume =
Encoding specificity and retrieval processes in episodic memory , author =. Psychological Review , volume =
-
[31]
Nature Human Behaviour , volume =
A generative model of memory construction and consolidation , author =. Nature Human Behaviour , volume =
-
[32]
Proceedings of the National Academy of Sciences (PNAS) , volume =
Place cells may simply be memory cells: Memory compression leads to spatial tuning and history dependence , author =. Proceedings of the National Academy of Sciences (PNAS) , volume =
-
[33]
Frontiers in Computational Neuroscience , volume =
Memory consolidation from a reinforcement learning perspective , author =. Frontiers in Computational Neuroscience , volume =
-
[34]
Trends in Cognitive Sciences , volume =
Interoceptive inference, emotion, and the embodied self , author =. Trends in Cognitive Sciences , volume =
-
[35]
Behavioral and Brain Sciences , volume =
Extending predictive processing to the body: Emotion as interoceptive inference , author =. Behavioral and Brain Sciences , volume =
-
[36]
Trends in Neurosciences , volume =
Functions of Interoception: From Energy Regulation to Experience of the Self , author =. Trends in Neurosciences , volume =
-
[37]
Maimon, Asaf and Wald, Ido and Pomarlan, Mihai and Zhang, Sen and Beßler, Daniel and Nolte, Robert and K
-
[38]
Towards a Synthetic Tutor Assistant: The
Vouloutsi, Vasiliki and others , booktitle =. Towards a Synthetic Tutor Assistant: The
-
[39]
Peller-Konrad, Fabian and Kartmann, Rainer and Dreher, Christian R. G. and Meixner, Andre and Reister, Fabian and Grotz, Markus and Asfour, Tamim , journal =. A memory system of a robot cognitive architecture and its implementation in
-
[40]
2023 , eprint =
Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , eprint =
2023
-
[41]
2023 , eprint =
Ma, Xiaojian and Yong, Silong and Zheng, Zilong and Li, Qing and Liang, Yitao and Zhu, Song-Chun and Huang, Siyuan , booktitle =. 2023 , eprint =
2023
-
[42]
Evaluating Very Long-Term Conversational Memory of
Maharana, Adyasha and Lee, Dong-Ho and Tulyakov, Sergey and Bansal, Mohit and Barbieri, Francesco and Fang, Yuwei , booktitle =. Evaluating Very Long-Term Conversational Memory of. 2024 , eprint =
2024
-
[43]
Majumdar, Arjun and Ajay, Anurag and Zhang, Xiaohan and Putta, Pranav and Yenamandra, Sriram and Henaff, Mikael and Silwal, Sneha and Mcvay, Paul and Maksymets, Oleksandr and Arnaud, Sergio and Yadav, Karmesh and Li, Qiyang and Newman, Ben and Sharma, Mohit and Berges, Vincent and Zhang, Shiqi and Agrawal, Pulkit and Bisk, Yonatan and Batra, Dhruv and Kal...
-
[44]
Yadav, Karmesh and Ali, Yusuf and Gupta, Gunshi and Gal, Yarin and Kira, Zsolt , year =. 2506.15635 , archivePrefix =
-
[45]
Explore with Long-term Memory: A Benchmark and Multimodal
Wang, Shuo and others , year =. Explore with Long-term Memory: A Benchmark and Multimodal. 2601.10744 , archivePrefix =
-
[46]
Kolve, Eric and Mottaghi, Roozbeh and Han, Winson and VanderBilt, Eli and Weihs, Luca and Herrasti, Alvaro and Gordon, Daniel and Zhu, Yuke and Gupta, Abhinav and Farhadi, Ali , year =. 1712.05474 , archivePrefix =
-
[47]
The Nature of Explanation , author =
-
[48]
Nature Reviews Neuroscience , volume =
The free-energy principle: a unified brain theory? , author =. Nature Reviews Neuroscience , volume =
-
[49]
Journal of The Royal Society Interface , volume =
The Markov blankets of life: autonomy, active inference and the free energy principle , author =. Journal of The Royal Society Interface , volume =
-
[50]
2018 , eprint =
World Models , author =. 2018 , eprint =
2018
-
[51]
2022 , howpublished =
A Path Towards Autonomous Machine Intelligence (Version 0.9.2) , author =. 2022 , howpublished =
2022
-
[52]
Progress in Neurobiology , volume =
Prediction and memory: A predictive coding account , author =. Progress in Neurobiology , volume =
-
[53]
Surfing Uncertainty: Prediction, Action, and the Embodied Mind , author =
-
[54]
Neuron , volume =
What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , author =. Neuron , volume =
-
[55]
Nature Neuroscience , volume =
The hippocampus as a predictive map , author =. Nature Neuroscience , volume =
-
[56]
Cell , volume =
The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation , author =. Cell , volume =
-
[57]
Neural Networks , volume =
World model learning and inference , author =. Neural Networks , volume =
-
[58]
World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges , author =. Advanced Robotics , year =. 2301.05832 , archivePrefix =
-
[59]
Nature , volume =
Mastering diverse control tasks through world models , author =. Nature , volume =. 2025 , eprint =
2025
-
[60]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Jim. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2405.14831 , archivePrefix =
-
[61]
2024 , howpublished =
2024
-
[62]
2025 , eprint =
Training Sparse Mixture of Experts Text Embedding Models , author =. 2025 , eprint =
2025
-
[63]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , author =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2020 , eprint =
2020
-
[64]
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD) , pages =
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , author =. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD) , pages =. 1996 , publisher =
1996
-
[65]
2019 , eprint =
Reimers, Nils and Gurevych, Iryna , booktitle =. 2019 , eprint =
2019
-
[66]
and Fischer, Martin and Malik, Jitendra and Savarese, Silvio , booktitle =
Armeni, Iro and He, Zhi-Yang and Gwak, JunYoung and Zamir, Amir R. and Fischer, Martin and Malik, Jitendra and Savarese, Silvio , booktitle =
-
[67]
Rosinol, Antoni and Violette, Andrew and Abate, Marcus and Hughes, Nathan and Chang, Yun and Shi, Jingnan and Gupta, Arjun and Carlone, Luca , booktitle =
-
[68]
2022 , url =
Deitke, Matt and VanderBilt, Eli and Herrasti, Alvaro and Weihs, Luca and Salvador, Jordi and Ehsani, Kiana and Han, Winson and Kolve, Eric and Farhadi, Ali and Kembhavi, Aniruddha and Mottaghi, Roozbeh , booktitle =. 2022 , url =
2022
-
[69]
and McDermott, Kathleen B
Roediger, Henry L. and McDermott, Kathleen B. , journal =. Creating false memories:. 1995 , publisher =
1995
-
[70]
Trends in Neurosciences , volume =
Pattern separation in the hippocampus , author =. Trends in Neurosciences , volume =. 2011 , publisher =
2011
-
[71]
Psychological Bulletin , volume =
Source monitoring , author =. Psychological Bulletin , volume =. 1993 , publisher =
1993
-
[72]
Ebbinghaus, Hermann , year =
-
[73]
Journal of Experimental Psychology , volume =
The serial position effect of free recall , author =. Journal of Experimental Psychology , volume =. 1962 , publisher =
1962
-
[74]
1996 , publisher =
Prospective. 1996 , publisher =
1996
-
[75]
Nature Communications , volume =
Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps , author =. Nature Communications , volume =. 2021 , doi =
2021
-
[76]
Swaroop and Zhou, Guangyao and Wendelken, Carter and L
Raju, Rajkumar Vasudeva and Guntupalli, J. Swaroop and Zhou, Guangyao and Wendelken, Carter and L. Space is a latent sequence:. Science Advances , volume =. 2024 , doi =
2024
-
[77]
Benna and Stefano Fusi
Marcus K. Benna and Stefano Fusi. Place cells may simply be memory cells: Memory compression leads to spatial tuning and history dependence. Proceedings of the National Academy of Sciences (PNAS), 118 0 (51), 2021
2021
-
[78]
EmbodiedRAG : Dynamic 3D scene graph retrieval for efficient and scalable robot task planning, 2024
Matthew Booker, Gregory Byrd, Brendan Kemp, Adam Schmidt, and Christopher Rivera. EmbodiedRAG : Dynamic 3D scene graph retrieval for efficient and scalable robot task planning, 2024
2024
-
[79]
System consolidation of memory during sleep
Jan Born and Ines Wilhelm. System consolidation of memory during sleep. Psychological Research, 76: 0 192--203, 2012
2012
-
[80]
Mem0 : Building production-ready AI agents with scalable long-term memory, 2025
Prateek Chhikara, Dev Khant, et al. Mem0 : Building production-ready AI agents with scalable long-term memory, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.