Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment
Pith reviewed 2026-05-10 14:55 UTC · model grok-4.3
The pith
Robots can learn to forget irrelevant experiences using language rules updated by user feedback, preserving query accuracy while reducing memory size by 45 percent and compute by 35 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
H²-EMV constructs hierarchical episodic memory incrementally from multimodal perception, applies selective forgetting through language-model relevance estimation conditioned on learned natural-language rules, and refines those rules based on user feedback regarding omitted details. This results in sustained question-answering performance alongside reductions in memory footprint and query computation, with accuracy rising across successive query rounds due to personalization.
What carries the argument
The H²-EMV framework combining hierarchical episodic memory construction with language-model-based relevance estimation on adaptive natural-language rules that are updated via user feedback about forgotten details.
If this is right
- Memory size is reduced by 45% while maintaining question-answering accuracy on household tasks and real robot recordings.
- Query-time compute decreases by 35%, making real-time responses feasible.
- Performance improves over time, with accuracy increasing 70% in second-round queries through adaptation to user priorities.
- The system supports 20.5-hour-long real-world recordings from the ARMAR-7 humanoid.
- Learned forgetting enables scalable and personalized episodic memory for long-term human-robot collaboration.
Where Pith is reading between the lines
- This forgetting approach could extend to other AI systems requiring persistent memory, such as virtual assistants managing conversation history.
- User-driven rule updates might help identify common patterns in what different people want remembered in shared environments.
- Future tests could examine how well the system resolves conflicting relevance feedback from multiple users in one household.
- The method opens possibilities for robots to implicitly learn task importance and user attention models from interaction patterns.
Load-bearing premise
That language-model-based relevance estimation conditioned on learned natural-language rules, when updated from user feedback about forgotten details, accurately and stably captures users' notions of relevance across varied interactions and tasks.
What would settle it
A deployment in which users repeatedly indicate that key forgotten details were relevant to them, or where accuracy on user-specific queries does not rise or falls in later rounds despite rule updates.
read the original abstract
Robots must verbalize their past experiences when users ask "Where did you put my keys?" or "Why did the task fail?" Yet maintaining life-long episodic memory (EM) from continuous multimodal perception quickly exceeds storage limits and makes real-time query impractical, calling for selective forgetting that adapts to users' notions of relevance. We present H$^2$-EMV, a framework enabling humanoids to learn what to remember through user interaction. Our approach incrementally constructs hierarchical EM, selectively forgets using language-model-based relevance estimation conditioned on learned natural-language rules, and updates these rules given user feedback about forgotten details. Evaluations on simulated household tasks and 20.5-hour-long real-world recordings from ARMAR-7 demonstrate that H$^2$-EMV maintains question-answering accuracy while reducing memory size by 45% and query-time compute by 35%. Critically, performance improves over time - accuracy increases 70% in second-round queries by adapting to user-specific priorities - demonstrating that learned forgetting enables scalable, personalized EM for long-term human-robot collaboration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces H²-EMV, a framework for lifelong episodic memory in humanoid robots. It incrementally builds a hierarchical EM from multimodal perception, applies selective forgetting via language-model-based relevance estimation conditioned on learned natural-language rules, and updates those rules from user feedback on forgotten details. Evaluations on simulated household tasks and 20.5-hour real-world ARMAR-7 recordings claim that the method maintains question-answering accuracy while cutting memory size by 45% and query-time compute by 35%, with accuracy rising 70% on second-round queries through adaptation to user-specific priorities.
Significance. If the empirical results hold, the work addresses a key barrier to long-term robot deployment by providing a scalable, adaptive memory mechanism that personalizes forgetting to user preferences. The reported gains in efficiency alongside performance improvement over time suggest practical value for human-robot collaboration scenarios, though the approach's reliance on LM-driven relevance judgments requires careful validation.
major comments (1)
- The abstract states concrete performance gains (45% memory reduction, 35% compute reduction, 70% accuracy increase on second-round queries), but without the full methods, data processing details, baselines, or error analysis it is not possible to confirm that the reported accuracy maintenance and improvements are robustly supported by the experiments.
Simulated Author's Rebuttal
We thank the referee for their review and the opportunity to clarify the empirical support for our claims. We address the concern point by point below, referencing specific sections of the manuscript that provide the requested details.
read point-by-point responses
-
Referee: The abstract states concrete performance gains (45% memory reduction, 35% compute reduction, 70% accuracy increase on second-round queries), but without the full methods, data processing details, baselines, or error analysis it is not possible to confirm that the reported accuracy maintenance and improvements are robustly supported by the experiments.
Authors: The full manuscript details the methods in Section 3 (H²-EMV framework, hierarchical EM construction, LM-based relevance estimation conditioned on learned rules, and feedback-driven rule updates). Data processing for the simulated household tasks is in Section 4.1, and for the 20.5-hour ARMAR-7 real-world recordings in Section 4.2, including preprocessing, annotation, and query generation protocols. Baselines (including non-hierarchical EM, random forgetting, and fixed-rule variants) are defined and compared in Section 4.3 with exact implementation parameters. Error analysis, including per-query accuracy breakdowns, statistical significance tests (paired t-tests with p<0.01), and ablation studies on rule adaptation, appears in Section 4.4 and the supplementary material. These elements directly support the reported 45% memory reduction, 35% compute savings, maintained QA accuracy, and 70% accuracy gain on second-round queries through personalization. We are prepared to add further clarifications or additional plots if the referee identifies specific gaps. revision: no
Circularity Check
No significant circularity identified
full rationale
The paper presents an empirical system for hierarchical episodic memory with LM-based relevance estimation and rule updates from user feedback. Central results consist of measured outcomes from simulated household tasks and 20.5-hour real-world ARMAR-7 recordings, including 45% memory reduction, 35% query-time savings, and 70% accuracy gain on second-round queries. No derivation chain, equations, or self-citations reduce any claimed prediction or first-principles result to its own inputs by construction; the framework is evaluated against external data and feedback rather than internally fitted quantities renamed as predictions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
in Episodic and semantic memory Organization of Memory (Tulving, E
Tulving, E. in Episodic and semantic memory Organization of Memory (Tulving, E. & Donaldson, W.eds ) , Vol. 1 381–403 (Academic Press, Cambridge, MA, 1972)
work page 1972
-
[2]
Rosenthal, S., Selvaraj, S. P. & Veloso, M. Verbalization: Narration of Autonomous Robot Experience . International Joint Conferences on Artificial Intelligence by AAAI , 862–868 (2016)
work page 2016
-
[3]
DeChant, C. & Bauer, D. Toward robots that learn to summarize their actions in natural language: a set of tasks . Conference on Robot Learning (CoRL) (2021)
work page 2021
-
[4]
B¨ armann, L., Peller-Konrad, F., Constantin, S., Asfour, T. & Waibel, A. Deep Episodic Memory for Verbalization of Robot Experience. IEEE Robotics and Automation Letters (RA-L) 6, 5808–5815 (2021)
work page 2021
-
[5]
DeChant, C., Akinola, I. & Bauer, D. Learning to summarize and answer questions about a virtual robot’s past actions. Autonomous Robots (2023)
work page 2023
-
[6]
Zhu, Q., Perera, V., W¨ achter, M., Asfour, T. & Veloso, M. M. Autonomous narration of humanoid robot kitchen task experience . IEEE/RAS International Conference on Humanoid Robots (Humanoids) , 390–397 (2017)
work page 2017
-
[7]
Katuwandeniya, K., Tian, L. & Kuli´ c, D. ‘What did the Robot do in my Absence?’ Video Foundation Models to Enhance Intermittent Supervision. IEEE Robotics and Automation Letters (RA-L) 1–8 (2025)
work page 2025
-
[8]
Plewnia, J. & Asfour, T. Combining Episodic Memory and LLMs for the Verbal- ization of Robot Experiences. IEEE/RAS International Conference on Humanoid Robots (Humanoids) (2025)
work page 2025
- [9]
-
[10]
Wang, Z. et al. I Can Tell What I am Doing: Toward Real-World Natural Lan- guage Grounding of Robot Experiences . Conference on Robot Learning (CoRL) (2024)
work page 2024
-
[11]
Episodic Memory Verbalization Using Hierarchical Represen- tations of Life-Long Robot Experience
B¨ armann, L.et al. Episodic Memory Verbalization Using Hierarchical Represen- tations of Life-Long Robot Experience . IEEE/RAS International Conference on Humanoid Robots (Humanoids), 783–790 (2025)
work page 2025
-
[12]
Plewnia, J., Peller-Konrad, F. & Asfour, T. Forgetting in Robotic Episodic Long- Term Memory . IEEE International Conference on Robotics and Automation (ICRA), 6711–6717 (2024). 40
work page 2024
-
[13]
Freedman, S. T. & Adams, J. A. Filtering Data Based on Human-Inspired Forgetting. Trans. Sys. Man Cyber. Part B 41, 1544–1555 (2011)
work page 2011
-
[14]
Padmakumar, A. et al. TEACh: Task-Driven Embodied Agents That Chat.AAAI Conference on Artificial Intelligence 36, 2017–2025 (2022)
work page 2017
-
[15]
Asfour, T. et al. in The Karlsruhe ARMAR Humanoid Robot Family Humanoid Robotics: A Reference 1–32 (Springer Netherlands, 2017)
work page 2017
-
[16]
Kumagai, K. et al. Towards Individualized Affective Human-Machine Interaction. IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 678–685 (2018)
work page 2018
-
[17]
Constantin, S., Eyiokur, F. I., Yaman, D., B¨ armann, L. & Waibel, A.Multimodal Error Correction with Natural Language and Pointing Gestures . IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) , 1968–1978 (2023)
work page 1968
-
[18]
Foster, M. I. & Keane, M. T. The Role of Surprise in Learning: Different Sur- prising Outcomes Affect Memorability Differentially. Topics in Cognitive Science 11, 75–87 (2019)
work page 2019
-
[19]
Barto, A., Mirolli, M. & Baldassarre, G. Novelty or Surprise? Frontiers in Psychology 4 (2013)
work page 2013
-
[20]
Tyng, C. M., Amin, H. U., Saad, M. N. M. & Malik, A. S. The Influences of Emotion on Learning and Memory. Frontiers in Psychology 8, 1454 (2017)
work page 2017
-
[21]
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models
B¨ armann, L.et al. Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models. Frontiers in Robotics and AI 11 (2024)
work page 2024
-
[22]
Peller-Konrad, F. et al. A Memory System of a Robot Cognitive Architecture and Its Implementation in ArmarX. Robotics and Autonomous Systems 164, 104415 (2023)
work page 2023
-
[23]
Prescott, T. J. & Dominey, P. F. Synthesizing the temporal self: robotic models of episodic and autobiographical memory. Philosophical Transactions of the Royal Society B: Biological Sciences 379, 20230415 (2024)
work page 2024
-
[24]
Beetz, M. et al. KnowRob 2.0 — A 2nd Generation Knowledge Processing Frame- work for Cognition-Enabled Robotic Agents . IEEE International Conference on Robotics and Automation (ICRA) (2018)
work page 2018
-
[25]
Petit, M., Fischer, T. & Demiris, Y. Lifelong Augmentation of Multimodal Streaming Autobiographical Memories. IEEE Transactions on Cognitive and Developmental Systems 8, 201–213 (2016). 41
work page 2016
- [26]
-
[27]
Leconte, F., Ferland, F. & Michaud, F. Design and integration of a spatio- temporal memory with emotional influences to categorize and recall the expe- riences of an autonomous mobile robot. Autonomous Robots 40, 831–848 (2016)
work page 2016
-
[28]
Rothfuss, J., Ferreira, F., Aksoy, E. E., Zhou, Y. & Asfour, T. Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution. IEEE Robotics and Automation Letters (RA-L) 3, 4007–4014 (2018)
work page 2018
-
[29]
DeChant, C., Akinola, I. & Bauer, D. In search of the embgram: forming episodic representations in a deep learning model . Cognitive Computational Neuroscience 2024 (2024)
work page 2024
-
[30]
Zeng, A. et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. International Conference on Learning Representations (ICLR) (2023)
work page 2023
-
[31]
Long, L. et al. Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory (2025)
work page 2025
-
[32]
Anwar, A., Welsh, J., Biswas, J., Pouya, S. & Chang, Y.ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation . IEEE International Conference on Robotics and Automation (ICRA) (2025)
work page 2025
-
[33]
´Alvarez Arias, S., Maroto-G´ omez, M., Segura-Bencomo, A., Rodr´ ıguez-Huelves, J. & Malfaz, M. Connecting Through Shared Memories. Episodic Memory for Social Robots Using Offline LLMs . International Conference on Social Robotics and AI, Vol. 16132, 149–165 (Springer Nature Singapore, 2026)
work page 2026
-
[34]
Wang, J., K¨ u¸ c¨ uktabak, E. B., Zarrin, R. S. & Erickson, Z.CoRI: Communication of Robot Intent for Physical Human-Robot Interaction . 9th Annual Conference on Robot Learning (2025)
work page 2025
-
[35]
Sigalas, M., Maniadakis, M. & Trahanias, P. Time-Aware Long-term Episodic Memory for Recurring HRI . Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction , HRI ’17, 287–288 (Association for Computing Machinery, New York, NY, USA, 2017)
work page 2017
-
[36]
Sarch, G., Wu, Y., Tarr, M. & Fragkiadaki, K. Open-Ended Instructable Embod- ied Agents with Memory-Augmented Large Language Models . Conference on Empirical Methods in Natural Language Processing (EMNLP), 3468–3500 (2023). 42
work page 2023
-
[37]
Zha, L. et al. Distilling and Retrieving Generalizable Knowledge for Robot Manip- ulation via Language Corrections . IEEE International Conference on Robotics and Automation (ICRA) , 15172–15179 (2024). 43
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.