PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft
Pith reviewed 2026-06-29 16:39 UTC · model grok-4.3
The pith
PEAM internalizes Minecraft agent experiences into parameter-based skills using contrastive learning on failure-correction pairs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PEAM demonstrates that by internalizing experience through contrastive objectives on failure-correction trajectories into physically isolated per-category LoRA adapters, governed by a parameterization-worthiness score and scale-free self-triggered consolidation, embodied agents in Minecraft can achieve better long-horizon task performance, reduced forgetting of consolidated skills, and higher efficiency compared to retrieval-based or other parametric approaches.
What carries the argument
The multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, which enables parameter-level continual learning without catastrophic forgetting.
Load-bearing premise
The parameterization-worthiness score combined with the scale-free consolidation trigger allows effective transfer to new task distributions without any re-tuning.
What would settle it
Testing the agent on a significantly different Minecraft task distribution after consolidation without any parameter updates, and observing whether performance improves or forgetting occurs compared to baselines.
Figures
read the original abstract
We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PEAM, a Parametric Embodied Agent Memory framework for Minecraft that converts agent memory from inference-time retrieval to parameter-resident skills. It pairs a slow deliberative LLM with a fast multimodal MoE-LoRA module using per-category physically isolated adapters for continual learning without forgetting. Failure-correction trajectory pairs are internalized via joint behavioral cloning and contrastive objectives. Consolidation is controlled by a parameterization-worthiness score and a scale-free self-triggered mechanism claimed to transfer across task distributions without re-tuning. Experiments are stated to show gains in long-horizon task performance, reduced forgetting on consolidated skills, and improved parametric-versus-retrieval efficiency over baselines.
Significance. If the empirical claims hold under rigorous evaluation, the work would be significant for embodied AI by showing how contrastive internalization of failures and self-triggered parametric consolidation can produce self-evolving agents that scale better than retrieval-heavy systems. The isolated-adapter design and scale-free trigger address real continual-learning and efficiency bottlenecks in long-horizon settings such as Minecraft.
major comments (1)
- [Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comment on the abstract. We address the point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.
Authors: We agree that the abstract would benefit from additional quantitative detail to allow readers to assess the strength of the empirical claims at a glance. The full manuscript reports these specifics (including metrics, baselines, number of trials, and statistical comparisons) in the Experiments section. To address the concern, we will revise the abstract to include representative performance numbers, baseline comparisons, and trial counts while preserving conciseness. revision: yes
Circularity Check
No significant circularity
full rationale
The provided abstract and description contain no equations, derivations, self-citations, or load-bearing mechanisms that reduce any claimed prediction or result to its inputs by construction. All elements are descriptive of an empirical framework and experimental outcomes in Minecraft, with no visible mathematical chain or ansatz that could trigger the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su
Dynamic mixture of curriculum lora experts for continual multimodal instruction tuning.arXiv preprint arXiv:2506.11672. Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su. 2026a. Can segmentation mod- els understand the world? towards proactive affor- dance reasoning via visual chain-of-thought.arXiv preprint arXiv:2605.27764. Yuchen G...
-
[2]
InProceed- ings of the European conference on computer vision (ECCV), pages 67–82
Piggyback: Adapting a single network to mul- tiple tasks by learning to mask weights. InProceed- ings of the European conference on computer vision (ECCV), pages 67–82. Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. InProceedings of the IEEE conference on Computer Vision and Pattern Recogn...
2018
-
[3]
Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig
Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig. 2026. Clare: Continual learning for vision- language-action models via autonomous adapter rout- ing and expansion.IEEE Robotics and Automation Letters. ...
2026
-
[4]
Progressive neural networks.arXiv preprint arXiv:1606.04671. Georgy Savva, Oscar Michel, Daohan Lu, Suppakit Waiwitlikhit, Timothy Meehan, Dhairya Mishra, Sri- vats Poddar, Jack Lu, and Saining Xie. 2026. So- laris: Building a multiplayer video world model in minecraft.arXiv preprint arXiv:2602.22208. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
Embodied- rag: General non-parametric embodied memory for retrieval and generation,
Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907. Zitao Wang, Xinyi Wang, and Wei Hu. 2025. Mixture of lora experts for continual information extraction with llms. InFindings of the Association for Com- putational Linguistics: EMNLP 2025...
-
[6]
Continual learning through synaptic intelli- gence. InInternational conference on machine learn- ing, pages 3987–3995. Pmlr. Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transac- tions on Information Systems, 43(6):1–...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.