PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Hongmin Cai; Junli Gong; Weicheng Wang; Weifeng Su; Yiu-ming Cheung; Yuchen Guo

arxiv: 2605.27762 · v2 · pith:32FKARX6new · submitted 2026-05-26 · 💻 cs.AI

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Yuchen Guo , Junli Gong , Weicheng Wang , Hongmin Cai , Yiu-ming Cheung , Weifeng Su This is my paper

Pith reviewed 2026-06-29 16:39 UTC · model grok-4.3

classification 💻 cs.AI

keywords embodied agentsMinecraftparametric memorycontrastive learningcontinual learningLoRA adaptersmixture of experts

0 comments

The pith

PEAM internalizes Minecraft agent experiences into parameter-based skills using contrastive learning on failure-correction pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PEAM as a way to move embodied agent memory from retrieval at inference time to internalized parameters. It combines a slow reasoning LLM with a fast multimodal Mixture-of-Experts module using isolated LoRA adapters for different skill categories. Experiences are turned into skills by training on pairs of failed and corrected trajectories with both cloning and contrastive losses. A parameterization-worthiness score decides which experiences to keep, and a scale-free trigger decides when to consolidate without manual thresholds, allowing the agent to improve over time across different tasks.

Core claim

PEAM demonstrates that by internalizing experience through contrastive objectives on failure-correction trajectories into physically isolated per-category LoRA adapters, governed by a parameterization-worthiness score and scale-free self-triggered consolidation, embodied agents in Minecraft can achieve better long-horizon task performance, reduced forgetting of consolidated skills, and higher efficiency compared to retrieval-based or other parametric approaches.

What carries the argument

The multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, which enables parameter-level continual learning without catastrophic forgetting.

Load-bearing premise

The parameterization-worthiness score combined with the scale-free consolidation trigger allows effective transfer to new task distributions without any re-tuning.

What would settle it

Testing the agent on a significantly different Minecraft task distribution after consolidation without any parameter updates, and observing whether performance improves or forgetting occurs compared to baselines.

Figures

Figures reproduced from arXiv: 2605.27762 by Hongmin Cai, Junli Gong, Weicheng Wang, Weifeng Su, Yiu-ming Cheung, Yuchen Guo.

**Figure 2.** Figure 2: PEAM architecture. Successful and corrected trajectories produced by the slow tier are staged in episodic [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: PV and STC make consolidation selective and self-triggered. (a) Full PV scoring ranks candidate skills [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Forgetting under sequential consolidation. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PEAM sketches a parametric memory shift for Minecraft agents using contrastive failure pairs and isolated LoRAs, but the abstract supplies no experiment details so the performance claims cannot be checked.

read the letter

PEAM's main move is turning retrieval-based memory into parameter-resident skills for embodied agents. It pairs a slow LLM with a fast MoE LoRA module that uses per-category physically isolated adapters, trains on failure-correction trajectory pairs with a joint behavioral cloning and contrastive loss, and adds a parameterization-worthiness score plus a scale-free self-triggered consolidation rule that is meant to work across task distributions without retuning.

The combination of contrastive internalization from failures and the isolated adapters is not a routine extension of prior work, and the self-triggered mechanism is a concrete attempt to reduce hand-tuning. The framework description is clear on the intended flow from experience to parameters.

The clear weakness is the complete absence of experimental information. The abstract states gains in long-horizon performance, reduced forgetting, and better efficiency, yet gives no metrics, baselines, Minecraft task list, data exclusion rules, or ablation results. Without those, it is impossible to tell whether the new components actually produce the reported effects or whether the gains come from other factors. The parameterization-worthiness score and scale-free trigger are described at a high level but not shown with equations or validation.

This is for researchers working on continual learning and memory architectures in embodied AI. Someone already thinking about LoRA-based agents or failure-driven training could pick up usable ideas here.

It deserves peer review so the experiments can be examined directly; the architectural sketch is coherent enough to warrant that step.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces PEAM, a Parametric Embodied Agent Memory framework for Minecraft that converts agent memory from inference-time retrieval to parameter-resident skills. It pairs a slow deliberative LLM with a fast multimodal MoE-LoRA module using per-category physically isolated adapters for continual learning without forgetting. Failure-correction trajectory pairs are internalized via joint behavioral cloning and contrastive objectives. Consolidation is controlled by a parameterization-worthiness score and a scale-free self-triggered mechanism claimed to transfer across task distributions without re-tuning. Experiments are stated to show gains in long-horizon task performance, reduced forgetting on consolidated skills, and improved parametric-versus-retrieval efficiency over baselines.

Significance. If the empirical claims hold under rigorous evaluation, the work would be significant for embodied AI by showing how contrastive internalization of failures and self-triggered parametric consolidation can produce self-evolving agents that scale better than retrieval-heavy systems. The isolated-adapter design and scale-free trigger address real continual-learning and efficiency bottlenecks in long-horizon settings such as Minecraft.

major comments (1)

[Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comment on the abstract. We address the point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.

Authors: We agree that the abstract would benefit from additional quantitative detail to allow readers to assess the strength of the empirical claims at a glance. The full manuscript reports these specifics (including metrics, baselines, number of trials, and statistical comparisons) in the Experiments section. To address the concern, we will revise the abstract to include representative performance numbers, baseline comparisons, and trial counts while preserving conciseness. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and description contain no equations, derivations, self-citations, or load-bearing mechanisms that reduce any claimed prediction or result to its inputs by construction. All elements are descriptive of an empirical framework and experimental outcomes in Minecraft, with no visible mathematical chain or ansatz that could trigger the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities. The framework introduces several novel components whose implementation details and dependencies are not provided.

pith-pipeline@v0.9.1-grok · 5748 in / 1127 out tokens · 46194 ms · 2026-06-29T16:39:19.087521+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su

Dynamic mixture of curriculum lora experts for continual multimodal instruction tuning.arXiv preprint arXiv:2506.11672. Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su. 2026a. Can segmentation mod- els understand the world? towards proactive affor- dance reasoning via visual chain-of-thought.arXiv preprint arXiv:2605.27764. Yuchen G...

work page arXiv 2025
[2]

InProceed- ings of the European conference on computer vision (ECCV), pages 67–82

Piggyback: Adapting a single network to mul- tiple tasks by learning to mask weights. InProceed- ings of the European conference on computer vision (ECCV), pages 67–82. Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. InProceedings of the IEEE conference on Computer Vision and Pattern Recogn...

2018
[3]

Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig

Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig. 2026. Clare: Continual learning for vision- language-action models via autonomous adapter rout- ing and expansion.IEEE Robotics and Automation Letters. ...

2026
[4]

Progressive Neural Networks

Progressive neural networks.arXiv preprint arXiv:1606.04671. Georgy Savva, Oscar Michel, Daohan Lu, Suppakit Waiwitlikhit, Timothy Meehan, Dhairya Mishra, Sri- vats Poddar, Jack Lu, and Saining Xie. 2026. So- laris: Building a multiplayer video world model in minecraft.arXiv preprint arXiv:2602.22208. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Embodied- rag: General non-parametric embodied memory for retrieval and generation,

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907. Zitao Wang, Xinyi Wang, and Wei Hu. 2025. Mixture of lora experts for continual information extraction with llms. InFindings of the Association for Com- putational Linguistics: EMNLP 2025...

work page arXiv 1907
[6]

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Continual learning through synaptic intelli- gence. InInternational conference on machine learn- ing, pages 3987–3995. Pmlr. Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transac- tions on Information Systems, 43(6):1–...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su

Dynamic mixture of curriculum lora experts for continual multimodal instruction tuning.arXiv preprint arXiv:2506.11672. Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su. 2026a. Can segmentation mod- els understand the world? towards proactive affor- dance reasoning via visual chain-of-thought.arXiv preprint arXiv:2605.27764. Yuchen G...

work page arXiv 2025

[2] [2]

InProceed- ings of the European conference on computer vision (ECCV), pages 67–82

Piggyback: Adapting a single network to mul- tiple tasks by learning to mask weights. InProceed- ings of the European conference on computer vision (ECCV), pages 67–82. Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. InProceedings of the IEEE conference on Computer Vision and Pattern Recogn...

2018

[3] [3]

Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig

Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig. 2026. Clare: Continual learning for vision- language-action models via autonomous adapter rout- ing and expansion.IEEE Robotics and Automation Letters. ...

2026

[4] [4]

Progressive Neural Networks

Progressive neural networks.arXiv preprint arXiv:1606.04671. Georgy Savva, Oscar Michel, Daohan Lu, Suppakit Waiwitlikhit, Timothy Meehan, Dhairya Mishra, Sri- vats Poddar, Jack Lu, and Saining Xie. 2026. So- laris: Building a multiplayer video world model in minecraft.arXiv preprint arXiv:2602.22208. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Embodied- rag: General non-parametric embodied memory for retrieval and generation,

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907. Zitao Wang, Xinyi Wang, and Wei Hu. 2025. Mixture of lora experts for continual information extraction with llms. InFindings of the Association for Com- putational Linguistics: EMNLP 2025...

work page arXiv 1907

[6] [6]

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Continual learning through synaptic intelli- gence. InInternational conference on machine learn- ing, pages 3987–3995. Pmlr. Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transac- tions on Information Systems, 43(6):1–...

work page internal anchor Pith review Pith/arXiv arXiv 2025