pith. sign in

arxiv: 2605.27762 · v2 · pith:32FKARX6new · submitted 2026-05-26 · 💻 cs.AI

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Pith reviewed 2026-06-29 16:39 UTC · model grok-4.3

classification 💻 cs.AI
keywords embodied agentsMinecraftparametric memorycontrastive learningcontinual learningLoRA adaptersmixture of experts
0
0 comments X

The pith

PEAM internalizes Minecraft agent experiences into parameter-based skills using contrastive learning on failure-correction pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PEAM as a way to move embodied agent memory from retrieval at inference time to internalized parameters. It combines a slow reasoning LLM with a fast multimodal Mixture-of-Experts module using isolated LoRA adapters for different skill categories. Experiences are turned into skills by training on pairs of failed and corrected trajectories with both cloning and contrastive losses. A parameterization-worthiness score decides which experiences to keep, and a scale-free trigger decides when to consolidate without manual thresholds, allowing the agent to improve over time across different tasks.

Core claim

PEAM demonstrates that by internalizing experience through contrastive objectives on failure-correction trajectories into physically isolated per-category LoRA adapters, governed by a parameterization-worthiness score and scale-free self-triggered consolidation, embodied agents in Minecraft can achieve better long-horizon task performance, reduced forgetting of consolidated skills, and higher efficiency compared to retrieval-based or other parametric approaches.

What carries the argument

The multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, which enables parameter-level continual learning without catastrophic forgetting.

Load-bearing premise

The parameterization-worthiness score combined with the scale-free consolidation trigger allows effective transfer to new task distributions without any re-tuning.

What would settle it

Testing the agent on a significantly different Minecraft task distribution after consolidation without any parameter updates, and observing whether performance improves or forgetting occurs compared to baselines.

Figures

Figures reproduced from arXiv: 2605.27762 by Hongmin Cai, Junli Gong, Weicheng Wang, Weifeng Su, Yiu-ming Cheung, Yuchen Guo.

Figure 1
Figure 1. Figure 1: PEAM turns episodic agent memory into parametric embodied skills. A slow deliberative LLM explores, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PEAM architecture. Successful and corrected trajectories produced by the slow tier are staged in episodic [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PV and STC make consolidation selective and self-triggered. (a) Full PV scoring ranks candidate skills [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Forgetting under sequential consolidation. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces PEAM, a Parametric Embodied Agent Memory framework for Minecraft that converts agent memory from inference-time retrieval to parameter-resident skills. It pairs a slow deliberative LLM with a fast multimodal MoE-LoRA module using per-category physically isolated adapters for continual learning without forgetting. Failure-correction trajectory pairs are internalized via joint behavioral cloning and contrastive objectives. Consolidation is controlled by a parameterization-worthiness score and a scale-free self-triggered mechanism claimed to transfer across task distributions without re-tuning. Experiments are stated to show gains in long-horizon task performance, reduced forgetting on consolidated skills, and improved parametric-versus-retrieval efficiency over baselines.

Significance. If the empirical claims hold under rigorous evaluation, the work would be significant for embodied AI by showing how contrastive internalization of failures and self-triggered parametric consolidation can produce self-evolving agents that scale better than retrieval-heavy systems. The isolated-adapter design and scale-free trigger address real continual-learning and efficiency bottlenecks in long-horizon settings such as Minecraft.

major comments (1)
  1. [Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comment on the abstract. We address the point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts that 'Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting..., and improves parametric-versus-retrieval efficiency' yet supplies no metrics, baselines, trial counts, statistical tests, or data-exclusion criteria, rendering it impossible to evaluate whether the data support the central claims.

    Authors: We agree that the abstract would benefit from additional quantitative detail to allow readers to assess the strength of the empirical claims at a glance. The full manuscript reports these specifics (including metrics, baselines, number of trials, and statistical comparisons) in the Experiments section. To address the concern, we will revise the abstract to include representative performance numbers, baseline comparisons, and trial counts while preserving conciseness. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and description contain no equations, derivations, self-citations, or load-bearing mechanisms that reduce any claimed prediction or result to its inputs by construction. All elements are descriptive of an empirical framework and experimental outcomes in Minecraft, with no visible mathematical chain or ansatz that could trigger the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities. The framework introduces several novel components whose implementation details and dependencies are not provided.

pith-pipeline@v0.9.1-grok · 5748 in / 1127 out tokens · 46194 ms · 2026-06-29T16:39:19.087521+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su

    Dynamic mixture of curriculum lora experts for continual multimodal instruction tuning.arXiv preprint arXiv:2506.11672. Yuchen Guo, Junli Gong, Hongmin Cai, Yiu-ming Che- ung, and Weifeng Su. 2026a. Can segmentation mod- els understand the world? towards proactive affor- dance reasoning via visual chain-of-thought.arXiv preprint arXiv:2605.27764. Yuchen G...

  2. [2]

    InProceed- ings of the European conference on computer vision (ECCV), pages 67–82

    Piggyback: Adapting a single network to mul- tiple tasks by learning to mask weights. InProceed- ings of the European conference on computer vision (ECCV), pages 67–82. Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. InProceedings of the IEEE conference on Computer Vision and Pattern Recogn...

  3. [3]

    Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig

    Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Ralf Römer, Yi Zhang, Yuming Li, and Angela P Schoel- lig. 2026. Clare: Continual learning for vision- language-action models via autonomous adapter rout- ing and expansion.IEEE Robotics and Automation Letters. ...

  4. [4]

    Progressive Neural Networks

    Progressive neural networks.arXiv preprint arXiv:1606.04671. Georgy Savva, Oscar Michel, Daohan Lu, Suppakit Waiwitlikhit, Timothy Meehan, Dhairya Mishra, Sri- vats Poddar, Jack Lu, and Saining Xie. 2026. So- laris: Building a multiplayer video world model in minecraft.arXiv preprint arXiv:2602.22208. Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik...

  5. [5]

    Embodied- rag: General non-parametric embodied memory for retrieval and generation,

    Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907. Zitao Wang, Xinyi Wang, and Wei Hu. 2025. Mixture of lora experts for continual information extraction with llms. InFindings of the Association for Com- putational Linguistics: EMNLP 2025...

  6. [6]

    Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

    Continual learning through synaptic intelli- gence. InInternational conference on machine learn- ing, pages 3987–3995. Pmlr. Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transac- tions on Information Systems, 43(6):1–...