MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents

Hanxin Zhu; Hao Wu; Kun Li; Liang Mi; Qianxi Zhang; Shiqi Jiang; Ting Cao; Xin Ding; Xinrui Wang; Yifan Yang

arxiv: 2605.07594 · v2 · submitted 2026-05-08 · 💻 cs.RO

MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents

Xin Ding , Xinrui Wang , Yifan Yang , Hao Wu , Shiqi Jiang , Qianxi Zhang , Liang Mi , Hanxin Zhu

show 4 more authors

Kun Li Yunxin Liu Zhibo Chen Ting Cao

This is my paper

Pith reviewed 2026-05-15 05:53 UTC · model grok-4.3

classification 💻 cs.RO

keywords memory compilationembodied agentsstate-conditioned memoryMemCompilerAlf WorldEmbodiedBenchScienceWorldlatent memory channels

0 comments

The pith

MemCompiler replaces static memory injection with a learned compiler that reads the agent's current state and compiles only relevant guidance at each step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard memory systems for embodied agents load all retrieved memory at the episode start, a static approach that soon drifts out of sync with the agent's evolving situation and can lower performance below a no-memory baseline. MemCompiler reframes the problem as state-conditioned compilation: a learned compiler examines a structured Brief State of the current execution, selects pertinent memory, and turns it into executable guidance. The guidance travels through both an explicit text channel and a latent Soft-Mem channel that carries perceptual details text cannot express. Experiments on Alf World, EmbodiedBench, and ScienceWorld show this method raises success rates over no-memory baselines by up to 129 percent across open-source backbones, reaches or nears closed-source frontier systems, and cuts per-step latency by 60 percent.

Core claim

MemCompiler reframes memory utilization as State-Conditioned Memory Compilation. A learned Memory Compiler reads a structured Brief State that captures the agent's current execution state and dynamically selects and compiles only relevant memory into executable guidance, delivered through a text channel and a latent Soft-Mem channel that preserves perceptual information not expressible in text.

What carries the argument

The Memory Compiler, which ingests the agent's Brief State and outputs compiled guidance via text plus latent Soft-Mem channels to the executor.

If this is right

Open-source executors with MemCompiler match or approach closed-source frontier systems on Alf World, EmbodiedBench, and ScienceWorld.
Per-step latency drops by 60 percent compared with static memory injection.
Lightweight executors no longer fall below the no-memory baseline when memory is added.
Dynamic selection keeps guidance aligned with the agent's evolving execution state.
The same compiler-plus-executor split works across multiple open-source backbones with consistent gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Soft-Mem latent channel could let vision-language executors retain spatial or sensory details that would otherwise be lost in text summaries.
Training the compiler on more diverse state traces might reduce the need for hand-crafted prompts in new environments.
Because compilation happens per step, the architecture naturally supports long-horizon tasks where early memory would otherwise become obsolete.

Load-bearing premise

A learned compiler can consistently extract only the relevant subset of memory from the Brief State without introducing selection mistakes that harm the downstream executor.

What would settle it

An experiment in which the compiler is forced to include clearly irrelevant memory on a benchmark where the no-memory baseline already succeeds, producing a measurable drop in success rate.

read the original abstract

Existing memory systems for embodied agents typically inject retrieved memory as static context at episode start, a paradigm we term Ahead-of-time Monolithic Memory Injection (AMMI). However, this static design quickly becomes misaligned with the agent's evolving state and may degrade lightweight executors below the no-memory baseline. To address this, we propose MemCompiler, which reframes memory utilization as State-Conditioned Memory Compilation. A learned Memory Compiler reads a structured Brief State capturing the agent's current execution state and dynamically selects and compiles only relevant memory into executable guidance. This guidance is delivered through a text channel and a latent Soft-Mem channel that preserves perceptual information not expressible in text. Across Alf World, EmbodiedBench, and ScienceWorld, MemCompiler consistently improves over no-memory across open-source backbones (up to +129%), matches or approaches frontier closed-source systems, and reduces per-step latency by 60%, demonstrating that state-aware memory compilation improves both effectiveness and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemCompiler swaps static upfront memory dumps for a learned compiler that pulls relevant pieces from a brief state summary and feeds them via text plus latent channels, with clear benchmark gains but thin details on how the state is built.

read the letter

The paper's useful move is to stop treating memory as a fixed block injected at the start of an episode. Instead it trains a compiler that reads a structured snapshot of the agent's current situation and assembles only the parts that matter right now, shipping them as ordinary text plus a latent vector that keeps perceptual information text cannot hold. That state-conditioned framing and the dual-channel delivery are the concrete additions over the AMMI baseline they criticize.

Referee Report

2 major / 2 minor

Summary. The paper proposes MemCompiler as an alternative to Ahead-of-time Monolithic Memory Injection (AMMI) for embodied agents. It introduces a learned Memory Compiler that reads a structured Brief State of the agent's current execution context and dynamically compiles only relevant memory into executable guidance, delivered via a text channel and a latent Soft-Mem channel. Experiments on Alf World, EmbodiedBench, and ScienceWorld report consistent gains over no-memory baselines (up to +129% across open-source backbones), performance matching or approaching closed-source systems, and a 60% reduction in per-step latency.

Significance. If validated, the shift from static memory injection to state-conditioned compilation could improve both effectiveness and efficiency for long-horizon embodied tasks. The dual-channel design (text plus Soft-Mem) addresses a concrete limitation of text-only memory systems by preserving perceptual details, which is a substantive technical contribution if the empirical gains prove robust.

major comments (2)

[Abstract] Abstract: The headline claims of up to +129% improvement and 60% latency reduction rest on an empirical comparison whose controls, baseline implementations, statistical significance, and Brief State construction are not described. Without these, it is impossible to determine whether the gains arise from the proposed state-conditioned compilation or from unstated differences in prompting, training data, or executor fine-tuning.
[Abstract] Abstract / Experimental section: The central premise that the learned compiler reliably extracts only relevant memory without introducing selection errors or degrading the executor is not supported by any reported ablation on Brief State completeness, compiler training distribution, or failure cases on the Alf World / EmbodiedBench / ScienceWorld trajectories. If the Brief State omits temporal or perceptual details, the Soft-Mem channel could still inject incorrect guidance, undermining the claim that state-aware compilation is strictly superior to AMMI.

minor comments (2)

[Abstract] Abstract: The acronym AMMI is defined but the full phrase 'Ahead-of-time Monolithic Memory Injection' appears only once; repeating the expansion on first use in the main text would aid readability.
[Abstract] Abstract: The phrase 'matches or approaches frontier closed-source systems' is vague; specifying the exact closed-source models and the metric on which they are matched would strengthen the claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback. We address each major comment below with clarifications drawn from the manuscript and indicate where revisions will strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claims of up to +129% improvement and 60% latency reduction rest on an empirical comparison whose controls, baseline implementations, statistical significance, and Brief State construction are not described. Without these, it is impossible to determine whether the gains arise from the proposed state-conditioned compilation or from unstated differences in prompting, training data, or executor fine-tuning.

Authors: We agree the abstract is concise and will expand it with explicit references. Section 4.1 states that all methods share identical executor backbones, prompting templates, and training distributions; only memory handling differs. Statistical significance is reported via mean and standard deviation over five random seeds in Table 1 and Appendix C. Brief State construction (current goal, inventory, recent observations, action history) is defined in Section 3.2. We will add a summary table of controls and a footnote in the abstract pointing to these sections. revision: yes
Referee: [Abstract] Abstract / Experimental section: The central premise that the learned compiler reliably extracts only relevant memory without introducing selection errors or degrading the executor is not supported by any reported ablation on Brief State completeness, compiler training distribution, or failure cases on the Alf World / EmbodiedBench / ScienceWorld trajectories. If the Brief State omits temporal or perceptual details, the Soft-Mem channel could still inject incorrect guidance, undermining the claim that state-aware compilation is strictly superior to AMMI.

Authors: Ablations on Brief State completeness appear in Appendix B, showing performance drops when temporal or perceptual elements are omitted. The compiler is trained on trajectories drawn from the same environment distributions used at test time. Section 5.3 analyzes failure trajectories and attributes most errors to executor limitations rather than compilation mistakes; the Soft-Mem channel demonstrably preserves details that improve rather than degrade performance. We will move key ablation results into the main experimental section and expand the failure-case discussion with additional examples. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical evaluation of state-conditioned memory compilation

full rationale

The paper defines AMMI as a baseline paradigm and introduces MemCompiler as an alternative that uses a learned compiler on Brief State to produce guidance via text and Soft-Mem channels. All reported gains (+129% over no-memory, 60% latency reduction) are presented as outcomes of benchmark experiments on Alf World, EmbodiedBench, and ScienceWorld rather than as predictions derived from equations or self-referential definitions. No load-bearing step reduces a claimed result to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work. The derivation chain is therefore self-contained empirical comparison against external baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 3 invented entities

The abstract introduces three new components whose behavior is learned rather than derived; no explicit free parameters, axioms, or independent evidence for the invented entities are stated.

invented entities (3)

Memory Compiler no independent evidence
purpose: Learned module that reads Brief State and compiles relevant memory into guidance
Core learned component whose selection logic is not further specified in the abstract.
Brief State no independent evidence
purpose: Structured capture of the agent's current execution state
Input representation required for conditioning the compiler.
Soft-Mem channel no independent evidence
purpose: Latent channel preserving perceptual information not expressible in text
Additional delivery mechanism introduced to complement text guidance.

pith-pipeline@v0.9.0 · 5500 in / 1190 out tokens · 43838 ms · 2026-05-15T05:53:20.290741+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MemCompiler reframes memory utilization as State-Conditioned Memory Compilation. A learned Memory Compiler reads a structured Brief State... delivers through text channel and latent Soft-Mem channel
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AMMI injects the full M at episode start... SCMC retains M as source library and compiles only state-relevant content m*,t at each step

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 14 internal anchors

[1]

Physical reasoning and object planning for household embodied agents.arXiv preprint arXiv:2311.13577, 2023

Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, and Dianbo Liu. Physical reasoning and object planning for household embodied agents.arXiv preprint arXiv:2311.13577, 2023

work page arXiv 2023
[2]

Multi-agent embodied ai: Advances and future directions.Science China Information Sciences, 69(5):151202, 2026

Zhaohan Feng, Ruiqi Xue, Lei Yuan, Yang Yu, Ning Ding, Meiqin Liu, Bingzhao Gao, Jian Sun, Xinhu Zheng, and Gang Wang. Multi-agent embodied ai: Advances and future directions.Science China Information Sciences, 69(5):151202, 2026

work page 2026
[3]

Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

Gemini Robotics Team, Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Ashwin Balakrishna, Nathan Batchelor, Alex Bewley, Jeff Bingham, et al. Gemini robotics 1.5: Pushing the frontier of generalist robots with advanced embodied reasoning, thinking, and motion transfer.arXiv preprint arXiv:2510.03342, 2025

work page internal anchor Pith review arXiv 2025
[4]

arXiv preprint arXiv:2504.21716 (2025)

Marc Glocker, Peter Hönig, Matthias Hirschmanner, and Markus Vincze. Llm-empowered embodied agent for memory-augmented task planning in household robotics.arXiv preprint arXiv:2504.21716, 2025

work page arXiv 2025
[5]

Habitat 2.0: Training home assistants to rearrange their habitat.Advances in neural information processing systems, 34:251–266, 2021

Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat.Advances in neural information processing systems, 34:251–266, 2021

work page 2021
[6]

Large language models as generalizable policies for embodied tasks

Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Rin Metcalf, Walter Talbott, Natalie Mackraz, R Devon Hjelm, and Alexander T Toshev. Large language models as generalizable policies for embodied tasks. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[7]

Vlmbench: A compositional benchmark for vision-and-language manipulation.Advances in Neural Information Processing Systems, 35:665– 678, 2022

Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, and Xin Wang. Vlmbench: A compositional benchmark for vision-and-language manipulation.Advances in Neural Information Processing Systems, 35:665– 678, 2022

work page 2022
[8]

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution

Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. Remember me, refine me: A dynamic procedural memory framework for experience-driven agent evolution.arXiv preprint arXiv:2512.10696, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z Pan, et al. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Siru Ouyang, Jun Yan, I Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T Le, Samira Daruki, Xiangru Tang, et al. Reasoningbank: Scaling agent self-evolving with reasoning memory.arXiv preprint arXiv:2509.25140, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y

Alireza Rezazadeh, Zichao Li, Wei Wei, and Yujia Bao. From isolated conversations to hierarchical schemas: Dynamic tree memory representation for llms.arXiv preprint arXiv:2410.14052, 2024

work page arXiv 2024
[12]

H-mem: Harnessing synaptic plasticity with hebbian memory networks.Advances in Neural Information Processing Systems, 33:21627–21637, 2020

Thomas Limbacher and Robert Legenstein. H-mem: Harnessing synaptic plasticity with hebbian memory networks.Advances in Neural Information Processing Systems, 33:21627–21637, 2020

work page 2020
[13]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025

Yaxiong Wu, Yongyue Zhang, Sheng Liang, and Yong Liu. Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025. 12

work page arXiv 2025
[15]

Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32779–32798, 2025

work page 2025
[16]

Seeing, listening, remembering, and reasoning: A multi- modal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025

Lin Long, Yichen He, Wentao Ye, Yiyuan Pan, Yuan Lin, Hang Li, Junbo Zhao, and Wei Li. Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025

work page arXiv 2025
[17]

Openclaw: The ai that actually does things

OpenClaw. Openclaw: The ai that actually does things. https://openclaw.ai/, 2026. Accessed: 2026-05-04

work page 2026
[18]

Claude code: Ai-powered coding assistant for developers.https://claude.com/product/ claude-code, 2026

Anthropic. Claude code: Ai-powered coding assistant for developers.https://claude.com/product/ claude-code, 2026. Accessed: 2026-05-04

work page 2026
[19]

Alfred: A benchmark for interpreting grounded instructions for everyday tasks

Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749, 2020

work page 2020
[20]

Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation

Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, et al. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. InConference on Robot Learning, pages 80–93. PMLR, 2023

work page 2023
[21]

Robot-r1: Reinforcement learning for enhanced embodied reasoning in robotics

Dongyoung Kim, Sumin Park, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, and Younggyo Seo. Robot-r1: Reinforcement learning for enhanced embodied reasoning in robotics.arXiv preprint arXiv:2506.00070, 2025

work page arXiv 2025
[22]

Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots, 2026

Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, and Yuke Zhu. Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots.arXiv preprint arXiv:2603.04356, 2026

work page arXiv 2026
[23]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments

Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Elliott Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, Karen Liu, et al. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. InConference on robot learning, pages 477–490. PMLR, 2022

work page 2022
[24]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

work page 2023
[26]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

work page 2024
[27]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

work page 2023
[28]

Memgpt: towards llms as operating systems

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: towards llms as operating systems. 2023

work page 2023
[29]

Expel: Llm agents are experiential learners

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19632–19642, 2024. 13

work page 2024
[30]

Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532–59569, 2024

Bernal J Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532–59569, 2024

work page 2024
[31]

Kadavy, Inc., 2021

David Kadavy.Digital Zettelkasten: Principles, Methods, & Examples. Kadavy, Inc., 2021

work page 2021
[32]

Sönke Ahrens, 2022

Sönke Ahrens.How to take smart notes: One simple technique to boost writing, learning and thinking. Sönke Ahrens, 2022

work page 2022
[33]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Introducingclaudeopus4.6

Anthropic. Introducingclaudeopus4.6. https://www.anthropic.com/news/claude-opus-4-6,2026. Online; accessed 2026-05-05

work page 2026
[35]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[36]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and An- ima Anandkumar. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907, 2024

Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, et al. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907, 2024

work page 1907
[39]

G- memory: Tracing hierarchical memory for multi-agent systems, 2025

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. G-memory: Tracing hierarchical memory for multi-agent systems.arXiv preprint arXiv:2506.07398, 2025

work page arXiv 2025
[40]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, et al. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

General agentic memory via deep research

BY Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, and Zheng Liu. General agentic memory via deep research. arXiv preprint arXiv:2511.18423, 2025

work page arXiv 2025
[42]

Robomemory: A brain-inspired multi-memory agentic framework for lifelong learning in physical embodied systems

Mingcong Lei, Honghao Cai, Zezhou Cui, Liangchen Tan, Junkun Hong, Gehan Hu, Shuangyu Zhu, Yimou Wu, Shaohan Jiang, Ge Wang, et al. Robomemory: A brain-inspired multi-memory agentic framework for lifelong learning in physical embodied systems. InNeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI, 2025

work page 2025
[43]

Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, and Tat-Seng Chua. Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

work page arXiv 2026
[44]

Appagent: Multimodal agents as smartphone users

Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self-evolving agents.arXiv preprint arXiv:2509.24704, 2025

work page arXiv 2025
[45]

Vismem: Latent vision memory unlocks potential of vision-language models,

Xinlei Yu, Chengming Xu, Guibin Zhang, Zhangquan Chen, Yudong Zhang, Yongbo He, Peng-Tao Jiang, Jiangning Zhang, Xiaobin Hu, and Shuicheng Yan. Vismem: Latent vision memory unlocks potential of vision-language models.arXiv preprint arXiv:2511.11007, 2025

work page arXiv 2025
[46]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020. 14

work page internal anchor Pith review Pith/arXiv arXiv 2010
[47]

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, et al. Embodiedbench: Comprehensive benchmarking multi-modal large language models for vision-driven embodied agents.arXiv preprint arXiv:2502.09560, 2025

work page internal anchor Pith review arXiv 2025
[48]

Scienceworld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298, 2022

Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. Scienceworld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298, 2022

work page 2022
[49]

Langmem: A framework for long-term memory in llm agents.https://github.com/ langchain-ai/langmem, 2026

LangChain AI. Langmem: A framework for long-term memory in llm agents.https://github.com/ langchain-ai/langmem, 2026. Accessed: 2026-05-04

work page 2026
[50]

Qwen2.5-vl technical report, 2025

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025

work page 2025
[51]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Textworld: A learning environment for text-based games

Marc-Alexandre Côté, Akos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, et al. Textworld: A learning environment for text-based games. InWorkshop on Computer Games, pages 41–75. Springer, 2018

work page 2018
[53]

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. Ai2-thor: An interactive 3d environment for visual ai.arXiv preprint arXiv:1712.05474, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

The ycb object and model set: Towards common benchmarks for manipulation research

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M Dollar. The ycb object and model set: Towards common benchmarks for manipulation research. In2015 international conference on advanced robotics (ICAR), pages 510–517. IEEE, 2015

work page 2015
[55]

Agentsquare: Automatic llm agent search in modular design space

Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, and Yong Li. Agentsquare: Automatic llm agent search in modular design space. InThe Thirteenth International Conference on Learning Representations, 2024

work page 2024
[56]

zero-leakage

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022. A. Progress Rate on EmbodiedBench Table 6 reportsProgress Rate, which credits partial subgoal completion, as a complement to the strict Success Rate in Table 1. The Progr...

work page 2022
[57]

Concurrently, maintain a dynamically updated task memory (M) to store successful and failed trajectories accumulated from historical interactions

Environment Initialization and Task Loading: Initialize the target environment and load the task goal. Concurrently, maintain a dynamically updated task memory (M) to store successful and failed trajectories accumulated from historical interactions

work page
[58]

• Brief State Construction: Encode the current environment observation, historical action sequence, and the task goal into a structured brief state (𝑠𝑏𝑟𝑖𝑒 𝑓)

State-Conditioned Memory Compilation: At each timestep𝑡 during task execution, the following procedures are performed: • Memory Selection: Using the currenttask goalas the query, retrieve relevant historical trajectory segments fromM. • Brief State Construction: Encode the current environment observation, historical action sequence, and the task goal into...

work page
[59]

The execution results and environment feedback are recorded and subsequently used to update the task memory at the end of the episode

Action Execution and Feedback: The Executor utilizes the guidance𝑔 alongside the current environment observation to output a specific action. The execution results and environment feedback are recorded and subsequently used to update the task memory at the end of the episode

work page
[60]

This data is used to train the Memory Compiler

SFT Sample Extraction and Encapsulation: After traversing the training set, successful trajectories are filtered and encapsulated into two types of supervision signals: • Compiler Data: Takes𝑠𝑏𝑟𝑖𝑒 𝑓 and the retrieved memory as inputs, with the teacher-generated guidance𝑔as the target output. This data is used to train the Memory Compiler. • Executor Data:...

work page
[61]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), deviating from goal, or a sub-task is done (needs new instruction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - If "move" fails at goal, suggest "put" (and vice versa). - If task item observed/clo...

work page
[67]

# Memory Compiler System Prompt on Alfworld Figure 8|Memory Compiler System Prompt on Alfworld

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on Alfworld Figure 8|Memory Compiler System Prompt on Alfworld. 23 You are the **High-Level Strategic Planner** for an embodied agent operating in a household environment. You observe the environment through first-person RGB images. You guide the Low-Level Executor using visual ob...

work page
[68]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), deviating from goal, or a sub-task is done (needs new instruction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - If task item observed/closed, strictly guide to interact/open it

work page
[69]

- **Rules:** `UPDATE`/`DELETE` must match existing text exactly

**BRIEF (Brief State Ops):** - **Triggers:** New discovery (loc/state), state change (e.g., door opened), or task progress state update needed. - **Rules:** `UPDATE`/`DELETE` must match existing text exactly. `FOLD` if >10 items

work page
[71]

Last action executed successfully

**NOACTION:** Executor is progressing logically; no new info to record and no need to guide. ## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type. **A. [Guidance_Fields] Definition:** <brief_reason>Why intervention is necessary</brief_reason> <guidance>Strategic instructio...

work page
[74]

# Memory Compiler System Prompt on EB-AlFRED and EB-Habitat Figure 9|Memory Compiler System Prompt on EB-AlFRED and EB-Habitat

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on EB-AlFRED and EB-Habitat Figure 9|Memory Compiler System Prompt on EB-AlFRED and EB-Habitat. 24 You are the **High-Level Strategic Planner** for a science experiment agent operating in the ScienceWorld simulation environment. You guide the Low-Level Executor using observation, ...

work page
[75]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), performing irrelevant actions, or a sub-goal is done (needs new direction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - **CRITICAL about "focus on"**: The "focus on" action is NOT an observation —...

work page
[76]

- **Rules:** `UPDATE`/`DELETE` must match existing text exactly

**BRIEF (Brief State Ops):** - **Triggers:** New discovery (object location, state change), experiment progress, or task progress state update needed. - **Rules:** `UPDATE`/`DELETE` must match existing text exactly. `FOLD` if >10 items

work page
[77]

**HYBRID:** Both Intervention and Brief State Ops are needed simultaneously

work page
[78]

## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type

**NOACTION:** Executor is progressing logically; no new info to record and no need to guide. ## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type. **A. [Guidance_Fields] Definition:** <brief_reason>Why intervention is necessary</brief_reason> <guidance>Strategic instructio...

work page
[79]

No markdown outside tags

Output Strictness: Only output the XML block. No markdown outside tags

work page
[80]

Read-only Task Memory; modify Brief State only via BRIEF ops

work page
[81]

pour OBJ in OBJ

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on Scienceworld Figure 10|Memory Compiler System Prompt on Scienceworld. 25 You are a household robot executor operating in a simulated home environment. You are now in a household environment called Alfworld, and your tasks include locating objects, heating or cooling items, and ...

work page

[1] [1]

Physical reasoning and object planning for household embodied agents.arXiv preprint arXiv:2311.13577, 2023

Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, and Dianbo Liu. Physical reasoning and object planning for household embodied agents.arXiv preprint arXiv:2311.13577, 2023

work page arXiv 2023

[2] [2]

Multi-agent embodied ai: Advances and future directions.Science China Information Sciences, 69(5):151202, 2026

Zhaohan Feng, Ruiqi Xue, Lei Yuan, Yang Yu, Ning Ding, Meiqin Liu, Bingzhao Gao, Jian Sun, Xinhu Zheng, and Gang Wang. Multi-agent embodied ai: Advances and future directions.Science China Information Sciences, 69(5):151202, 2026

work page 2026

[3] [3]

Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

Gemini Robotics Team, Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Ashwin Balakrishna, Nathan Batchelor, Alex Bewley, Jeff Bingham, et al. Gemini robotics 1.5: Pushing the frontier of generalist robots with advanced embodied reasoning, thinking, and motion transfer.arXiv preprint arXiv:2510.03342, 2025

work page internal anchor Pith review arXiv 2025

[4] [4]

arXiv preprint arXiv:2504.21716 (2025)

Marc Glocker, Peter Hönig, Matthias Hirschmanner, and Markus Vincze. Llm-empowered embodied agent for memory-augmented task planning in household robotics.arXiv preprint arXiv:2504.21716, 2025

work page arXiv 2025

[5] [5]

Habitat 2.0: Training home assistants to rearrange their habitat.Advances in neural information processing systems, 34:251–266, 2021

Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat.Advances in neural information processing systems, 34:251–266, 2021

work page 2021

[6] [6]

Large language models as generalizable policies for embodied tasks

Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Rin Metcalf, Walter Talbott, Natalie Mackraz, R Devon Hjelm, and Alexander T Toshev. Large language models as generalizable policies for embodied tasks. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023

[7] [7]

Vlmbench: A compositional benchmark for vision-and-language manipulation.Advances in Neural Information Processing Systems, 35:665– 678, 2022

Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, and Xin Wang. Vlmbench: A compositional benchmark for vision-and-language manipulation.Advances in Neural Information Processing Systems, 35:665– 678, 2022

work page 2022

[8] [8]

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution

Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. Remember me, refine me: A dynamic procedural memory framework for experience-driven agent evolution.arXiv preprint arXiv:2512.10696, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z Pan, et al. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Siru Ouyang, Jun Yan, I Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T Le, Samira Daruki, Xiangru Tang, et al. Reasoningbank: Scaling agent self-evolving with reasoning memory.arXiv preprint arXiv:2509.25140, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y

Alireza Rezazadeh, Zichao Li, Wei Wei, and Yujia Bao. From isolated conversations to hierarchical schemas: Dynamic tree memory representation for llms.arXiv preprint arXiv:2410.14052, 2024

work page arXiv 2024

[12] [12]

H-mem: Harnessing synaptic plasticity with hebbian memory networks.Advances in Neural Information Processing Systems, 33:21627–21637, 2020

Thomas Limbacher and Robert Legenstein. H-mem: Harnessing synaptic plasticity with hebbian memory networks.Advances in Neural Information Processing Systems, 33:21627–21637, 2020

work page 2020

[13] [13]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025

Yaxiong Wu, Yongyue Zhang, Sheng Liang, and Yong Liu. Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025. 12

work page arXiv 2025

[15] [15]

Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32779–32798, 2025

work page 2025

[16] [16]

Seeing, listening, remembering, and reasoning: A multi- modal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025

Lin Long, Yichen He, Wentao Ye, Yiyuan Pan, Yuan Lin, Hang Li, Junbo Zhao, and Wei Li. Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory.arXiv preprint arXiv:2508.09736, 2025

work page arXiv 2025

[17] [17]

Openclaw: The ai that actually does things

OpenClaw. Openclaw: The ai that actually does things. https://openclaw.ai/, 2026. Accessed: 2026-05-04

work page 2026

[18] [18]

Claude code: Ai-powered coding assistant for developers.https://claude.com/product/ claude-code, 2026

Anthropic. Claude code: Ai-powered coding assistant for developers.https://claude.com/product/ claude-code, 2026. Accessed: 2026-05-04

work page 2026

[19] [19]

Alfred: A benchmark for interpreting grounded instructions for everyday tasks

Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749, 2020

work page 2020

[20] [20]

Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation

Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Michael Lingelbach, Jiankai Sun, et al. Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. InConference on Robot Learning, pages 80–93. PMLR, 2023

work page 2023

[21] [21]

Robot-r1: Reinforcement learning for enhanced embodied reasoning in robotics

Dongyoung Kim, Sumin Park, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, and Younggyo Seo. Robot-r1: Reinforcement learning for enhanced embodied reasoning in robotics.arXiv preprint arXiv:2506.00070, 2025

work page arXiv 2025

[22] [22]

Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots, 2026

Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, and Yuke Zhu. Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots.arXiv preprint arXiv:2603.04356, 2026

work page arXiv 2026

[23] [23]

Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments

Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Elliott Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, Karen Liu, et al. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. InConference on robot learning, pages 477–490. PMLR, 2022

work page 2022

[24] [24]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

work page 2023

[26] [26]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

work page 2024

[27] [27]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

work page 2023

[28] [28]

Memgpt: towards llms as operating systems

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: towards llms as operating systems. 2023

work page 2023

[29] [29]

Expel: Llm agents are experiential learners

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. Expel: Llm agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19632–19642, 2024. 13

work page 2024

[30] [30]

Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532–59569, 2024

Bernal J Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologically inspired long-term memory for large language models.Advances in neural information processing systems, 37:59532–59569, 2024

work page 2024

[31] [31]

Kadavy, Inc., 2021

David Kadavy.Digital Zettelkasten: Principles, Methods, & Examples. Kadavy, Inc., 2021

work page 2021

[32] [32]

Sönke Ahrens, 2022

Sönke Ahrens.How to take smart notes: One simple technique to boost writing, learning and thinking. Sönke Ahrens, 2022

work page 2022

[33] [33]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Introducingclaudeopus4.6

Anthropic. Introducingclaudeopus4.6. https://www.anthropic.com/news/claude-opus-4-6,2026. Online; accessed 2026-05-05

work page 2026

[35] [35]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[36] [36]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and An- ima Anandkumar. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[38] [38]

Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907, 2024

Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, et al. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):1894–1907, 2024

work page 1907

[39] [39]

G- memory: Tracing hierarchical memory for multi-agent systems, 2025

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. G-memory: Tracing hierarchical memory for multi-agent systems.arXiv preprint arXiv:2506.07398, 2025

work page arXiv 2025

[40] [40]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, et al. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

General agentic memory via deep research

BY Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, and Zheng Liu. General agentic memory via deep research. arXiv preprint arXiv:2511.18423, 2025

work page arXiv 2025

[42] [42]

Robomemory: A brain-inspired multi-memory agentic framework for lifelong learning in physical embodied systems

Mingcong Lei, Honghao Cai, Zezhou Cui, Liangchen Tan, Junkun Hong, Gehan Hu, Shuangyu Zhu, Yimou Wu, Shaohan Jiang, Ge Wang, et al. Robomemory: A brain-inspired multi-memory agentic framework for lifelong learning in physical embodied systems. InNeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI, 2025

work page 2025

[43] [43]

Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, and Tat-Seng Chua. Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

work page arXiv 2026

[44] [44]

Appagent: Multimodal agents as smartphone users

Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self-evolving agents.arXiv preprint arXiv:2509.24704, 2025

work page arXiv 2025

[45] [45]

Vismem: Latent vision memory unlocks potential of vision-language models,

Xinlei Yu, Chengming Xu, Guibin Zhang, Zhangquan Chen, Yudong Zhang, Yongbo He, Peng-Tao Jiang, Jiangning Zhang, Xiaobin Hu, and Shuicheng Yan. Vismem: Latent vision memory unlocks potential of vision-language models.arXiv preprint arXiv:2511.11007, 2025

work page arXiv 2025

[46] [46]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020. 14

work page internal anchor Pith review Pith/arXiv arXiv 2010

[47] [47]

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, et al. Embodiedbench: Comprehensive benchmarking multi-modal large language models for vision-driven embodied agents.arXiv preprint arXiv:2502.09560, 2025

work page internal anchor Pith review arXiv 2025

[48] [48]

Scienceworld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298, 2022

Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. Scienceworld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298, 2022

work page 2022

[49] [49]

Langmem: A framework for long-term memory in llm agents.https://github.com/ langchain-ai/langmem, 2026

LangChain AI. Langmem: A framework for long-term memory in llm agents.https://github.com/ langchain-ai/langmem, 2026. Accessed: 2026-05-04

work page 2026

[50] [50]

Qwen2.5-vl technical report, 2025

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025

work page 2025

[51] [51]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

Textworld: A learning environment for text-based games

Marc-Alexandre Côté, Akos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, et al. Textworld: A learning environment for text-based games. InWorkshop on Computer Games, pages 41–75. Springer, 2018

work page 2018

[53] [53]

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, et al. Ai2-thor: An interactive 3d environment for visual ai.arXiv preprint arXiv:1712.05474, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[54] [54]

The ycb object and model set: Towards common benchmarks for manipulation research

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srinivasa, Pieter Abbeel, and Aaron M Dollar. The ycb object and model set: Towards common benchmarks for manipulation research. In2015 international conference on advanced robotics (ICAR), pages 510–517. IEEE, 2015

work page 2015

[55] [55]

Agentsquare: Automatic llm agent search in modular design space

Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, and Yong Li. Agentsquare: Automatic llm agent search in modular design space. InThe Thirteenth International Conference on Learning Representations, 2024

work page 2024

[56] [56]

zero-leakage

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022. A. Progress Rate on EmbodiedBench Table 6 reportsProgress Rate, which credits partial subgoal completion, as a complement to the strict Success Rate in Table 1. The Progr...

work page 2022

[57] [57]

Concurrently, maintain a dynamically updated task memory (M) to store successful and failed trajectories accumulated from historical interactions

Environment Initialization and Task Loading: Initialize the target environment and load the task goal. Concurrently, maintain a dynamically updated task memory (M) to store successful and failed trajectories accumulated from historical interactions

work page

[58] [58]

• Brief State Construction: Encode the current environment observation, historical action sequence, and the task goal into a structured brief state (𝑠𝑏𝑟𝑖𝑒 𝑓)

State-Conditioned Memory Compilation: At each timestep𝑡 during task execution, the following procedures are performed: • Memory Selection: Using the currenttask goalas the query, retrieve relevant historical trajectory segments fromM. • Brief State Construction: Encode the current environment observation, historical action sequence, and the task goal into...

work page

[59] [59]

The execution results and environment feedback are recorded and subsequently used to update the task memory at the end of the episode

Action Execution and Feedback: The Executor utilizes the guidance𝑔 alongside the current environment observation to output a specific action. The execution results and environment feedback are recorded and subsequently used to update the task memory at the end of the episode

work page

[60] [60]

This data is used to train the Memory Compiler

SFT Sample Extraction and Encapsulation: After traversing the training set, successful trajectories are filtered and encapsulated into two types of supervision signals: • Compiler Data: Takes𝑠𝑏𝑟𝑖𝑒 𝑓 and the retrieved memory as inputs, with the teacher-generated guidance𝑔as the target output. This data is used to train the Memory Compiler. • Executor Data:...

work page

[61] [61]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), deviating from goal, or a sub-task is done (needs new instruction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - If "move" fails at goal, suggest "put" (and vice versa). - If task item observed/clo...

work page

[62] [67]

# Memory Compiler System Prompt on Alfworld Figure 8|Memory Compiler System Prompt on Alfworld

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on Alfworld Figure 8|Memory Compiler System Prompt on Alfworld. 23 You are the **High-Level Strategic Planner** for an embodied agent operating in a household environment. You observe the environment through first-person RGB images. You guide the Low-Level Executor using visual ob...

work page

[63] [68]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), deviating from goal, or a sub-task is done (needs new instruction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - If task item observed/closed, strictly guide to interact/open it

work page

[64] [69]

- **Rules:** `UPDATE`/`DELETE` must match existing text exactly

**BRIEF (Brief State Ops):** - **Triggers:** New discovery (loc/state), state change (e.g., door opened), or task progress state update needed. - **Rules:** `UPDATE`/`DELETE` must match existing text exactly. `FOLD` if >10 items

work page

[65] [71]

Last action executed successfully

**NOACTION:** Executor is progressing logically; no new info to record and no need to guide. ## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type. **A. [Guidance_Fields] Definition:** <brief_reason>Why intervention is necessary</brief_reason> <guidance>Strategic instructio...

work page

[66] [74]

# Memory Compiler System Prompt on EB-AlFRED and EB-Habitat Figure 9|Memory Compiler System Prompt on EB-AlFRED and EB-Habitat

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on EB-AlFRED and EB-Habitat Figure 9|Memory Compiler System Prompt on EB-AlFRED and EB-Habitat. 24 You are the **High-Level Strategic Planner** for a science experiment agent operating in the ScienceWorld simulation environment. You guide the Low-Level Executor using observation, ...

work page

[67] [75]

do X then Y

**EXPERIENCE (Intervention):** - **Triggers:** Executor is stuck (loops), performing irrelevant actions, or a sub-goal is done (needs new direction). - **Rules:** - Provide **single-step** strategic goals (No "do X then Y"). - If looping, explicitly say "Do not [action] again". - **CRITICAL about "focus on"**: The "focus on" action is NOT an observation —...

work page

[68] [76]

- **Rules:** `UPDATE`/`DELETE` must match existing text exactly

**BRIEF (Brief State Ops):** - **Triggers:** New discovery (object location, state change), experiment progress, or task progress state update needed. - **Rules:** `UPDATE`/`DELETE` must match existing text exactly. `FOLD` if >10 items

work page

[69] [77]

**HYBRID:** Both Intervention and Brief State Ops are needed simultaneously

work page

[70] [78]

## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type

**NOACTION:** Executor is progressing logically; no new info to record and no need to guide. ## OUTPUT FORMAT SPECIFICATIONS Use the **Component Definitions** below to fill the strictly required XML structure for your chosen type. **A. [Guidance_Fields] Definition:** <brief_reason>Why intervention is necessary</brief_reason> <guidance>Strategic instructio...

work page

[71] [79]

No markdown outside tags

Output Strictness: Only output the XML block. No markdown outside tags

work page

[72] [80]

Read-only Task Memory; modify Brief State only via BRIEF ops

work page

[73] [81]

pour OBJ in OBJ

Brevity: Keep reasons and guidance concise. # Memory Compiler System Prompt on Scienceworld Figure 10|Memory Compiler System Prompt on Scienceworld. 25 You are a household robot executor operating in a simulated home environment. You are now in a household environment called Alfworld, and your tasks include locating objects, heating or cooling items, and ...

work page