arxiv: 2507.07957 · v1 · submitted 2025-07-10 · 💻 cs.CL · cs.AI

Recognition: 1 theorem link

· Lean Theorem

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang , Xi Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords MIRIXmulti-agent memoryLLM agentsmultimodal memorylong-term memoryScreenshotVQALOCOMOmemory system

0 comments

The pith

MIRIX uses six specialized memory types coordinated by multiple agents to enable LLM-based agents to accurately recall long-term multimodal user data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current AI agents struggle with flat memory that limits personalization and reliable recall of user information over time. MIRIX introduces a modular system built around six memory types—Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault—managed by a multi-agent framework that handles dynamic updates and retrieval. This structure supports rich visual and textual experiences, such as long sequences of computer screenshots, making memory practical for ongoing agent use. On the ScreenshotVQA benchmark with nearly 20,000 high-resolution images, the system delivers 35% higher accuracy than a RAG baseline while cutting storage by 99.9%. It also reaches 85.4% on the LOCOMO long-conversation benchmark, establishing new performance levels for memory-augmented agents.

Core claim

MIRIX consists of six distinct, carefully structured memory types—Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault—coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale, as shown by 35% higher accuracy than the RAG baseline on ScreenshotVQA with 99.9% reduced storage and state-of-the-art 85.4% performance on LOCOMO.

What carries the argument

A multi-agent framework that dynamically controls updates and retrieval across six memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault.

If this is right

Agents maintain accurate recall across sequences of nearly 20,000 high-resolution screenshots.
Storage needs for memory drop by 99.9% relative to standard retrieval-augmented methods.
State-of-the-art results are achieved on long-form textual conversation benchmarks.
Agents can personalize responses using accumulated visual and textual user histories.
Real-time screen monitoring becomes viable for building and querying personalized memory bases locally.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continuous memory accumulation could support agents that adapt to daily user patterns without periodic retraining.
The modular memory design may extend naturally to additional input types such as audio streams.
Local secure storage emphasis opens pathways for privacy-focused deployment on personal devices.
Coordination patterns used here might apply to other multi-agent tasks beyond memory handling.

Load-bearing premise

The multi-agent coordination mechanism can reliably manage updates and retrieval across the six memory types without introducing retrieval errors or inconsistencies in long sequences.

What would settle it

A test showing retrieval errors rising or accuracy dropping below the RAG baseline on extended screenshot sequences or additional long-conversation data would falsify the performance claims.

read the original abstract

Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by solving the field's most critical challenge: enabling language models to truly remember. Unlike prior approaches, MIRIX transcends text to embrace rich visual and multimodal experiences, making memory genuinely useful in real-world scenarios. MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale. We validate MIRIX in two demanding settings. First, on ScreenshotVQA, a challenging multimodal benchmark comprising nearly 20,000 high-resolution computer screenshots per sequence, requiring deep contextual understanding and where no existing memory systems can be applied, MIRIX achieves 35% higher accuracy than the RAG baseline while reducing storage requirements by 99.9%. Second, on LOCOMO, a long-form conversation benchmark with single-modal textual input, MIRIX attains state-of-the-art performance of 85.4%, far surpassing existing baselines. These results show that MIRIX sets a new performance standard for memory-augmented LLM agents. To allow users to experience our memory system, we provide a packaged application powered by MIRIX. It monitors the screen in real time, builds a personalized memory base, and offers intuitive visualization and secure local storage to ensure privacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MIRIX's six-memory multi-agent design reports strong gains on screenshot sequences and long conversations, but the coordinator's contribution remains unisolated.

read the letter

MIRIX breaks memory into six types—Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault—and routes updates and retrieval through a multi-agent framework. This setup targets the real limitation that most agent memory stays flat and text-only, and the paper shows it can scale to long sequences of high-resolution screenshots where standard RAG fails outright. The 99.9% storage cut on ScreenshotVQA while gaining 35% accuracy over baseline is the clearest practical win, and the 85.4% on LOCOMO beats prior numbers on that benchmark. The packaged real-time screen app is a useful addition for anyone who wants to test the system locally with privacy preserved. The main gap is the missing ablation that turns the coordinator off while keeping the same six memory schemas. Without that comparison, the accuracy and consistency gains could come from the structured memory types alone rather than the dynamic multi-agent layer, and we cannot rule out new retrieval inconsistencies introduced by the coordinator itself over very long runs. Error analysis and controls are also thin in the reported results. This paper is aimed at people building or evaluating memory systems for LLM agents in personal or productivity settings. A reader already working on agent architectures would find the taxonomy and multimodal application worth examining even if they end up modifying the coordination logic. I would send it to peer review so referees can check the implementation details and request the needed ablations.

Referee Report

3 major / 2 minor

Summary. The paper introduces MIRIX, a modular multi-agent memory system for LLM-based agents comprising six structured memory types (Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault) coordinated by a dynamic multi-agent framework for updates and retrieval. It claims to enable scalable, long-term multimodal memory and reports 35% higher accuracy than a RAG baseline with 99.9% storage reduction on the ScreenshotVQA benchmark (nearly 20k high-resolution screenshots) plus state-of-the-art 85.4% performance on the LOCOMO long-form conversation benchmark.

Significance. If the empirical claims hold after proper validation, MIRIX would represent a practical advance in memory-augmented agents by moving beyond flat retrieval to a typed, multimodal, long-horizon memory architecture. The provision of a packaged real-time screen-monitoring application is a positive step toward reproducibility and usability.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: the central performance claims (35% accuracy lift and 99.9% storage reduction on ScreenshotVQA; 85.4% on LOCOMO) are stated without any reference to tables, figures, statistical significance tests, or error bars, and no ablation isolating the multi-agent coordinator from the six memory types is described. This makes it impossible to determine whether the reported gains require the coordination mechanism or could be obtained from the memory schemas alone.
[Methods] Methods section: the multi-agent coordination mechanism for dynamic updates and retrieval across the six memory types is presented at a high level with no description of conflict resolution, consistency invariants, or failure modes in long sequences. The weakest assumption in the design—that coordination reliably avoids retrieval inconsistencies—therefore remains untested.
[Results] Results section: no quantitative breakdown is given for how the 99.9% storage reduction is achieved (e.g., compression ratios per memory type, deduplication strategy, or comparison against a single-memory baseline with identical content). Without these details the storage claim cannot be evaluated.

minor comments (2)

[Abstract] The abstract would benefit from a one-sentence definition or example for each of the six memory types to make the architecture immediately intelligible.
[Methods] The paper should include a clear diagram or pseudocode for the multi-agent update/retrieval loop to clarify control flow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for improving clarity and rigor. We have revised the manuscript to address each major comment by adding explicit references, new analyses, and expanded methodological details.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central performance claims (35% accuracy lift and 99.9% storage reduction on ScreenshotVQA; 85.4% on LOCOMO) are stated without any reference to tables, figures, statistical significance tests, or error bars, and no ablation isolating the multi-agent coordinator from the six memory types is described. This makes it impossible to determine whether the reported gains require the coordination mechanism or could be obtained from the memory schemas alone.

Authors: We agree that explicit cross-references and supporting analyses strengthen the presentation. In the revised manuscript, the Abstract now cites Table 1 (ScreenshotVQA results) and Table 2 (LOCOMO results). The Experiments section includes statistical significance tests, error bars, and a new ablation study comparing the full MIRIX system (with multi-agent coordinator) against a variant using only the six memory types. This ablation confirms that the coordinator contributes measurably to the observed gains beyond the memory schemas alone. revision: yes
Referee: [Methods] Methods section: the multi-agent coordination mechanism for dynamic updates and retrieval across the six memory types is presented at a high level with no description of conflict resolution, consistency invariants, or failure modes in long sequences. The weakest assumption in the design—that coordination reliably avoids retrieval inconsistencies—therefore remains untested.

Authors: We have expanded the Methods section with a detailed account of the coordination mechanism. This now covers conflict resolution via priority-based merging (factoring recency and memory type), consistency invariants (timestamp ordering and cross-type semantic checks), and failure-mode analysis for long sequences with mitigation via periodic reconciliation. New experiments in the revised paper demonstrate robustness over extended interactions, directly testing the assumption. revision: yes
Referee: [Results] Results section: no quantitative breakdown is given for how the 99.9% storage reduction is achieved (e.g., compression ratios per memory type, deduplication strategy, or comparison against a single-memory baseline with identical content). Without these details the storage claim cannot be evaluated.

Authors: We have added a dedicated subsection in Results providing the requested breakdown. It reports per-type compression ratios (e.g., 99.5% for Knowledge Vault via embedding-based semantic compression), the deduplication approach (similarity thresholding with periodic pruning), and a direct comparison to a single flat-memory baseline holding identical content. These details substantiate the overall 99.9% reduction. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering design with no equations, fitted parameters, or self-referential derivations.

full rationale

The paper describes MIRIX as a modular system with six explicitly defined memory types (Core, Episodic, Semantic, Procedural, Resource, Knowledge Vault) plus a multi-agent coordinator. Performance claims (35% accuracy lift on ScreenshotVQA, 85.4% on LOCOMO) are presented as outcomes of empirical evaluation against baselines, not as predictions derived from equations or parameters that reduce to the inputs by construction. No mathematical derivations, ansatzes, uniqueness theorems, or self-citations appear as load-bearing steps in the provided text. The architecture is justified by functional requirements for multimodal long-term memory rather than by any self-referential logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated assumption that structured multi-agent memory coordination improves retrieval accuracy and efficiency over flat RAG without introducing new failure modes; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption LLM agents benefit from explicit separation of memory into core, episodic, semantic, procedural, resource, and knowledge-vault types
Invoked implicitly when claiming the six-type design enables better personalization and recall

pith-pipeline@v0.9.0 · 5608 in / 1220 out tokens · 33292 ms · 2026-05-15T05:53:20.135092+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.LawOfExistence defect_zero_iff_one unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
cs.AI 2026-05 conditional novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for ...
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
cs.CR 2026-05 unverdicted novelty 8.0

ShadowMerge poisons graph-based agent memory by creating relation-channel conflicts that get extracted and retrieved, achieving 93.8% attack success rate on Mem0 and datasets like PubMedQA while evading prior defenses.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
cs.CR 2026-05 unverdicted novelty 8.0

ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
cs.AI 2026-05 conditional novelty 7.0

ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
cs.AI 2026-05 unverdicted novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents
cs.AI 2026-04 unverdicted novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summa...
MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents
cs.CL 2026-04 unverdicted novelty 7.0

MemEvoBench is the first benchmark for long-horizon memory safety in LLM agents, using QA tasks across 7 domains and 36 risks plus workflow tasks with noisy tools to measure behavioral drift from biased memory updates.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems
cs.AI 2026-05 unverdicted novelty 6.0

HEAR uses a stratified hypergraph ontology to orchestrate evidence-driven multi-hop reasoning over heterogeneous business systems, reaching 94.7% accuracy on supply-chain root-cause tasks with open-weight models.
Cognifold: Always-On Proactive Memory via Cognitive Folding
cs.AI 2026-05 unverdicted novelty 6.0

Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...
$\delta$-mem: Efficient Online Memory for Large Language Models
cs.AI 2026-05 unverdicted novelty 6.0

δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-...
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
cs.AI 2026-05 unverdicted novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
cs.AI 2026-05 unverdicted novelty 6.0

HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Tree-based Credit Assignment for Multi-Agent Memory System
cs.MA 2026-05 unverdicted novelty 6.0

TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
cs.CL 2026-04 unverdicted novelty 6.0

RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case v...
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
math.OC 2026-04 unverdicted novelty 6.0

Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
Stateless Decision Memory for Enterprise AI Agents
cs.AI 2026-04 unverdicted novelty 6.0

Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, de...
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search
cs.IR 2026-04 unverdicted novelty 6.0

MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search
cs.IR 2026-04 unverdicted novelty 6.0

MemSearch-o1 uses reasoning-aligned memory growth from seed tokens, retracing via contribution functions, and path reorganization to mitigate memory dilution in LLM agentic search.
Decocted Experience Improves Test-Time Inference in LLM Agents
cs.AI 2026-04 unverdicted novelty 6.0

Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.
PersonaVLM: Long-Term Personalized Multimodal LLMs
cs.CL 2026-03 unverdicted novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
Joint Optimization of Multi-agent Memory System
cs.MA 2026-03 unverdicted novelty 6.0

CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
MemOS: A Memory OS for AI System
cs.CL 2025-07 unverdicted novelty 5.0

MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 24 Pith papers · 7 internal anchors

[1]

Arigraph: Learning knowledge graph world models with episodic memory for llm agents

Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, and Evgeny Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents. arXiv preprint arXiv:2407.04363, 2024

work page arXiv 2024
[2]

Aydar Bulatov, Yuri Kuratov, and Mikhail S. Burtsev. Recurrent memory transformer. In NeurIPS, 2022

work page 2022
[4]

URL https://arxiv.org/abs/2006.11527

work page arXiv 2006
[5]

Agentverse: A multi-agent framework for autonomous task completion

Li Chen, Rohan Kumar, and Anika Patel. Agentverse: A multi-agent framework for autonomous task completion. Online; accessed 2024, 2024

work page 2024
[6]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Autogpt: Autonomous gpt-4 powered agent

Community. Autogpt: Autonomous gpt-4 powered agent. GitHub repository,https://github. com/Significant-Gravitas/Auto-GPT, 2023

work page 2023
[8]

Babyagi: Open-source autonomous ai agent

Community. Babyagi: Open-source autonomous ai agent. GitHub repository, https:// github.com/yoheinakajima/babyagi, 2023

work page 2023
[9]

Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jirí Navrátil, Soham Dan, and Pin-Yu Chen

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarathkrishna Swaminathan, Sihui Dai, Aurélie C. Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jirí Navrátil, Soham Dan, and Pin-Yu Chen. Larimar: Large language models with episodic memory control. In ICML. OpenReview.net, 2024

work page 2024
[10]

Cartridges: Lightweight and general- purpose long context representations via self-study

Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Liu, Will Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, et al. Cartridges: Lightweight and general- purpose long context representations via self-study. arXiv preprint arXiv:2506.06266, 2025

work page arXiv 2025
[11]

Camelot: Towards large language models with training-free consolidated associative memory

Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry Krotov, and Rogerio Feris. Camelot: Towards large language models with training-free consolidated associative memory. arXiv preprint arXiv:2402.13449, 2024

work page arXiv 2024
[12]

Metagpt: Designing a multi-agent ecosystem for task management

Emily Hong, Xin Zhao, and Kevin Lee. Metagpt: Designing a multi-agent ecosystem for task management. Online; accessed 2023, 2023

work page 2023
[13]

Memory os of ai agent

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of ai agent. arXiv preprint arXiv:2506.06326, 2025

work page arXiv 2025
[14]

A machine with short-term, episodic, and semantic memory systems

Taewoon Kim, Michael Cochez, Vincent François-Lavet, Mark Neerincx, and Piek V ossen. A machine with short-term, episodic, and semantic memory systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 48–56, 2023

work page 2023
[16]

Memory, consciousness and large language model

Jitang Li and Jinzheng Li. Memory, consciousness and large language model. arXiv preprint arXiv:2401.02509, 2024

work page arXiv 2024
[17]

SnapKV: LLM Knows What You are Looking for Before Generation

Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. Snapkv: LLM knows what you are looking for before generation. CoRR, abs/2404.14469, 2024. doi: 10.48550/ARXIV .2404.14469. URL https://doi.org/10.48550/arXiv.2404.14469

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[18]

The role of episodic memory in long-term llm agents: A position paper

Ming Liao, Su Chen, and Li Zhao. The role of episodic memory in long-term llm agents: A position paper. Online; accessed 2024, 2024

work page 2024
[19]

Echo: A large language model with temporal episodic memory

WenTao Liu, Ruohua Zhang, Aimin Zhou, Feng Gao, and JiaLi Liu. Echo: A large language model with temporal episodic memory. arXiv preprint arXiv:2502.16090, 2025. 14

work page arXiv 2025
[20]

Evaluating Very Long-Term Conversational Memory of LLM Agents

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents. arXiv preprint arXiv:2402.17753, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Optimizing the interface between knowledge graphs and llms for complex reasoning

Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, and Jovan Pavlovic. Optimizing the interface between knowledge graphs and llms for complex reasoning. arXiv preprint arXiv:2505.24478, 2025

work page arXiv 2025
[22]

Leave no context behind: Efficient infinite context transformers with infini-attention.arXiv preprint arXiv:2404.07143, 101, 2024

Tsendsuren Munkhdalai, Manaal Faruqui, and Siddharth Gopal. Leave no context behind: Efficient infinite context transformers with infini-attention.arXiv preprint arXiv:2404.07143, 101, 2024

work page arXiv 2024
[23]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems. CoRR, abs/2310.08560, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Position: Episodic memory is the missing piece for long-term LLM agents.CoRR, abs/2502.06975,

Mathis Pink, Qinyuan Wu, Vy Ai V o, Javier Turek, Jianing Mu, Alexander Huth, and Mariya Toneva. Position: Episodic memory is the missing piece for long-term llm agents.arXiv preprint arXiv:2502.06975, 2025

work page arXiv 2025
[25]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Cognitive Memory in Large Language Models, April 2025

Lianlei Shan, Shixian Luo, Zezhou Zhu, Yu Yuan, and Yong Wu. Cognitive memory in large language models. arXiv preprint arXiv:2504.02441, 2025

work page arXiv 2025
[27]

Cognitive neuroscience perspective on memory: overview and summary

Sruthi Sridhar, Abdulrahman Khamaj, and Manish Kumar Asthana. Cognitive neuroscience perspective on memory: overview and summary. Frontiers in human neuroscience, 17:1217093, 2023

work page 2023
[28]

Memory and consciousness

Endel Tulving. Memory and consciousness. Canadian Psychology/Psychologie canadienne, 26 (1):1, 1985

work page 1985
[29]

Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, and Julian J. McAuley. MEMORYLLM: towards self-updatable large language models. In ICML. OpenReview.net, 2024

work page 2024
[30]

Yu Wang, Chi Han, Tongtong Wu, Xiaoxin He, Wangchunshu Zhou, Nafis Sadeq, Xiusi Chen, Zexue He, Wei Wang, Gholamreza Haffari, Heng Ji, and Julian J. McAuley. Towards lifespan cognitive systems. CoRR, abs/2409.13265, 2024

work page arXiv 2024
[31]

Self- updatable large language models with parameter integration

Yu Wang, Xinshuang Liu, Xiusi Chen, Sean O’Brien, Junda Wu, and Julian McAuley. Self- updatable large language models with parameter integration. arXiv preprint arXiv:2410.00487, 2024

work page arXiv 2024
[32]

arXiv preprint arXiv:2502.00592 , year=

Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, and Zexue He. M+: Extending memoryllm with scalable long-term memory. arXiv preprint arXiv:2502.00592, 2025

work page arXiv 2025
[33]

Caim: Devel- opment and evaluation of a cognitive ai memory framework for long-term interaction with intelligent agents

Rebecca Westhäußer, Frederik Berenz, Wolfgang Minker, and Sebastian Zepf. Caim: Devel- opment and evaluation of a cognitive ai memory framework for long-term interaction with intelligent agents. arXiv preprint arXiv:2505.13044, 2025

work page arXiv 2025
[34]

Procedural memory is not all you need: Bridging cognitive gaps in llm-based agents

Schaun Wheeler and Olivier Jeunen. Procedural memory is not all you need: Bridging cognitive gaps in llm-based agents. arXiv preprint arXiv:2505.03434, 2025

work page arXiv 2025
[35]

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Long- memeval: Benchmarking chat assistants on long-term interactive memory. arXiv preprint arXiv:2410.10813, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110, 2025. 15

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Patil, Ion Stoica, and Joseph E

Fanjia Yan, Huanzhi Mao, Charlie Cheng-Jie Ji, Tianjun Zhang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Berkeley function calling leaderboard.https://gorilla.cs.berkeley. edu/blogs/8_berkeley_function_calling_leaderboard.html, 2024

work page 2024
[38]

Barrett, Zhangyang Wang, and Beidi Chen

Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark W. Barrett, Zhangyang Wang, and Beidi Chen. H2O: heavy-hitter oracle for efficient generative inference of large language models. InNeurIPS, 2023

work page 2023
[39]

Memorybank: Enhancing large language models with long-term memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. arXiv preprint arXiv:2305.10250, 2023. 16 A Full Experimental Results with Different Runs We run MIRIX and Full-Context with gpt-4.1-mini three times and we report the full results in Ta- ble 3. There are variations across different r...

work page arXiv 2023