hub Canonical reference

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang, Xi Chen · 2025 · cs.CL · arXiv 2507.07957

Canonical reference. 71% of citing Pith papers cite this work as background.

49 Pith papers citing it

Background 71% of classified citations

open full Pith review browse 49 citing papers arXiv PDF

abstract

Although memory capabilities of AI agents are gaining increasing attention, existing solutions remain fundamentally limited. Most rely on flat, narrowly scoped memory components, constraining their ability to personalize, abstract, and reliably recall user-specific information over time. To this end, we introduce MIRIX, a modular, multi-agent memory system that redefines the future of AI memory by solving the field's most critical challenge: enabling language models to truly remember. Unlike prior approaches, MIRIX transcends text to embrace rich visual and multimodal experiences, making memory genuinely useful in real-world scenarios. MIRIX consists of six distinct, carefully structured memory types: Core, Episodic, Semantic, Procedural, Resource Memory, and Knowledge Vault, coupled with a multi-agent framework that dynamically controls and coordinates updates and retrieval. This design enables agents to persist, reason over, and accurately retrieve diverse, long-term user data at scale. We validate MIRIX in two demanding settings. First, on ScreenshotVQA, a challenging multimodal benchmark comprising nearly 20,000 high-resolution computer screenshots per sequence, requiring deep contextual understanding and where no existing memory systems can be applied, MIRIX achieves 35% higher accuracy than the RAG baseline while reducing storage requirements by 99.9%. Second, on LOCOMO, a long-form conversation benchmark with single-modal textual input, MIRIX attains state-of-the-art performance of 85.4%, far surpassing existing baselines. These results show that MIRIX sets a new performance standard for memory-augmented LLM agents. To allow users to experience our memory system, we provide a packaged application powered by MIRIX. It monitors the screen in real time, builds a personalized memory base, and offers intuitive visualization and secure local storage to ensure privacy.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 2

citation-polarity summary

background 5 baseline 2

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

cs.CL · 2026-04-17 · unverdicted · novelty 8.0 · 2 refs

MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

cs.IR · 2026-07-01 · unverdicted · novelty 7.0 · 2 refs

MemSyco-Bench is a benchmark covering five tasks to evaluate memory-induced sycophancy in LLM agents, testing rejection of invalid memory, scope respect, conflict resolution, update tracking, and valid personalization.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

Rosetta Memory trains two profile-conditioned operators with a minimum-gain sampling curriculum and performance-gap reward to enable memory transfer between LLMs, showing gains on multi-hop QA benchmarks and robustness to unseen models.

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution

cs.HC · 2026-02-20 · unverdicted · novelty 7.0

EvoDiagram uses a coordinated multi-agent system and design knowledge evolution to generate editable diagrams via canvas schema, with a new CanvasBench benchmark showing strong performance over baselines.

ECHO: Prune to act, trace to learn with selective turn memory in agentic RL

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.

M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions

cs.CL · 2026-06-05 · unverdicted · novelty 6.0

Presents M³Exam benchmark for multimodal conversational memory in user-agent settings and M³Proctor method that raises accuracy 13% while cutting construction time and tokens over 70%.

cs.AI · 2026-06-04 · unverdicted · novelty 6.0

MemGate is a 9M-parameter neural gate inserted between vector memory and LLM that converts similarity search into task-conditioned admission, reducing memory-induced threats across agent frameworks while preserving utility.

HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning

cs.IR · 2026-06-03 · unverdicted · novelty 6.0

HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.

Rethinking Memory as Continuously Evolving Connectivity

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.

From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.

Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

HEAR uses a stratified hypergraph ontology to orchestrate evidence-driven multi-hop reasoning over heterogeneous business systems, reaching 94.7% accuracy on supply-chain root-cause tasks with open-weight models.

$\delta$-mem: Efficient Online Memory for Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-heavy tasks.

SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.

citing papers explorer

Showing 49 of 49 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 32 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts cs.CR · 2026-05-09 · unverdicted · none · ref 11 · 3 links · internal anchor
ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.
MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents cs.CL · 2026-04-17 · unverdicted · none · ref 15 · 2 links · internal anchor
MemEvoBench is presented as the first standardized benchmark for long-horizon memory safety in LLM agents, covering adversarial memory injection, noisy tool outputs, and biased feedback across QA and workflow tasks.
MemSyco-Bench: Benchmarking Sycophancy in Agent Memory cs.IR · 2026-07-01 · unverdicted · none · ref 30 · 2 links · internal anchor
MemSyco-Bench is a benchmark covering five tasks to evaluate memory-induced sycophancy in LLM agents, testing rejection of invalid memory, scope respect, conflict resolution, update tracking, and valid personalization.
MemTrace: Probing What Final Accuracy Misses in Long-Term Memory cs.AI · 2026-06-15 · unverdicted · none · ref 20 · internal anchor
MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.
Rosetta Memory: Adaptive Memory for Cross-LLM Agents cs.LG · 2026-06-05 · unverdicted · none · ref 21 · internal anchor
Rosetta Memory trains two profile-conditioned operators with a minimum-gain sampling curriculum and performance-gap reward to enable memory transfer between LLMs, showing gains on multi-hop QA benchmarks and robustness to unseen models.
Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads cs.AI · 2026-06-04 · unverdicted · none · ref 29 · internal anchor
The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.
SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory cs.CL · 2026-05-15 · unverdicted · none · ref 25 · internal anchor
SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents cs.AI · 2026-05-13 · unverdicted · none · ref 99 · 2 links · internal anchor
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 75 · internal anchor
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 158 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents cs.AI · 2026-04-21 · unverdicted · none · ref 7 · internal anchor
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 58 · internal anchor
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution cs.HC · 2026-02-20 · unverdicted · none · ref 14 · internal anchor
EvoDiagram uses a coordinated multi-agent system and design knowledge evolution to generate editable diagrams via canvas schema, with a new CanvasBench benchmark showing strong performance over baselines.
ECHO: Prune to act, trace to learn with selective turn memory in agentic RL cs.LG · 2026-06-30 · unverdicted · none · ref 31 · internal anchor
ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.
M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions cs.CL · 2026-06-05 · unverdicted · none · ref 50 · internal anchor
Presents M³Exam benchmark for multimodal conversational memory in user-agent settings and M³Proctor method that raises accuracy 13% while cutting construction time and tokens over 70%.
Beyond Similarity: Trustworthy Memory Search for Personal AI Agents cs.AI · 2026-06-04 · unverdicted · none · ref 7 · internal anchor
MemGate is a 9M-parameter neural gate inserted between vector memory and LLM that converts similarity search into task-conditioned admission, reducing memory-induced threats across agent frameworks while preserving utility.
HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning cs.IR · 2026-06-03 · unverdicted · none · ref 92 · internal anchor
HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.
Rethinking Memory as Continuously Evolving Connectivity cs.CL · 2026-05-27 · unverdicted · none · ref 50 · internal anchor
FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.
From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents cs.CL · 2026-05-25 · unverdicted · none · ref 70 · internal anchor
RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents cs.CL · 2026-05-20 · unverdicted · none · ref 30 · internal anchor
Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems cs.AI · 2026-05-14 · unverdicted · none · ref 62 · internal anchor
HEAR uses a stratified hypergraph ontology to orchestrate evidence-driven multi-hop reasoning over heterogeneous business systems, reaching 94.7% accuracy on supply-chain root-cause tasks with open-weight models.
$\delta$-mem: Efficient Online Memory for Large Language Models cs.AI · 2026-05-12 · unverdicted · none · ref 15 · internal anchor
δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-heavy tasks.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory cs.AI · 2026-05-12 · unverdicted · none · ref 238 · internal anchor
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 4 · internal anchor
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Tree-based Credit Assignment for Multi-Agent Memory System cs.MA · 2026-05-06 · unverdicted · none · ref 12 · internal anchor
TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents cs.CL · 2026-04-30 · unverdicted · none · ref 6 · internal anchor
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling math.OC · 2026-04-28 · unverdicted · none · ref 18 · internal anchor
Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
Stateless Decision Memory for Enterprise AI Agents cs.AI · 2026-04-22 · unverdicted · none · ref 6 · internal anchor
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, determinism, and auditability.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search cs.IR · 2026-04-19 · unverdicted · none · ref 53 · internal anchor
MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
Decocted Experience Improves Test-Time Inference in LLM Agents cs.AI · 2026-04-06 · unverdicted · none · ref 2 · internal anchor
Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.
PersonaVLM: Long-Term Personalized Multimodal LLMs cs.CL · 2026-03-20 · unverdicted · none · ref 40 · internal anchor
PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.
Joint Optimization of Multi-agent Memory System cs.MA · 2026-03-13 · unverdicted · none · ref 29 · internal anchor
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
Security Considerations for Multi-agent Systems cs.CR · 2026-03-09 · unverdicted · none · ref 221 · internal anchor
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling cs.AI · 2026-02-15 · unverdicted · none · ref 15 · internal anchor
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web cs.AI · 2026-01-18 · unverdicted · none · ref 95 · internal anchor
Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 49 · internal anchor
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision cs.CL · 2026-06-15 · unverdicted · none · ref 48 · internal anchor
MemSlides introduces a three-part memory hierarchy (user profile, working, tool) with scoped local revision for multi-turn personalized slide generation.
ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning cs.AI · 2026-06-09 · unverdicted · none · ref 15 · internal anchor
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems cs.AI · 2026-06-07 · unverdicted · none · ref 16 · internal anchor
ConMem distills agent trajectories into structured memory cards organized in a relation-aware graph to enable training-free, relation-coordinated adaptation in LLM-based multi-agent systems.
VikingMem: A Memory Base Management System for Stateful LLM-based Applications cs.AI · 2026-05-28 · unverdicted · none · ref 64 · internal anchor
VikingMem implements the Memory Base paradigm via event-centric extraction and entity updates on VikingDB with temporal compression, claiming up to 30% better retrieval effectiveness on long-term memory benchmarks.
Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory cs.AI · 2026-05-25 · unverdicted · none · ref 41 · internal anchor
Proposes Governed Evolving Memory (GEM) as a state-trajectory workload for long-term AI agent memory using four operators and six correctness conditions that record-level systems cannot satisfy.
Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 195 · internal anchor
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
PyraVid: Hierarchical Multimodal Memory for Long-Horizon Video Reasoning cs.MA · 2026-05-16 · unverdicted · none · ref 42 · internal anchor
PyraVid is a hierarchical multimodal memory system that structures long videos into pyramids to improve long-horizon reasoning and evidence aggregation.
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory cs.CV · 2026-05-14 · unverdicted · none · ref 42 · internal anchor
MemEye benchmark evaluates multimodal memory on visual granularity and evidence synthesis, finding that 13 methods across 4 VLMs struggle with fine details and temporal state changes.
CogniFold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unverdicted · none · ref 63 · 2 links · internal anchor
CogniFold extends Complementary Learning Systems theory to three layers with a prefrontal intent layer and uses graph self-organization to build proactive agent memory from continuous event streams.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 151 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
What Deserves Memory: Adaptive Memory Distillation for LLM Agents cs.AI · 2025-08-05 · unverdicted · none · ref 30 · internal anchor
NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.
MemOS: A Memory OS for AI System cs.CL · 2025-07-04 · unverdicted · none · ref 106 · internal anchor
MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

MIRIX: Multi-Agent Memory System for LLM-Based Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer