Introduces PrecisionMemBench, an 89-case benchmark for isolated retrieval precision in LLM memory systems, and Tenure, a structured store achieving 1.0 mean precision on all cases.
Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based mechanisms. While efficient, such System-1-style retrieval struggles with scenarios that require global reasoning or comprehensive coverage of all relevant information. In this work, We propose Mnemis, a novel memory framework that integrates System-1 similarity search with a complementary System-2 mechanism, termed Global Selection. Mnemis organizes memory into a base graph for similarity retrieval and a hierarchical graph that enables top-down, deliberate traversal over semantic hierarchies. By combining the complementary strength from both retrieval routes, Mnemis retrieves memory items that are both semantically and structurally relevant. Mnemis achieves state-of-the-art performance across all compared methods on long-term memory benchmarks, scoring 93.9 on LoCoMo and 91.6 on LongMemEval-S using GPT-4.1-mini.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
MemCoT transforms long-context LLM reasoning into an iterative stateful search using multi-view memory for evidence localization and dual short-term memory for guiding decisions, achieving SOTA on LoCoMo and LongMemEval-S benchmarks.
citing papers explorer
-
Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval
Introduces PrecisionMemBench, an 89-case benchmark for isolated retrieval precision in LLM memory systems, and Tenure, a structured store achieving 1.0 mean precision on all cases.
-
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
-
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
MemCoT transforms long-context LLM reasoning into an iterative stateful search using multi-view memory for evidence localization and dual short-term memory for guiding decisions, achieving SOTA on LoCoMo and LongMemEval-S benchmarks.