GroupMemBench is a new benchmark exposing that LLM agent memory systems fail on group conversation properties like speaker-grounded tracking and audience-adapted responses, with top systems at 46% accuracy.
Accessed 2026- 05-27
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 1polarities
background 1representative citing papers
Introduces PrecisionMemBench, an 89-case benchmark for isolated retrieval precision in LLM memory systems, and Tenure, a structured store achieving 1.0 mean precision on all cases.
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
Eywa introduces a provenance-grounded memory system for persistent AI agents featuring evidence-first storage, typed validation, and deterministic multi-route retrieval, reporting 90.19% accuracy on LoCoMo and 88.2% on LongMemEval-S.
ActiveGraph inverts traditional agent frameworks by treating the append-only event log as the primary source of truth, from which the reactive graph is projected, yielding deterministic replay, forking, and lineage tracking.
True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zero ingestion cost.
citing papers explorer
-
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
GroupMemBench is a new benchmark exposing that LLM agent memory systems fail on group conversation properties like speaker-grounded tracking and audience-adapted responses, with top systems at 46% accuracy.
-
Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval
Introduces PrecisionMemBench, an 89-case benchmark for isolated retrieval precision in LLM memory systems, and Tenure, a structured store achieving 1.0 mean precision on all cases.
-
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
-
Eywa: Provenance-Grounded Long-Term Memory for AI Agents
Eywa introduces a provenance-grounded memory system for persistent AI agents featuring evidence-first storage, typed validation, and deterministic multi-route retrieval, reporting 90.19% accuracy on LoCoMo and 88.2% on LongMemEval-S.
-
The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems
ActiveGraph inverts traditional agent frameworks by treating the append-only event log as the primary source of truth, from which the reactive graph is projected, yielding deterministic replay, forking, and lineage tracking.
-
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Intern-Atlas constructs a methodological evolution graph with 9.4 million edges from 1.03 million AI papers to capture how methods emerge, adapt, and transition, enabling better idea evaluation and generation for AI-driven research.
-
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zero ingestion cost.