Realmem: Bench- marking llms in real-world memory-driven interaction

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction , author= · 2026 · arXiv 2601.06966

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

GateMem benchmark shows no existing memory method for LLM agents achieves strong utility, access control, and reliable forgetting simultaneously in multi-principal shared settings.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

PersonaTree: Structured Lifecycle Memory for Person Understanding in LLM Agents

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

PersonaTree is a new hierarchical memory framework for persistent LLM agents that structures evidence into persona claims via support paths and outperforms baselines on six person-understanding benchmarks.

HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

The paper creates the WorldLines benchmark for long-horizon embodied household tasks and proposes ObsMem as an observer-grounded memory architecture that maintains visibility-aware state trails.

HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning

cs.IR · 2026-06-03 · unverdicted · novelty 6.0

HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.

Opal: Private Memory for Personal AI

cs.CR · 2026-04-02 · unverdicted · novelty 6.0

Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

SegTreeMem organizes agent conversation history as a temporally ordered segment tree and shows improved answer quality on long-horizon benchmarks when chronological order is preserved during insertion and retrieval.

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

cs.AI · 2026-01-29 · unverdicted · novelty 5.0

MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.

citing papers explorer

Showing 1 of 1 citing paper after filters.

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents cs.LG · 2026-06-17 · unverdicted · none · ref 3
GateMem benchmark shows no existing memory method for LLM agents achieves strong utility, access control, and reliable forgetting simultaneously in multi-principal shared settings.

Realmem: Bench- marking llms in real-world memory-driven interaction

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer