Ross Mitchell

Mohammad Tavakoli, Alireza Salemi, Carrie Ye, Mohamed Abdalla, Hamed Zamani, J Ross Mitchell · 2025 · arXiv 2510.27246

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

RHELM is a benchmark for LLM long-term memory with dynamic profiles, heterogeneous sources, and 27 memory characteristics that reveals weaknesses in existing models for multi-source aggregation and contextual reasoning.

ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

ElasticMem enables LLM agents to learn adaptive latent memory retrieval and elastic budget allocation, improving QA accuracy by 24-26% and ALFWorld success by 27-66% over baselines with lower token cost.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

cs.CR · 2026-04-22 · conditional · novelty 7.0

Introduces CSTM-Bench with 26 cross-session attack taxonomies, demonstrates recall loss in session-bound and full-log detectors, and proposes a bounded-memory coreset reader with the CSTM metric balancing detection and serving stability.

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

cs.AI · 2026-04-13 · conditional · novelty 7.0

Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

cs.CL · 2026-05-06 · conditional · novelty 6.0

True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

cs.AI · 2026-05-07 · unverdicted · novelty 4.0

LLM agent memory is organized into Storage (preserving trajectories), Reflection (refining them), and Experience (abstracting into reusable knowledge) stages driven by needs for long-range consistency, dynamic adaptation, and continual learning.

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

Existing memory benchmarks cover at most two of the seven continuity properties from ATANT v1.0, with a median of one and none covering more than two.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms cs.CR · 2026-04-22 · conditional · none · ref 3
Introduces CSTM-Bench with 26 cross-session attack taxonomies, demonstrates recall loss in session-bound and full-log detectors, and proposes a bounded-memory coreset reader with the CSTM metric balancing detection and serving stability.
The Missing Knowledge Layer in Cognitive Architectures for AI Agents cs.AI · 2026-04-13 · conditional · none · ref 35
Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.
Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall cs.CL · 2026-05-06 · conditional · none · ref 14
True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.

Ross Mitchell

fields

years

verdicts

representative citing papers

citing papers explorer