Ross Mitchell

Mohammad Tavakoli, Alireza Salemi, Carrie Ye, Mohamed Abdalla, Hamed Zamani, J Ross Mitchell · 2026 · arXiv 2510.27246

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

cs.CR · 2026-04-22 · conditional · novelty 7.0

Introduces CSTM-Bench with 26 cross-session attack taxonomies, demonstrates recall loss in session-bound and full-log detectors, and proposes a bounded-memory coreset reader with the CSTM metric balancing detection and serving stability.

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

cs.AI · 2026-04-13 · conditional · novelty 7.0

Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

cs.CL · 2026-05-06 · conditional · novelty 6.0

True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

cs.AI · 2026-05-07 · unverdicted · novelty 4.0

LLM agent memory is organized into Storage (preserving trajectories), Reflection (refining them), and Experience (abstracting into reusable knowledge) stages driven by needs for long-range consistency, dynamic adaptation, and continual learning.

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

Existing memory benchmarks cover at most two of the seven continuity properties from ATANT v1.0, with a median of one and none covering more than two.

citing papers explorer

Showing 7 of 7 citing papers.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 91
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms cs.CR · 2026-04-22 · conditional · none · ref 3
Introduces CSTM-Bench with 26 cross-session attack taxonomies, demonstrates recall loss in session-bound and full-log detectors, and proposes a bounded-memory coreset reader with the CSTM metric balancing detection and serving stability.
The Missing Knowledge Layer in Cognitive Architectures for AI Agents cs.AI · 2026-04-13 · conditional · none · ref 35
Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 52
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall cs.CL · 2026-05-06 · conditional · none · ref 14
True Memory is a verbatim-event retrieval pipeline running on a single SQLite file that reaches 93% accuracy on LoCoMo multi-session questions, outperforming Mem0, Supermemory, Zep, and matching or exceeding EverMemOS and Hindsight on other long-context benchmarks.
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms cs.AI · 2026-05-07 · unverdicted · none · ref 18
LLM agent memory is organized into Storage (preserving trajectories), Reflection (refining them), and Experience (abstracting into reusable knowledge) stages driven by needs for long-range consistency, dynamic adaptation, and continual learning.
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks cs.AI · 2026-04-13 · unverdicted · none · ref 9
Existing memory benchmarks cover at most two of the seven continuity properties from ATANT v1.0, with a median of one and none covering more than two.

Ross Mitchell

fields

years

verdicts

representative citing papers

citing papers explorer