MEMTRACK: Evaluating long-term memory and state tracking in multi-platform dynamic agent environments.arXiv preprint arXiv:2510.01353, 2025

Darshan Deshpande, Varun Gangal, Hersh Mehta, Anand Kannappan, Rebecca Qian, Peng Wang · 2025 · arXiv 2510.01353

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

Simulating Human Memory with Language Models

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.

Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.

AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

AgentCollabBench shows that multi-agent reliability is limited by communication topology, with converging-DAG nodes causing synthesis bottlenecks that discard constraints and explain 7-40% of information loss variance.

From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.

citing papers explorer

Showing 6 of 6 citing papers after filters.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory cs.AI · 2026-06-15 · unverdicted · none · ref 24
MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.
Simulating Human Memory with Language Models cs.CL · 2026-05-25 · unverdicted · none · ref 45
Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery cs.LG · 2026-05-13 · unverdicted · none · ref 3
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators cs.CL · 2026-05-09 · unverdicted · none · ref 9
AgentCollabBench shows that multi-agent reliability is limited by communication topology, with converging-DAG nodes causing synthesis bottlenecks that discard constraints and explain 7-40% of information loss variance.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction cs.AI · 2026-04-30 · unverdicted · none · ref 9
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
FileGram: Grounding Agent Personalization in File-System Behavioral Traces cs.CV · 2026-04-06 · unverdicted · none · ref 4
FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.

MEMTRACK: Evaluating long-term memory and state tracking in multi-platform dynamic agent environments.arXiv preprint arXiv:2510.01353, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer