MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

· 2026 · cs.AI · arXiv 2605.11814

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain conversations, failing to capture the high-stakes complexity of real-world medical applications. Motivated by the stringent production requirements of an industry-leading health management agent serving tens of millions of active users, we introduce MedMemoryBench. We develop a human-agent collaborative pipeline to synthesize highly realistic, long-horizon medical trajectories based on clinically grounded, synthetic patient archetypes. This process yields a massive, expertly validated dataset comprising approximately 2,000 sessions and 16,000 interaction turns. Crucially, MedMemoryBench departs from traditional static evaluations by pioneering an "evaluate-while-constructing" streaming assessment protocol, which precisely mirrors dynamic memory accumulation in production environments. Furthermore, we formalize and systematically investigate the critical phenomenon of memory saturation, where sustained information influx actively degrades retrieval and reasoning robustness. Comprehensive benchmarking reveals severe bottlenecks in mainstream architectures, particularly concerning complex medical reasoning and noise resilience. By exposing these fundamental flaws, MedMemoryBench establishes a vital foundation for developing robust, production-ready medical agents.

representative citing papers

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

cs.MA · 2026-05-26 · unverdicted · novelty 6.0

AgensFlow learns coordination policies from task trajectories and outperforms fixed pipelines on distributed-systems incident and security-advisory tasks.

citing papers explorer

Showing 1 of 1 citing paper.

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems cs.MA · 2026-05-26 · unverdicted · none · ref 18 · internal anchor
AgensFlow learns coordination policies from task trajectories and outperforms fixed pipelines on distributed-systems incident and security-advisory tasks.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

fields

years

verdicts

representative citing papers

citing papers explorer