MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
The probabilistic relevance framework: BM25 and beyond.Foundations and Trends in Information Retrieval, 3(4):333–389
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
years
2026 2verdicts
CONDITIONAL 2roles
baseline 1polarities
baseline 1representative citing papers
CUE-R uses REMOVE, REPLACE, and DUPLICATE interventions on individual evidence items to quantify their per-item utility in RAG along correctness, grounding faithfulness, and confidence axes.
citing papers explorer
-
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
-
CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
CUE-R uses REMOVE, REPLACE, and DUPLICATE interventions on individual evidence items to quantify their per-item utility in RAG along correctness, grounding faithfulness, and confidence axes.