MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
hub Canonical reference
Halumem: Evaluating hallucinations in memory systems of agents
Canonical reference. 83% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 11representative citing papers
AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.
SensorPersona uses LLMs for hierarchical reasoning on longitudinal mobile sensor streams to continually extract stable personas, showing up to 31.4% higher recall and 85.7% win rate over baselines on a 20-user dataset.
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
MemReader uses distilled passive and GRPO-trained active extractors to selectively write low-noise long-term memories, outperforming passive baselines on knowledge updating, temporal reasoning, and hallucination tasks.
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
citing papers explorer
-
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
-
AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment
AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
-
MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts
MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.
-
SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams
SensorPersona uses LLMs for hierarchical reasoning on longitudinal mobile sensor streams to continually extract stable personas, showing up to 31.4% higher recall and 85.7% win rate over baselines on a 20-user dataset.
-
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
-
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.
-
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
-
MemReader: From Passive to Active Extraction for Long-Term Agent Memory
MemReader uses distilled passive and GRPO-trained active extractors to selectively write low-noise long-term memories, outperforming passive baselines on knowledge updating, temporal reasoning, and hallucination tasks.
-
Opal: Private Memory for Personal AI
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
-
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.