MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
hub Canonical reference
Memos: An operating system for memory-augmented generation (mag) in large language models
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 11roles
background 5polarities
background 5representative citing papers
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
Memory-R2 proposes LoGo-GRPO to fix unfair trajectory comparisons in RL training of memory-augmented LLM agents by combining global end-to-end rewards with local rerollouts from identical memory states.
PyraVid is a hierarchical multimodal memory system that structures long videos into pyramids to improve long-horizon reasoning and evidence aggregation.
OIDA is a proposed framework that represents organizational knowledge as epistemic Knowledge Objects with class-specific importance decay and signed contradictions, plus a QUESTION mechanism that surfaces modeled ignorance via inverse decay.
MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens than Mem0.
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
citing papers explorer
-
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
-
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
-
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
-
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
-
Opal: Private Memory for Personal AI
Opal enables private long-term memory for personal AI by decoupling reasoning to a trusted enclave with a lightweight knowledge graph and piggybacking reindexing on ORAM accesses.
-
Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents
Memory-R2 proposes LoGo-GRPO to fix unfair trajectory comparisons in RL training of memory-augmented LLM agents by combining global end-to-end rewards with local rerollouts from identical memory states.
-
PyraVid: Hierarchical Multimodal Memory for Long-Horizon Video Reasoning
PyraVid is a hierarchical multimodal memory system that structures long videos into pyramids to improve long-horizon reasoning and evidence aggregation.
-
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure
OIDA is a proposed framework that represents organizational knowledge as epistemic Knowledge Objects with class-specific importance decay and signed contradictions, plus a QUESTION mechanism that surfaces modeled ignorance via inverse decay.
-
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens than Mem0.
-
Agentic Reasoning for Large Language Models
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.